[FLINK-39421][table] Fix metadata filter contract and add coverage#28162
Open
jnh5y wants to merge 1 commit into
Open
[FLINK-39421][table] Fix metadata filter contract and add coverage#28162jnh5y wants to merge 1 commit into
jnh5y wants to merge 1 commit into
Conversation
Repair the metadata filter push-down contract introduced in FLINK-39421
so that connectors can express arbitrary subset acceptance through
MetadataFilterResult, and add a coverage precondition + integration
tests.
Rule changes (PushFilterIntoTableSourceScanRule):
- Replace position-based slicing (accepted-as-prefix, remaining-as-suffix)
with identity-based correlation. Each ResolvedExpression in accepted/
remaining is looked up by instance identity to recover the original
RexNode (one in metadata-key index space for spec storage, one in
scan-row index space for the runtime Calc above the scan). The
ResolvedExpression list is immutable, so identity round-trip is the
natural correspondence.
- Add a coverage precondition: every input ResolvedExpression must
appear in accepted, remaining, or both. Dropping a predicate from
both lists silently produces wrong results — the predicate would be
neither pushed nor preserved.
- Widen the early-return so the rule terminates when no buckets can
usefully push (physical predicates can't be pushed AND metadata
predicates are empty), avoiding wasted Hep iterations.
- Route a metadata-only predicate that cannot currently be pushed
(source does not support metadata push-down, or the spec was already
attached) directly to the runtime Calc rather than the physical path.
Routing it physically produced a stale FilterPushDownSpec with
out-of-range RexInputRef indices that crashed compiled-plan restore.
Spec changes (MetadataFilterPushDownSpec):
- apply() enforces the same identity-based round-trip on compiled-plan
restore — every previously-accepted predicate must come back from the
source, identified by ResolvedExpression instance identity.
- needAdjustFieldReferenceAfterProjection() now returns false. The spec
stores predicates in metadata-row index space against a self-contained
predicateRowType used at restore; narrowing the scan via
ProjectPushDownSpec does not invalidate them, so ScanReuser is now
free to reuse a scan when two queries differ only in projection but
share the same metadata filter.
Helper change (FilterPushDownSpec):
- resolvePredicates returns a ResolvedPredicates struct carrying both
the resolved expressions and an identity-keyed reverse map from each
resolved expression to the input RexNode it came from. Used by the
rule and by MetadataFilterPushDownSpec.apply().
Test infrastructure:
- TestValuesScanTableSourceWithWatermarkPushDown.copy() now propagates
enableMetadataFilterPushDown. Without this, copy() silently disables
metadata filter push-down for the watermark-enabled test source,
making the combination of watermark + metadata filter push-down
untestable through TestValuesTableFactory.
Tests:
- New MetadataFilterInReadingMetadataTest cases:
* testBestEffortMetadataFilter — overlap pattern (accepted == remaining)
pushes metadataFilter onto the scan AND keeps a runtime Calc.
* testSubsetAcceptedNonPrefix — source accepts inputs at non-contiguous
positions; spec stores exactly the accepted predicates and the
runtime Calc carries exactly the rejected predicate.
* testSourceDroppingPredicateRaisesError — source returning empty
accepted AND empty remaining triggers the new coverage precondition.
- New MetadataFilterResultShapesITCase covers the four shapes a source
can return end-to-end through MiniCluster: accept-all, accept-none,
partial split, and best-effort overlap.
- DynamicTableSourceSpecSerdeTest spec3 round-trips MetadataFilterPushDownSpec
through JSON serde.
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Repair the metadata filter push-down contract introduced in FLINK-39421 so that connectors can express arbitrary subset acceptance through MetadataFilterResult, and add a coverage precondition + integration tests.
Rule changes (PushFilterIntoTableSourceScanRule):
Spec changes (MetadataFilterPushDownSpec):
Helper change (FilterPushDownSpec):
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
Brief change log
(for example:)
Verifying this change
This change added tests and can be verified as follows:
Tests:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation
Was generative AI tooling used to co-author this PR?
Generated-by: Claude (Opus 4.7)