Refactor parquet row filter setup#22191
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partitioned |
|
run benchmarks clickbench_partitioned |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing xudong963/prune-page-index-followup (d6ac2fb) to 3f501f4 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Benchmark follow-up: I reran |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
@xudong963 I left you some suggestions in xudong963#6. It goes a bit further but I think leaves things in a nicer state and is related enough to the other refactors to make sense to bundle. |
alamb
left a comment
There was a problem hiding this comment.
Thanks @xudong963 -- this looks good to me
I thnk @adriangb's xudong963#6 also looks good, but we could also potentially do that as a follow on
| Ok(prepared_access_plan) | ||
| } | ||
|
|
||
| struct DecoderBuilderConfig<'a> { |
There was a problem hiding this comment.
It would be nice to add some comments here explaining what this represnets (namely the state needed to build the ParquetPushDecoder)
| decoder_limit: Option<usize>, | ||
| } | ||
|
|
||
| fn build_decoder_builder( |
There was a problem hiding this comment.
Is there a reason this isn't a method on DecoderBuilderConfig? That seems like a natural way to encapsulate the functionality
impl DecoderBuilderConfig {
fn build(self, prepared_access_plan, metadata) -> ParquetPushDecoderBuilder| /// Result of applying page-index pruning to a [`ParquetAccessPlan`]. | ||
| pub(crate) struct PagePruningResult { | ||
| pub(crate) access_plan: ParquetAccessPlan, | ||
| /// Pages skipped because the containing row group was fully matched by |
Which issue does this PR close?
N/A, follow-up to review comments from #21637.
Rationale for this change
PR #21637 added logic to skip row filters and page-index pruning for row groups that are fully matched by row-group statistics. The resulting opener code had a few local closures and manually tracked
Option<RowFilter>state, which made the scan setup harder to follow.This also restores the public
PagePruningAccessPlanFilter::prune_plan_with_page_indexreturn type toParquetAccessPlan. The extra fully-matched page count is still available internally for metrics without changing the public API.What changes are included in this PR?
RowFilterGenerator.PagePruningResultandprune_plan_with_page_index_and_metricsfor opener metrics.prune_plan_with_page_indexAPI returningParquetAccessPlan.Are these changes tested?
Yes.
Are there any user-facing changes?
No user-facing behavior changes. This preserves the existing public
prune_plan_with_page_indexreturn type.