DataFusion: Push down decimals with some explicit coercion#7919
Conversation
Signed-off-by: Adam Gutglick <[email protected]>
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.020x ➖ datafusion / vortex-file-compressed (1.020x ➖, 0↑ 1↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.028x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.062x ➖, 0↑ 2↓)
datafusion / parquet (1.058x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.038x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.034x ➖, 0↑ 1↓)
duckdb / parquet (1.053x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.984x ➖, 5↑ 1↓)
datafusion / vortex-compact (1.005x ➖, 1↑ 1↓)
datafusion / parquet (1.016x ➖, 0↑ 1↓)
datafusion / arrow (0.897x ✅, 11↑ 2↓)
duckdb / vortex-file-compressed (0.978x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.011x ➖, 0↑ 0↓)
duckdb / parquet (0.990x ➖, 1↑ 2↓)
duckdb / duckdb (0.974x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.974x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.970x ➖, 3↑ 3↓)
datafusion / parquet (0.978x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 1↑ 2↓)
duckdb / vortex-compact (0.983x ➖, 4↑ 0↓)
duckdb / parquet (0.983x ➖, 3↑ 0↓)
duckdb / duckdb (0.990x ➖, 1↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.062x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.996x ➖, 1↑ 1↓)
datafusion / parquet (1.064x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.028x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.101x ➖, 0↑ 1↓)
duckdb / parquet (1.021x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.973x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.996x ➖, 0↑ 0↓)
duckdb / parquet (0.988x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.997x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.043x ➖, 0↑ 1↓)
datafusion / parquet (1.066x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.022x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.001x ➖, 0↑ 0↓)
duckdb / parquet (1.014x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.009x ➖, 0↑ 1↓)
datafusion / parquet (0.964x ➖, 7↑ 1↓)
duckdb / vortex-file-compressed (0.999x ➖, 1↑ 0↓)
duckdb / parquet (1.005x ➖, 1↑ 2↓)
duckdb / duckdb (1.012x ➖, 1↑ 3↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.121x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.091x ➖, 0↑ 2↓)
datafusion / parquet (0.935x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.103x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.009x ➖, 0↑ 0↓)
duckdb / parquet (1.017x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.988x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.997x ➖, 0↑ 0↓)
datafusion / parquet (1.003x ➖, 0↑ 0↓)
datafusion / arrow (0.915x ➖, 7↑ 0↓)
duckdb / vortex-file-compressed (1.009x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.008x ➖, 0↑ 0↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
duckdb / duckdb (0.998x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Signed-off-by: Adam Gutglick <[email protected]>
Merging this PR will improve performance by 20.27%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | new_bp_prim_test_between[i64, 16384] |
144.4 µs | 115.1 µs | +25.45% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 32768] |
236.7 µs | 178 µs | +32.97% |
| ⚡ | Simulation | new_bp_prim_test_between[i16, 32768] |
134.1 µs | 120.2 µs | +11.58% |
| ⚡ | Simulation | new_alp_prim_test_between[f64, 16384] |
148.8 µs | 126.9 µs | +17.22% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 16384] |
109.1 µs | 94.8 µs | +15.13% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 32768] |
169.9 µs | 141.1 µs | +20.46% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/df-decimal (40ca877) with develop (7349cd6)
Footnotes
-
24 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Summary
Allow some version of pushing down work and casts on decimals to overcome the difference in semantics between DataFusion and Vortex.
The underlying issue here is that DataFusion is happy handling different decimals in one expression, while Vortex isn't. To overcome that, we make sure to inject some cast expressions when they are required.