duckdb: flatten runend arrays on export if requested#7951
Conversation
Signed-off-by: Mikhail Kot <[email protected]>
733aaf4 to
edf8146
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.987x ➖ datafusion / vortex-file-compressed (0.987x ➖, 0↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Merging this PR will degrade performance by 12.27%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | encode_varbin[(1000, 2)] |
148.5 µs | 169.2 µs | -12.27% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing myrrc/duckdb-runend-flatten (edf8146) with develop (254f91b)
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.990x ➖, 0↑ 0↓)
datafusion / parquet (0.973x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.954x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.955x ➖, 1↑ 0↓)
duckdb / parquet (0.923x ➖, 2↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
| Ok(array) if !flatten => return run_end::new_exporter(array, cache, ctx), | ||
| Ok(array) => array.into_array(), // .into_array() does flattening |
There was a problem hiding this comment.
add a new_exporter_with_flatten here?
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.963x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.968x ➖, 1↑ 0↓)
datafusion / parquet (1.006x ➖, 0↑ 1↓)
datafusion / arrow (0.991x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.961x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.943x ➖, 3↑ 0↓)
duckdb / parquet (0.993x ➖, 0↑ 0↓)
duckdb / duckdb (0.989x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.083x ➖, 2↑ 39↓)
datafusion / vortex-compact (1.075x ➖, 2↑ 19↓)
datafusion / parquet (1.085x ➖, 0↑ 34↓)
duckdb / vortex-file-compressed (1.039x ➖, 3↑ 16↓)
duckdb / vortex-compact (0.986x ➖, 13↑ 8↓)
duckdb / parquet (1.055x ➖, 1↑ 12↓)
duckdb / duckdb (1.078x ➖, 0↑ 32↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.024x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.119x ➖, 0↑ 3↓)
datafusion / parquet (1.157x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.073x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.031x ➖, 0↑ 0↓)
duckdb / parquet (1.078x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: Likely improvement (medium confidence) duckdb / vortex-file-compressed (0.775x ✅, 7↑ 1↓)
duckdb / vortex-compact (0.689x ✅, 7↑ 1↓)
duckdb / parquet (1.018x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.831x ✅, 21↑ 0↓)
datafusion / vortex-compact (0.859x ✅, 20↑ 0↓)
datafusion / parquet (0.900x ➖, 10↑ 0↓)
datafusion / arrow (0.870x ✅, 19↑ 0↓)
duckdb / vortex-file-compressed (0.872x ✅, 17↑ 0↓)
duckdb / vortex-compact (0.872x ✅, 17↑ 0↓)
duckdb / parquet (0.954x ➖, 0↑ 0↓)
duckdb / duckdb (0.930x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.996x ➖, 0↑ 0↓)
datafusion / parquet (0.992x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.966x ➖, 6↑ 0↓)
duckdb / parquet (0.993x ➖, 1↑ 0↓)
duckdb / duckdb (1.028x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.957x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.093x ➖, 0↑ 2↓)
datafusion / parquet (1.126x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (0.984x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.949x ➖, 0↑ 0↓)
duckdb / parquet (0.981x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.980x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.025x ➖, 0↑ 0↓)
datafusion / parquet (1.043x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (0.939x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.956x ➖, 0↑ 0↓)
duckdb / parquet (0.930x ➖, 0↑ 0↓)
Full attributed analysis
|
There was a bug in runend exporter that didn't flatten the array.
This produced dictionary arrays instead of flat arrays which required duckdb to
re-flatten them which was a regression in statpopgen.