[ci] More CI critical-path length optimizations#13352
Conversation
This commit introduces three different optimizations: 1. only run `cargo check [..]` with MSRV instead of the whole test suite, based on the assumption that all possible failures would be identified statically 2. Reduce work done for MPK builds 3. shard the `wasmtime-cli` job into three buckets to run in parallel, at the cost of rebuilding the binaries multiple times We'll see how kind the real world is to the theoretical analysis here.
Most importantly, change the MPK availability check to use the same mechanism Wasmtime itself uses, to prevent potential future drift. prtest:full
Specifically - only install rustc wasm targets for jobs that need them - only check out submodules for jobs that need them - only install the `rust-src` component for `-min` builds
the `Miri (wasmtime-cli)` job is one of the longest for full test runs right now, so sharding it should (together with other measures) help reduce wall-clock duration.
These jobs are regularly the longest-running on full CI test runs by a few minutes, and this hopefully shave off a few minutes. With a bit of luck, wall-clock duration should go a bit below 20 minutes. This comes at the cost of doing a bunch more duplicate work, so will only be worth it if the wins are sizable enough. prtest:full
Previously, newly introduced test targets for `wasmtime-cli` would silently be ignored in CI. Now, `cargo metadata` is used to identify and run them all. This is applied both for the normal test jobs and for miri.
| # Dynamically compute --test targets for wasmtime-cli that are: | ||
| # - standard harness (custom-harness tests can't run under nextest) | ||
| # - not "all" (has its own dedicated Miri bucket) | ||
| # This uses cargo metadata for the full target list and Cargo.toml for | ||
| # the harness flag, so new [[test]] entries are picked up automatically. | ||
| tests=$(python3 -c " | ||
| import tomllib, json, subprocess | ||
| # Get all test targets from cargo metadata | ||
| meta = json.loads(subprocess.check_output(['cargo', 'metadata', '--format-version=1', '--no-deps'])) | ||
| pkg = next(p for p in meta['packages'] if p['name'] == 'wasmtime-cli') | ||
| all_tests = {t['name'] for t in pkg['targets'] if 'test' in t['kind']} | ||
| # Find custom-harness tests from Cargo.toml | ||
| with open('Cargo.toml', 'rb') as f: | ||
| manifest = tomllib.load(f) | ||
| custom_harness = {t['name'] for t in manifest.get('test', []) if not t.get('harness', True)} | ||
| # Exclude custom-harness tests and 'all' (has its own bucket) | ||
| excluded = custom_harness | {'all'} | ||
| for t in sorted(all_tests - excluded): | ||
| print(f'--test {t}') | ||
| ") |
There was a problem hiding this comment.
While I've no doubt this works, personally I dread this sort of CI configuration of "giant wad of script/shell/etc in yml" as it's generally quite difficult to reproduce and/or onerous to work with locally. Especially if this is duplciated with the (already quite large) build-text-matrix.js logic I feel like we're not really getting much bang for our buck by refactoring/shuffling tests.
| // Expand each platform entry into two jobs: one for the CLI binary and one for | ||
| // the C API. This lets them build in parallel on separate runners. | ||
| const expanded = []; | ||
| for (const entry of builds) { | ||
| for (const component of ["cli", "capi"]) { | ||
| expanded.push(Object.assign({}, entry, { component })); | ||
| } | ||
| } | ||
|
|
||
| console.log(JSON.stringify(expanded)); |
There was a problem hiding this comment.
Personally I'm wary of continuing to fork jobs as I feel like we're already more-or-less at the limit, even at 500 concurrency. A job already spawns 100+ jobs for all the platforms/checks/etc and doubling all release jobs will probably push this closer to 200. That can be a pretty big problem for things like security advisories where we do multiple CI runs at once -- that already takes hours -- and continuing to fork things will push that time even higher.
Personally I feel like 20-30m is pretty reasonable for CI for Wasmtime at this time. Shaving a few minutes off here and there by drastically increasing total CPU time I feel isn't worth it myself.
There was a problem hiding this comment.
I'll also point out that the increase in complexity of all the shell scripts here is pretty significant. I already loathe shell scripts and I'm always wary of making them even more complicated. Basically my personal complexity threshold for a shell script is quite low and I feel that the additions here definitely push it over the edge
There was a problem hiding this comment.
as mentioned in the PR description, I wasn't at all sure if these changes are worth it, yeah. I personally think that the complexity is manageable, but certainly agree that this comes at a pretty steep cost. And it seems like the "quick" check improvements are more important anyway. I'll drop the changes here except for the ones that aren't introducing tradeoffs (which probably means all but the first commit, which eliminates needless work.)
There was a problem hiding this comment.
If the goal is to primarily get CI to complete faster I think it'd be reasonable to specialize entirely on that, for example having just two jobs that runs cargo test --test all and cargo test --test wast or something like that. That way we wouldn't have to worry about losing testing (other builders wouldn't be affected) and nor would be have to worry about the long-poll (both just run a single test suite). We could then turn down some testing automatically done in CI from the other matrix entries (e.g. skip the wasmtime job by default) or something like that
These ones are a bit more speculative than the ones in #13340, and are meant to shorten the wall-clock duration of full CI runs, not just PR runs.
More speculative because at least the last commit, parallelizing release builds of the wasmtime CLI and libwasmtime, comes at the cost of a fairly high degree of duplicate work per job, so we'll have to assess whether the reduction in wall-clock time is worth it.
Stacked on top of #13340.