feat(patch): add package- and diff-level patch sources#67
Open
Mikola Lysenko (mikolalysenko) wants to merge 2 commits into
Open
feat(patch): add package- and diff-level patch sources#67Mikola Lysenko (mikolalysenko) wants to merge 2 commits into
Mikola Lysenko (mikolalysenko) wants to merge 2 commits into
Conversation
Adds two new optional pathways to the socket-patch CLI alongside the
existing per-file blob path:
- Per-package archives at `.socket/packages/<uuid>.tar.gz` — a tarball
of patched files for a single patch, extracted in one shot.
- Per-file bsdiff archives at `.socket/diffs/<uuid>.tar.gz` — bsdiff
deltas that transform `before_hash` content into `after_hash` content.
The apply pipeline now tries sources in the order package → diff →
blob, falling through to the next on any failure. Every strategy
post-write-verifies the file's git-sha256 against `after_hash`, so the
existing safety invariant is unchanged.
A new `--download-mode {diff,package,file}` flag (default: `diff`)
controls what `apply`, `get`, `scan`, and `repair` fetch when local
artifacts are missing. The manifest schema is intentionally unchanged:
archives are keyed by patch UUID (already present in `PatchRecord`),
so legacy manifests keep working with no migration.
Highlights:
- New core modules `patch/diff.rs` (qbsdiff bspatch wrapper) and
`patch/package.rs` (tar+flate2 reader with path-traversal guards,
whitelist filtering against `expected_files`, and hard caps on
decompressed bytes / per-entry size / entry count to defuse
gzip-bomb and `Vec::with_capacity` allocation attacks).
- New `PatchSources` struct and `AppliedVia` enum in `patch/apply.rs`;
`apply_package_patch` takes a `PatchSources` and an optional UUID.
Passing `uuid = None` restores pre-2.2 blob-only behavior.
- `try_apply_from_diff` gates on the captured pre-apply `current_hash`
rather than `VerifyStatus`, so `--force` cannot drive a diff against
garbage content.
- `apply`'s offline guard now reports per-patch source availability
instead of a global blobs/diffs/packages bucket count.
- `ApiClient::fetch_diff(uuid)` and `fetch_package(uuid)` mirror
`fetch_blob(hash)`; a private `fetch_binary` helper deduplicates the
proxy/auth client split and 200/404/error handling.
- `DownloadMode` enum + `fetch_missing_sources` in `api/blob_fetcher.rs`
dispatch downloads by kind. `cleanup_unused_archives` in
`utils/cleanup_blobs.rs` reaps orphaned `.socket/packages/` and
`.socket/diffs/` files via `repair`.
Tests: 307 unit + 2 e2e gem (was 263 + 2 before this change). New
coverage spans diff round-trips, package extraction safety (traversal,
oversize-header, too-many-entries, decompression-bomb truncation),
fallback chain ordering (`via package`/`diff`/`blob`), force-mode +
diff regression, dry-run safety, UUID validation, and archive
download/cleanup helpers. All existing tests pass unchanged.
Server-side `/patch/diff/<uuid>` and `/patch/package/<uuid>` endpoints
are not live yet — 404 responses fall through gracefully to the file
blob path, so this PR ships safely ahead of server support.
Assisted-by: Claude Code:claude-opus-4-7
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Contributor
Author
|
Cursor (@cursor) review |
|
Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings. |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. Ignoring alerts on:
|
Contributor
Author
|
@socketsecurity-staging ignore-all |
- e2e_npm.rs: NPM_PURL is actually used by 5 assertions; drop the stale `#[allow(dead_code)]`. - maven_crawler.rs: remove `read_pom_in_dir`, an async helper that was never called and only existed under `#[allow(dead_code)]`. No behavior change. 307 tests still pass; cargo build clean. Assisted-by: Claude Code:opus-4-7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds two new optional patch pathways to the socket-patch CLI alongside the existing per-file blob path:
.socket/packages/<uuid>.tar.gz— a tarball of patched files for a single patch, extracted in one shot..socket/diffs/<uuid>.tar.gz— bsdiff deltas that transformbefore_hashcontent intoafter_hashcontent.The apply pipeline tries sources in order package → diff → blob, falling through on any failure. Every strategy still post-write-verifies file git-sha256 against
after_hash, so the existing safety invariant is unchanged.A new
--download-mode {diff,package,file}flag (default:diff) controls whatapply,get,scan, andrepairfetch when local artifacts are missing.Why?
Backwards compatibility
PatchRecord.uuid, so legacy.socket/manifest.jsonfiles Just Work.PatchSources; tests proveuuid = Nonerestores the pre-2.2 behavior exactly./patch/diff/<uuid>and/patch/package/<uuid>endpoints are not live yet — 404s fall through gracefully to the file-blob path, so this is safe to ship ahead of server support.What's in the diff
patch/diff.rs—qbsdiff::Bspatchwrapper plus round-trip + malformed-delta tests.patch/package.rs— tar + flate2 reader with path-traversal guards, whitelist filtering againstexpected_files, and hard caps on cumulative decompressed bytes (MAX_TOTAL_DECOMPRESSED_BYTES = 64 MiB), per-entry size (MAX_ENTRY_BYTES = 16 MiB), and entry count (MAX_ENTRIES = 10_000) to defuse gzip-bomb /Vec::with_capacityDoS.patch/apply.rs:PatchSourcesstruct,AppliedViaenum,try_apply_from_archive/try_apply_from_diffhelpers;apply_package_patchtakes aPatchSourcesand optionaluuid.try_apply_from_diffgates on the captured pre-applycurrent_hashso--forcecannot run a diff against unexpected content.api/client.rs:fetch_diff(uuid)andfetch_package(uuid)mirrorfetch_blob(hash)via a new privatefetch_binaryhelper that dedupes the proxy/auth client split and HTTP handling.is_valid_uuidvalidator added.api/blob_fetcher.rs:DownloadModeenum +fetch_missing_sourcesdispatch by kind.get_missing_archivesparallelsget_missing_blobs.utils/cleanup_blobs.rs:cleanup_unused_archivesreaps orphan.tar.gzs in.socket/packages/and.socket/diffs/, invoked fromrepair.apply,get,scan,repairall accept--download-mode;apply's offline guard now reports per-patch source availability (one bucket per PURL) rather than a global blobs/diffs/packages count.qbsdiff = "1",tar = "0.4",flate2 = "1".Tests
via package/diff/blob), force-mode + diff regression (test_apply_via_diff_falls_through_when_before_hash_mismatch), dry-run safety, UUID validation, archive download/cleanup helpers.cargo clippy --all-targets --workspaceis clean for code I touched (4 pre-existingmap_orwarnings in e2e test files remain).Review summary
Pre-merge security + quality review run by automated reviewers. Findings:
CRITICAL fixed: tarball decompression bomb (CWE-409) — added cumulative-bytes cap via
Read::take, per-entry size check, entry-count cap, andVec::with_capacityclamp.HIGH fixed: force-mode + diff-strategy loophole (now gated on real
current_hash == before_hash), offline guard rewritten per-patch, empty-archive-fetch round eliminated when blobs are already local,cleanup_unused_archivesdoc reconciled with behavior.Three new bomb-defense tests verify each cap fires before allocation.
Test plan
cargo test --workspace(307 unit + 2 e2e gem, 0 failures)cargo clippy --all-targets --workspace(no new warnings)cargo build --releaseapply --offline --verbosereports(via blob)and content matchesapply --offlinereports(via package)Assisted-by: Claude Code:claude-opus-4-7