Skip to content

feat(patch): add package- and diff-level patch sources#67

Open
Mikola Lysenko (mikolalysenko) wants to merge 2 commits into
mainfrom
feat/diff-package-pathways
Open

feat(patch): add package- and diff-level patch sources#67
Mikola Lysenko (mikolalysenko) wants to merge 2 commits into
mainfrom
feat/diff-package-pathways

Conversation

@mikolalysenko
Copy link
Copy Markdown
Contributor

Adds two new optional patch pathways to the socket-patch CLI alongside the existing per-file blob path:

  • Per-package archives at .socket/packages/<uuid>.tar.gz — a tarball of patched files for a single patch, extracted in one shot.
  • Per-file bsdiff archives at .socket/diffs/<uuid>.tar.gz — bsdiff deltas that transform before_hash content into after_hash content.

The apply pipeline tries sources in order package → diff → blob, falling through on any failure. Every strategy still post-write-verifies file git-sha256 against after_hash, so the existing safety invariant is unchanged.

A new --download-mode {diff,package,file} flag (default: diff) controls what apply, get, scan, and repair fetch when local artifacts are missing.

Why?

  • Smaller downloads — a one-line fix to a large minified bundle no longer ships the whole file (diff mode), and multi-file patches can ship as one archive (package mode).
  • One round-trip per patch instead of one per file, for users with many patches applied.

Backwards compatibility

  • Manifest schema is unchanged. Archives are keyed by the existing PatchRecord.uuid, so legacy .socket/manifest.json files Just Work.
  • The new pathways are opt-in via PatchSources; tests prove uuid = None restores the pre-2.2 behavior exactly.
  • Server-side /patch/diff/<uuid> and /patch/package/<uuid> endpoints are not live yet — 404s fall through gracefully to the file-blob path, so this is safe to ship ahead of server support.

What's in the diff

  • New core modules:
    • patch/diff.rsqbsdiff::Bspatch wrapper plus round-trip + malformed-delta tests.
    • patch/package.rs — tar + flate2 reader with path-traversal guards, whitelist filtering against expected_files, and hard caps on cumulative decompressed bytes (MAX_TOTAL_DECOMPRESSED_BYTES = 64 MiB), per-entry size (MAX_ENTRY_BYTES = 16 MiB), and entry count (MAX_ENTRIES = 10_000) to defuse gzip-bomb / Vec::with_capacity DoS.
  • patch/apply.rs: PatchSources struct, AppliedVia enum, try_apply_from_archive / try_apply_from_diff helpers; apply_package_patch takes a PatchSources and optional uuid. try_apply_from_diff gates on the captured pre-apply current_hash so --force cannot run a diff against unexpected content.
  • api/client.rs: fetch_diff(uuid) and fetch_package(uuid) mirror fetch_blob(hash) via a new private fetch_binary helper that dedupes the proxy/auth client split and HTTP handling. is_valid_uuid validator added.
  • api/blob_fetcher.rs: DownloadMode enum + fetch_missing_sources dispatch by kind. get_missing_archives parallels get_missing_blobs.
  • utils/cleanup_blobs.rs: cleanup_unused_archives reaps orphan .tar.gzs in .socket/packages/ and .socket/diffs/, invoked from repair.
  • CLI: apply, get, scan, repair all accept --download-mode; apply's offline guard now reports per-patch source availability (one bucket per PURL) rather than a global blobs/diffs/packages count.
  • Cargo deps added: qbsdiff = "1", tar = "0.4", flate2 = "1".

Tests

  • 307 unit tests + 2 e2e gem pass (was 263 + 2). 42 new tests added across modules.
  • New coverage: diff round-trips (text + binary), package extraction safety (traversal, oversize-header, too-many-entries, decompression-bomb truncation), fallback ordering (via package/diff/blob), force-mode + diff regression (test_apply_via_diff_falls_through_when_before_hash_mismatch), dry-run safety, UUID validation, archive download/cleanup helpers.
  • All previously existing tests pass unchanged.
  • cargo clippy --all-targets --workspace is clean for code I touched (4 pre-existing map_or warnings in e2e test files remain).

Review summary

Pre-merge security + quality review run by automated reviewers. Findings:

CRITICAL fixed: tarball decompression bomb (CWE-409) — added cumulative-bytes cap via Read::take, per-entry size check, entry-count cap, and Vec::with_capacity clamp.
HIGH fixed: force-mode + diff-strategy loophole (now gated on real current_hash == before_hash), offline guard rewritten per-patch, empty-archive-fetch round eliminated when blobs are already local, cleanup_unused_archives doc reconciled with behavior.

Three new bomb-defense tests verify each cap fires before allocation.

Test plan

  • cargo test --workspace (307 unit + 2 e2e gem, 0 failures)
  • cargo clippy --all-targets --workspace (no new warnings)
  • cargo build --release
  • Smoke test: blob-only manifest → apply --offline --verbose reports (via blob) and content matches
  • Smoke test: manifest + package archive → apply --offline reports (via package)
  • Smoke test: offline + no sources → fails with per-patch report listing affected PURL

Assisted-by: Claude Code:claude-opus-4-7

Adds two new optional pathways to the socket-patch CLI alongside the
existing per-file blob path:

- Per-package archives at `.socket/packages/<uuid>.tar.gz` — a tarball
  of patched files for a single patch, extracted in one shot.
- Per-file bsdiff archives at `.socket/diffs/<uuid>.tar.gz` — bsdiff
  deltas that transform `before_hash` content into `after_hash` content.

The apply pipeline now tries sources in the order package → diff →
blob, falling through to the next on any failure. Every strategy
post-write-verifies the file's git-sha256 against `after_hash`, so the
existing safety invariant is unchanged.

A new `--download-mode {diff,package,file}` flag (default: `diff`)
controls what `apply`, `get`, `scan`, and `repair` fetch when local
artifacts are missing. The manifest schema is intentionally unchanged:
archives are keyed by patch UUID (already present in `PatchRecord`),
so legacy manifests keep working with no migration.

Highlights:

- New core modules `patch/diff.rs` (qbsdiff bspatch wrapper) and
  `patch/package.rs` (tar+flate2 reader with path-traversal guards,
  whitelist filtering against `expected_files`, and hard caps on
  decompressed bytes / per-entry size / entry count to defuse
  gzip-bomb and `Vec::with_capacity` allocation attacks).
- New `PatchSources` struct and `AppliedVia` enum in `patch/apply.rs`;
  `apply_package_patch` takes a `PatchSources` and an optional UUID.
  Passing `uuid = None` restores pre-2.2 blob-only behavior.
- `try_apply_from_diff` gates on the captured pre-apply `current_hash`
  rather than `VerifyStatus`, so `--force` cannot drive a diff against
  garbage content.
- `apply`'s offline guard now reports per-patch source availability
  instead of a global blobs/diffs/packages bucket count.
- `ApiClient::fetch_diff(uuid)` and `fetch_package(uuid)` mirror
  `fetch_blob(hash)`; a private `fetch_binary` helper deduplicates the
  proxy/auth client split and 200/404/error handling.
- `DownloadMode` enum + `fetch_missing_sources` in `api/blob_fetcher.rs`
  dispatch downloads by kind. `cleanup_unused_archives` in
  `utils/cleanup_blobs.rs` reaps orphaned `.socket/packages/` and
  `.socket/diffs/` files via `repair`.

Tests: 307 unit + 2 e2e gem (was 263 + 2 before this change). New
coverage spans diff round-trips, package extraction safety (traversal,
oversize-header, too-many-entries, decompression-bomb truncation),
fallback chain ordering (`via package`/`diff`/`blob`), force-mode +
diff regression, dry-run safety, UUID validation, and archive
download/cleanup helpers. All existing tests pass unchanged.

Server-side `/patch/diff/<uuid>` and `/patch/package/<uuid>` endpoints
are not live yet — 404 responses fall through gracefully to the file
blob path, so this PR ships safely ahead of server support.

Assisted-by: Claude Code:claude-opus-4-7
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 12, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​flate2@​1.1.910010093100100
Addedcargo/​qbsdiff@​1.4.49710093100100
Addedcargo/​tar@​0.4.4510010093100100

View full report

@mikolalysenko
Copy link
Copy Markdown
Contributor Author

Cursor (@cursor) review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 12, 2026

Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings.

@socket-security-staging
Copy link
Copy Markdown

socket-security-staging Bot commented May 12, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​flate2@​1.1.910010093100100
Addedcargo/​qbsdiff@​1.4.49710093100100
Addedcargo/​tar@​0.4.4510010093100100

View full report

@socket-security-staging
Copy link
Copy Markdown

socket-security-staging Bot commented May 12, 2026

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

Ignoring alerts on:

View full report

@mikolalysenko
Copy link
Copy Markdown
Contributor Author

@socketsecurity-staging ignore-all

- e2e_npm.rs: NPM_PURL is actually used by 5 assertions; drop the
  stale `#[allow(dead_code)]`.
- maven_crawler.rs: remove `read_pom_in_dir`, an async helper that
  was never called and only existed under `#[allow(dead_code)]`.

No behavior change. 307 tests still pass; cargo build clean.

Assisted-by: Claude Code:opus-4-7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant