feat(vm): replace time-based token refill with completion-based return#20618
Draft
guzalv wants to merge 2 commits into
Draft
feat(vm): replace time-based token refill with completion-based return#20618guzalv wants to merge 2 commits into
guzalv wants to merge 2 commits into
Conversation
The VM index report rate limiter previously refilled tokens at a fixed rate over time, which could leave capacity unused when processing was fast or overcommit when processing was slow. Tokens are now returned explicitly when pipeline processing completes (or when a message is dropped by deduplication), so available capacity always tracks actual processing throughput. This eliminates both wasted and overused capacity. Key changes: - Replace golang.org/x/time/rate per-client buckets with counter-based clientBucket tracking available/capacity - Add Return(clientID, msg) to the Limiter and rateLimiter interface - Wire Return into sensorEventHandler.handleMessages (after pipeline completes) and addMultiplexed (after dedup drop) - Replace PerClientRate metric with InFlightTokens gauge - Update env var documentation to reflect concurrency semantics Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add E2E test report documenting the cluster comparison between time-based and completion-based rate limiters, reproduction guide, and benchmark tests that demonstrate the throughput difference at the limiter level. These files are intended for PR review only and should be removed before merging. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Contributor
🚀 Build Images ReadyImages are ready for commit 4ef3694. To use with deploy scripts: export MAIN_IMAGE_TAG=4.11.x-978-g4ef369487d |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Replace the VM index report rate limiter's time-based token refill mechanism (
golang.org/x/time/rate.Limiter) with a completion-based token return. Instead of refilling tokens at a fixed rate per second, tokens are returned to the bucket when processing completes. This turns the rate limiter into a concurrency limiter: the bucket capacity controls how many reports can be in-flight simultaneously, and throughput naturally tracks Scanner V4's actual processing speed.TryConsume()decrements available tokens;Return()increments them back when work completesOnClientDisconnectreclaims all in-flight tokens for the disconnected client*Limiter)golang.org/x/time/ratedependencyFiles changed
pkg/rate/limiter.gogolang.org/x/time/rate.LimiterwithclientBucket; addReturn()pkg/rate/metrics.goPerClientRategauge withInFlightTokensgaugepkg/rate/limiter_test.gopkg/env/virtualmachine.gocentral/sensor/service/connection/connection_impl.goReturn()torateLimiterinterfacecentral/sensor/service/connection/sensorevents.goReturn()at processing completion and dedup dropcentral/sensor/service/connection/connection_test.gonoopRLhelperUser-facing documentation
No user-facing behavior change. The
ROX_VM_INDEX_REPORT_BUCKET_CAPACITYenv var semantics shift from "burst size" to "max concurrent in-flight", but the configuration surface is unchanged.Testing and quality
VM feature is gated behind
ROX_VIRTUAL_MACHINESfeature flag. Rate limiter only activates whenROX_VM_INDEX_REPORT_RATE_LIMIT > 0.Automated testing
19 unit tests in
pkg/rate/limiter_test.gocoveringReturn(), concurrency, rebalancing, disconnect, and nil safety. Comparison benchmarks inpkg/rate/benchmark_comparison_test.go.How I validated my change
Unit tests:
Comparison benchmarks (limiter-level, simulated processing):
E2E cluster test on ga-acp (OCP 4.21, 3 masters + 4 workers), ~21 minutes total wall clock, 6-minute monitoring window per variant. Load: 100 fake VMs with 500 real RHEL 9 packages each, reports every 10 seconds, Scanner V4 performing real vulnerability matching (404 CVEs per VM). Rate limiter: capacity=30, refill=0.3/sec.
The time-based limiter's sustained throughput (0.3 rps) matches the token refill rate exactly -- Scanner V4's actual capacity (~3 rps) is completely wasted. The completion-based limiter processes at Scanner V4's real speed while keeping Central memory stable and bounded (500-544 Mi over 6 minutes, no upward trend).
Full E2E test report and step-by-step reproduction guide in commit
4ef3694(remove before merging):pkg/rate/testdata/E2E_TEST_REPORT.md-- Full methodology, durations, raw datapkg/rate/testdata/E2E_REPRODUCTION_GUIDE.md-- Step-by-step reproduction instructions