vLLM patches OOM DoS via unbounded n parameter
TL;DR — A missing upper bound on vLLM’s OpenAI API n parameter can let attackers crash inference hosts (event-loop starvation + OOM) with a single crafted request.
What happened
vLLM is an open-source LLM inference/serving engine commonly deployed behind an OpenAI-compatible HTTP API (vllm.entrypoints.openai.api_server).
A GitHub-reviewed advisory (CVE-2026-34756) reports a resource exhaustion / denial-of-service issue where request parsing and scheduling accept an unbounded integer n in ChatCompletionRequest / CompletionRequest. By sending an astronomically large n, an attacker can force the server to fan out the request n times, monopolizing the single-threaded asyncio event loop and rapidly allocating copies until the OS OOM-kills the process.
Notably, the advisory’s narrative describes this as unauthenticated against public-facing API servers, while the included CVSS v3.1 vector indicates PR:L — treat real-world exploitability as deployment-dependent (auth/gateway controls matter). Either way, single-request availability failures are operationally high-impact for AI serving fleets and are increasingly common in OpenAI-compatible gateways that trust model parameters too early in the request lifecycle.
Who is impacted
- Anyone exposing vLLM’s OpenAI-compatible API server to untrusted clients (directly or via a reverse proxy).
- Deployments without strict request validation on OpenAI parameters (specifically
n).
| Component | Affected versions (per advisory) | Patched (per advisory) |
|---|---|---|
vllm (PyPI) | >= 0.1.0, < 0.19.0 | 0.19.0 |
What to do now
- Follow vendor remediation guidance and apply the latest patched release available at the time of writing (the advisory marks the issue fixed in
0.19.0). - Inventory where you run
vllm.entrypoints.openai.api_server(including internal "shadow" deployments spun up for evaluation) and verify the deployedvllmversion. - Add defense-in-depth request controls at the edge:
- Enforce a reasonable upper bound for
n(and consider bounding other fan-out parameters) at the API gateway/reverse proxy. - Rate-limit and reject anomalously large/expensive requests early (before they reach the Python event loop).
- Enforce a reasonable upper bound for
- If you suspect abuse, review ingress logs for unusually large
nvalues and correlate with OOM-kills/restarts and liveness probe failures.
Content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly, send corrections to [email protected]. Always consult official documentation and validate key implementation decisions before making design or security choices.
