vLLM patches OOM DoS via unbounded n parameter

1 min readPublished 03 Apr 2026Source: GitHub Advisory Database

CVE-2026-34756 News AI Security Python Availability

TL;DR — A missing upper bound on vLLM’s OpenAI API n parameter can let attackers crash inference hosts (event-loop starvation + OOM) with a single crafted request.

What happened

vLLM is an open-source LLM inference/serving engine commonly deployed behind an OpenAI-compatible HTTP API (vllm.entrypoints.openai.api_server).

A GitHub-reviewed advisory (CVE-2026-34756) reports a resource exhaustion / denial-of-service issue where request parsing and scheduling accept an unbounded integer n in ChatCompletionRequest / CompletionRequest. By sending an astronomically large n, an attacker can force the server to fan out the request n times, monopolizing the single-threaded asyncio event loop and rapidly allocating copies until the OS OOM-kills the process.

Notably, the advisory’s narrative describes this as unauthenticated against public-facing API servers, while the included CVSS v3.1 vector indicates PR:L — treat real-world exploitability as deployment-dependent (auth/gateway controls matter). Either way, single-request availability failures are operationally high-impact for AI serving fleets and are increasingly common in OpenAI-compatible gateways that trust model parameters too early in the request lifecycle.

Who is impacted

Anyone exposing vLLM’s OpenAI-compatible API server to untrusted clients (directly or via a reverse proxy).
Deployments without strict request validation on OpenAI parameters (specifically n).

Component	Affected versions (per advisory)	Patched (per advisory)
`vllm` (PyPI)	`>= 0.1.0, < 0.19.0`	`0.19.0`

What to do now

Follow vendor remediation guidance and apply the latest patched release available at the time of writing (the advisory marks the issue fixed in 0.19.0).
Inventory where you run vllm.entrypoints.openai.api_server (including internal "shadow" deployments spun up for evaluation) and verify the deployed vllm version.
Add defense-in-depth request controls at the edge:
- Enforce a reasonable upper bound for n (and consider bounding other fan-out parameters) at the API gateway/reverse proxy.
- Rate-limit and reject anomalously large/expensive requests early (before they reach the Python event loop).
If you suspect abuse, review ingress logs for unusually large n values and correlate with OOM-kills/restarts and liveness probe failures.

Content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly, send corrections to [email protected]. Always consult official documentation and validate key implementation decisions before making design or security choices.