SGLang rerank endpoint enables RCE via malicious GGUF templates

2 min readPublished 20 Apr 2026Updated 20 Apr 2026Source: CERT/CC Vulnerability Note

CVE-2026-5760 News AI Security RCE

TL;DR — Loading an attacker-crafted GGUF model into SGLang can lead to remote code execution when /v1/rerank renders an unsandboxed Jinja2 tokenizer.chat_template.

What happened

SGLang is an open-source framework for serving large language models (LLMs) and exposing OpenAI-compatible APIs. On April 20, 2026, CERT/CC published Vulnerability Note VU#915947 for CVE-2026-5760, describing a remote code execution issue in SGLang’s reranking endpoint (/v1/rerank).

Per CERT/CC, the exploit chain is: an attacker prepares a malicious GGUF model file whose tokenizer.chat_template metadata contains a Jinja2 server-side template injection (SSTI) payload; a victim downloads and loads that model into SGLang; and when the /v1/rerank endpoint is hit, the template is rendered and attacker-controlled Python code executes in the SGLang service context.

CERT/CC attributes the root cause to using jinja2.Environment() without sandboxing in getjinjaenv(), allowing template rendering to escape into arbitrary code execution. CERT/CC notes it did not receive a response from project maintainers during coordination.

This is a high-signal reminder that “model files” are part of the application supply chain: if your platform loads third-party artifacts that contain executable templating or code-like metadata, you should treat them as code with the same isolation and provenance controls.

Who is impacted

Teams running SGLang deployments that load GGUF model files and expose or use the /v1/rerank endpoint.
Highest risk environments where model acquisition is not tightly controlled (e.g., operators downloading models from untrusted sources) and where the service interface is reachable from untrusted networks.

Component	Risk factor	Why it matters
`/v1/rerank` endpoint	Requests can trigger template rendering	A request hitting `/v1/rerank` is the execution trigger once a malicious model is loaded
GGUF model ingestion	Untrusted model files allowed	The payload is carried in `tokenizer.chat_template` metadata
Jinja2 rendering	Unsandboxed environment	Template rendering can execute arbitrary Python code per CERT/CC

What to do now

Implement CERT/CC’s mitigation guidance:

To mitigate this vulnerability, it is recommended to use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates.
Treat model files as untrusted inputs:
- restrict who can introduce new models into environments that run SGLang
- prefer allowlisted registries and verified artifacts for GGUF model distribution
Reduce blast radius while you assess exposure:
- avoid exposing /v1/rerank to untrusted networks unless it’s explicitly required
- run the service with strong isolation (least privilege, container/VM boundaries) to limit host compromise impact
Detection/scoping:
- inventory where SGLang is deployed and whether reranking is enabled/used
- review model provenance for any GGUF models recently added to production (especially from public sources)

Additional Information

CERT/CC Vulnerability Note: VU#915947
Proof-of-concept repository referenced by CERT/CC: Stuub/SGLang-0.5.9-RCE
Background on GGUF/template risks referenced by CERT/CC: JFrog research on GGUF SSTI

Content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly, send corrections to [email protected]. Always consult official documentation and validate key implementation decisions before making design or security choices.