SGLang rerank endpoint enables RCE via malicious GGUF templates
TL;DR — Loading an attacker-crafted GGUF model into SGLang can lead to remote code execution when /v1/rerank renders an unsandboxed Jinja2 tokenizer.chat_template.
What happened
SGLang is an open-source framework for serving large language models (LLMs) and exposing OpenAI-compatible APIs. On April 20, 2026, CERT/CC published Vulnerability Note VU#915947 for CVE-2026-5760, describing a remote code execution issue in SGLang’s reranking endpoint (/v1/rerank).
Per CERT/CC, the exploit chain is: an attacker prepares a malicious GGUF model file whose tokenizer.chat_template metadata contains a Jinja2 server-side template injection (SSTI) payload; a victim downloads and loads that model into SGLang; and when the /v1/rerank endpoint is hit, the template is rendered and attacker-controlled Python code executes in the SGLang service context.
CERT/CC attributes the root cause to using jinja2.Environment() without sandboxing in getjinjaenv(), allowing template rendering to escape into arbitrary code execution. CERT/CC notes it did not receive a response from project maintainers during coordination.
This is a high-signal reminder that “model files” are part of the application supply chain: if your platform loads third-party artifacts that contain executable templating or code-like metadata, you should treat them as code with the same isolation and provenance controls.
Who is impacted
- Teams running
SGLangdeployments that load GGUF model files and expose or use the/v1/rerankendpoint. - Highest risk environments where model acquisition is not tightly controlled (e.g., operators downloading models from untrusted sources) and where the service interface is reachable from untrusted networks.
| Component | Risk factor | Why it matters |
|---|---|---|
/v1/rerank endpoint | Requests can trigger template rendering | A request hitting /v1/rerank is the execution trigger once a malicious model is loaded |
| GGUF model ingestion | Untrusted model files allowed | The payload is carried in tokenizer.chat_template metadata |
| Jinja2 rendering | Unsandboxed environment | Template rendering can execute arbitrary Python code per CERT/CC |
What to do now
- Implement CERT/CC’s mitigation guidance:
To mitigate this vulnerability, it is recommended to use
ImmutableSandboxedEnvironmentinstead ofjinja2.Environment()to render the chat templates. - Treat model files as untrusted inputs:
- restrict who can introduce new models into environments that run
SGLang - prefer allowlisted registries and verified artifacts for GGUF model distribution
- restrict who can introduce new models into environments that run
- Reduce blast radius while you assess exposure:
- avoid exposing
/v1/rerankto untrusted networks unless it’s explicitly required - run the service with strong isolation (least privilege, container/VM boundaries) to limit host compromise impact
- avoid exposing
- Detection/scoping:
- inventory where
SGLangis deployed and whether reranking is enabled/used - review model provenance for any GGUF models recently added to production (especially from public sources)
- inventory where
Additional Information
- CERT/CC Vulnerability Note: VU#915947
- Proof-of-concept repository referenced by CERT/CC: Stuub/SGLang-0.5.9-RCE
- Background on GGUF/template risks referenced by CERT/CC: JFrog research on GGUF SSTI
Content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly, send corrections to [email protected]. Always consult official documentation and validate key implementation decisions before making design or security choices.
