JustAppSec
Back to news

SGLang rerank endpoint enables RCE via malicious GGUF templates

2 min readPublished 20 Apr 2026Updated 20 Apr 2026Source: CERT/CC Vulnerability Note

TL;DR — Loading an attacker-crafted GGUF model into SGLang can lead to remote code execution when /v1/rerank renders an unsandboxed Jinja2 tokenizer.chat_template.

What happened

SGLang is an open-source framework for serving large language models (LLMs) and exposing OpenAI-compatible APIs. On April 20, 2026, CERT/CC published Vulnerability Note VU#915947 for CVE-2026-5760, describing a remote code execution issue in SGLang’s reranking endpoint (/v1/rerank).

Per CERT/CC, the exploit chain is: an attacker prepares a malicious GGUF model file whose tokenizer.chat_template metadata contains a Jinja2 server-side template injection (SSTI) payload; a victim downloads and loads that model into SGLang; and when the /v1/rerank endpoint is hit, the template is rendered and attacker-controlled Python code executes in the SGLang service context.

CERT/CC attributes the root cause to using jinja2.Environment() without sandboxing in getjinjaenv(), allowing template rendering to escape into arbitrary code execution. CERT/CC notes it did not receive a response from project maintainers during coordination.

This is a high-signal reminder that “model files” are part of the application supply chain: if your platform loads third-party artifacts that contain executable templating or code-like metadata, you should treat them as code with the same isolation and provenance controls.

Who is impacted

  • Teams running SGLang deployments that load GGUF model files and expose or use the /v1/rerank endpoint.
  • Highest risk environments where model acquisition is not tightly controlled (e.g., operators downloading models from untrusted sources) and where the service interface is reachable from untrusted networks.
ComponentRisk factorWhy it matters
/v1/rerank endpointRequests can trigger template renderingA request hitting /v1/rerank is the execution trigger once a malicious model is loaded
GGUF model ingestionUntrusted model files allowedThe payload is carried in tokenizer.chat_template metadata
Jinja2 renderingUnsandboxed environmentTemplate rendering can execute arbitrary Python code per CERT/CC

What to do now

  • Implement CERT/CC’s mitigation guidance:

    To mitigate this vulnerability, it is recommended to use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates.

  • Treat model files as untrusted inputs:
    • restrict who can introduce new models into environments that run SGLang
    • prefer allowlisted registries and verified artifacts for GGUF model distribution
  • Reduce blast radius while you assess exposure:
    • avoid exposing /v1/rerank to untrusted networks unless it’s explicitly required
    • run the service with strong isolation (least privilege, container/VM boundaries) to limit host compromise impact
  • Detection/scoping:
    • inventory where SGLang is deployed and whether reranking is enabled/used
    • review model provenance for any GGUF models recently added to production (especially from public sources)

Additional Information


Content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly, send corrections to [email protected]. Always consult official documentation and validate key implementation decisions before making design or security choices.

Need help?Get in touch.