LMDeploy SSRF Vulnerability: CVE-2026-33626 Under Active Exploitation

Article Content
The landscape of artificial intelligence security shifted significantly on April 24, 2026, as a high-severity zero-day vulnerability in the LMDeploy framework moved from public disclosure to active, widespread exploitation in less than 13 hours. This incident, now officially tracked as CVE-2026-33626, represents a watershed moment for the security of Large Language Model (LLM) serving infrastructure. The LMDeploy SSRF vulnerability, carrying a CVSS score of 7.5, highlights a critical oversight in how multimodal AI systems handle external inputs, specifically within the framework’s vision-language module. As organizations race to deploy AI agents capable of “seeing” and processing images, the underlying code responsible for fetching these assets has become the primary battleground for modern cybercriminals.
The Anatomy of CVE-2026-33626: How the LMDeploy SSRF Vulnerability Works
At its core, the LMDeploy SSRF vulnerability is a classic Server-Side Request Forgery (SSRF) flaw (CWE-918) residing in the load_image() function within lmdeploy/vl/utils.py. LMDeploy, an open-source toolkit developed by the Shanghai AI Laboratory for compressing and serving LLMs, includes a vision-language module that allows models like InternVL2 or Qwen2-VL to process image data alongside text prompts. When a user submits a multimodal request via an OpenAI-compatible API, the server must retrieve the image from a provided image_url.
The technical failure in versions prior to 0.12.3 was the absence of a robust validation layer for these URLs. The load_image() function would indiscriminately fetch any URL provided by the user, failing to verify if the destination belonged to a private IP range, a loopback address, or a cloud-specific metadata service. By crafting a prompt containing a malicious URL, an attacker can coerce the LMDeploy server into making outbound HTTP requests to resources it was never intended to access. These resources often include:
- Cloud Metadata Services: Specifically the AWS Instance Metadata Service (IMDS) at
169.254.169.254. - Internal Service Interfaces: Local databases like Redis or MySQL running on the same host or in the same VPC.
- Internal Network Scanning: Probing for other internal HTTP interfaces or administrative dashboards.
Because the LMDeploy server acts as the requester, it effectively bypasses traditional firewall rules that prevent external entities from reaching these internal-only endpoints. This makes the LMDeploy SSRF vulnerability a “trusted-to-untrusted” bridge that collapses the perimeter of the AI infrastructure.
The 13-Hour Race: From Disclosure to Active Weaponization
The speed at which CVE-2026-33626 was weaponized underscores the new reality of automated threat intelligence. Security researchers at Sysdig first observed exploitation attempts against their honeypots just 12 hours and 31 minutes after the advisory was published on GitHub. This rapid pivot is particularly alarming because no public proof-of-concept (PoC) code existed at the time; the technical details within the advisory alone were sufficient for attackers to build a functional exploit chain.
Telemetry data indicates that the primary exploitation wave originated from IP addresses located in Kowloon Bay, Hong Kong. The attackers did not perform a simple “hit and run” validation; instead, they engaged in a sophisticated, multi-phase reconnaissance operation lasting approximately eight minutes per target. During this window, the following steps were observed:
- Phase 1: Cloud Credential Probing: Initial requests targeted
169.254.169.254/latest/meta-data/iam/security-credentials/to attempt the exfiltration of IAM roles. - Phase 2: Out-of-Band (OOB) Confirmation: Attackers used DNS callbacks to services like
requestrepo.comto verify that the server had unrestricted egress and that the SSRF was functional. - Phase 3: Internal Enumeration: The vision-language image loader was used as a generic HTTP primitive to scan for internal ports, specifically 6379 (Redis), 3306 (MySQL), and 8080 (administrative UI).
The use of automated scanners and AI-assisted tools allowed the adversary to iterate through multiple vision-language models—switching between internlm-xcomposer2 and InternVL2-8B—to find which model configuration was most susceptible to the crafted input. This level of agility demonstrates that modern attackers are intimately familiar with the disaggregated architecture of AI serving stacks.
Critical Infrastructure at Risk: Why AI Servers are High-Value Targets
Exploiting the LMDeploy SSRF vulnerability is not just about crashing a service; it is a gateway to the entire cloud environment. AI inference servers are unique in their infrastructure requirements. They typically run on high-performance GPU instances (such as AWS P4/P5 or Azure ND-series) that are often granted broad IAM permissions. These permissions are necessary for the server to fetch model weights from S3 buckets, log telemetry to centralized collectors, and access massive training datasets.
If an attacker successfully retrieves a temporary security token through the IMDS via SSRF, they can inherit these broad permissions. This allows for several high-impact outcomes:
- Theft of Proprietary Model Data: Attackers can gain access to S3 buckets containing the proprietary weights of the models being served.
- Poisoning of Training Datasets: With write access to data lakes, an adversary could subtly alter training data, leading to model degradation or the insertion of backdoors.
- Lateral Movement: The inference server often resides in a VPC with access to internal databases. The SSRF allows the attacker to map these databases and plan further attacks without ever being detected by external-facing security controls.
Furthermore, because LMDeploy exposes an OpenAPI schema and various administrative endpoints under /distserve/*, a successful SSRF can be used to interact with the internal control plane of the distributed serving engine, potentially allowing the attacker to disrupt the prefill/decode routes for other peers in the cluster.
Technical Deep Dive: Hardening the Vision-Language Module
The emergency patch provided in LMDeploy v0.12.3 introduces a critical security function named _is_safe_url(). This function acts as a gatekeeper for the load_image() process. To understand the depth of the fix, one must look at the validation logic now required for any AI framework processing multimodal URLs. The LMDeploy SSRF vulnerability remediation involves three layers of defense:
1. Hostname and IP Resolution: The framework now resolves the provided hostname before making the request. This prevents “DNS Rebinding” attacks where a hostname initially resolves to a safe IP but later points to an internal IP during the fetch phase.
2. Deny-listing Reserved Ranges: The system now explicitly blocks requests to:
- Loopback addresses (
127.0.0.0/8,::1). - RFC 1918 private ranges (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16). - Link-local addresses (
169.254.0.0/16), effectively cutting off the IMDS vector.
3. Protocol Restrictions: The load_image() function is now restricted to standard http and https schemes, preventing the use of file://, gopher://, or ftp:// wrappers that are often used in advanced SSRF exploitation to read local system files like /etc/passwd.
Strategic Mitigation: Beyond the Patch
While updating to LMDeploy v0.12.3 is the immediate priority, organizations must adopt a “defense-in-depth” posture to protect their AI assets. The LMDeploy SSRF vulnerability is a symptom of a larger trend where AI-specific software moves faster than traditional security vetting processes. To mitigate future risks, security teams should implement the following:
- Enforce IMDSv2: On AWS, transition all GPU instances to IMDSv2, which requires a session-oriented header. This effectively neutralizes most simple SSRF attacks that cannot add custom headers to the request.
- Egress Filtering: Implement strict outbound network rules. Inference nodes should only be allowed to talk to known, allow-listed endpoints (e.g., Hugging Face, specific S3 buckets, and logging services). Block all other traffic by default.
- Network Segmentation: Run LMDeploy and other inference engines in isolated subnets with no direct route to sensitive internal databases or administrative interfaces.
- Runtime Protection: Utilize security tools that can detect “Contact EC2 Instance Metadata Service from Container” events. Any outbound connection from an inference process to the metadata IP should trigger an immediate alert and automated isolation.
The Future of AI Security and “Secure by Design”
The exploitation of CVE-2026-33626 serves as a stark reminder that as AI becomes more multimodal, its attack surface expands exponentially. The LMDeploy SSRF vulnerability was not a failure of the AI model itself, but a failure of the “plumbing” that supports it. This incident highlights a dangerous pattern: AI infrastructure tools, despite their popularity, often evade standard enterprise scanning workflows and security reviews.
The rapid 13-hour window from disclosure to exploitation suggests that attackers are now treating AI advisories with the same urgency as critical Windows or Linux kernel flaws. For the AI community, this means that the era of “move fast and break things” must come to an end. Frameworks must be Secure by Design, incorporating input validation and least-privilege principles from the very first commit. For organizations running internal AI applications, the message is clear: the patch cycle for AI infrastructure is no longer measured in weeks or days, but in hours. Immediate action is the only defense against a threat landscape that moves at the speed of thought.
Written by
TempMail Ninja
Digital privacy and online security expert. Passionate about creating tools that protect users' identity on the internet.


