File uploads are one of the most dangerous features you can add to a web application. If handled poorly, they can lead to remote code execution, data exfiltration, denial of service, and stored XSS. This lesson covers how to upload, store, and serve files safely.
The risks
When a user uploads a file, they are sending arbitrary binary data to your server. The risks include:
- Remote code execution — uploading a web shell (e.g., a
.phpor.jspfile) that the server then executes - Stored XSS — uploading an HTML or SVG file containing JavaScript that is served to other users
- Path traversal — a filename like
../../etc/passwdthat writes to unintended locations - Denial of service — uploading extremely large files or zip bombs that exhaust disk or memory
- Malware distribution — using your server as a hosting platform for malicious files
Never trust file metadata
The filename, MIME type, and file extension are all user-controlled. An attacker can:
- Send a file named
photo.jpgthat is actually an executable - Set the
Content-Typeheader toimage/jpegfor an HTML file - Use double extensions like
malware.php.jpg
Do not rely on any of these for security decisions. Validate the actual file content.
Validation strategies
Check file content, not just the extension
Use magic bytes (file signatures) to verify the actual file type:
import magic
mime = magic.from_buffer(file.read(2048), mime=True)
if mime not in ["image/jpeg", "image/png", "image/gif"]:
raise ValueError("Invalid file type")
Restrict allowed types to a strict allowlist
Only accept the specific file types your feature requires. If the feature is profile photos, accept JPEG, PNG, and WebP. Reject everything else.
Enforce file size limits
Set limits at multiple layers:
- Web server level (e.g., Nginx
client_max_body_size) - Application level (check
Content-Lengthand actual bytes read) - Storage level (per-user quotas)
Re-encode images
The safest approach for image uploads is to decode the image and re-encode it using a library like Pillow (Python), Sharp (Node.js), or ImageMagick. This strips any embedded scripts, EXIF data containing sensitive information, and malformed content.
from PIL import Image
import io
img = Image.open(uploaded_file)
img.verify() # Verify the image is valid
# Re-open and re-encode
img = Image.open(uploaded_file)
output = io.BytesIO()
img.save(output, format="PNG")
Sanitise filenames
Never use the user-provided filename directly. Generate a new filename:
import uuid
import os
ext = ".png" # Determined from validated content type, not from user input
safe_filename = f"{uuid.uuid4()}{ext}"
This prevents path traversal, special character issues, and filename collisions.
Storage
Store outside the web root
If uploaded files are stored within the web server's document root, the server may execute them. A .php file in the web root will be executed by Apache/Nginx with PHP enabled.
Store uploaded files in a location that the web server does not serve directly. Use a separate endpoint to serve files, with explicit content-type headers.
Use object storage
Cloud object storage (S3, GCS, Azure Blob) is the preferred approach:
- Files are stored outside your application server
- No risk of server-side execution
- Built-in access control, encryption, and CDN integration
- Pre-signed URLs with expiry for time-limited access
# Generate a pre-signed URL (AWS S3)
url = s3_client.generate_presigned_url(
"get_object",
Params={"Bucket": "uploads", "Key": safe_filename},
ExpiresIn=3600,
)
Encrypt at rest
Enable server-side encryption on your storage (S3 SSE, GCS default encryption, etc.). This protects against data exposure from storage-level breaches.
Serving files safely
Set Content-Type explicitly
When serving uploaded files, set the Content-Type header based on your validated type, not the stored metadata:
Content-Type: image/png
Set Content-Disposition for downloads
Force the browser to download files rather than rendering them inline:
Content-Disposition: attachment; filename="photo.png"
This prevents HTML or SVG files from being rendered in the browser context.
Use a separate domain
Serve user-uploaded content from a separate domain (e.g., uploads.example.com instead of example.com). This creates a different origin, so even if an attacker manages to upload an HTML file, any scripts in it cannot access cookies or data from your main application domain.
Set restrictive headers
X-Content-Type-Options: nosniff
Content-Security-Policy: default-src 'none'
nosniff prevents the browser from guessing the content type. A restrictive CSP prevents any scripts from executing in the served content.
Specific file type risks
| File type | Risk | Mitigation |
|---|---|---|
| SVG | Can contain embedded JavaScript | Sanitise with a library, or reject SVGs entirely |
| HTML | Executes scripts when rendered | Never serve as text/html; use Content-Disposition: attachment |
| Can contain JavaScript and external links | Serve with Content-Disposition: attachment | |
| ZIP / archive | Zip bombs (expand to enormous size), path traversal in entries | Set extraction size limits, sanitise entry paths |
| Office documents | Macro execution, external data connections | Scan with antivirus, convert to PDF for viewing |
Antivirus scanning
For applications that accept arbitrary file types, scan uploads with an antivirus engine (ClamAV is a common open-source option). This is not foolproof — novel malware can bypass signatures — but it catches known threats.
Process scans asynchronously. Do not block the upload response while scanning. Store the file in a quarantine area, scan it, and move it to the final location only if it passes.
Summary
File uploads are a high-risk feature. Never trust the filename, extension, or MIME type. Validate actual file content with magic bytes. Re-encode images to strip embedded payloads. Store files outside the web root (preferably in object storage). Serve files with explicit content types, Content-Disposition: attachment, and from a separate domain. Combine these controls to reduce a dangerous feature to a manageable one.
