JustAppSec

Input Validation and Schema Enforcement

Validate early, validate strictly — schemas, allowlists, and type-safe boundaries.

0:00

Input validation is your first line of defence — the point where you establish what is acceptable before data enters your system. This lesson covers practical validation strategies, schema enforcement, and the common mistakes that leave gaps.

Input validation is not a silver bullet

Input validation alone does not prevent most vulnerabilities. SQL injection is prevented by parameterised queries. XSS is prevented by output encoding. SSRF is prevented by URL validation and network controls. Input validation is a supporting layer that catches obviously bad data early, reducing the attack surface for everything downstream.

Never rely on input validation as the sole defence against any specific vulnerability class.

Client-side vs. server-side

Client-side validation (HTML5 attributes, JavaScript checks) is a UX feature. It gives users immediate feedback. It is not a security control.

Every validation that matters must be enforced on the server. The client is completely under the user's control — they can disable JavaScript, modify the DOM, or send requests directly to your API using curl or Burp Suite.

Validation strategies

Allowlists over denylists

Accept known-good values. Reject everything else.

ALLOWED_STATUS = {"active", "inactive", "pending"}

if status not in ALLOWED_STATUS:
    raise ValidationError("Invalid status")

Denylists (blocking known-bad values) are fragile. You will always miss something. Attackers are creative — there are hundreds of ways to encode a <script> tag, dozens of SQL injection variants, and new bypass techniques discovered regularly.

Type enforcement

Ensure data matches the expected type before processing:

# Expected: integer
order_id = int(request.params.get("order_id"))  # Raises ValueError if not an integer

# Expected: UUID
import uuid
user_id = uuid.UUID(request.params.get("user_id"))  # Raises ValueError if not a valid UUID

Type enforcement eliminates entire classes of payloads. A SQL injection string cannot be a valid integer.

Length limits

Every string input should have a maximum length. This prevents:

  • Buffer overflow in downstream systems
  • Storage exhaustion
  • ReDoS (Regular Expression Denial of Service) by limiting input size
  • General abuse
username = request.body.get("username", "")
if len(username) > 64:
    raise ValidationError("Username too long")

Format validation

For structured data (emails, phone numbers, dates, URLs), validate the format:

import re

EMAIL_PATTERN = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
if not EMAIL_PATTERN.match(email):
    raise ValidationError("Invalid email format")

Be careful with regex validation — overly complex patterns can be vulnerable to ReDoS. Keep patterns simple and set input length limits.

Range validation

For numeric inputs, validate minimum and maximum values:

quantity = int(request.body.get("quantity"))
if quantity < 1 or quantity > 1000:
    raise ValidationError("Quantity must be between 1 and 1000")

Schema enforcement

For APIs that accept structured data (JSON bodies, GraphQL queries), use a schema validation library. This enforces type, format, length, and required fields in a single declaration.

JSON Schema

{
  "type": "object",
  "required": ["name", "email"],
  "properties": {
    "name": { "type": "string", "minLength": 1, "maxLength": 100 },
    "email": { "type": "string", "format": "email", "maxLength": 254 },
    "age": { "type": "integer", "minimum": 0, "maximum": 150 }
  },
  "additionalProperties": false
}

additionalProperties: false is critical — it rejects any fields not defined in the schema. This prevents mass assignment attacks where an attacker adds fields like "role": "admin" to the request body.

Zod (TypeScript)

import { z } from "zod";

const CreateUserSchema = z.object({
  name: z.string().min(1).max(100),
  email: z.string().email().max(254),
  age: z.number().int().min(0).max(150).optional(),
}).strict(); // Reject unknown keys

const data = CreateUserSchema.parse(req.body);

Pydantic (Python)

from pydantic import BaseModel, EmailStr, Field

class CreateUser(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: EmailStr
    age: int | None = Field(default=None, ge=0, le=150)

    class Config:
        extra = "forbid"  # Reject unknown fields

data = CreateUser(**request.json)

Handling validation failures

Return clear errors

Tell the user what is wrong, but do not expose internal details:

{
  "error": "Validation failed",
  "details": [
    { "field": "email", "message": "Invalid email format" },
    { "field": "age", "message": "Must be between 0 and 150" }
  ]
}

Use appropriate HTTP status codes

  • 400 Bad Request — the input is malformed or invalid
  • 422 Unprocessable Entity — the input is well-formed but semantically invalid

Do not leak internal state

Error messages should not include stack traces, database column names, or SQL error messages in production.

Canonicalisation

Input can be encoded in multiple ways that resolve to the same value. Canonicalise (normalise) input before validation:

  • Unicode normalisation: é can be represented as a single code point or as e + combining accent. Normalise to NFC or NFD before comparing or validating.
  • URL encoding: %2e%2e%2f decodes to ../. Decode before checking for path traversal.
  • Case normalisation: if your allowlist contains admin, an attacker might try Admin, ADMIN, or aDmIn. Normalise case before comparison.

Validate after canonicalisation, not before.

GraphQL-specific validation

GraphQL introduces additional validation challenges:

  • Query depth limits — prevent deeply nested queries that cause exponential database joins
  • Query complexity limits — assign a cost to each field and reject queries exceeding a threshold
  • Introspection control — disable introspection in production to prevent schema enumeration
  • Input type validation — GraphQL has a type system; use it. Define strict input types with field-level validation.

Summary

Input validation is a supporting defence — necessary but not sufficient on its own. Prefer allowlists over denylists. Enforce types, lengths, formats, and ranges. Use schema validation libraries (Zod, Pydantic, JSON Schema) to declare and enforce structure in one place. Reject unknown fields to prevent mass assignment. Canonicalise input before validating. Always validate on the server, regardless of what the client does.


This training content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly. Send corrections to [email protected].