JustAppSec

Trust Boundaries and Data Flow

Where data crosses a trust boundary, security decisions need to happen.

0:00

Every security vulnerability is a trust boundary violation — data crossing from an untrusted zone into a trusted one without proper validation. This lesson teaches you to identify trust boundaries and trace data flow so you can spot where defences are needed.

What is a trust boundary?

A trust boundary is any point where the level of trust changes. The most obvious one is the boundary between the internet and your server — you do not trust incoming HTTP requests. But there are many others:

  • Browser → Server — the user controls the browser. Everything from the client is untrusted.
  • Server → Database — the application trusts data in the database, but what if an attacker already wrote data there via a different path?
  • Your service → Another microservice — does the downstream service validate its own inputs, or does it trust your service to have already done so?
  • Your code → A third-party library — you trust the library to behave correctly. What if it has a vulnerability?
  • User A → User B — in multi-tenant systems, one user's data should never leak to or influence another user's experience.

Every time data crosses a trust boundary, it needs to be validated, sanitised, or encoded for the target context.

Tracing data flow

To understand where trust boundaries exist, trace how data moves through your system. Pick any user action — submitting a form, requesting a page, uploading a file — and follow the data:

  1. Where does the data originate? (User input, third-party API, scheduled job, database read)
  2. Where does it travel? (HTTP request, message queue, internal API call, WebSocket)
  3. Where is it stored? (Database, cache, object storage, log file)
  4. Where is it rendered or used? (HTML template, SQL query, email body, PDF generator, shell command)

At every step, ask: has this data been validated for the context it is entering?

Context matters

The same piece of data can be safe in one context and dangerous in another:

  • A username like O'Brien is fine in an HTML page (with proper encoding) but will break a SQL query (without parameterisation).
  • A string like <b>bold</b> is harmless in a log file but dangerous if rendered as HTML in a browser.
  • A URL like javascript:alert(1) is fine stored as text but dangerous if placed in an href attribute.

This is why generic "sanitise all input" approaches fail. You need output encoding for the specific context where data is used, not just input validation at the boundary.

Data flow diagrams

A data flow diagram (DFD) is the simplest tool for visualising trust boundaries. The notation is straightforward:

  • External entities (users, third-party services) — rectangles
  • Processes (your code, APIs, workers) — circles
  • Data stores (databases, caches, file systems) — parallel lines
  • Data flows (HTTP, SQL, gRPC, etc.) — arrows
  • Trust boundaries — dashed lines separating zones of different trust

You do not need formal diagramming tools. A whiteboard sketch or a quick diagram in a shared doc is enough. The act of drawing it is what surfaces the assumptions.

Example: user comment system

[User Browser] --HTTP POST--> [API Server] --SQL INSERT--> [Database]
                                                              |
[Other Users' Browsers] <--HTML Response-- [API Server] <--SQL SELECT--+

Trust boundaries:

  • Between the user's browser and the API server (the user is untrusted)
  • Between the API server and the database (data going in must be parameterised; data coming out must be encoded before rendering)
  • Between the API server and other users' browsers (stored data from one user must not execute as code in another user's browser)

Each boundary needs a control:

  • Input validation at the API layer
  • Parameterised queries at the database layer
  • Output encoding at the rendering layer

If any one of these is missing, you have a vulnerability.

Common trust boundary mistakes

Trusting data from your own database

A frequent mistake: developers treat database reads as "safe" because the data is "internal." But if an attacker managed to write malicious data into the database (via a different endpoint, a direct database compromise, or a stored XSS path), every page that renders that data without encoding is now vulnerable.

Rule: encode data for the output context regardless of where it came from.

Trusting requests from internal services

Microservice A calls Microservice B. "It's internal, so we don't need auth." This is a dangerous assumption. If any machine in the internal network is compromised, all unauthenticated internal APIs are immediately accessible. Always authenticate service-to-service calls, even in private networks.

Trusting client-side validation

HTML5 form validation, JavaScript checks, and disabled buttons are user experience features, not security controls. They exist in the user's browser, which the user fully controls. Every validation that matters must be repeated on the server.

Trusting file metadata

A file uploaded as photo.jpg with a MIME type of image/jpeg might actually be an HTML file containing JavaScript. Trust the content, not the metadata. Validate the actual file content (magic bytes, re-encoding images, restricting content types at the server level).

Data classification

Not all data needs the same level of protection. Classify your data so you can prioritise defences:

ClassificationExamplesHandling
PublicMarketing content, documentationMinimal controls
InternalInternal dashboards, team dataAuthentication required
ConfidentialPII, email addresses, usernamesEncryption, access controls, audit logging
RestrictedPasswords, API keys, financial data, health recordsEncryption at rest and in transit, strict access controls, minimal retention

When you know what classification of data is flowing across each boundary, you can make proportionate security decisions.

Summary

Trust boundaries exist wherever the level of trust changes — between users and servers, between services, between your code and its data stores. Trace how data flows through your system, identify every boundary, and ensure that data is validated or encoded appropriately for each context it enters. Treat every data source as potentially hostile, including your own database.


This training content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly. Send corrections to [email protected].