Trust Boundaries and Data Flow | Training | JustAppSec

Trace almost any vulnerability back to its root and you find the same thing: data crossed a line where the trust level changed, and nobody re-checked it on the other side. Learn to see those lines and you start spotting bugs before they're written.

What a trust boundary is

A trust boundary is any point where the level of trust changes. The classic ones:

Browser to server. The user controls the browser, so everything arriving from the client is untrusted, full stop.
Server to database. You trust the database, but what if an attacker already wrote to it through some other path?
Your service to another service. Does the downstream service validate, or does it assume you already did?
Your code to a third-party library. You trust it to behave. What happens when it has a vulnerability?
One user to another. User A's data should never leak into User B's session.

Following the data

Pick any user action and follow the data through its life:

Where it originates: user input, an API, a job, a database read.
How it travels: HTTP, a queue, an internal API, a WebSocket.
Where it rests: a database, a cache, object storage, a log.
How it's used: rendered as HTML, run as SQL, sent as email, baked into a PDF, passed to a shell.

At every step, ask one question: has this data been made safe for the context it's about to enter?

The same data is dangerous in different places

A value that's harmless in one context is an exploit in another, which is why "just sanitise all input" never quite works.

O'Brien is fine in HTML once encoded, and breaks your SQL the moment you build the query by hand.
<b>bold</b> is noise in a log file and a problem the second it's rendered as HTML.
javascript:alert(1) sits harmlessly in storage and turns into code inside an href.

There's no universal "clean" version of a string. You encode for the specific place it's going.

Drawing it out

A data flow diagram makes the boundaries visible. The notation is simple:

External entities (users, third parties) are rectangles.
Processes (your code, your APIs) are circles.
Data stores (databases, caches) sit between parallel lines.
Data flows are arrows.
Trust boundaries are dashed lines cutting across those arrows.

Here's a comment system reduced to its flow:

[User Browser] --HTTP POST--> [API Server] --SQL INSERT--> [Database]
                                                              |
[Other Users] <--HTML Response-- [API Server] <--SQL SELECT--+

Three controls keep that safe: input validation at the API, parameterised queries at the database, output encoding when it renders. Miss any one of them and you have a stored XSS that fires at every other user.

The assumptions that bite

Trusting your own database. If an attacker planted malicious data through another route, every page that renders it without encoding becomes a vulnerability.

Trusting internal services. "It's internal, it doesn't need auth" means that one compromised machine reaches every unauthenticated API you have.

Trusting client-side validation. HTML5 constraints, JavaScript checks, and disabled buttons are user-experience features. They are not security.

Trusting file metadata. A file called photo.jpg claiming to be image/jpeg might be HTML. Validate the actual bytes.

Knowing what you're protecting

How hard you defend data depends on what it is, so classify it and let that drive the controls:

Classification	Examples	Handling
Public	Marketing, docs	Minimal
Internal	Dashboards, team data	Auth required
Confidential	PII, emails, usernames	Encryption, access control, audit
Restricted	Passwords, keys, financial, health	Encryption in transit and at rest, strict access, minimal retention

Get the classification right and the proportionate security decisions tend to follow. Trace the flow, find the boundaries, and validate or encode for the context on the far side of each one. Including the boundary into your own database.