JustAppSec

Logging and Detection Engineering

Writing logs that actually help you find attackers — structured, contextual, actionable.

0:00

Most application logs are written for debugging. They tell you what happened in the code, but not who did it, why it matters, or whether it was malicious. Detection engineering starts with logs that are structured, contextual, and actionable — designed from day one to answer security questions.

What makes a log security-useful

A log line is useful for detection when it answers:

  1. Who — authenticated user, API key, session ID, source IP
  2. What — the action taken (login, file download, permission change)
  3. Where — which service, endpoint, resource
  4. When — UTC timestamp with millisecond precision
  5. Outcome — success or failure, with error codes
  6. Context — request ID, correlation ID, tenant ID

Structured logging

Unstructured logs are nearly impossible to query at scale:

# Bad — unstructured
[2025-03-15 14:23:01] User admin logged in from 203.0.113.45

# Good — structured JSON
{
  "timestamp": "2025-03-15T14:23:01.042Z",
  "level": "info",
  "event": "auth.login.success",
  "actor": {
    "userId": "u_8f3k2",
    "username": "admin",
    "ip": "203.0.113.45",
    "userAgent": "Mozilla/5.0..."
  },
  "metadata": {
    "mfaUsed": true,
    "method": "password",
    "sessionId": "sess_a1b2c3"
  },
  "service": "auth-api",
  "traceId": "abc-123-def"
}

Event naming conventions

Use a consistent, hierarchical naming scheme:

EventName
Successful loginauth.login.success
Failed loginauth.login.failure
Password reset requestedauth.password_reset.requested
Permission changedauthz.permission.changed
File uploadedresource.file.uploaded
API key createdcredential.api_key.created
Admin actionadmin.user.disabled

Consistent naming lets you write detection rules like event:auth.login.failure AND count > 10 within 5m without guessing field names.

What to log

Always log these events

Authentication events:

  • Login success / failure (with reason: wrong password, locked account, expired MFA)
  • MFA enrolment and verification
  • Password changes and resets
  • Session creation, expiry, and explicit logout
  • Token issuance and revocation

Authorisation events:

  • Access denied (which resource, which permission was missing)
  • Privilege escalation (role changes, permission grants)
  • Admin actions (user management, configuration changes)

Data access:

  • Access to sensitive resources (PII, financial data, secrets)
  • Bulk data exports or downloads
  • API access to restricted endpoints

System events:

  • Configuration changes
  • Deployment events
  • Service-to-service authentication
  • Rate limit triggers
  • Input validation failures

What NOT to log

  • Passwords, tokens, API keys, or other credentials
  • Full credit card numbers or bank account details
  • Session tokens (log the session ID, not the token value)
  • Personally identifiable information unless required and masked
  • Health check pings (they create noise, not signal)
// BAD — logs the actual password
logger.info('Login attempt', { username, password });

// GOOD — logs the event without secrets
logger.info({
  event: 'auth.login.attempt',
  actor: { username, ip: req.ip },
  metadata: { method: 'password' }
});

Detection rules

A detection rule defines a pattern of log events that indicates something worth investigating. Detection engineering is the practice of writing, testing, and maintaining these rules.

Anatomy of a detection rule

name: Brute force login attempt
description: Multiple failed logins to the same account from the same IP
severity: medium
query: |
  event:auth.login.failure
  | stats count by actor.username, actor.ip
  | where count > 10
timeframe: 5m
response:
  - alert: security-team
  - action: temporary-block-ip

Common detection patterns

Threshold-based:

# More than 5 failed logins in 2 minutes
auth.login.failure | count by actor.username | where count > 5 within 2m

Anomaly-based:

# Login from a country the user has never logged in from
auth.login.success WHERE actor.geo.country NOT IN user.historical_countries

Sequence-based:

# Failed login → successful login → immediate privilege escalation
auth.login.failure FOLLOWED BY auth.login.success FOLLOWED BY authz.permission.changed
  WHERE actor.username is same
  WITHIN 10m

Absence-based:

# Expected daily backup did not run
NOT event:system.backup.completed WITHIN 24h

MITRE ATT&CK mapping

Map detection rules to MITRE ATT&CK techniques to identify coverage gaps:

TechniqueDetection
T1110 — Brute ForceFailed login threshold
T1078 — Valid AccountsLogin from unusual location/time
T1098 — Account ManipulationPermission change outside change window
T1530 — Data from Cloud StorageBulk download exceeding normal volume
T1190 — Exploit Public-Facing AppInput validation failures + error spikes

Logging implementation

Node.js (Pino)

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level(label) { return { level: label }; }
  },
  timestamp: pino.stdTimeFunctions.isoTime,
  redact: ['req.headers.authorization', 'req.headers.cookie']
});

// Security event helper
function logSecurityEvent(event, actor, metadata = {}) {
  logger.info({
    event,
    actor: {
      userId: actor.id,
      username: actor.username,
      ip: actor.ip,
      sessionId: actor.sessionId
    },
    metadata,
    service: process.env.SERVICE_NAME
  });
}

// Usage
logSecurityEvent('auth.login.success', {
  id: user.id,
  username: user.username,
  ip: req.ip,
  sessionId: req.session.id
}, { mfaUsed: true });

Python (structlog)

import structlog

logger = structlog.get_logger()

def log_security_event(event: str, actor: dict, **metadata):
    logger.info(
        event,
        actor=actor,
        metadata=metadata,
        service="auth-api"
    )

log_security_event(
    "auth.login.failure",
    actor={"user_id": "u_8f3k2", "ip": request.remote_addr},
    reason="invalid_password",
    attempt_count=3
)

Log pipeline architecture

Application → Log Shipper → Log Aggregator → Detection Engine → Alerts
  (stdout)     (Fluentd,      (Elasticsearch,   (Sigma rules,    (PagerDuty,
                Vector,         Loki,              custom rules)    Slack,
                Filebeat)       Datadog)                            SIEM)

Key decisions

DecisionOptions
Log formatJSON (preferred), logfmt, CEF
Transportstdout → sidecar (Kubernetes), file → shipper, direct API
StorageElasticsearch, Loki, Datadog, Splunk, CloudWatch
Retention30 days hot, 90 days warm, 1+ year cold (compliance dependent)
DetectionSIEM rules, Sigma, custom alerting pipelines

Sigma rules

Sigma is a vendor-agnostic format for detection rules that can be converted to any SIEM query language:

title: Multiple Failed Logins Followed by Success
status: stable
logsource:
  category: authentication
detection:
  failed:
    event: auth.login.failure
  success:
    event: auth.login.success
  condition: failed | count() > 5 and success
  timeframe: 5m
level: medium
tags:
  - attack.credential_access
  - attack.t1110

Testing detection rules

Detection rules are code. Test them like code.

  1. Unit test — feed known-malicious log patterns through the rule and confirm it fires
  2. Replay test — run the rule against historical logs to check for false positives
  3. Red team validation — simulate the attack and verify the rule detects it
  4. Noise assessment — run in alert-but-don't-page mode for a week before promoting to production

Summary

Write structured logs with consistent event names, actor context, and correlation IDs. Log authentication, authorisation, data access, and system events — but never credentials or raw PII. Build detection rules mapped to MITRE ATT&CK techniques and test them the same way you test application code. A log that nobody queries is just storage cost; a detection rule that nobody tunes is just noise.


This training content is AI-assisted and reviewed by our team, but issues may be missed and best practices evolve rapidly. Send corrections to [email protected].