Most application logs are written for debugging. They tell you what happened in the code, but not who did it, why it matters, or whether it was malicious. Detection engineering starts with logs that are structured, contextual, and actionable — designed from day one to answer security questions.
What makes a log security-useful
A log line is useful for detection when it answers:
- Who — authenticated user, API key, session ID, source IP
- What — the action taken (login, file download, permission change)
- Where — which service, endpoint, resource
- When — UTC timestamp with millisecond precision
- Outcome — success or failure, with error codes
- Context — request ID, correlation ID, tenant ID
Structured logging
Unstructured logs are nearly impossible to query at scale:
# Bad — unstructured
[2025-03-15 14:23:01] User admin logged in from 203.0.113.45
# Good — structured JSON
{
"timestamp": "2025-03-15T14:23:01.042Z",
"level": "info",
"event": "auth.login.success",
"actor": {
"userId": "u_8f3k2",
"username": "admin",
"ip": "203.0.113.45",
"userAgent": "Mozilla/5.0..."
},
"metadata": {
"mfaUsed": true,
"method": "password",
"sessionId": "sess_a1b2c3"
},
"service": "auth-api",
"traceId": "abc-123-def"
}
Event naming conventions
Use a consistent, hierarchical naming scheme:
| Event | Name |
|---|---|
| Successful login | auth.login.success |
| Failed login | auth.login.failure |
| Password reset requested | auth.password_reset.requested |
| Permission changed | authz.permission.changed |
| File uploaded | resource.file.uploaded |
| API key created | credential.api_key.created |
| Admin action | admin.user.disabled |
Consistent naming lets you write detection rules like event:auth.login.failure AND count > 10 within 5m without guessing field names.
What to log
Always log these events
Authentication events:
- Login success / failure (with reason: wrong password, locked account, expired MFA)
- MFA enrolment and verification
- Password changes and resets
- Session creation, expiry, and explicit logout
- Token issuance and revocation
Authorisation events:
- Access denied (which resource, which permission was missing)
- Privilege escalation (role changes, permission grants)
- Admin actions (user management, configuration changes)
Data access:
- Access to sensitive resources (PII, financial data, secrets)
- Bulk data exports or downloads
- API access to restricted endpoints
System events:
- Configuration changes
- Deployment events
- Service-to-service authentication
- Rate limit triggers
- Input validation failures
What NOT to log
- Passwords, tokens, API keys, or other credentials
- Full credit card numbers or bank account details
- Session tokens (log the session ID, not the token value)
- Personally identifiable information unless required and masked
- Health check pings (they create noise, not signal)
// BAD — logs the actual password
logger.info('Login attempt', { username, password });
// GOOD — logs the event without secrets
logger.info({
event: 'auth.login.attempt',
actor: { username, ip: req.ip },
metadata: { method: 'password' }
});
Detection rules
A detection rule defines a pattern of log events that indicates something worth investigating. Detection engineering is the practice of writing, testing, and maintaining these rules.
Anatomy of a detection rule
name: Brute force login attempt
description: Multiple failed logins to the same account from the same IP
severity: medium
query: |
event:auth.login.failure
| stats count by actor.username, actor.ip
| where count > 10
timeframe: 5m
response:
- alert: security-team
- action: temporary-block-ip
Common detection patterns
Threshold-based:
# More than 5 failed logins in 2 minutes
auth.login.failure | count by actor.username | where count > 5 within 2m
Anomaly-based:
# Login from a country the user has never logged in from
auth.login.success WHERE actor.geo.country NOT IN user.historical_countries
Sequence-based:
# Failed login → successful login → immediate privilege escalation
auth.login.failure FOLLOWED BY auth.login.success FOLLOWED BY authz.permission.changed
WHERE actor.username is same
WITHIN 10m
Absence-based:
# Expected daily backup did not run
NOT event:system.backup.completed WITHIN 24h
MITRE ATT&CK mapping
Map detection rules to MITRE ATT&CK techniques to identify coverage gaps:
| Technique | Detection |
|---|---|
| T1110 — Brute Force | Failed login threshold |
| T1078 — Valid Accounts | Login from unusual location/time |
| T1098 — Account Manipulation | Permission change outside change window |
| T1530 — Data from Cloud Storage | Bulk download exceeding normal volume |
| T1190 — Exploit Public-Facing App | Input validation failures + error spikes |
Logging implementation
Node.js (Pino)
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: {
level(label) { return { level: label }; }
},
timestamp: pino.stdTimeFunctions.isoTime,
redact: ['req.headers.authorization', 'req.headers.cookie']
});
// Security event helper
function logSecurityEvent(event, actor, metadata = {}) {
logger.info({
event,
actor: {
userId: actor.id,
username: actor.username,
ip: actor.ip,
sessionId: actor.sessionId
},
metadata,
service: process.env.SERVICE_NAME
});
}
// Usage
logSecurityEvent('auth.login.success', {
id: user.id,
username: user.username,
ip: req.ip,
sessionId: req.session.id
}, { mfaUsed: true });
Python (structlog)
import structlog
logger = structlog.get_logger()
def log_security_event(event: str, actor: dict, **metadata):
logger.info(
event,
actor=actor,
metadata=metadata,
service="auth-api"
)
log_security_event(
"auth.login.failure",
actor={"user_id": "u_8f3k2", "ip": request.remote_addr},
reason="invalid_password",
attempt_count=3
)
Log pipeline architecture
Application → Log Shipper → Log Aggregator → Detection Engine → Alerts
(stdout) (Fluentd, (Elasticsearch, (Sigma rules, (PagerDuty,
Vector, Loki, custom rules) Slack,
Filebeat) Datadog) SIEM)
Key decisions
| Decision | Options |
|---|---|
| Log format | JSON (preferred), logfmt, CEF |
| Transport | stdout → sidecar (Kubernetes), file → shipper, direct API |
| Storage | Elasticsearch, Loki, Datadog, Splunk, CloudWatch |
| Retention | 30 days hot, 90 days warm, 1+ year cold (compliance dependent) |
| Detection | SIEM rules, Sigma, custom alerting pipelines |
Sigma rules
Sigma is a vendor-agnostic format for detection rules that can be converted to any SIEM query language:
title: Multiple Failed Logins Followed by Success
status: stable
logsource:
category: authentication
detection:
failed:
event: auth.login.failure
success:
event: auth.login.success
condition: failed | count() > 5 and success
timeframe: 5m
level: medium
tags:
- attack.credential_access
- attack.t1110
Testing detection rules
Detection rules are code. Test them like code.
- Unit test — feed known-malicious log patterns through the rule and confirm it fires
- Replay test — run the rule against historical logs to check for false positives
- Red team validation — simulate the attack and verify the rule detects it
- Noise assessment — run in alert-but-don't-page mode for a week before promoting to production
Summary
Write structured logs with consistent event names, actor context, and correlation IDs. Log authentication, authorisation, data access, and system events — but never credentials or raw PII. Build detection rules mapped to MITRE ATT&CK techniques and test them the same way you test application code. A log that nobody queries is just storage cost; a detection rule that nobody tunes is just noise.
