Security · Lesson 03
Securing APIs with Input Validation
Your API's attack surface is every byte it accepts. Client-controlled input is the most reliable path into a system, which is why injection attacks top vulnerability charts decade after decade. This lesson shows how to close that door.
By the end you'll be able to
- Explain why "never trust the client" is the foundational rule of input handling.
- Distinguish allow-list from deny-list validation and state which is stronger.
- Rewrite a concatenated SQL query as a parameterized query and explain why it prevents injection.
The foundational rule: never trust the client
An API must treat every byte from outside its trust boundary as potentially hostile — even if the sender is your own mobile app running your own JavaScript. Why? Because an attacker can bypass the client entirely and send raw HTTP requests using curl, Burp Suite, or a custom script. The client-side validation you wrote, the JavaScript that "prevents" bad input, the mobile app that "only" shows valid options — none of it exists from the server's perspective. The only validation that counts is validation on the server.
Think of it like a bank teller window. The rope maze and the numbered ticket system are conveniences for honest customers. A determined bad actor ignores both and walks straight to the counter. The teller — the server — must check ID regardless of how the person arrived.
Allow-list vs deny-list
There are two philosophies for deciding what input is acceptable:
| Approach | How it works | Strength |
|---|---|---|
| Allow-list (whitelist) | Define exactly what is valid; reject everything else | Strong — unknown inputs fail closed |
| Deny-list (blacklist) | Define what is forbidden; pass everything else | Weak — attackers find gaps in the list |
Allow-listing is almost always the right default. Specify the type, length, format, and allowed range for every field. If an input doesn't match, reject it with a 400. Do not try to sanitise it — sanitisation logic is where bypasses live.
# Allow-list examples — validate field by field
# Good: exact type + range check
age: integer, 0 ≤ age ≤ 150
# Good: constrained string with regex
username: string, matches /^[a-zA-Z0-9_]{3,32}$/"
# Good: enum for known values
status: one of ["active", "inactive", "pending"]
# Bad: blacklisting — an attacker encodes the blocked chars
comment: reject if contains "'" or "--" # ← bypassed with Unicode lookalikes
Injection attacks — the anatomy
Injection happens when unsanitised user input is interpreted as code or commands by a downstream system. The three most common flavours in API contexts:
| Type | Downstream system targeted | What the attacker injects |
|---|---|---|
| SQL injection | Relational database | SQL keywords / clauses that modify the intended query |
| NoSQL injection | MongoDB, Redis, etc. | Operator keys like $where, $gt to bypass filters |
| Command injection | OS shell (when an API runs system commands) | Shell metacharacters: ; | ` $() |
SQL injection — vulnerable vs fixed
Here is a login endpoint that concatenates user input into a SQL string, followed by how an attacker exploits it, followed by the correct parameterized fix:
// VULNERABLE — string concatenation (Node.js + mysql2 example)
const query = "SELECT * FROM users WHERE username='"
+ req.body.username
+ "' AND password='"
+ req.body.password + "'";
await db.query(query); // ← executes attacker-controlled SQL
// SAFE — parameterized query (? placeholders, values passed separately)
const query = "SELECT * FROM users WHERE username=? AND password=?";
await db.query(query, [req.body.username, req.body.password]);
// The DB driver treats ? values as data, never as SQL syntax.
// A username of "' OR '1'='1" matches no row — it is just a string.
Parameterized queries (also called prepared statements) work because the SQL structure is fixed when it is sent to the database. The user-supplied values fill placeholders after the database has already parsed the query. No matter what string the attacker puts in, the database can never be tricked into treating it as SQL syntax.
Validate type, size, and range — everywhere
Injection is the dramatic example, but most real-world input bugs are quieter. A JSON body that contains a 50 MB string where you expected 200 characters can exhaust memory. A negative integer where you expected a positive quantity can underflow account balances. An unexpected field in a POST body can silently overwrite a protected attribute (mass-assignment — see below). Validate all of these at the boundary:
# OpenAPI 3.1 allow-list inline — validate in the schema, before any code runs
requestBody:
content:
application/json:
schema:
type: object
required: [name, quantity]
additionalProperties: false # rejects unknown fields
properties:
name:
type: string
minLength: 1
maxLength: 120
pattern: '^[a-zA-Z0-9 \-]+$'
quantity:
type: integer
minimum: 1
maximum: 1000
Mass-assignment
Mass-assignment is what happens when you automatically bind all incoming JSON fields to a model object without checking which fields the caller is allowed to set. A simple example: your User model has an is_admin field. Your update endpoint is:
// UNSAFE — blindly bind all request fields to the user object
PATCH /v1/users/42
Body: { "email": "alice@example.com", "is_admin": true }
// The server does: user.update(request.body) ← promotes caller to admin
// SAFE — explicit allow-list of writable fields
const allowed = ['email', 'display_name'];
const updates = pick(req.body, allowed); // ignores is_admin entirely
user.update(updates);
Output encoding
Input validation stops bad data entering. Output encoding prevents stored bad data from being executed when it leaves. If you store a user-supplied string in a database and later return it inside an HTML response without encoding it, a stored XSS attack is possible. For JSON APIs the risk is lower (JSON is data, not markup), but any endpoint that renders HTML or serves files must encode output for the target context (HTML entities for HTML, URL encoding for URLs, etc.).
When asked "how do you prevent SQL injection?" a junior answer is "sanitize inputs." A senior answer is "use parameterized queries — sanitisation is error-prone because it operates on the attack vector itself. Parameterization removes the attack surface entirely: user input is always data, never parsed as SQL." Then add: "validate on allow-list before the query, and use a least-privilege DB user so a compromised query can only read what it needs."
Deny-list filtering as a primary defence. Blocking ' and -- seems like it prevents SQL injection. Attackers use URL encoding (%27), Unicode look-alikes (ʼ), double-encoding, and comment variants to bypass it. Every new bypass found in the wild means your deny-list needs updating — an arms race you will lose. Use parameterized queries instead; they eliminate the category, not individual characters.
Do validate at the API boundary using an allow-list schema (OpenAPI, Joi, Pydantic, Zod — pick your ecosystem's tool), use parameterized queries for every database call, and set additionalProperties: false in your request schema to block mass-assignment by default. Don't rely on client-side validation, deny-list filters, or "trusted" fields passed in query parameters.
Under the hood: how it actually works
String concatenation creates injection because the database's SQL parser sees user input as part of the SQL grammar — it has no way to distinguish "code" from "data" when they arrive as one string. The parser treats the entire concatenated string as a SQL statement, so any SQL syntax embedded in the user's value becomes part of the query's structure.
# Step 1 — The vulnerable server code
# Developer intends this query:
SELECT * FROM users WHERE id = '<user_input>'
# Server code (Node.js):
const query = "SELECT * FROM users WHERE id = '" + req.body.id + "'";
# Step 2 — Attacker sends id = ' OR '1'='1
# Resulting query string sent to DB:
SELECT * FROM users WHERE id = '' OR '1'='1'
# DB parser sees:
# WHERE clause: (id = '') OR ('1' = '1')
# '1'='1' is always TRUE
# → condition is TRUE for every row
# → returns all users
# Step 3 — More dangerous: UNION-based extraction
# Attacker sends id = ' UNION SELECT username, password, null FROM users --
# Resulting query:
SELECT * FROM users WHERE id = '' UNION SELECT username, password, null FROM users --'
# → dumps the entire users table including password hashes
# Step 4 — Even more dangerous: stacked queries (database-dependent)
# Attacker sends id = '; DROP TABLE orders; --
# Resulting query on MySQL with multi-statement enabled:
SELECT * FROM users WHERE id = ''; DROP TABLE orders; --'
# → first query returns empty, second drops the table
| Attacker input (id field) | Resulting WHERE clause | Effect |
|---|---|---|
' OR '1'='1 |
WHERE id='' OR '1'='1' |
Returns all rows — auth bypass |
' OR 1=1 LIMIT 1 -- |
WHERE id='' OR 1=1 LIMIT 1 |
Returns first row regardless of id — log in as first user |
' UNION SELECT table_name,null FROM information_schema.tables -- |
WHERE id='' UNION SELECT table_name... |
Reveals all table names in the database |
'; UPDATE users SET is_admin=1 WHERE id=42; -- |
Two statements: SELECT + UPDATE | Promotes user 42 to admin (if multi-statement allowed) |
Parameterized queries close this gap because the SQL driver sends the query template and the parameter values as separate protocol messages to the database. The database parses the SQL structure first, with the ? placeholder already understood as "a data slot, not SQL". When the value arrives, the parser has already finished — there is no grammar left to inject into.
How to debug & inspect it
Here is how to probe your own API for SQL injection and how to read the signs when something is wrong.
| Symptom | Cause | Fix |
|---|---|---|
| API returns a raw database error message (SQL syntax, ORA-, SQLSTATE) | Input containing SQL metacharacters (quotes, dashes) breaks string concatenation | Switch to parameterized queries; catch DB exceptions and return a generic 500 without the DB message |
Different response body/size for id=1 AND 1=1 vs id=1 AND 1=2 |
Boolean-based blind SQL injection — query structure is injectable | Parameterized queries; also validate that id is an integer before it reaches any query |
| Response contains data from unexpected tables or columns | UNION-based injection — attacker appended a second SELECT | Parameterized queries; use least-privilege DB user that can only SELECT from allowed tables |
| Rows that shouldn't be visible appear for certain users | BOLA + missing WHERE owner_id clause (different from injection but similar symptom) | Add AND owner_id = ? to every query scoped to a user; review all queries that use user-supplied IDs |
Log shows '; DROP TABLE or UNION SELECT in request parameters |
Active SQL injection attempt in logs | Block/rate-limit the IP; audit whether any request with those payloads returned a 200; confirm parameterized queries are in place |
- Search the codebase for string concatenation into queries:
grep -rn "f\"SELECT\|+ \"SELECT\|+ 'SELECT"— any hit is a candidate vulnerability. - Confirm every DB call uses
?/%s/$1placeholders, not f-strings or.format(). - Send a single quote (
') to every parameter that reaches a DB query — a 500 with a DB error means concatenation. - Verify the DB user has only SELECT/INSERT/UPDATE on the tables it needs — no DROP, no GRANT, no access to system tables.
- Ensure DB error messages are caught and replaced with a generic 500 response before reaching the client.
🧠 Quick check
1. Why is an allow-list stronger than a deny-list for validating API input?
A deny-list can only block known bad input — attackers find encoding tricks or novel patterns to slip through. An allow-list rejects everything outside the defined valid set, so unknown-bad inputs fail closed.
2. What makes parameterized queries effective against SQL injection?
With a parameterized query, the database receives the query structure first and compiles it. The user-supplied values are bound afterward as data — the parser has already finished, so injection is structurally impossible, not just unlikely.
3. A PATCH /users/42 endpoint automatically applies all fields from the request body to the user record. Why is this a security problem?
Mass-assignment lets a caller set fields they should not control — roles, billing status, internal flags. The fix is an explicit allow-list of writable fields, so unknown or protected fields are ignored regardless of what the client sends.
4. Which of these is the most dangerous input-handling approach?
Concatenating user input into a shell command enables command injection — the attacker uses ;, |, or $() to execute arbitrary OS commands. Options A and C are both valid allow-list approaches.
✍️ Exercise: spot and fix the injection (try before opening)
The following Python snippet is the handler for GET /v1/products?category=electronics. Identify the vulnerability, explain how it could be exploited, and rewrite it to be safe.
category = request.args.get('category')
result = db.execute(f"SELECT * FROM products WHERE category='{category}'")
Model answer:
Vulnerability: The category query parameter is interpolated directly into the SQL string. An attacker sends ?category=' OR '1'='1 to dump every product, or ?category='; DROP TABLE products; -- to attempt destructive commands.
# Safe version — parameterized query
category = request.args.get('category')
# Allow-list: only accept known category slugs
valid_categories = {'electronics', 'clothing', 'books'}
if category not in valid_categories:
return 400, {"error": "invalid category"}
result = db.execute(
"SELECT * FROM products WHERE category=%s",
(category,) # ← tuple of values; db driver handles escaping
)
Rubric: ✓ named the attack (SQL injection) ✓ showed a concrete exploit payload ✓ fixed with parameterized query ✓ added allow-list as a second layer ✓ did not rely solely on escaping.
Key takeaways
- Never trust the client — all validation that matters runs server-side; client-side checks are UX, not security.
- Allow-list (define what is valid and reject the rest) is stronger than deny-list (define what is forbidden).
- Parameterized queries eliminate SQL injection by separating SQL structure from user data — the database never parses user input as code.
- Validate type, length, and range for every input field; set
additionalProperties: falseto prevent mass-assignment. - Output encoding is the complement to input validation — it prevents stored data from executing when rendered.