Security · Lesson 03

Securing APIs with Input Validation

Your API's attack surface is every byte it accepts. Client-controlled input is the most reliable path into a system, which is why injection attacks top vulnerability charts decade after decade. This lesson shows how to close that door.

⏱ 13 min Difficulty: core Prereq: OWASP overview (sec-01)

By the end you'll be able to

Explain why "never trust the client" is the foundational rule of input handling.
Distinguish allow-list from deny-list validation and state which is stronger.
Rewrite a concatenated SQL query as a parameterized query and explain why it prevents injection.

The foundational rule: never trust the client

An API must treat every byte from outside its trust boundary as potentially hostile — even if the sender is your own mobile app running your own JavaScript. Why? Because an attacker can bypass the client entirely and send raw HTTP requests using curl, Burp Suite, or a custom script. The client-side validation you wrote, the JavaScript that "prevents" bad input, the mobile app that "only" shows valid options — none of it exists from the server's perspective. The only validation that counts is validation on the server.

Think of it like a bank teller window. The rope maze and the numbered ticket system are conveniences for honest customers. A determined bad actor ignores both and walks straight to the counter. The teller — the server — must check ID regardless of how the person arrived.

Every caller — your own app or a raw script — meets the same server-side gate. Only input that matches the allow-list passes; the rest is rejected. Client-side checks don't exist from the server's view.

Allow-list vs deny-list

There are two philosophies for deciding what input is acceptable:

Approach	How it works	Strength
Allow-list (whitelist)	Define exactly what is valid; reject everything else	Strong — unknown inputs fail closed
Deny-list (blacklist)	Define what is forbidden; pass everything else	Weak — attackers find gaps in the list

Allow-listing is almost always the right default. Specify the type, length, format, and allowed range for every field. If an input doesn't match, reject it with a 400. Do not try to sanitise it — sanitisation logic is where bypasses live.

# Allow-list examples — validate field by field

# Good: exact type + range check
age: integer, 0 ≤ age ≤ 150

# Good: constrained string with regex
username: string, matches /^[a-zA-Z0-9_]{3,32}$/"

# Good: enum for known values
status: one of ["active", "inactive", "pending"]

# Bad: blacklisting — an attacker encodes the blocked chars
comment: reject if contains "'" or "--"    # ← bypassed with Unicode lookalikes

Injection attacks — the anatomy

Injection happens when unsanitised user input is interpreted as code or commands by a downstream system. The three most common flavours in API contexts:

Type	Downstream system targeted	What the attacker injects
SQL injection	Relational database	SQL keywords / clauses that modify the intended query
NoSQL injection	MongoDB, Redis, etc.	Operator keys like `$where`, `$gt` to bypass filters
Command injection	OS shell (when an API runs system commands)	Shell metacharacters: `;` `\|` ` `$()`

SQL injection — vulnerable vs fixed

Here is a login endpoint that concatenates user input into a SQL string, followed by how an attacker exploits it, followed by the correct parameterized fix:

# Attacker's HTTP request to the vulnerable endpoint: POST /v1/login HTTP/1.1 Content-Type: application/json { "username": "' OR '1'='1", "password": "anything" } # The server concatenates and executes: SELECT * FROM users WHERE username='' OR '1'='1' AND password='anything' # '1'='1' is always true → returns every user → attacker is logged in as user #1

// VULNERABLE — string concatenation (Node.js + mysql2 example)
const query = "SELECT * FROM users WHERE username='"
            + req.body.username
            + "' AND password='"
            + req.body.password + "'";
await db.query(query);   // ← executes attacker-controlled SQL

// SAFE — parameterized query (? placeholders, values passed separately)
const query = "SELECT * FROM users WHERE username=? AND password=?";
await db.query(query, [req.body.username, req.body.password]);
// The DB driver treats ? values as data, never as SQL syntax.
// A username of "' OR '1'='1" matches no row — it is just a string.

Parameterized queries (also called prepared statements) work because the SQL structure is fixed when it is sent to the database. The user-supplied values fill placeholders after the database has already parsed the query. No matter what string the attacker puts in, the database can never be tricked into treating it as SQL syntax.

Validate type, size, and range — everywhere

Injection is the dramatic example, but most real-world input bugs are quieter. A JSON body that contains a 50 MB string where you expected 200 characters can exhaust memory. A negative integer where you expected a positive quantity can underflow account balances. An unexpected field in a POST body can silently overwrite a protected attribute (mass-assignment — see below). Validate all of these at the boundary:

# OpenAPI 3.1 allow-list inline — validate in the schema, before any code runs
requestBody:
  content:
    application/json:
      schema:
        type: object
        required: [name, quantity]
        additionalProperties: false   # rejects unknown fields
        properties:
          name:
            type: string
            minLength: 1
            maxLength: 120
            pattern: '^[a-zA-Z0-9 \-]+$'
          quantity:
            type: integer
            minimum: 1
            maximum: 1000

Mass-assignment

Mass-assignment is what happens when you automatically bind all incoming JSON fields to a model object without checking which fields the caller is allowed to set. A simple example: your User model has an is_admin field. Your update endpoint is:

// UNSAFE — blindly bind all request fields to the user object
PATCH /v1/users/42
Body: { "email": "alice@example.com", "is_admin": true }

// The server does: user.update(request.body)   ← promotes caller to admin

// SAFE — explicit allow-list of writable fields
const allowed = ['email', 'display_name'];
const updates = pick(req.body, allowed);   // ignores is_admin entirely
user.update(updates);

Output encoding

Input validation stops bad data entering. Output encoding prevents stored bad data from being executed when it leaves. If you store a user-supplied string in a database and later return it inside an HTML response without encoding it, a stored XSS attack is possible. For JSON APIs the risk is lower (JSON is data, not markup), but any endpoint that renders HTML or serves files must encode output for the target context (HTML entities for HTML, URL encoding for URLs, etc.).

🎯 Interview angle

When asked "how do you prevent SQL injection?" a junior answer is "sanitize inputs." A senior answer is "use parameterized queries — sanitisation is error-prone because it operates on the attack vector itself. Parameterization removes the attack surface entirely: user input is always data, never parsed as SQL." Then add: "validate on allow-list before the query, and use a least-privilege DB user so a compromised query can only read what it needs."

⚠️ Common trap

Deny-list filtering as a primary defence. Blocking ' and -- seems like it prevents SQL injection. Attackers use URL encoding (%27), Unicode look-alikes (ʼ), double-encoding, and comment variants to bypass it. Every new bypass found in the wild means your deny-list needs updating — an arms race you will lose. Use parameterized queries instead; they eliminate the category, not individual characters.

✅ Do this, not that

Do validate at the API boundary using an allow-list schema (OpenAPI, Joi, Pydantic, Zod — pick your ecosystem's tool), use parameterized queries for every database call, and set additionalProperties: false in your request schema to block mass-assignment by default. Don't rely on client-side validation, deny-list filters, or "trusted" fields passed in query parameters.

Under the hood: how it actually works

String concatenation creates injection because the database's SQL parser sees user input as part of the SQL grammar — it has no way to distinguish "code" from "data" when they arrive as one string. The parser treats the entire concatenated string as a SQL statement, so any SQL syntax embedded in the user's value becomes part of the query's structure.

# Step 1 — The vulnerable server code

# Developer intends this query:
SELECT * FROM users WHERE id = '<user_input>'

# Server code (Node.js):
const query = "SELECT * FROM users WHERE id = '" + req.body.id + "'";


# Step 2 — Attacker sends id = ' OR '1'='1

# Resulting query string sent to DB:
SELECT * FROM users WHERE id = '' OR '1'='1'

# DB parser sees:
#   WHERE clause: (id = '') OR ('1' = '1')
#   '1'='1' is always TRUE
#   → condition is TRUE for every row
#   → returns all users


# Step 3 — More dangerous: UNION-based extraction

# Attacker sends id = ' UNION SELECT username, password, null FROM users --
# Resulting query:
SELECT * FROM users WHERE id = '' UNION SELECT username, password, null FROM users --'
# → dumps the entire users table including password hashes


# Step 4 — Even more dangerous: stacked queries (database-dependent)

# Attacker sends id = '; DROP TABLE orders; --
# Resulting query on MySQL with multi-statement enabled:
SELECT * FROM users WHERE id = ''; DROP TABLE orders; --'
# → first query returns empty, second drops the table

Attacker input (id field)	Resulting WHERE clause	Effect
`' OR '1'='1`	`WHERE id='' OR '1'='1'`	Returns all rows — auth bypass
`' OR 1=1 LIMIT 1 --`	`WHERE id='' OR 1=1 LIMIT 1`	Returns first row regardless of id — log in as first user
`' UNION SELECT table_name,null FROM information_schema.tables --`	`WHERE id='' UNION SELECT table_name...`	Reveals all table names in the database
`'; UPDATE users SET is_admin=1 WHERE id=42; --`	Two statements: SELECT + UPDATE	Promotes user 42 to admin (if multi-statement allowed)

Parameterized queries close this gap because the SQL driver sends the query template and the parameter values as separate protocol messages to the database. The database parses the SQL structure first, with the ? placeholder already understood as "a data slot, not SQL". When the value arrives, the parser has already finished — there is no grammar left to inject into.

# Parameterized query — what the driver sends to MySQL: [Prepare] SELECT * FROM users WHERE id = ? ↑ DB parses this SQL, compiles execution plan, id=? is a typed data slot [Execute] values: ["' OR '1'='1"] ↑ treated as the literal string "' OR '1'='1" — quotes are just characters ↑ no SQL parsing happens again; the value cannot escape its data slot # Result: 0 rows (no user whose id is literally the string "' OR '1'='1")

How to debug & inspect it

Here is how to probe your own API for SQL injection and how to read the signs when something is wrong.

# Basic probe: send a single quote and watch what happens $ curl -s "https://api.example.com/v1/products?id=1'" # Vulnerable response leaks a DB error: {"error":"You have an error in your SQL syntax near '1''' at line 1"} # Probe for boolean-based blind SQLi (no error, but behavior changes) $ curl -s "https://api.example.com/v1/products?id=1 AND 1=1" # should return normal response if vulnerable $ curl -s "https://api.example.com/v1/products?id=1 AND 1=2" # should return empty if vulnerable # Check if database errors are leaking in logs $ grep -i "sql\|syntax\|mysql\|postgres\|ora-" /var/log/app/errors.log | tail -20 # If you see raw DB errors, they're probably reaching the client too — double problem # sqlmap (automated SQL injection scanner) — test only on your own APIs in dev/staging $ sqlmap -u "https://staging.api.example.com/v1/products?id=1" --batch --level=2

Symptom	Cause	Fix
API returns a raw database error message (SQL syntax, ORA-, SQLSTATE)	Input containing SQL metacharacters (quotes, dashes) breaks string concatenation	Switch to parameterized queries; catch DB exceptions and return a generic 500 without the DB message
Different response body/size for `id=1 AND 1=1` vs `id=1 AND 1=2`	Boolean-based blind SQL injection — query structure is injectable	Parameterized queries; also validate that `id` is an integer before it reaches any query
Response contains data from unexpected tables or columns	UNION-based injection — attacker appended a second SELECT	Parameterized queries; use least-privilege DB user that can only SELECT from allowed tables
Rows that shouldn't be visible appear for certain users	BOLA + missing WHERE owner_id clause (different from injection but similar symptom)	Add `AND owner_id = ?` to every query scoped to a user; review all queries that use user-supplied IDs
Log shows `'; DROP TABLE` or `UNION SELECT` in request parameters	Active SQL injection attempt in logs	Block/rate-limit the IP; audit whether any request with those payloads returned a 200; confirm parameterized queries are in place

Search the codebase for string concatenation into queries: grep -rn "f\"SELECT\|+ \"SELECT\|+ 'SELECT" — any hit is a candidate vulnerability.
Confirm every DB call uses ? / %s / $1 placeholders, not f-strings or .format().
Send a single quote (') to every parameter that reaches a DB query — a 500 with a DB error means concatenation.
Verify the DB user has only SELECT/INSERT/UPDATE on the tables it needs — no DROP, no GRANT, no access to system tables.
Ensure DB error messages are caught and replaced with a generic 500 response before reaching the client.

🧠 Quick check

1. Why is an allow-list stronger than a deny-list for validating API input?

A deny-list can only block known bad input — attackers find encoding tricks or novel patterns to slip through. An allow-list rejects everything outside the defined valid set, so unknown-bad inputs fail closed.

2. What makes parameterized queries effective against SQL injection?

With a parameterized query, the database receives the query structure first and compiles it. The user-supplied values are bound afterward as data — the parser has already finished, so injection is structurally impossible, not just unlikely.

3. A PATCH /users/42 endpoint automatically applies all fields from the request body to the user record. Why is this a security problem?

Mass-assignment lets a caller set fields they should not control — roles, billing status, internal flags. The fix is an explicit allow-list of writable fields, so unknown or protected fields are ignored regardless of what the client sends.

4. Which of these is the most dangerous input-handling approach?

Concatenating user input into a shell command enables command injection — the attacker uses ;, |, or $() to execute arbitrary OS commands. Options A and C are both valid allow-list approaches.

✍️ Exercise: spot and fix the injection (try before opening)

The following Python snippet is the handler for GET /v1/products?category=electronics. Identify the vulnerability, explain how it could be exploited, and rewrite it to be safe.

category = request.args.get('category')
result = db.execute(f"SELECT * FROM products WHERE category='{category}'")

Model answer:

Vulnerability: The category query parameter is interpolated directly into the SQL string. An attacker sends ?category=' OR '1'='1 to dump every product, or ?category='; DROP TABLE products; -- to attempt destructive commands.

# Safe version — parameterized query
category = request.args.get('category')
# Allow-list: only accept known category slugs
valid_categories = {'electronics', 'clothing', 'books'}
if category not in valid_categories:
    return 400, {"error": "invalid category"}
result = db.execute(
    "SELECT * FROM products WHERE category=%s",
    (category,)   # ← tuple of values; db driver handles escaping
)

Rubric: ✓ named the attack (SQL injection) ✓ showed a concrete exploit payload ✓ fixed with parameterized query ✓ added allow-list as a second layer ✓ did not rely solely on escaping.

Key takeaways

Never trust the client — all validation that matters runs server-side; client-side checks are UX, not security.
Allow-list (define what is valid and reject the rest) is stronger than deny-list (define what is forbidden).
Parameterized queries eliminate SQL injection by separating SQL structure from user data — the database never parses user input as code.
Validate type, length, and range for every input field; set additionalProperties: false to prevent mass-assignment.
Output encoding is the complement to input validation — it prevents stored data from executing when rendered.