If you've ever wondered why %20, &, and aGVsbG8= all show up in the same web request, you're not alone. These three encodings — URL (percent), HTML (entity), and Base64 — solve different problems but get mixed up constantly. Reaching for the wrong one corrupts data, breaks rendering, or opens an XSS hole. Here's a short field guide.
The one-line definition of each
- URL encoding (also called percent-encoding) escapes characters that have special meaning in a URL —
/,?,&,#, spaces — so they can appear inside a URL component as data. - HTML encoding (also called entity encoding) escapes characters that have special meaning in HTML —
<,>,&,",'— so they render as literal text instead of being parsed as markup. - Base64 encodes arbitrary bytes as printable ASCII so binary data can travel through text-only channels (JSON, email, headers).
Three problems, three encodings. They're not interchangeable.
When to use each
| Scenario | Encoding |
|---|---|
| Putting a user search term in a query string | URL |
| Showing user text inside HTML | HTML |
| Embedding an image inside a JSON payload | Base64 |
Building a mailto: link with a subject line |
URL |
Displaying a < character in a paragraph |
HTML |
| Sending a binary file in an SMTP email | Base64 |
| Passing user input through a redirect URL | URL (and the redirect should validate) |
| Storing a small icon inline in CSS | Base64 (as a data: URL) |
If a value crosses more than one boundary — say a user-typed string gets put in a URL and then rendered into HTML — you need to apply each encoding at its own boundary, in order. Decode after each one, encode again before the next.
URL encoding in detail
URL syntax reserves a handful of characters. : separates the scheme from the rest. / separates path segments. ? starts the query string. & separates query parameters. # starts the fragment. % itself is the escape introducer.
To put any reserved character inside a value — say a search term that contains & — you replace it with %XX, where XX is its byte value in hex. Spaces become %20. The é in café becomes %C3%A9 (UTF-8 byte pair). The 🎉 emoji becomes %F0%9F%8E%89 (UTF-8 four-byte sequence).
JavaScript has two relevant functions:
encodeURIComponent(value)— escapes everything reserved. Use this for each individual query value or path segment.encodeURI(url)— leaves URL structure intact and only escapes truly unsafe characters. Use this for an entire URL where you want the result to still parse as a URL.
The most common mistake is using encodeURI on a query value. It will leave & and = unescaped, so a value like Tom & Jerry will appear to your server as two separate query parameters. Always use encodeURIComponent per value.
The URL Encoder supports both modes side by side, plus a query-string breakout that decodes each parameter into a table — handy when debugging a malformed URL.
HTML encoding in detail
HTML reserves a different set of characters. < and > delimit tags. & starts a character reference. Inside attribute values, " and ' delimit the attribute. To make any of these characters appear as literal text, you replace them with an entity:
| Character | Entity |
|---|---|
< |
< |
> |
> |
& |
& |
" |
" |
' |
' |
The full rule for safe HTML output is more nuanced — different contexts (HTML body, attribute, URL inside href, JavaScript inside <script>, CSS inside <style>) need different escaping. The OWASP XSS cheat sheet has the complete list. For everyday work, the five entities above cover 95% of cases.
The mistake here is decoding too early. If you receive HTML-encoded user input and want to put it into a URL, you should NOT decode it to its raw form, treat it as a URL component, and re-encode. The lifecycle is: decode HTML once to recover the original Unicode, then URL-encode that for the URL boundary. Forgetting the URL encoding leads to XSS via redirected URLs.
Base64 in detail
Base64 represents arbitrary bytes using only the characters A-Z, a-z, 0-9, +, and / (plus = for padding). Every three input bytes become four output characters. The output is 33% larger than the input.
Use it when you need to put bytes into a text channel. Three common cases:
- JSON. JSON has no binary type. To embed a small image or a certificate inside a JSON payload, Base64-encode it as a string.
data:URLs.data:image/png;base64,iVBORw0KG...lets the browser render an inline image without a separate HTTP request. Useful for tiny icons, not for hero images (33% bloat, no caching).- JSON Web Tokens. A JWT is three Base64URL-encoded segments separated by dots.
Base64 has a URL-safe variant (- and _ instead of + and /, padding optionally dropped). Use it any time the encoded value touches a URL, a JWT, or a filename. The Base64 Encoder supports both standard and URL-safe modes.
What Base64 is not for: encryption (it's fully reversible), hashing (it doesn't lose information), or "obfuscating" a config value (anyone who sees the encoded string can decode it instantly).
When two encodings combine
The interesting bugs happen at boundaries. A few real examples:
User types Tom & Jerry in a search box.
- JavaScript URL-encodes the value:
Tom%20%26%20Jerry. - Server decodes:
Tom & Jerry. - Server renders to HTML inside a heading.
- HTML-encode for output:
Tom & Jerry.
Skip step 4 and the & may be parsed as a malformed entity (browsers are forgiving but inconsistent). Skip step 1 and the server may receive the value as two fragments. Both steps are required.
An attacker tries to inject HTML through a query string.
- Attacker visits
/search?q=<script>alert(1)</script>. - Browser URL-encodes nothing visible, sends
q=%3Cscript%3Ealert(1)%3C/script%3E. - Server decodes URL:
q = "<script>alert(1)</script>". - If the server interpolates
qinto HTML without HTML-encoding: XSS. - If the server HTML-encodes on output: the page renders
<script>alert(1)</script>as literal text. Safe.
URL encoding does not protect against XSS. HTML encoding does. The two are not substitutes for each other.
Storing a binary blob inside a URL.
- Application has 1 KB of binary data to pass through a redirect URL.
- Base64-encode the binary: still text, 1.33 KB.
- URL-encode the Base64 output: still text, slightly bigger (Base64 contains
+and/, which need escaping). - Or use Base64URL (
-and_instead of+and/) to skip step 3.
Base64URL is the right tool here. Standard Base64 + URL encoding works but doubles up unnecessarily.
The mental shortcut
When you're about to encode something, ask: "What kind of channel is this value about to travel through?"
- About to live inside a URL? → URL-encode.
- About to be inserted into HTML? → HTML-encode.
- About to be carried as text inside something that expects text but the data is bytes? → Base64.
Apply the encoding at the boundary, decode at the next boundary, re-encode for the channel after that. Three encodings, one rule.