·5 min read

URL encoding vs HTML encoding vs Base64: which one when

Three encodings that look similar, solve different problems, and break in different ways when you reach for the wrong one. A practical guide.

If you've ever wondered why %20, &, and aGVsbG8= all show up in the same web request, you're not alone. These three encodings — URL (percent), HTML (entity), and Base64 — solve different problems but get mixed up constantly. Reaching for the wrong one corrupts data, breaks rendering, or opens an XSS hole. Here's a short field guide.

The one-line definition of each

  • URL encoding (also called percent-encoding) escapes characters that have special meaning in a URL — /, ?, &, #, spaces — so they can appear inside a URL component as data.
  • HTML encoding (also called entity encoding) escapes characters that have special meaning in HTML — <, >, &, ", ' — so they render as literal text instead of being parsed as markup.
  • Base64 encodes arbitrary bytes as printable ASCII so binary data can travel through text-only channels (JSON, email, headers).

Three problems, three encodings. They're not interchangeable.

When to use each

Scenario Encoding
Putting a user search term in a query string URL
Showing user text inside HTML HTML
Embedding an image inside a JSON payload Base64
Building a mailto: link with a subject line URL
Displaying a < character in a paragraph HTML
Sending a binary file in an SMTP email Base64
Passing user input through a redirect URL URL (and the redirect should validate)
Storing a small icon inline in CSS Base64 (as a data: URL)

If a value crosses more than one boundary — say a user-typed string gets put in a URL and then rendered into HTML — you need to apply each encoding at its own boundary, in order. Decode after each one, encode again before the next.

URL encoding in detail

URL syntax reserves a handful of characters. : separates the scheme from the rest. / separates path segments. ? starts the query string. & separates query parameters. # starts the fragment. % itself is the escape introducer.

To put any reserved character inside a value — say a search term that contains & — you replace it with %XX, where XX is its byte value in hex. Spaces become %20. The é in café becomes %C3%A9 (UTF-8 byte pair). The 🎉 emoji becomes %F0%9F%8E%89 (UTF-8 four-byte sequence).

JavaScript has two relevant functions:

  • encodeURIComponent(value) — escapes everything reserved. Use this for each individual query value or path segment.
  • encodeURI(url) — leaves URL structure intact and only escapes truly unsafe characters. Use this for an entire URL where you want the result to still parse as a URL.

The most common mistake is using encodeURI on a query value. It will leave & and = unescaped, so a value like Tom & Jerry will appear to your server as two separate query parameters. Always use encodeURIComponent per value.

The URL Encoder supports both modes side by side, plus a query-string breakout that decodes each parameter into a table — handy when debugging a malformed URL.

HTML encoding in detail

HTML reserves a different set of characters. < and > delimit tags. & starts a character reference. Inside attribute values, " and ' delimit the attribute. To make any of these characters appear as literal text, you replace them with an entity:

Character Entity
< &lt;
> &gt;
& &amp;
" &quot;
' &#39;

The full rule for safe HTML output is more nuanced — different contexts (HTML body, attribute, URL inside href, JavaScript inside <script>, CSS inside <style>) need different escaping. The OWASP XSS cheat sheet has the complete list. For everyday work, the five entities above cover 95% of cases.

The mistake here is decoding too early. If you receive HTML-encoded user input and want to put it into a URL, you should NOT decode it to its raw form, treat it as a URL component, and re-encode. The lifecycle is: decode HTML once to recover the original Unicode, then URL-encode that for the URL boundary. Forgetting the URL encoding leads to XSS via redirected URLs.

Base64 in detail

Base64 represents arbitrary bytes using only the characters A-Z, a-z, 0-9, +, and / (plus = for padding). Every three input bytes become four output characters. The output is 33% larger than the input.

Use it when you need to put bytes into a text channel. Three common cases:

  • JSON. JSON has no binary type. To embed a small image or a certificate inside a JSON payload, Base64-encode it as a string.
  • data: URLs. data:image/png;base64,iVBORw0KG... lets the browser render an inline image without a separate HTTP request. Useful for tiny icons, not for hero images (33% bloat, no caching).
  • JSON Web Tokens. A JWT is three Base64URL-encoded segments separated by dots.

Base64 has a URL-safe variant (- and _ instead of + and /, padding optionally dropped). Use it any time the encoded value touches a URL, a JWT, or a filename. The Base64 Encoder supports both standard and URL-safe modes.

What Base64 is not for: encryption (it's fully reversible), hashing (it doesn't lose information), or "obfuscating" a config value (anyone who sees the encoded string can decode it instantly).

When two encodings combine

The interesting bugs happen at boundaries. A few real examples:

User types Tom & Jerry in a search box.

  1. JavaScript URL-encodes the value: Tom%20%26%20Jerry.
  2. Server decodes: Tom & Jerry.
  3. Server renders to HTML inside a heading.
  4. HTML-encode for output: Tom &amp; Jerry.

Skip step 4 and the & may be parsed as a malformed entity (browsers are forgiving but inconsistent). Skip step 1 and the server may receive the value as two fragments. Both steps are required.

An attacker tries to inject HTML through a query string.

  1. Attacker visits /search?q=<script>alert(1)</script>.
  2. Browser URL-encodes nothing visible, sends q=%3Cscript%3Ealert(1)%3C/script%3E.
  3. Server decodes URL: q = "<script>alert(1)</script>".
  4. If the server interpolates q into HTML without HTML-encoding: XSS.
  5. If the server HTML-encodes on output: the page renders &lt;script&gt;alert(1)&lt;/script&gt; as literal text. Safe.

URL encoding does not protect against XSS. HTML encoding does. The two are not substitutes for each other.

Storing a binary blob inside a URL.

  1. Application has 1 KB of binary data to pass through a redirect URL.
  2. Base64-encode the binary: still text, 1.33 KB.
  3. URL-encode the Base64 output: still text, slightly bigger (Base64 contains + and /, which need escaping).
  4. Or use Base64URL (- and _ instead of + and /) to skip step 3.

Base64URL is the right tool here. Standard Base64 + URL encoding works but doubles up unnecessarily.

The mental shortcut

When you're about to encode something, ask: "What kind of channel is this value about to travel through?"

  • About to live inside a URL? → URL-encode.
  • About to be inserted into HTML? → HTML-encode.
  • About to be carried as text inside something that expects text but the data is bytes? → Base64.

Apply the encoding at the boundary, decode at the next boundary, re-encode for the channel after that. Three encodings, one rule.

Related tools

URL Encoder & DecoderEncode and decode URLs and query parameters in your browser. Handles full URLs and individual components. Always private.Base64 Encoder & DecoderEncode text to Base64 or decode Base64 back to text instantly in your browser. Unicode-safe. Nothing is uploaded.Markdown PreviewWrite Markdown and see the rendered result side-by-side. GitHub-flavored syntax, sanitized output, copy HTML. Never leaves your browser.
← All posts