·5 min read

Base64 explained: why your data sometimes grows by 33%

What Base64 actually does, why it exists, when to use it, and the small mistakes that bloat your payloads or corrupt your data.

Base64 is one of those bits of plumbing that quietly shows up in every part of the web — embedded images in CSS, JSON Web Tokens, email attachments, OAuth flows, data URLs in HTML. It looks like gibberish, behaves like text, and grows your payload by about a third. If you've ever wondered why it exists and when to reach for it, this is the long answer.

What Base64 actually is

Base64 is an encoding, not an encryption. It takes a sequence of bytes (any bytes — image data, a UTF-8 string, raw binary) and represents them using only 64 ASCII characters: A-Z, a-z, 0-9, +, and /. The trailing = you sometimes see is padding.

The recipe is mechanical:

  1. Take the input bytes three at a time. Three bytes = 24 bits.
  2. Slice those 24 bits into four 6-bit chunks.
  3. Each 6-bit chunk has a value from 0 to 63 — look it up in the Base64 alphabet.

That's the whole thing. Three input bytes become four output characters. Always.

Why four characters per three bytes — and where the 33% comes from

Three bytes carry 24 bits of information. Four Base64 characters carry the same 24 bits, but each character costs a full byte to store (because text is one byte per ASCII character). So you've turned 3 bytes of input into 4 bytes of output. That's a 33% increase. Math doesn't care if the input was an image, a string, or a binary blob — the overhead is constant.

If the input isn't a clean multiple of three, the encoder pads. Two leftover bytes become three Base64 characters plus one =. One leftover byte becomes two Base64 characters plus two ==. The padding lets a decoder reconstruct the exact original byte count.

Why it exists at all

Base64 solves one specific problem: getting binary data through systems that only handle text safely. That's a real constraint in more places than you'd think.

  • Email (MIME): SMTP was designed for 7-bit ASCII. Attachments are Base64-encoded so they survive every relay along the way.
  • JSON: JSON has no native binary type. To embed an image, a certificate, or any raw bytes, you Base64 them and store the string.
  • Data URLs: data:image/png;base64,... lets a browser render an inline image without a separate HTTP request.
  • JSON Web Tokens: A JWT is three Base64URL-encoded segments separated by dots. The header, payload, and signature are each binary in spirit but travel as text.
  • HTTP headers: Basic Auth credentials, X.509 fingerprints, and a long tail of other headers use Base64 because the header line is a text channel.

The common thread: you have a transport that's safe for printable ASCII but not for arbitrary bytes, and you need bytes to get through.

Base64 vs Base64URL

Standard Base64 uses + and /. Both characters mean something in a URL — + is sometimes interpreted as a space, and / is a path separator. So the standard library defines a URL-safe variant where:

  • + becomes -
  • / becomes _
  • Padding = is often dropped (decoder infers length)

If you're using Base64 in a query string, a path segment, or a JWT, you almost certainly want Base64URL. Our Base64 Encoder & Decoder supports both — and the URL Encoder is where to go if you're escaping non-Base64 strings for URLs.

When Base64 is the wrong tool

Base64 is overused. Three patterns to push back on:

Storing binary in a database. Most databases have a binary type (Postgres bytea, MySQL BLOB). Storing the same data as Base64 wastes 33% disk, breaks indexable equality on the raw bytes, and complicates client-side decoding. Use the binary column.

"Encrypting" anything. Base64 is encoding — fully reversible without a key. It is not a hash, not a cipher, not a signature. If you need confidentiality, use real crypto (AES-GCM, age, libsodium). If you need integrity, use a hash. If you need a token that's hard to guess, use a CSPRNG.

Inline images in HTML emails or web pages, for everything. Data URLs skip an HTTP request but get bigger by 33%, can't be cached separately, bloat the HTML, and force the browser to re-decode on every render. They make sense for tiny icons (sub-1KB), maybe a few SVGs above the fold. For anything larger, serve a real file.

Common bugs

A few classes of bug that crop up in code that deals with Base64:

  • UTF-8 round-trip errors. btoa("résumé") throws in browsers because btoa expects each character to be in the Latin-1 range. The fix is to encode the string as UTF-8 bytes first (TextEncoder) and then Base64. The inverse on decode.
  • Padding mismatches. Some libraries emit padding (=, ==), others don't. JWTs strip padding by convention; some Base64URL encoders don't. A decoder that's strict about padding will reject input that another encoder produced. If you control both ends, pick one rule.
  • Whitespace inside Base64. PEM-encoded keys and email MIME bodies break Base64 across 64- or 76-character lines. Strict decoders reject the embedded newlines. Most permissive decoders strip whitespace silently. If you're writing your own, strip first.
  • Treating Base64 as opaque. A leading eyJ is almost always a JSON object's {" Base64-encoded — a quick way to spot a JWT or a config blob without running a decoder.

A worked example

Take the string Cat. Three bytes — exactly one group, no padding.

ASCII:    C        a        t
Decimal:  67       97       116
Binary:   01000011 01100001 01110100
Regroup:  010000 110110 000101 110100
Decimal:  16     54     5      52
Base64:   Q      2      F      0

Cat becomes Q2F0. Three bytes in, four characters out, no padding needed.

For one-byte input A:

Binary:   01000001  (pad with zeros to fill 12 bits)
Regroup:  010000 010000
Base64:   Q      Q      ==

Two characters, two = padding. Total still four — Base64 always emits a multiple of four characters.

When to reach for Base64

Use it when you need to put bytes into a text channel and you control the consumer. Use Base64URL specifically for anything that touches a URL, a query string, or a JWT. Don't use it as a security measure — it doesn't hide anything from anyone looking. And don't use it as a default for binary storage when you have a real binary column.

For everything else, the encoder is on this site — paste your input, get the output, move on with your day.

Related tools

Base64 Encoder & DecoderEncode text to Base64 or decode Base64 back to text instantly in your browser. Unicode-safe. Nothing is uploaded.URL Encoder & DecoderEncode and decode URLs and query parameters in your browser. Handles full URLs and individual components. Always private.Hash Generator (MD5, SHA-256…)Generate MD5, SHA-1, SHA-256, SHA-384, and SHA-512 hashes from text in your browser. Verify checksums without leaking the input.
← All posts