How PDF compression actually works (and why some PDFs barely shrink)

Every time someone says "compress this PDF" they expect the file to halve in size. Sometimes it does. Often it shrinks 5%. Sometimes it grows. The reason isn't the compression tool — it's what was in the PDF to begin with. This post explains where PDF bytes come from, why compression isn't a single thing, and how to predict whether your PDF will actually shrink.

A PDF is not one thing

When you compress a JPEG, you're choosing a quality level and the encoder does the rest. When you compress a PDF, you're potentially:

Re-compressing already-compressed image streams.
Re-encoding raster images at lower quality or resolution.
Subsetting embedded fonts.
Removing unused objects.
Switching to a more efficient image codec (JPEG → WebP, JPEG 2000 → JPEG XR).
Rewriting stream wrappers with a denser compression filter.

Different PDFs benefit from different ones. There's no universal "compress this PDF" — only a stack of specific operations, each of which helps with a specific kind of waste.

Where the bytes come from

Open a typical PDF and the bytes break down roughly like:

Embedded fonts: 10–500 KB each. A PDF with two custom fonts and three weights can hide a megabyte of font data before any content appears.
Raster images: typically 50–95% of any PDF that contains photographs, scans, or screenshots. A single full-page JPEG at 300 DPI is ~2-5 MB.
Content streams: usually small (5-50 KB per page) once Flate-compressed. Mostly text positioning instructions.
Metadata, structure: tiny (a few KB).
Unused objects: zero in a clean PDF, sometimes many KB in an edited or merged one.

If you're trying to compress a PDF, look at the file size and ask: "What's the dominant content?" That tells you which lever to pull.

The dominant lever: image re-encoding

For most PDFs in the wild — scans, slide decks, reports with screenshots — the biggest win is re-encoding the embedded images:

Walk every page, find every image xobject.
Decode it (it's JPEG, JBIG2, or a lossless format like Flate-wrapped pixel data).
Re-encode at lower quality and/or lower resolution.
Rewrite the xobject's stream with the smaller payload.

The savings depend on what you started with:

Lossless image (PNG-style inside PDF) → JPEG at quality 80 typically saves 70–95% of that image's size.
JPEG at quality 95 → JPEG at quality 75 typically saves 40–60% of that image's size.
JPEG at quality 75 (already compressed) → quality 60 typically saves 10–20%, and visible artifacts start showing.
300 DPI scan → 150 DPI typically halves resolution, quartering data, with little visible loss on screens.

Multiplying these by "how much of the PDF is images" gives you a realistic estimate. If the PDF is 90% images and you save 50% of image bytes, you save 45% of the PDF. If it's 10% images, the same operation saves 5%.

Fonts: small wins, easy to leave on the table

Most PDFs embed font subsets — only the glyphs they actually used. But you'd be surprised how often that fails. A "Save as PDF" from Microsoft Office sometimes embeds the full font for everything it knows you might need. A PDF stitched together from multiple sources often duplicates fonts because each source embedded its own copy.

The fix:

Subset fonts if any are fully embedded. Easy win, 100–500 KB per font.
Deduplicate fonts if the same font is embedded multiple times under different names. Common when merging PDFs.

This is rarely the dominant win but it's free bytes if your tool does it.

Stream-level compression

Every stream in a PDF can have a filter chain — /FlateDecode (zlib) is the standard. Older PDFs sometimes use /LZWDecode (LZW), which compresses about 30% worse for typical content. Re-Flating an LZW-compressed stream is pure win — same content, smaller file.

Modern PDF compressors also try:

Re-ordering content streams to expose more repetition (better for zlib).
Switching to object streams in PDF 1.5+, which Flate-compresses groups of small objects together. The trailer rewrites accordingly.
Stripping per-object generation history.

These are all unglamorous structural wins, usually 5–20% on a verbose source PDF.

What re-encoding can't do

There's a floor below which compression hurts more than it helps:

Already-tight scans. A 400 KB JPEG of a single page at 200 DPI is close to as small as you can get without visible degradation. Re-encoding at lower quality smears text.
Vector-heavy PDFs. A complex chart with thousands of paths is fully described by its vector data; there's no image to re-compress.
Text-only documents. A 50-page text report can already be 100 KB. There's almost no fat to trim.

When tools advertise "compress any PDF to 1/10 the size", they're banking on you not noticing the quality drop. A real compressor tells you when it can't help.

DPI: the lever nobody talks about

For scanned PDFs, the resolution is often higher than it needs to be. A typical scan at 300 DPI produces about 8 megapixels per A4 page. For on-screen viewing, you only need ~150 DPI (roughly 2 megapixels). For phone viewing, 100 DPI is plenty.

Halving DPI quarters pixel count, which quarters JPEG bytes. A 4 MB scan at 300 DPI becomes ~1 MB at 150 DPI, with no visible difference unless you zoom past 100%.

The trade-off: printing. If the PDF will be printed at full A4 size, 200–300 DPI is the safe range. For "send by email and view on screen", 100–150 DPI is fine.

The compression tools you'll see

Real-world compressors split roughly into three tiers:

Object/stream compressors — Ghostscript with -dPDFSETTINGS=/ebook, qpdf --optimize, pdfcpu optimize. These touch filters, fonts, structure. They never recompress images and never lose quality. Typical savings: 5–25% on PDFs that haven't been optimized.
Image-aware recompressors — Ghostscript with /screen, Adobe Acrobat's "Reduce File Size", iLovePDF compress, Smallpdf compress. They downsample images and re-encode JPEGs at lower quality. Big savings on scan-heavy PDFs (50–90%); modest on text-heavy ones.
Format-converting compressors — Adobe Acrobat Pro, some commercial tools. They can convert JPEG to JPEG 2000 or JBIG2 for further savings. Heavy hammer, usually overkill.

A browser-based tool can do tier 1 and (with more work) tier 2. We're building a focused tier-2 compressor that re-encodes embedded JPEGs at chosen quality and resolution; that's the simplest thing that produces real savings for the average user.

In the meantime, two practical workflows that already help:

For a multi-page scan: use the PDF to JPG tool at Medium quality, then re-make the PDF from those JPGs with a PDF builder. The intermediate step strips per-image overhead.
For a too-big image inside a PDF: extract it with the conversion tool, resize with the Image Resizer, and rebuild the PDF.

Neither is as clean as a single-button compress. Both produce dramatic savings on scan-heavy PDFs in 2026.

How to predict your shrink rate

A quick mental model before you compress:

All text, no images (Word reports, ebooks): expect 0–15% savings. Skip the compress.
Mostly text, some images (slide decks, mixed reports): expect 10–30% if a tool does subset + stream re-compression.
Image-heavy (photo-heavy decks, design exports): expect 30–60% with image re-encoding.
Pure scans (phone-scanned documents, archived paper): expect 50–95% with DPI reduction and JPEG re-encoding.
Already-optimized PDF (any of the above run through a compressor already): expect 0–5% more. Don't keep re-running.

If you've compressed twice and you're still over the limit, the problem isn't the tool — the PDF wants to be that big. Time to split it, or rethink whether everyone needs all the pages.

A note on privacy

PDF compression in this site's tools runs entirely in your browser. The original PDF, the parsed object tree, and every intermediate image stay in your tab. They never reach a server. Most "free PDF compressor" sites upload your file to do the work on a server you didn't pick. Your contract, your scan, your medical PDF — all on someone else's machine, retained for some unstated period. Worth knowing what you're handing over before you click "Compress".