·5 min read

Regex anchors and lookarounds in plain English

The five anchors and four lookarounds, demystified with practical examples — and the one mistake almost everyone makes with ^ and $.

Regex anchors and lookarounds are the parts of a regular expression that don't consume any characters. They look like a match but they don't move the cursor. Once you understand that, every weird quirk falls into place. Here's the short guide.

What "doesn't consume" means

A normal character class like [a-z] consumes a character — it matches one letter and advances the position by one. An anchor like ^ matches a position — the start of a line — without moving anywhere. After ^ matches, the next part of the pattern picks up at the same character that ^ was looking at.

This distinction is the key to everything. Anchors and lookarounds are zero-width assertions. They check something about the current position and either succeed or fail without changing where in the string we are.

The five anchors

  • ^ — start of the string (or start of a line in multiline mode).
  • $ — end of the string (or end of a line in multiline mode).
  • \b — word boundary (between a word character and a non-word character).
  • \B — non-word-boundary (anything that isn't a \b position).
  • \A and \z — true start and end of the string, regardless of multiline mode. (Not all flavors support these — Python, Ruby, Java do; JavaScript does not.)

^ and $ and the multiline trap

In JavaScript, ^ matches only the start of the input by default. With the m flag, it matches the start of any line. Most people learn ^ as "start of input" and never learn the multiline switch, so they're surprised when their pattern ^\d+ doesn't match the first number on each line.

The most common ^/$ bug: writing ^password$ expecting it to match the whole-string value "password", then running it against "password\n" and being confused when it doesn't match. By default, $ in JavaScript matches the position right before the final \n only if there is no m flag — actually, by default $ matches only the very end of the string. The m flag changes its behavior. Spec-level pedantry matters here; just test the corner case in the Regex Tester before you commit.

\b — the most useful anchor

\b matches the position between a word character (\w, which is [A-Za-z0-9_]) and a non-word character. It's how you say "match this word but not when it appears inside a longer word."

  • \bcat\b matches cat but not category or concat.
  • \bcat matches cat and catalog but not concat.
  • cat\b matches cat and concat but not catalog.

The catch: \b's definition of "word character" is ASCII-only in many flavors. \bcafé\b may treat é as a non-word character and surface false boundaries. JavaScript fixed this in ES2018 with the u flag and Unicode property escapes; older engines did not.

The four lookarounds

A lookaround is a zero-width assertion that checks a pattern at the current position.

  • (?=…)lookahead. Match here only if would match starting at this position.
  • (?!…)negative lookahead. Match here only if would NOT match.
  • (?<=…)lookbehind. Match here only if matched ending at this position.
  • (?<!…)negative lookbehind. Match here only if would NOT match ending here.

The defining property is the same as for anchors: they don't consume characters. After a (?=foo) succeeds, the cursor is still in the same place; the next part of the pattern starts from where the lookahead started.

Three useful examples

Password validation: must contain at least one digit, at least one letter, ≥8 chars.

^(?=.*\d)(?=.*[A-Za-z])[A-Za-z\d]{8,}$

Three zero-width assertions stack at the start: one checks that the rest of the string contains a digit, one checks that it contains a letter, and the third — actually a normal pattern — consumes the characters. Lookaheads here let you check multiple independent conditions without worrying about order.

Find numbers not preceded by a dollar sign.

(?<!\$)\b\d+\b

(?<!\$) rejects the position if the previous character was $. Then \b\d+\b matches a standalone number.

Replace cat with dog only when not at the end of a word.

cat(?!\b) matches cat only when it's followed by another word character — i.e., it's inside a longer word like catalog.

Variable-length lookbehind

JavaScript (since ES2018), Python, and PCRE support variable-length lookbehind: (?<=\w+) matches a position preceded by one or more word characters. Older engines required a fixed length. If you target old engines (some legacy regex libraries), you may have to rewrite as a non-capturing group elsewhere.

Greedy, lazy, possessive — not the same as lookarounds

A common confusion: people lump the +? (lazy quantifier) and ++ (possessive quantifier, where supported) in with lookarounds because they look like they're "modifying" matching behavior. They're not zero-width — they consume. They control how many characters a quantifier matches; lookarounds control whether a position is allowed.

If your regex is doing something unexpected, ask: is the symptom that the cursor moved when I didn't expect, or that the match boundary is in a weird place? Cursor problems mean lookaround/anchor issues. Boundary problems mean quantifier issues.

The one mistake almost everyone makes

You write ^https?://.*$ and assume it matches a URL. It does — but only one. Then you run it against multi-line input expecting to extract every URL, one per line, and the regex matches the entire input as a single URL because . matches every character except newline and $ happens to land at the end.

Two fixes:

  • Add the m flag so ^ and $ match line boundaries.
  • Replace .* with [^\n]* so the match can't span lines.

Combined: /^https?:\/\/[^\s]+$/gm.

The 60-second test for this kind of bug is: open the Regex Tester, paste a multi-line sample with two of the thing you want to match, and check that you get two matches, not one. If you get one, your . is eating the newline.

A cheat sheet to keep around

Symbol Meaning
^ Start of string (or line with m)
$ End of string (or line with m)
\b Word boundary
\B Not a word boundary
(?=x) Lookahead — followed by x
(?!x) Negative lookahead — not followed by x
(?<=x) Lookbehind — preceded by x
(?<!x) Negative lookbehind — not preceded by x
. Any character except newline (use s flag for any)
\w Word character (ASCII unless u flag)

The mental model that makes all of this click: anchors and lookarounds check a position, the rest of regex matches characters. When in doubt, ask "is the cursor supposed to move here?" — if no, you want an anchor or a lookaround.

Related tools

Regex TesterTest JavaScript regular expressions in real time. See matches, capture groups, and flags as you type. Pattern and text stay in your browser.Text Case ConverterConvert text between camelCase, PascalCase, snake_case, kebab-case, CONSTANT_CASE, Title Case, and more — instantly, in your browser.Diff CheckerCompare two blocks of text side-by-side and highlight added, removed, and changed lines. Runs entirely in your browser.
← All posts