Guilty by Punctuation

Em-dash: linguistic stigmatisation and sociolinguistic marker

Apr 06, 2026

Post something online with an em-dash in it. Wait. Within hours, someone will tell you it was written by AI.

This is where we are now. A punctuation mark with centuries of continuous use in English has become a shibboleth. Not a marker of literacy or style, but of machine authorship. The em-dash — that long horizontal stroke that writers from Emily Dickinson to Zadie Smith have relied on to interrupt, redirect, and dramatise their prose — is guilty by association.

The charge is straightforward. Large language models use em-dashes. They use them a lot. Therefore, if you use one, you are probably not writing your own text. The logic is identical to arguing that anyone wearing a trench coat must be a spy, because spies in films wear trench coats.

Humphrey Bogart as Rick Blaine in Casablanca (1942). — Humphrey Bogart as Rick Blaine in *Casablanca* (1942)

The pedigree

The dash has been part of English typography since at least the seventeenth century. Parkes (1993) traces its emergence in printed texts to the early modern period, where it served functions that commas and semicolons could not: marking abrupt shifts in thought, setting off parenthetical material with more force than brackets, and signalling interrupted speech. The specific em-dash — a dash the width of the letter “m” — became standardised as printing technology matured, and by the nineteenth century it was a fixture of English prose.

Its literary credentials are well established. Dickinson made it a structural principle of her poetry and a signature element of her poetic lineation. Woolf used it to trace the movement of consciousness in prose that resists the stopping points of conventional punctuation. Sterne turned it into a device of comic timing in Tristram Shandy. In the twentieth century, writers as different as Joan Didion and David Foster Wallace found that their prose rhythms could absorb the dash without strain. And in the twenty‑first, Maggie Nelson’s Bluets and The Argonauts show how the dash can hold two ideas in suspension without forcing a premature resolution.

Crystal (2015) describes the em-dash as one of the most versatile marks in English punctuation. It functions variously as a parenthetical separator, an appositive marker, a signal of amplification, and a substitute for the colon. No other single mark covers this range. The semicolon is too formal. The colon too declarative. The comma too weak. The em-dash sits in a space that no other mark occupies.

In academic writing, both the Chicago Manual of Style and the Oxford Style Manual accept and describe em-dash usage. APA permits it. The em-dash is not slang. It is not informal. It is standard equipment in every serious style guide published in English.

How we got here

Large language models are trained on text. Vast quantities of it. That training data includes literary fiction, journalism, academic papers, blogs, and everything else that constitutes the written record of English. The em-dash appears frequently in this material because good writers use it frequently.

When a model produces text containing em-dashes, it is doing exactly what it was designed to do: reproducing patterns found in its training data. The model did not invent the em-dash. It learned it from us. Blaming the em-dash for appearing in AI output is like blaming the word “however” for appearing in undergraduate essays. The tool is not the problem. The frequency is.

Consider the circularity. We write with em-dashes for centuries. We feed that writing into a model. The model reproduces the pattern. We then accuse ourselves of sounding like the model. At no point in this chain did the em-dash change. Only its social meaning did.

Ouroboros-snake-eating-its-own-tail-eternity-or-vector-12076546. From Corvid Research Blog

And there is a real problem with frequency. Models use em-dashes with a regularity and distributional consistency that no individual human writer would match. A human writer has preferences. Some reach for em-dashes constantly; others avoid them entirely. A model trained on millions of writers produces text that reflects the aggregate, which means em-dashes appear at a rate that feels uncanny. They appear in the same syntactic slots, with the same parenthetical insertion pattern — “X — and this is the key point — is Y” — over and over.

The tell is not the em-dash itself. The tell is the monotony.

There is a feedback loop at work. Human writers remove em-dashes from their prose to avoid suspicion. That prose enters the next generation of training data. Future models, trained on text with fewer em-dashes, will produce fewer in turn. The mark disappears from the written record not because it failed, but because a false heuristic became a self-fulfilling prophecy. Some model providers have already responded to the backlash. In November 2025, OpenAI announced that ChatGPT would finally respect user instructions to avoid em-dashes, a request it had previously ignored. The mark remains in the default output, but the fact that its removal was treated as a product fix tells us everything about how far the stigmatisation has gone.

Stigmatisation by association

What is happening to the em-dash is a textbook case of what sociolinguists call stigmatisation. Labov (1972) demonstrated that linguistic features become socially marked not because of anything inherent to the feature, but because of their association with particular speaker groups. The dropped “h” in English is phonologically unremarkable. It became stigmatised because it was associated with working-class speech. The double negative is logically coherent and historically standard in English. It became stigmatised because prestige dialects abandoned it.

The em-dash is undergoing the same process, but in reverse. It was a prestige feature. It carried literary and intellectual connotations. Now it is being reassigned — not to a lower social class, but to a non-human source. The mechanism is identical: a feature acquires a social meaning that overrides its functional one.

Trudgill (2000) uses the term “sociolinguistic marker” for a variable that has acquired social significance and triggers evaluation by listeners or, in this case, readers. The em-dash has become a sociolinguistic marker of machine authorship, regardless of who actually produced the text. This is not rational evaluation. It is heuristic pattern matching, and the heuristic is producing false positives at an extraordinary rate.

Every journalist, academic, novelist, and essayist who has used em-dashes for their entire career is now subject to suspicion. The feature has been contaminated by association.

This is not just about punctuation. The stigmatisation of the em-dash is a symptom of a broader anxiety about authenticity in the age of generative AI. We do not yet have reliable tools for distinguishing human from machine text. So we reach for proxies. The em-dash is one. Overly fluent prose is another. The word “delve” has become suspect. So has “tapestry”. We are assembling a folk taxonomy of machine tells, and it is riddled with false positives. The result is not better detection. It is worse writing, as humans contort their prose to avoid triggering the heuristic.

What we are really detecting

If you want to identify AI-generated text, the em-dash is a poor indicator. What makes AI prose identifiable is not any single feature but a cluster of distributional properties. The vocabulary is slightly too even. The sentence length variation is slightly too regular. The hedging patterns recur with mechanical frequency. Certain phrasings — “It is worth noting that”, “This is particularly relevant”, “In conclusion” — appear in contexts where no human writer would bother with them.

Human writers are inconsistent. They have habits, tics, and blind spots. They overuse some constructions and avoid others for reasons that are often biographical rather than rational. A writer who grew up reading Didion will use em-dashes differently from one raised on Hemingway. My own punctuation was heavily infuenced by … Louis-Ferdinand Céline…That inconsistency — shaped by reading history, education, temperament, and mood — is the signature of human authorship. No single punctuation mark can substitute for detecting its absence.

The AI detection tools that exist are unreliable, and their operators know it. Accusing someone of using AI because they deployed an em-dash is like accusing a singer of lip-syncing because they hit the right notes. The evidence points nowhere useful.

The real cost

The consequence of this false heuristic is that human writers are now self-censoring. They are removing em-dashes from their prose — not because the marks are wrong, but because they fear the accusation. This is a loss. The em-dash does things that other punctuation marks cannot do with the same economy and force. A parenthetical set off by em-dashes has a different rhythm, a different weight, a different relationship to the surrounding sentence than the same material enclosed in commas or brackets. Removing it impoverishes the toolkit.

Students are particularly vulnerable. A university essay containing em-dashes now risks being flagged by AI detection software that is itself unreliable and poorly understood by the academics who rely on it. The student who learned to use em-dashes from reading good prose is penalised for having absorbed a technique from exactly the sources their teachers told them to read. The irony is savage.

I say this as someone who rarely uses em-dashes in my own writing, partly as a stylistic preference, partly as a reflection of my first language. French uses the em-dash almost exclusively for dialogue, not for the mid-sentence parenthetical work it performs in English. It is a cultural difference, not a principle. But I recognise the difference between a stylistic choice and a prohibition based on guilt by association. My avoidance is a preference and culture. The current climate is turning avoidance into a norm, and norms based on false premises deserve to be resisted.

There is something darkly comic about the situation. Writers trained for decades in the use of a punctuation mark are now being told they sound like machines — machines that learned the mark from writers exactly like them. The snake is eating its own tail.

The em-dash did nothing wrong. The machines borrowed it. That does not make it theirs.

Crystal, D. (2015) Making a Point: The Pernickety Story of English Punctuation. London: Profile Books.

Houston, K. (2013) Shady Characters: The Secret Life of Punctuation, Symbols & Other Typographical Marks. New York: W.W. Norton.

Labov, W. (1972) Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.

Parkes, M.B. (1993) Pause and Effect: An Introduction to the History of Punctuation in the West. Aldershot: Scolar Press. (2016 ebook, Taylor Francis)

Trudgill, P. (2000) Sociolinguistics: An Introduction to Language and Society. 4th edn. London: Penguin.

Viktoria Verde, PhD

Brilliant work! Thanks. I've been really looking forward to it.

I loved your analogies, and indeed, the core absurdity is that the snake is eating its own tail. We trained the models on our best writing, and now we punish ourselves for sounding like what the models learned from us.

Your application of Labov's stigmatization framework is excellent. The em dash is undergoing a social reassignment, exactly the way dropped /h/ or double negatives did, except that the "undesirable group" isn't a social class but a nonhuman agent.

The poem was a perfect selection to illustrate the point. I can only imagine Emily Dickinson publishing nowadays. Her entire body of work would be AI-flagged :) There's a dark irony that even she might have appreciated.

What concerns me most is the feedback loop you describe. Human avoidance reshapes training data, which reshapes model output, which reshapes human norms. That's the case of folk linguistic ideology actively degrading the written language.

Kem-Laurin Lubin, PhD

20h

I was just talking aboit yhe Em Dash this very morning with another prof about how they seem so common these days im student "writing" 🤔 genAI

1 more comment...

LinguisticallyYours' Substack

Discussion about this post

Ready for more?