Mapping the United Swears of America
Swearing varies a lot from place to place, even within the same country, in the same language. But how do we know who swears what, where, in the big picture? We turn to data – damn big data. With great computing power comes great cartography.
Jack Grieve, lecturer in forensic linguistics at Aston University in Birmingham, UK, has created a detailed set of maps of the US showing strong regional patterns of swearing preferences. The maps are based on an 8.9-billion-word corpus of geo-coded tweets collected by Diansheng Guo in 2013–14 and funded by Digging into Data.
The red–blue scale shows relative frequency. The frequency of a word in the tweets from a given county is divided by the total number of words from that county (which correlates strongly with population density). The result is then smoothed using spatial autocorrection analysis, with Getis-Ord z-scores mapped to identify clusters. Alaska and Hawaii are not included.
Polysemy – a word’s multiple meanings – has not been controlled in the graphs, so the hell map includes straight religious uses as well as sweary ones, the p*ssy map includes cat references, and so on. But the graphs are nonetheless highly suggestive of differential swearword (and minced oath) clustering in different parts of the country.
Hell, damn and bytch are especially popular in the south and southeast. Douche is relatively common in northern states. b*stard is beloved in Maine and New Hampshire, and those states – together with a band across southern Arizona, New Mexico, and Texas – are the areas of particular motherfukker favour. Crap is more popular inland, fukk along the coasts. fukkboy – a rising star* – is also mainly a coastal thing, so far.
Here’s the full glorious set in alphabetical order (click to enlarge):
Grieve put it, ‘pretty much everyone’s swearing. We just don’t all prefer the same words’. You can see more word-maps on his research blog and various publications elsewhere on his website. He and colleagues have been measuring the 100,000 most common words in American English (as manifested in the tweet corpus), so additional maps will be appearing, and he tells me Diansheng is also collecting UK data.
For more on the method of spatial analysis used to create the maps, see for example Grieve’s ‘A regional analysis of contraction rate in written Standard American English’ (PDF), or ‘A statistical method for the identification and aggregation of regional linguistic variation’ (PDF) (co-written with Dirk Speelman and Dirk Geeraerts), both from 2011.
* Grieve’s presentation ‘Mapping lexical spread in American English’ (PDF) has data on the fastest growing words on Twitter in 2014, among other delights. Four of the top 10 are based on fukk. We’re becoming sweary asf.