Researchers find regional dialects come through on Twitter

Mass media has long been credited and blamed for homogenizing language, for erasing regional dialects as we become accustomed to the way words are pronounced by the talking heads on TV. Now, though, a paper from researchers at Carnegie Mellon University says that at least one form of mass media, Twitter, might be a haven for regional dialects online—even though they are written and not spoken.

Author Jacob Eisenstein et al used an automated method to examine a week’s worth of tweets—380,000 messages from 9,500 Twitter users. Their goal was to present an efficient and unbiased method for identifying “geographically-aligned lexical variation” (read “regional dialects”) from raw text.

The researchers looked at larger discussions on Twitter, such as discussions about sports or other events, and then looked for dialects within those conversations that could pin users to a geographic location. While they are careful to point out that slang and certain pronunciations travel and cannot easily be tied to a certain region, they did find trends in spelling and usage were making it on to Twitter.

The researchers say this is because unlike many other forms of mass media, Twitter tends to be more conversational, and users tend to talk with people they know, people who may be located near them or who are interested in the same things and who are therefore likely to talk in a similar way.

From the university’s press release about the paper:

It might be a mistake to assume that the greater interconnectivity afforded by computer networks and sites such as Twitter will necessarily result in more homogeneity in language. The social circles maintained by social networks such as Twitter often are geographically focused, he noted. Also, many people use the Internet to seek out like-minded people with similar interests, rather than expose themselves to a broader range of ideas and experiences.

I suggest you read the paper yourself. It's not too long and it's not a terrible read.