Beyond Words - Language Blog

Number of Languages Remains A Mystery

You’d think that one of the fundamental questions linguists would have sorted out by know is how many languages there are. I thought that too before I studied to become one. Quantifying languages is much more complicated than it initially sounds, with the result that we can never truly count all the world’s languages. Why is this the case?

A lot comes down to the age-old distinction of language vs. dialect. An oft-quoted line is used to illustrate that, far from anything empirically measurable, the difference is political: “a language is a dialect with an army and a navy”. That is, what we call a language is just a dialect that has political, social and cultural power. To speakers and signers, government policy-makers and researchers interested in identity, this is significant.

Hindi and Urdu, for example, used to be much closer than they are today. Increasing political distance between their respective countries of India and Pakistan, and a desire for some speakers to express allegiance to Hinduism and Islam, has led to the two varieties adopting words from Sanskrit and Arabic. Whether we call Hindi and Urdu one language with two dialects (Hindustani, a favorite of Gandhi), or two separate languages, depends considerably more on one’s ideology about the language than anything quantifiable. And the same story is repeated world-over: some dialects of Norwegian and Swedish are essentially identical in speech, yet they are classified as two separate languages.


Linguists have tried to put forth rigid criteria for distinguishing a language from a related dialect, the most common one being that two varieties (a neutral term favored by linguists) are dialects of one language if they share 70% mutual intelligibility. However, these are arbitrary measures that attempt to divide something that is fluid. Language varieties exist in a continuum with each other.

The continuum goes down even to the individual. Each person is unique in terms of their linguistic repertoire, having a unique vocabulary, range of languages, formal and informal registers that they are able to use, and even their own sounds. There is a word for this – idiolect – along with similar words for varieties and characteristics specific to ethnic groups (ethnolect) and socioeconomic classes (sociolect).

This variety in definition makes censuses very difficult. To again use India as an example, the government has some well-defined official, scheduled and regional languages, but beyond this the language names get complicated, with often close varieties being named after the village in which they are spoken. The job of sifting through the huge volume – in the thousands – of names of languages, assessing which should be classified as the same and which are different, falls on somebody who does not have much of an opportunity to ask locals to clarify. Further, people may discount their own abilities in a language variety if they do not consider it prestigious enough to be counted. Each government has its own way of assessing the languages spoken within its borders, often changing the wording of questions each census. Oh, and many languages are spoken across borders, often with different names depending on the country …

… I could go on forever. So how does an organization like Ethnologue come up with a precise figure like 7,097, from their latest (19th) edition?

They compromise. The language database admits that the criteria are open to debate, using the analogy of language being “a particle, a wave and field” to describe the myriad of ways that the language continuum can be theorized. They admit the social, political and cultural factors that go into defining a language and dialect, listing alternative names for specific varieties in their entries, and define the codes given to individual languages according to a combination of mutual intelligibility and user identity. They source their information from researchers, academics and governments.

Ultimately they know that their figure is an estimate, one that we will never be certain of, that’s always in flux. So next time you see claims of a definitive number of languages over a large area, take it with a fistful of salt, and come back to this blog to remind yourself of why.

Paul Sutherland writes about endangered languages, sociolinguistics and related phenomena for ALTA Language Services. He is a linguist, photographer and writer with a passion for supporting endangered language communities. To this end, Paul has an MA in Language Documentation & Description from SOAS and has worked with groups including language archives, teaching material developers and UNESCO.