As humans, engaging in conversation comes so naturally to us that we tend to overlook the complex mechanics of the back and forth ‘dance’ we are constantly performing as interlocuters. You speak, I speak, usually starting my turn just about 200 milliseconds (that’s .2 seconds) after you finish yours. I have to be careful though, because if I start my turn too soon, it will be perceived as rude, like stepping on my dance partner’s toes. Then again, if I wait even a few milliseconds too long, I’m likely to create a pregnant pause in the conversation, comparable to a clumsy misstep that will leave both of us blushing. Thankfully, most people are much more coordinated conversationalists than we are dancers.
While turn-taking is generally accepted as a trait that’s universal to human conversation, for decades the anthropological literature claimed that different cultures and languages had vastly different rules governing the specific timing of conversational turn-taking. Nordic cultures, for example, were said to relish delays of minutes or even hours between one speaker’s turn and the next. As one report goes, “Two brothers of Häme (Finland) were on their way to work in the morning. One says, ‘It is here that I lost my knife’. Coming back home in the evening, the other asks, ‘Your knife, did you say?”’ In contrast, Jewish New Yorkers have been cited as having a “preference for simultaneous speech,” and in one Antiguan village, there is said to be “no regular requirement for two or more voices not to be going on at the same time.”
Such anecdotal evidence is often used to support the cultural variability hypothesis, which states that conversational turn-taking form is dependent on culture and language. In 2009, PNAS published a research paper that tested the cultural variability hypothesis against its antithesis, the universal system hypothesis, which suggests instead that ‘turn-taking is a universal system with minimum cultural variability.’
Testing the Universal System Hypothesis
Researches compared ten languages from five continents, including Dutch, Korean, English, a Mexican indigenous language called Tzeltal, the northern Namibian language ‡Ākhoe Hai‖om, Japanese, and Yélî-Dnye, which is a language spoken by less than five thousand people in Papua New Guinea. These languages are fundamentally varied in terms of grammar and syntax, and the cultures of their speakers range from hunter-gatherer groups to urban metropolis dwellers. The results of the study? Across all ten languages, evidence showed a preference for minimizing silence between conversational turns and avoiding overlapping talk. Although response times did vary slightly across languages, the results were a far cry from the anecdotes of daylong pauses in Scandinavian conversation or virtual simultaneity in Antiguan interlocution. Though Danish did have the slowest response time on average (469 milliseconds), while Japanese had the fastest (7 milliseconds), the gap between these two languages ultimately clocked in at less than half a second, or about the time it takes to utter a single syllable.
What’s more, researchers found evidence for four additional factors that impact response time in similar ways across languages and cultures. Across all ten languages, speakers provided answers to questions significantly faster than they provided non-answers, such as ‘I don’t know,’ or ‘I’m not sure.’ Similarly, speakers were quicker to affirm than to contradict, even if an affirmation took a negative form, such as in the exchange, ‘You’re not going to the party, are you?’ ‘No, I’m not.’ Across every language, speakers were also quicker to respond with visual cues, such as nods, shrugs, head shakes, or in the case of Yélî-Dnye, with extended blinks and eyebrow flashes, than they were to respond with words. Any time a visual response occurred in response to a question, it was faster than the spoken response across all languages. Finally, in nine out of ten languages, when the speaker gazed directly at the recipient while asking a question, the recipient responded more quickly, likely because direct gazes are associated with heightened expectation.
Don’t Believe Everything (You Think) You Hear (Or Don’t Hear)
These universal factors, in combination with the extremely slight differences in break times between interlocuter turns, overwhelmingly support a universal system hypothesis. Then why all the tall tales about long awkward minutes waiting for a response to ‘would you like a cup of coffee’ when in Northern Sweden, or the impossibility of getting a word in edge-wise while strolling through Brooklyn? The researches from PNAS suggest that, though speakers across languages will aim to minimize delay while avoiding overlap, differences in the overall tempo of social life or interactional pace could cause slight discrepancies in the perception of what constitutes a ‘delay.’ And in fact, when asked to judge conversational response times as ‘on time’ versus ‘delayed,’ subjectively on-time responses in Danish and Lao (202 and 203 milliseconds, respectively), were longer than those in Japanese and Tzeltal (36 and 83 milliseconds, respectively). However, the overall difference in what is judged to be ‘on-time’ amongst these groups amounts to less than one-fifth of a second. Although the actual difference in time is extremely small, it is likely that because we are, in a sense, calibrated to the response time typical of our language and culture, we become hypersensitive to perturbations in the timing of responses, and thus perceive subtle variation as much greater than it truly is. That, and a human penchant for storytelling, are likely the roots of the ‘huge silences’ we hear about in Nordic conversation.
What Does the Future Hold?
Researchers are now turning to new questions about conversational turn-taking, like where exactly this habit came from. Some are suggesting that the turn-taking system may predate human language. Both great apes and chimpanzees take turns when gesturing to each other, and several kinds of monkeys take turns when calling. Even marmosets leave gaps of five to six seconds between turns, adjusting to match the other interlocutor if the conversation speeds up or slows down. Might it be that conversational turn-taking was developed out of an ancient framework which we built upon when we first developed our capacity for speech? Considering the evidence in support of a universal system for conversational turn-taking, this seems like a fascinating avenue to explore next.
Janet Barrow writes about the places where language meets history, culture, and politics. She studied Written Arts at Bard College, and her fiction has appeared in Easy Street and Adelaide Magazine. After two years in Lima, Peru, she recently moved to Chicago.