A cross-linguistic comparison of the evolved complexity of numeral systems
The ways in which languages keep track of quantities differ substantially (Bender & Beller, 2018), but the global diver- sity of these systems has barely been explored. Here we pre- sent some preliminary investigations into Numeralbank: a new global database of numeral systems containing ~186,000 number words from ~5300 languages. First, we show that there is a strong relationship between a number and the or- thographic length of its lexeme, where the lexical forms for numbers below five are shortest, followed by the numbers be- low ten. Number words for multiples of base 10 (e.g. “twenty”) also tend to be short. Second, we develop a novel method for characterizing the complexity of numeral systems in these languages, and quantify and model their evolution over time. Finally, we use these data to test some broad-scale generalizations (e.g., Pagel & Meade, 2018) that number words are shorter and less ambiguous than other words.
Phylogenetic tree thinking is beginning to revolutionise studies of linguistic and cultural evolution. However, linguistic and cultural traits are easily transmitted horizontally ("borrowed") between cultures. Indeed, well over 95% of the words in the Oxford English Dictionary aren't English. A loud and persistent debate has centered around the issue of borrowing and whether it invalidates cultural phylogenies or not. Here, we use a natural model of linguistic evolution to simulate borrowing between languages. The results show that tree topologies constructed with Bayesian phylogenetic methods are relatively robust to the effects of realistic levels of borrowing. Inferences about time depth are slightly less robust.