Dr. Simon J. Greenhill

I research why and how people created all the amazing languages around us, and what they tell us about human prehistory.
I use (mainly) Bayesian phylogenetic methods to tackle these questions and have investigated everything from how the Austronesian peoples settled the Pacific, to modelling the co-evolution of linguistic structure. And I have built a number of large-scale databases to help answer these questions.
You can find me on Twitter or Mastodon, at the University of Auckland.
Subgrouping language varieties within dialect continua poses challenges for the application of the comparative method of historical linguistics, and similar claims have been made for the use of Bayesian phylogenetic methods. In this article, we present the first Bayesian phylogenetic analysis of the Mixtecan language family of southern Mexico and show that the method produces valuable results and new insights with respect to subgrouping beyond what the comparative method and dialect geography have provided. Our findings reveal potential new subgroups that should be further investigated. We …
Abstract PDF 10.1093/jole/lzad004For a single species, human kinship organization is both remarkably diverse and strikingly organized. Kinship terminology is the structured vocabulary used to classify, refer to, and address relatives and family. Diversity in kinship terminology has been analyzed by anthropologists for over 150 years, although recurrent patterning across cultures remains incompletely explained. Despite the wealth of kinship data in the anthropological record, comparative studies of kinship terminology are hindered by data accessibility. Here we present Kinbank, a new database of 210,903 kinterms from a global …
Abstract PDF 10.1371/journal.pone.0283218 OverviewOf approximately 7,000 languages around the world, only a handful have abundant computational resources. Extending the reach of language technologies to diverse, less-resourced languages is important for tackling the challenges of digital equity and inclusion. Here we introduce the Grambank typological database as a resource to support such efforts. To date, work that uses typological data to extend computational research to less-resourced languages has relied on cross-linguistic morphosyntax datasets that are sparsely populated, use categorical coding that can be difficult to interpret, and …
Abstract PDF OverviewWhile global patterns of human genetic diversity are increasingly well characterized, the diversity of human languages remains less systematically described. Here, we outline the Grambank database. With over 400,000 data points and 2400 languages, Grambank is the largest comparative grammatical database available. The comprehensiveness of Grambank allows us to quantify the relative effects of genealogical inheritance and geographic proximity on the structural diversity of the world’s languages, evaluate constraints on linguistic diversity, and identify the world’s most unusual languages. An …
Abstract PDF 10.1126/sciadv.adg6175 Overview