Dr. Simon J. Greenhill

I research why and how people created all the amazing languages around us, and what they tell us about human prehistory.
I use (mainly) Bayesian phylogenetic methods to tackle these questions and have investigated everything from how the Austronesian peoples settled the Pacific, to modelling the co-evolution of linguistic structure. And I have built a number of large-scale databases to help answer these questions.
Currently I'm one of the editors of Language Dynamics and Change and on the editorial board of the Journal of Language Evolution.
I'm an Associate Professor in the School of Biological Sciences at the University of Auckland. Before that I was senior scientist in the Department of Linguistic and Cultural Evolution at the Max Planck Institute for the Science of Human History in Jena, Germany, and the ARC Centre of Excellence for the Dynamics of Language at Australian National University.
Publications:
The Uto-Aztecan language family is one of the largest language families in the Americas. However, there has been considerable debate about its origin and how it spread. Here we use Bayesian phylogenetic methods to analyze lexical data from thirty-four Uto-Aztecan varieties and two Kiowa-Tanoan languages. We infer the age of Proto-Uto-Aztecan to be around 4,100 years (3,258–5,025 years) and identify the most likely homeland to be near what is now Southern California. We reconstruct the most probable subsistence strategy in the ancestral Uto-Aztecan society and infer no casual or intensive …
Abstract PDF 10.1353/lan.0.0276Although language-family specific traits which do not find direct counterparts outside a given language family are usually ignored in quantitative phylogenetic studies, scholars have made ample use of them in qualitative investigations, revealing their potential for identifying language relationships. An example of such a family specific trait are body-part expressions in Pano languages, which are often lexicalized forms, composed of bound roots (also called body-part prefixes in the literature) and non-productive derivative morphemes (called here body-part formatives). We use various …
Abstract PDF 10.1098/rsfs.2022.0053Nouns and verbs are known to differ in the types of grammatical information they encode. What is less well known is the relationship between verbal and nominal coding within and across languages. The equi-complexity hypothesis holds that all languages are equally complex overall, which entails trade-offs between coding in different domains. From a diachronic point of view, this hypothesis implies that the loss and gain of coding in different domains can be expected to balance each other out. In this study, we test to what extent such inverse coevolution can be observed in a sample of 244 …
Abstract PDF 10.1515/lingvan-2021-0011Human history is written in both our genes and our languages. The extent to which our biological and linguistic histories are congruent has been the subject of considerable debate, with clear examples of both matches and mismatches. To disentangle the patterns of demographic and cultural transmission, we need a global systematic assessment of matches and mismatches. Here, we assemble a genomic database (GeLaTo, or Genes and Languages Together) specifically curated to investigate genetic and linguistic diversity worldwide. We find that most populations in GeLaTo that speak languages of the same …
Abstract PDF 10.1073/pnas.2122084119The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed “break-away” geographical diffusion model, specially designed for modeling migrations, with “augmented” geographic information, to reconstruct the Bantu language family expansion. This Bayesian phylogeographic approach with augmented geographical data provides a powerful way of linking linguistic, archaeological, and …
Abstract PDF 10.1073/pnas.2112853119
Projects:
Glottobank
Glottobank is an international research consortium established to document and understand the world’s linguistic diversity. We have established five global databases documenting variation in language structure (Grambank), lexicon (Lexibank), paradigm systems (Parabank), numerals (Numeralbank), and phonetic changes (Phonobank).
Database of Places, Language, Culture and Environment
From the foods we eat, to who we can marry, to the types of games we teach our children, the diversity of cultural practices in the world is astounding. Yet, our ability to visualize and understand this diversity is often limited by the ways it traditionally has been documented and shared: on a culture-by-culture basis, in locally-told stories or difficult-to-access books and articles. D-PLACE represents an attempt to bring together this dispersed corpus of information.
Trans-New Guinea Online
TransNewGuinea.org is a database of the Trans-New Guinea language family and friends. The Trans-New Guinea language family currently occupies most of the interior of New Guinea. This family is possibly the third largest in the world with 400 languages and is tentatively thought to have originated with root-crop agriculture around 10,000 years ago. However, vanishingly little is known about this family’s history.
POLLEX: Polynesian Lexicon Project Online
The Polynesian Lexicon Project Online is a large-scale comparative dictionary of Polynesian languages.
Austronesian Basic Vocabulary Database
The Austronesian Basic Vocabulary Database is the world’s largest cross-linguistic database of the Pacific. It contains ~300,000 lexical items from ~1,600 languages spoken throughout the Pacific region.