Project: grambank: a database of structural (typological) features of language
Grambank is a database of structural (typological) features of language. It consists of 195 logically independent features (most of them binary) spanning all subdomains of morphosyntax. The Grambank feature questionnaire has been filled in, based on reference grammars, for 2,467 languages. The aim is to eventually reach as many as 3,500 languages. The database can be used to investigate deep language prehistory, the geographical-distribution of features, language universals and the functional interaction of structural features.
Publications from this Project:
Haynie H, Blasi DE, Skirgård H, Greenhill SJ, Atkinson QD, & Gray RD. Grambank’s Typological Advances Support Computational Research on Diverse Languages. In Beinborn L, Goswami K, Muradoğlu S, Sorokin A, Kumar R, Shcherbakov A, Ponti EM, Cotterell R & Vylomova E. Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP). Association for Computational Linguistics: Dubrovnik, Croatia.
Of approximately 7,000 languages around the world, only a handful have abundant computational resources. Extending the reach of language technologies to diverse, less-resourced languages is important for tackling the challenges of digital equity and inclusion. Here we introduce the Grambank typological database as a resource to support such efforts. To date, work that uses typological data to extend computational research to less-resourced languages has relied on cross-linguistic morphosyntax datasets that are sparsely populated, use categorical coding that can be difficult to interpret, and …Abstract PDF Overview
Skirgård H ....Greenhill SJ, Atkinson QD, & Gray RD. 2023. Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss. Science Advances, 9, eadg6175.
While global patterns of human genetic diversity are increasingly well characterized, the diversity of human languages remains less systematically described. Here, we outline the Grambank database. With over 400,000 data points and 2400 languages, Grambank is the largest comparative grammatical database available. The comprehensiveness of Grambank allows us to quantify the relative effects of genealogical inheritance and geographic proximity on the structural diversity of the world’s languages, evaluate constraints on linguistic diversity, and identify the world’s most unusual languages. An …Abstract PDF 10.1126/sciadv.adg6175 Overview
Shcherbakova O, Michaelis SM, Haynie HJ, Passmore S, Gast V, Gray RD, Greenhill SJ, Blasi DE, & Skirgård H. Preprint. Societies of strangers do not speak grammatically simpler languages.
Many recent proposals claim that languages adapt to their environments. The Linguistic Niche hypothesis claims that languages with numerous native speakers and substantial proportions of non-native speakers (societies of strangers) will tend to lose grammatical distinctions. In contrast, languages in small, isolated communities should maintain or expand their range of grammatical markers. Here, we test such claims using a new global dataset of grammatical structures - Grambank. We model the impact of the number of native speakers, the proportion of non-native speakers, the number of linguistic …Abstract PDF 10.31235/osf.io/svfdx
Shcherbakova O, Gast V, Blasi DE, Skirgård H, Gray RD, & Greenhill SJ. 2022. A quantitative global test of the complexity trade-off hypothesis: the case of nominal and verbal grammatical marking. Linguistics Vanguard.
Nouns and verbs are known to differ in the types of grammatical information they encode. What is less well known is the relationship between verbal and nominal coding within and across languages. The equi-complexity hypothesis holds that all languages are equally complex overall, which entails trade-offs between coding in different domains. From a diachronic point of view, this hypothesis implies that the loss and gain of coding in different domains can be expected to balance each other out. In this study, we test to what extent such inverse coevolution can be observed in a sample of 244 …Abstract PDF 10.1515/lingvan-2021-0011