Sequence Comparison in Computational Historical Linguistics: Phonetic Alignments and Cognate Detection with LingPy 2.6.

Abstract:

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multilingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which has not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy, and then illustrate in concrete workflows, how automatic sequence comparison can be applied to multilingual word lists. The goal is to provide the readers with all information they need to (a) carry out cognate detection and alignment analyses in LingPy, (b) select the appropriate algorithms for the appropriate task, (c) evaluate how well automatic cognate detection algorithms perform compared to experts, and (d) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.