Linguistic Diversity On Africa: Clustering Methods Application On Language Typology Data

Journal Title
Journal ISSN
Volume Title

One of the main goals in language research is to understand the distribution of linguistic diversity and its underlying principles. Linguistic typology allows us to characterize diverse languages in terms of their linguistic features, which can be used to create a relatively comprehensive description of any language. An important theoretical issue concerning linguistic diversity is whether it should be considered stochastic (randomly determined) or deterministic (based on a set of principles governing it). The latter may depend on a set of constraints imposed from inside or outside the linguistic system, i.e., language faculty in the narrow sense or the “interfaces” – aspects of perceptual, motor, and general cognition systems. The debate on the nature of language variation has not been settled. Africa, having the longest history of human settlement of any continent and hypothesized to be the place of origin of the Homo Sapiens, has the deepest genealogical relations between languages that are yet to be comprehensively and systematically described. By using a sample of the language typology data available in the World Atlas of Language Structures (WALS) and in the repository of crosslinguistic phonological inventory data (PHOIBLE 2) for languages of the African continent, we seek to address the following research objective: to investigate whether the typological diversity of languages in Africa can be characterized by clustering along two important structural divides: synthetic – analytic and tonal – non-tonal. Several methods, including latent class analysis and CFA models, hierarchical clustering, k-means family algorithms and CART modes, using feature networks focusing on relevant language domains were constructed to classify the data according to these structural divides. Those classification patterns can be further linked to several possible future interface hypotheses, leading to better understanding of human language.

Linguistic typology, African languages, cluster analysis, structural equation modeling