Heggarty, Paul, 2005

Enigmas en el origen de las lenguas andinas: aplicando nuevas técnicas a las incógnitas por resolver

English title:  Enigmas in the origins of the Andean languages: applying new techniques to the unanswered questions

Revista Andina 40, 9-57 & 70-80


Abstract:  for Linguists

This summary is intended for specialists in (historical) linguistics.  Readers with a more general interest in the Andean languages, particularly their history and that of their speakers, should click here for a different summary more oriented to those interests.

*   *   *   *   *

This article outlines a new method for measuring and comparing how similar languages are in their lexical semantics, developed over the course of a three-year multidisciplinary research project into Quantitative Methods in Language Classification.  As a test case, we apply this method to our own extensive new data set on twenty varieties of Aymara and Quechua, to investigate fundamental and still unresolved issues in the historical linguistics of these, the two main surviving language families of the Andes.

We approach these questions also with the aid of phylogenetic analysis programmes drawn from the biological sciences.  Recent years have seen growing interest in how such analyses might valuably be applied to investigate the family trees not just of species and populations, but also of languages.  Much of the work in this new synthesis between genetics and historical linguistics has been undermined, however, by serious reservations about the ‘encodings’ used to convert language data into a format suitable as input to phylogenetic analyses, questioning whether they can really be considered meaningful measurements and representations of the relationships between real languages. 

Hence the need for a radical departure from the traditional method most widely used for this encoding, lexicostatistics.  We put forward a new method specifically designed to go beyond the many idealisations inherent in the all-or-nothing approach of lexicostatistics (not least its insistence on ‘one meaning one lexeme’) in order to model relationships in lexical semantics to a greater level of detail and sensitivity.  Furthermore, we attach particular importance to a thorny but crucial issue in lexical comparison, one that has all too often been brushed aside:  we propose an explicit and novel approach to the question of how to distinguish cognacy from borrowing as alternative explanations for correlations observed between languages. 

We obtain a matrix of measurements of how similar all of the languages covered are to each other.  This matrix contains within it what are often complex signals of the relationships between those languages.  We therefore use the latest phylogenetic analysis programmes, particularly NeighborNet, to synthesise those signals and to represent them graphically, in order to help us interpret what our quantifications of similarity really mean for the key questions about the history and divergence of the languages concerned.

All these new tools can be expected to be of broad methodological interest to historical and comparative linguists, and to illustrate their potential we present a case study in which we apply them to a number of issues of precisely the types most commonly disputed in research into the history and classification of language families.  Together, the Aymara and Quechua families provide an ideal test-bed:  specialists have still not come to a consensus even on whether the two are ultimately related, nor on equally fundamental questions about the internal classification of the Quechua family.

Now that new methods are available, the time is ripe for recruiting them to contribute to a definitive resolution to these questions.  However, our more sensitive quantification method requires data sets in which comparisons in lexical semantics are taken to a level of detail beyond existing databases and dictionary resources for the Andean languages.  So we present also our own major new comparative database, collected in fieldwork in Ecuador, Peru and Bolivia, and downloadable from our website. 

Our results could hardly be of greater import for the historical linguistics of the Andean languages, in that they constitute powerful new data on the two most fundamental outstanding issues, strongly in favour of one particular resolution to each.  On the long‑running ‘Quechumara’ debate, our results clearly back the now majority stance that the two language families do not demonstrably stem from a common source, thanks to our novel approach to teasing apart the signals of common origin and intense contact that have for so long muddied the waters. 

As for the internal structure of the Quechua family, recent work has convincingly challenged the mostly morphological and phonological criteria on which the traditional classification of Quechua dialects has long been based.  Our new lexical data fully support that challenge, and indeed take it further still, by offering strong evidence that even the putative highest-order split between Quechua I (Central) and II (North/South) branches is a misleading idealisation.  We illustrate, then, how the results from our new analysis methods can in some scenarios emerge as clearly incompatible with a discrete two-way branching at a given stage in the history of a family, and argue on the contrary for a dialect continuum (in this case, the early history of Quechua).

The Andean languages allow us to illustrate also how our methods can offer new data and insights to help resolve other, more specific conundrums in historical linguistics, of a range of different types.  We close, for instance, with a discussion of what can and cannot reliably be inferred, from the methods we propose, on the vexed question of dating language splits. 



Abstract:  for General Readership

Many of the key questions in the (pre‑)history, archaeology and ethnology of the central Andes revolve around the origins, identity and relationships between the peoples who speak the two great surviving language families:  Aymara (also known as Jaqi or Aru) and Quechua.  In principle, the historical and comparative study of these languages themselves has great potential to shed light on the processes and stages by which these language families came to spread so widely throughout the Andean countries, when they did so, starting from which homelands, and thus which cultures and population migrations they are to be associated with.

Yet despite the four decades that have now passed since the first groundbreaking work on these issues, linguists have still not reached a consensus either on the most basic questions in the traditional classification of the many dialects within the Quechua family, or on whether it is ultimately related to Aymara or not.  Early attempts were dogged by the very limited research then available on those languages, and serious methodological problems in linguistics.  A great deal has now changed since the difficult early days of Andean linguistics, however, thanks to a great deal of research and debate over the years.  Beyond the Andes, too, the last few years have seen linguists and geneticists increasingly working together on new, much improved techniques for answering these questions.  So far they have been applied mostly to European and Australian language families;  the time is now ripe for a major new comparative study of the Andean languages. 

This article presents the first major results from a three‑year research and fieldwork project covering (so far) twenty‑one varieties of Quechua, Aymara and Chipaya from Ecuador, Peru and Bolivia (our full data are also available for download from on our website supplementary info page).  It uses new, more sensitive linguistic techniques to make detailed comparisons and measurements of how similar all these language varieties are in both their vocabulary (lexis) and their sound systems (phonetics).  It also applies new means of interpreting the results to help understand what they really mean for the key questions about the historical relationships between those languages and the contacts between the peoples who speak them. 

This first article focuses on similarities and differences in vocabulary, and brings powerful new evidence that the two language families Quechua and Aymara do not demonstrably stem from a common source, despite intense contact between them.  The lexical data also support the general objections already raised to even the most basic classification of Quechua dialects:  the split between Quechua I (Central) and II (North/South) branches appears to be a misleading idealisation, and our results are more compatible with a more gradual spread and break-up of the family during its early history, with many forms of Quechua (among them several bound for imminent extinction) intermediate between the main two surviving QI and QII groups.

We also provide important new data on other important questions in the classification and history of the Andean languages:  where the northern Peruvian varieties of Quechua fit into its family tree and history;  and for the Aymara family, whether the two Central Peruvian forms of Aymara, alias Jaqaru (Tupe) and Kawki (Cachuy) should be considered quite separate languages, or just very closely related varieties of the same language.  Finally, we also discuss what can and cannot reliably be inferred on the vexed question of dating the origins and phases of expansion of Quechua and Aymara.


Back to Index  of publications by Paul Heggarty