Supplementary Information



for the article: 


Heggarty, Paul, 2007

Linguistics for archaeologists:  principles, methods and the case of the Incas

Cambridge Archaeological Journal 17(3), 311-40


click to download the full text of this article as a .pdf   [1.26 mb  ]

Note:    This article is the first of a pair of articles on ‘Linguistics for archaeologists’; 
for information on the second article and a download link, click here.



Torero’s (1970) Glottochronological Time-Depth Estimates for the Quechua Family


Figure 5 on page 335 of Heggarty (2007) presents a NeighborNet analysis of the relationships between 36 regional varieties of Quechua.  This NeighborNet is based directly on the comparative study of those varieties by Alfredo Torero (1970).  The original figures published by Torero were expressed in terms of decades elapsed since the date of ‘separation’ of each of the possible pairs of Quechua varieties in his data set, as calculated by the ‘glottochronological’ formula which assumes a constant rate of shared vocabulary ‘decay’ over time.  These figures from Torero (1970) are reproduced in Figure 1 below. 



Figure 1:  Torero’s (1970) ‘glottochronological’ estimates of time-spans elapsed
(expressed in number of decades prior to 1970)
since assumed separation dates for 36 regional varieties of Quechua





Lexicostatistical Counts of Vocabulary Overlap

Recovered from Torero’s Glottochronological Time-Depth Estimates


The underlying linguistic data from which Torero calculated these glottochronological time-depth estimates were his own lexicostatistical counts of overlap in ‘core vocabulary’ between these varieties, for the standard ‘Swadesh list’ of 100 basic word-meanings.  Unfortunately, Torero never published his data lists for the 36 Quechua varieties he compared, nor his basic lexicostatistical counts of overlap in the 100 words;  he published only the glottochronological time-depth estimates he calculated from them, given in Figure 1 above. 

To obtain these results he applied to his lexicostatistical overlap counts the standard ‘glottochronological’ formula, to convert them into the supposedly corresponding time-spans of separation.  This formula effects a logarithmic transformation on the original overlap counts, whereas the most suitable input data for NeighborNet analysis are not such transformed data but the original ‘raw’ distance measures between each pair of language varieties.  I therefore preferred to recover from Torero’s glottochronological time-depth figures the original overlap counts they were based on.  To do this, I simply applied the glottochronological formula in reverse, which yields the results in Figure 2 below.  It was these distance figures that I input to NeighborNet to produce my Figure 5.


Figure 2:  Lexicostatistical counts of vocabulary overlap between Torero’s (1970) 36 regional varieties of Quechua,
as recovered by Heggarty from the time‑spans in Figure 1, by reversing the application of the glottochronological formula.



As I am at pains to stress on page 334, the core glottochronological assumption of a regular rate of change over time is now widely discredited in linguistics:  no Andean linguist (not even Torero himself) now assumes that the ratings in Figure 1 necessarily represent remotely accurate measures of the actual time-depths for any ‘splits’ within the Quechua language family. 

The point of my analysis, indeed, is precisely to undo the application of the glottochronological formula, so as to discard any untenable assumptions about fixed rates of language change over time and recover Torero’s original measures of simple overlap in core vocabulary.  A number of methodological objections surround even these basic lexicostatistical measures too, for certainly the method takes a rather blunt and inflexible approach to putting numbers on degree of language divergence (hence the new alternative method I proposed in Heggarty 2005).  Nonetheless, Torero’s figures can still provide a potentially useful guide to ‘orders of magnitude’ of divergence across his very wide sample of the regional variation within the Quechua family.




Definition of the ‘Central’ and ‘Southern’ Sub-Groups of Quechua


On page 335 of Heggarty (2007) I discuss two particular sub-sets of the 36 Quechua dialects in Torero’s data-set, which I term the ‘Central Quechua’ and ‘Southern Quechua’ sub-groups.  I go on to put forward ratings of the degree of diversity within each sub-set, in the form of mean divergence and standard deviation measures. 

The particular selection of varieties I included in each of these sub-sets is as follows, where my classification codes such as Q1a or Q2c are those that appear in the ‘classification’ column in Figures 1 and 2 above:

   Central Quechua’ sub-set:  all 16 varieties with classification codes that start with Q1, except for Q1p, i.e. Pacaraos Quechua.  That is, this sub-set brings together my geographical sub-classifications of Quechua 1, namely:  Q1a (Ancash), Q1s (south-central) and Q1j (Junín). 

Pacaraos Quechua (Q1p) is explicitly excluded from this Central Quechua group since its classification is particularly uncertain;  indeed in the ‘traditional’ Quechua family tree in Cerrón‑Palomino (2003: 236), Pacaraos Quechua is split off from all other Central Quechua varieties in its own separate high-order sub-branch within Quechua 1.

   Southern Quechua’ sub-set:  all 10 varieties with the classification code Q2c.



An Illustration of Regional Divergence in Quechua:  Origins of the Words for Where?


In Southern Quechua (as spoken in Cuzco and Bolivia, for example) the word for where is pronounced [majpi];  in Central Quechua (Ancash), however, it is so different as to be unrecognisable:  [meːʈʂoː]. 

The two words do in fact share the same root may· which place? (pronounced not as in English may, but a very clipped my).  In this case, Southern Quechua preserves the original pronunciation, whereas in Ancash the original [aj] has coalesced into [], where the symbol [ː] denotes a distinctly long form of the [e] vowel.  

In both areas this root is followed here by a locative suffix, equivalent to English in, to give the combined meaning in which place?, i.e. where?  Yet while Southern Quechua uses ·pi as its in suffix, Central Quechua uses instead the entirely different suffix ·ĉaw.  Moreover, the letter spelt ĉ here denotes a retroflex [ʈʂ] pronunciation of ch, now entirely lost from Southern Quechua in any case, while in Central Quechua the original [aw] has also coalesced into [].   Hence modern Southern Quechua [majpi] vs. Central Quechua [meːʈʂoː], for the same meaning where?  

The words for here offer a similar illustration.  In this case the root is not may· which place?  but kay· this (place), though the locative suffixes remain the same:  Southern Quechua ·pi vs. Central Quechua ·ĉaw.  These duly give the combined meaning in this place, i.e. here.  As an example of the regularity that is so typical of sound changes during language divergence, note that exactly parallel sound changes have again happened in the Ancash pronunciations of both the root and suffix:  original [aj] ® [], and [aw] ® [].  Hence Southern Quechua [kajpi] contrasts with Central Quechua [keːʈʂoː]

To listen to how these here words are actually pronounced in these and other Quechua-speaking regions of the Andes, click here.