Languages & Origins in Europe


How and Why Do We Select Our Cognate Lists for Each Family?


The dataset for our studies of the degree of difference between languages in their phonetics takes the form of a list of 100 words (in fact we normally cover 110, in case we come across problems with certain words as we collect the data). 

The comparison works by measuring the net degree of divergence in pronunciations of the same original word.  For example, we measure how different the pronunciation of the word right in English is from the pronunciation of the corresponding related word recht in German.  These two words are ‘related’ or cognate in the sense that both of them derive from what was once the same word (something like */rextaz/) in the single common ancestor language which gave rise to both English and German (‘Proto‑Germanic’). 

Our method works by comparing only words that are strictly cognate in this way.  A match in meaning alone is not enough:  words like English dog and German Hund both mean the same thing, but they do not go back to a common original phonetic form, so we cannot measure divergence in phonetics since that proto‑form (any phonetic relationships between dog and Hund are essentially arbitrary, and measuring them is not useful for our purposes).  So our method would compare German Hund not against English dog, but against its true English cognate hound. 

The list of which words to compare then, is chosen deliberately to ensure that we have 100 words which are all strictly cognate in all the languages and dialectal varieties across the particular language family in question.  So the particular 100 words in the Germanic list, for example, were chosen specifically because cognate forms of all of those words survive in all of the accents and varieties covered. 

What this means is that the list of the most suitable words to compare actually varies from one language family to the next:  all Germanic languages have straightforward cognates of house, brother and heart, for example, but the situation with the corresponding words in Romance is much more complex.  We have, though, endeavoured to base the lists on a core of as many cognates as possible that are present in all four of the families we are studying (including, for example, most of the numerals from one to ten).

What the words happen to mean is effectively immaterial, because the aim here is to produce a measure of difference in phonetics.  This must not be conflated and confused with a measure of difference on the quite different level of lexical semantics – the question of whether for a given meaning two languages use words that are cognate or not, like English and German do for the meaning right (right is cognate with recht) but not for the meaning dog (dog is not cognate with Hund).  Clearly, it is interesting to measure differences between languages on this level of lexical semantics too, which is precisely what we do in that part of our research project.  But it is far better to keep the two levels entirely distinct and not to unbalance and confuse our measurements by confusing the issue and conflating them.

To repeat, then, what the words in the cognate lists happen to mean is effectively immaterial.  Rather, the cognate lists serve only as a sample of the phonetics of the languages being compared.  This does call for one other control on the selection of our words, though:  we must ensure that the list is not phonetically unbalanced and unrepresentative by including an overly large or small number of examples of particular sounds and sequences.  This then was a further criterion applied in selecting the appropriate words for the list.  To avoid over‑representing particular sounds, we keep the number of repeated grammatical suffixes covered down to a minimum, and cover instead a range of different endings, and where necessary, different genders and cases.