Languages & Origins in Europe


Historical linguistics with prehistoric archaeology




Welcome to the website for the research project Languages & Origins in Europe,
based at the McDonald Institute for Archaeological Research at the University of Cambridge. 
Our project is funded by the Leverhulme Trust, and runs for three years from June 2006 to May 2009.

The researchers are the archaeologist Colin Renfrew (Lord Renfrew of Kaimsthorn)
and Paul Heggarty, a comparative/historical linguist.



Context, Scope, Methods, Objectives

A 1000-Word Summary of this Research Project



In this research project we focus on the four main language families of Europe – Romance, Germanic, Slavic and Celtic – and on the early relationships between their respective ancestor languages.  Our research method consists of a series of steps.

   First we collect our own dedicated data sets for each language family, ensuring particularly that we cover as broad a range as possible of the variation across each family, not just between its major languages but also at the dialect level. 

   The data sets are specifically collected as input for a number of new techniques developed within linguistics, to produce from them measurements of the degree of similarity/difference between the dialects and languages within each family, particularly in phonetics and lexis.  We also take this back a step further, applying the same analysis to how the families’ four ancestor languages relate to each other, as far as is possible within the limits to our knowledge of them.

   We then use these similarity ratings as input to computational methods for ‘phylogenetic analysis’, originally developed in the biological sciences.  These synthesise the signals present in the similarity ratings to produce corresponding graphical representations of the relationships between the languages and dialects within each family, and between their ancestors.

   Finally, we analyse and interpret these representations for what they might suggest for the early histories of each of the language families – with the crucial proviso that similarity by no means necessarily means shared history, of course.  These linguistic indications as to the prehistory and geography of the populations of Europe that spoke these language can then be compared with scenarios worked out independently from archaeology and genetics, to see if significant correlations can be found to give a more consistent overall picture. 


Not that we are the first researchers to use such methods to investigate these questions, of course.  Previous studies have, however, come up against a range of obstacles and objections, and remain highly controversial, not least Gray & Atkinson (2003).  And it transpires that new methods have recently become available both for step 2 and for step 3, so our project seeks to take advantage of this opportune moment to put both new sets of methods to use together, which holds out the prospect of heading off many of the criticisms levelled at previous work in this field.

   Serious objections have been raised to the quality of the data sets used for step 2 by some past studies, such as Gray & Atkinson (2003) and particularly Forster & Toth (2003).  Step 2 entails analysing and in some way ‘encoded’ real language data to convert them into a format suitable as input for phylogenetic processing;  in many cases, however, this encoding has so radically edited and distorted the data that linguists have been left doubting whether the input to the phylogenetic analysis really constitutes a meaningful representation of the relationships between the actual languages concerned.  The problems can lie not only in misanalysis of language data by non-linguists, but also in the methods themselves:  traditional ‘lexicostatistical’ analysis, for example, may have been first proposed by linguists, but it has long been criticised by many of their peers for its simplistic approach that clearly misrepresents much of the real language data.

In our study, analysis of all language data will be by historical/comparative linguists.  The data will be input, moreover, to new, dedicated methods for step 2, purposely designed for such linguistic data and applications.  These methods take analysis to a considerably greater level of detail than traditional methods such as lexicostatistics, as is necessary particularly for investigating similarity at the finer dialect level that is our focus.

   At step 3, previous researchers such as Ringe et al. (2002) and Gray & Atkinson (2003) have used phylogenetic methods able to draw only ‘family trees’.  In both these studies, however, patterns emerge which are problematic for the traditional idealised family tree analysis, in the relationships between the ancestor languages of certain sub-families within Indo-European – not least the main quartet in Europe of Romance, Germanic, Celtic and Slavic. 

It is this observation that motivates our methodological approach and for our focus on these four families.  We suggest that the ‘problematic’ patterns in fact come as no surprise, when one recalls that the family tree structure corresponds to only one of the two main processes by which languages typically diverge (radical splits).  Our research aims to redress the balance, by turning to the most recent methods for both step 2 and step 3, which are now able to measure and to represent graphically also the more complex cross-cutting relationships between languages that can result from the other main type of divergence, the ‘wave’ model, producing dialect continua. 

We look to these new, more flexible and more sensitive ‘dialect-level’ methods, then, as a means of testing a specific hypothesis that might explain the ‘problematic’ patterns in the interrelationships of the main four language families of Europe that emerge from both Ringe et al. (2002) and Gray & Atkinson (2003).  Perhaps their ancestor languages did not diverge from each other quite so sharply as traditional tree-like representations suggest;  rather, they may have emerged out of what were originally rather more complex dialect continuum relationships.  If so, there may be important implications for our view of their early history and geography, and for how we might correlate the linguistic scenario for the early populations of Europe with those based on archaeological and genetic data.  


Naturally, our analyses and conclusions will be submitted for publication in appropriate journals in the field, but there is also a more general way in which we intend that our output should help advance research in the new synthesis.  Previous work, and critical reactions to it, have revealed numerous problems of cross-disciplinary understanding between the different specialisms involved.  In particular, linguists have often pointed to what they see as too simplistic a vision of how languages change and diverge, often founded on mistaken analogies with phenomena in other disciplines (‘mutations’,  ‘evolution’,  ‘splits’, and so on).  In our output we shall seek to set out, to a wide audience from other disciplines, the linguistic fundamentals of these issues, as illustrated specifically for the languages of Europe.

Our full databases will be published in electronic form on this website.  Our lexical database for our list of basic word-meanings will be made available for download;  our phonetic data, meanwhile, will be presented in the form of instant-playback recordings that visitors to our pages can listen to online simply by gliding their mouse over the corresponding phonetic transcriptions.



For more details on all aspects of our project, please navigate around by clicking on the links in the website index column on the left of your screen.  The first section, for instance, links through to a fuller Presentation of our project, which looks into all the issues discussed in this summary in more depth.

Please note that grey links have relatively little content as yet, but these pages will steadily be expanded as the project progresses over the next three years.