Languages & Origins in Europe


Presentation of This Research Project




Research Context and Motivation

Research Questions:  Splits or Waves?  Trees or Networks?

Research Procedure and Timetable

The Question of Dating

Output A:  Publications & New Resources

Interdisciplinary Discord?

Output B:  Advancing the New Synthesis



Back to ContentsSkip to Next:  Research Questions:  Splits or Waves?  Trees or Networks?


Research Context and Motivation

Recent years have seen a surge of interest around the world in research in the multidisciplinary field that has become known as the new synthesis, where linguistics, genetics and archaeology intersect to seek a coherent picture of the distant origins of human populations.  The issue that has commanded most attention has been the long-standing conundrum of the Indo-Europeans, and within this our Languages and Origins project focuses on Europe and its four main language families: Romance, Germanic, Slavic and Celtic. 

One particular strand of the new synthesis that has come to the fore in recent years starts out from prior work in the biological sciences, particularly genetics, to develop computational methods for phylogenetic analysis.  These programmes take as their input data on the similarities and differences between species or populations, and process and synthesise those data in such a way as to produce graphical representations of the relationships between those species or populations, in the form of ‘family trees’ or more recently ‘network’ type diagrams.  Researchers in the new synthesis have now sought to apply these same methods to language data too.  A good illustration of this multidisciplinary trend is the controversial paper in Nature by Gray & Atkinson (2003):  specialists in evolutionary biology take linguistic data, and apply to them phylogenetic analysis methods originally developed for their own fields, to produce a ‘family-tree’ representation of the relationships between the Indo-European languages.  They also propose a calibration system to put dates on the split nodes within the tree, and from their results propose dates for the first Indo-European expansion that are in close accord with the ‘Anatolian farming dispersal’ thesis proposed by an archaeologist, in Renfrew’s (1987) Archaeology and Language.  These applications to language have not been problem-free, however;  indeed as we shall see below, it is a defining characteristic of new synthesis research that by no means is all necessarily well between the disciplines… 

The starting point for our own research is an observation of a common thread in the results from the two new synthesis studies of Indo-European that have had most impact in recent years.  In stark contrast to Gray & Atkinson (2003), Ringe et al. (2002) carried out a more linguistically-led study using very different linguistic and computational methodologies, applied to a different data set.  And yet, both studies came up against certain patterns in the relationships between the ancestors of the main four European language families which remain problematic for traditional ‘family tree’ representations of the history of Indo-European.  Our specific research aim is to test a hypothesis that might help explain those apparently problematic patterns. 



Back to ContentsSkip to Next:  Research Procedure and Timetable


Research Questions:  Splits or Waves?  Trees or Networks?

Our basic contention runs as follows.  Linguists have identified two basic ways in which languages diverge.  In some instances, groups of speakers become entirely separated from each other and their original common language then develops in quite independent ways in each group, producing a neatly branching in their family tree.  In other cases, however, a wide contiguous area that started out speaking the same original language is criss-crossed by a series of waves of different language changes which start out from different regions and cross-cut each other, resulting in the more complex relationships between language varieties that are typically found across a dialect continuum. 

This ‘wave model’ is just as natural and normal a pattern of language divergence as the branching tree, such that most authors of studies to reconstruct language histories do feel obliged to pay it at least preliminary lip-service.   Nonetheless, they still tend to shy away from the methodological consequences that it entails.  Work so far in the new synthesis has focused very heavily on models suited to representing the divergence of and relationships between languages in terms of trees alone.  This goes for Gray & Atkinson’s (2003) model, and for that of Ringe et al. (2002) who indeed set out to search specifically for a ‘perfect phylogeny’.  This is despite the fact that the family tree model is well known to be an abstraction and often a gross idealisation of actual language histories;  bearing this in mind, it comes as little surprise that a truly perfect phylogeny cannot be found.

The realities of language divergence call for us at least to redress the balance, by explicitly investigating alternative analyses that are also able to represent the more complex relationships between languages that can result from divergence by the wave model.  It is precisely this that Languages and Origins in Europe sets out to do.  Note that we do not say analyses that look at language divergence only in terms of the wave model – that would of course be to tip the balance too far the other way.  In practice, the relationships between real languages can show aspects of both patterns of divergence (and they can be complicated still further by other forces such as extensive language contact).  We need sensitive methods able to reflect both, and indeed combinations of both.

Language divergence by waves typically results in multi-dimensional cross-cutting relationships between the dialects and languages within a continuum, which are of greater complexity than those produced by clear-cut branching.  Research at this dialect level, then, calls for new and more sensitive analysis techniques.  Fortunately, the very latest methods, from both within linguistics and without, do now seem to be sensitive and flexible enough to help us investigate this further.  Combining these methods offers the potential for real progress in the new synthesis.



Back to ContentsSkip to Next:  The Question of Dating


Research Procedure and Timetable

Our research involves a number of successive stages.

First, these new methods need to be explored extensively on well-known languages, before we can then extrapolate them to cases where our data are more scarce and our knowledge less secure.  To this end, we shall conduct novel comparative studies, at the dialect level, for the Romance, Germanic, Slavic and Celtic language families.  

   We shall collect dedicated data sets (in phonetics, vocabulary and certain aspects of grammar) on their dialectal variation and ancestor languages, in consultation with specialists in a number of other institutions across Europe.  (For more details, see our Database pages.)

   From these data we shall produce measures of language similarity, using a number of novel methods designed by Paul Heggarty, particularly in phonetics and in lexical semantics.  (For more details, see our Methods pages.)

   These similarity measures in turn form input suited to the most recent ‘network-type’ phylogenetic analysis programmes developed in the biological sciences for reconstructing genealogical origins, particularly NeighborNet developed by Bryant & Moulton (2002).  (Again, for more details, see our Methods pages.)

All stages of the collection and analysis of language data will be by professional linguists, just as the methods for measuring language similarity have from the outset been ‘linguistically-led’ and were purposely designed for language data and research. 


On the strength of the experience and lessons from exploring these new methods as applied to these four well-known language families, the next step in this research project will be then to apply them one stage further back:  to the four respective ancestor languages of each.  The patterns than emerge in the relationships between them can then be analysed for the extent to which they reflect a discreetly branching ‘tree’ history for these four ancestor languages relative to each other, or perhaps an earlier and more complex ‘continuum’ stage. 

Ultimately, we can attempt to look at whichever pattern emerges in the context of the prehistories of populations in Europe as determined independently on archaeological and/or genetic data, to see which may offer the best match with the linguistic scenario.  Particularly if the indications are indeed that the ancestors of some of the main European families may show more complex, cross-cutting relationships between each other than a simple branching tree, this may call for a revision of the best scenario for the origins of the populations that spoke them.



Back to ContentsSkip to Next:  Output A:  Publications & New Resources


The Question of Dating

This brings us onto one of the issues on which there has been particularly animated debate within the new synthesis:  dating language families, not least the question of the time depth of Indo-European.  Two contrasting positions are represented by the proposals in:

   Renfrew’s (1987) Archaeology and Language, which suggests agriculture as the by which the Indo-Europeans and their languages came to be so dominant over such a huge land area, and thus an origin in Anatolia around 9500 bp.  Gray & Atkinson (2003) claim that their findings support such a time-depth and location.

   Mallory’s (1989) In Search of the Indo-Europeans, which sets out counter-arguments in favour of more traditional locations and dates for Proto-Indo-European, somewhere in the southern steppes of Russia around 6500 bp (i.e. by a culture already using the plough and the wheel), expanding from there by a horse-born ‘elite dominance’ model.

Renfrew’s position certainly goes against the traditionally assumed time-frame for Indo-European;  however one judges his ‘out-of-Anatolia’ hypothesis for Indo-European origins, though, it is to its merit that the ‘long chronology’ that it requires at least forced a real debate on the traditional claims by Indo-Europeanists for a supposed consensus around a split date no earlier than 6500 bp.  And it has duly emerged that the ‘linguistic palaeontology’ on which such claims are principally staked actually leaves many a linguist quite unconvinced that it can reliably rule out alternative and earlier scenarios.  In fact, all techniques proposed for absolute ‘linguistic’ datings have been found wanting, such that for many linguists the impressionistic dates traditionally put forward have been shown to be simply “a house of cards”, in the words of Dixon (1997: 49), echoed by Sims‑Williams (1998: 509) and Heggarty (2005).

It is true that many of the criticisms made by linguists about the method used by Gray & Atkinson (2003) are cogent ones, and they have also put forward linguistic objections to the geography of Renfrew’s hypothesis.  Specifically on the dating issue, though, linguists have failed to find any other technique of recruiting linguistic data to come up with some compelling, quantifiable support for the short chronology.  Indeed, after the discrediting of glottochronology and while objections continue to dog linguistic palaeontology, probably a majority of historical linguists nowadays instinctively look askance at any claim that linguistic data can be used for dating. 

What is left is the tone of many of the objections to articles such as Gray & Atkinson (2003), whose authors seem to start out from an a priori assumption that the long chronology “just can’t be right”, while struggling to find any compelling, quantifiable reason why not.  Beyond the techniques that other linguists do not hold to be reliable, their remaining grounds, ostensibly, are that the long chronology is simply too long to be “compatible” with the amount of linguistic divergence observed within Indo-European.  Yet in the absence of any reliable dating technique, the objections that it is “simply too hard to believe” that Indo-European could be “so old” end up seeming for what they are:  not objectivity verifiable calculations, but simply professions of faith in a traditional creed, long after its fundamental tenets have been exposed as unsound.

In such a context, Languages and Origins in Europe approaches the issue with an open mind:  namely, that at our current state of knowledge, we are forced to admit that a priori either the short or the long chronology may be compatible with the time-depth of Indo-European, which remains an unknown quantity.  In any case, in this project we do not put forward or investigate any dating method;  we shall limit ourselves to what light our methods and our results might shed on the issue of maximum and minimum rates of change possible over time, as far as we can quantify them in our studies of the four main language families of Europe.  This should at least help define a most plausible span of chronologies apparently compatible with the amounts of change observable in Indo-European languages, though this span may well transpire to be broad enough to accommodate both the short and the long chronologies.  If so, it may be the non-chronological aspects of our study of the early relationships between the main European language families are the ones that prove to have the most to contribute the debate on the ‘out of Anatolia’ hypothesis and its rivals.



Back to ContentsSkip to Next:  Interdisciplinary Discord?


Output A:  Publications & New Resources

Our specific studies of the four main language families of Europe, then, will form the basis of a number of articles both on issues of linguistic methodology, and on what our results can contribute to our understanding of the early relationships between those families, and the wider Indo-European question.  These articles will be submitted to publications in both linguistics and the new synthesis.

We shall also make further use of our main research effort to collect our databases on dialect-level variation across the main four European language families.  We shall make those data easily available on this website, both to allow for peer review of our work and as a resource for other researchers.  Indeed as well as exploring and testing our new methods and trying to answer our research questions, we are keen to use our data to develop resources for other audiences and purposes.  In particular, we shall make our phonetic data available in the form of tables which display our sound-recordings in comparative tables with instantaneous playback on your computer just by gliding your mouse over any of the transcriptions.  This enables the user to compare side by side the pronunciations of the same ‘shibboleth’ words in different dialects and languages of each family.  Click on the following links for an online preview of this resource for a selection of twenty common words in ten accent varieties of English;  and for our more extensive online database on the Sounds of the Andean Languages.



Back to ContentsSkip to Next:  Output B:  Advancing the New Synthesis


Interdisciplinary Discord?

Perhaps the single most important aspect of the research context in the new synthesis is the one to which we have already made reference above:  that despite their common interests in investigating the origins of human populations, all is not necessarily well between the different disciplines involved.  Some of the founding works of the new synthesis – such as Archaeology and Language by the archaeologist Renfrew (1987) and Genes, Peoples, and Languages by the population geneticist Cavalli-Sforza (2001) – have both been much criticised by linguists.

Similarly, articles such as Gray & Atkinson (2003) have generally been received with considerable scepticism by most linguists, for whom the new synthesis remains dogged by serious problems of inter-disciplinary understanding.  They point to what they see as a failure by non-linguists to grasp certain fundamental aspects of language that set it apart from the natural sciences, whose numerical processing methods cannot therefore necessarily be transposed successfully to language.  It is true that the input to these methods has at times betrayed a simplistic, linguistically uninformed analysis, blind to the many pitfalls in the peculiar nature of language data. 

Secondly, many of the interpretations that non-linguists have drawn from their results, including the claims made for apparent correlations in linguistic and genetic patterns across given populations, are flawed by misconceptions of how real languages change and diverge, interact and converge, and simply replace each other.  Languages are potentially so informative of the history of the peoples that spoke them only because of their inherent susceptibility to social, historical and certain specifically linguistic factors, for which the popular superficial parallels with speciation and population genetics are in fact poor and misleading analogies. 



Back to ContentsLast section on this page.


Output B:  Advancing the New Synthesis

In such a context, the other key objective of Languages and Origins in Europe is to foster a better understanding, within the new synthesis, of these essentially linguistic issues.  That is, after Archaeology and Language and Genes, Peoples, and Languages, we shall seek to set out the issues as seen from the linguistic viewpoint, though in publications intended for a wide audience from disciplines other than linguistics.  Specifically, we shall set out how linguists see the nature of the relationships between languages and peoples – or more accurately, populations – as illustrated specifically for Europe.  Our focus, then, will be not on the detailed mechanisms of change, but on their effects on how real languages relate to each other through time:  patterns of divergence (the tree vs. wave models), dialect fragmentation, standardisation, etc..  We shall address very specifically how the origins of populations as identified in genetics and archaeology can both match, but also fail to match, the genealogy of their languages:  through processes of language contact, replacement, and death, governed by sociolinguistic forces of language, ethnicity and identity. 

We hope thus to contribute to a truer grasp of the relevant linguistic principles, to help fill the critical gaps in inter-disciplinary understanding that still so frustrate the new synthesis, and thereby to help researchers at this highly topical meeting-point of three complementary fields bring the insights of their various disciplines together more successfully than hitherto.  Indeed, our publications will themselves be the product of the multidisciplinary input, perspectives and guidance available at the McDonald Institute. 


We also envisage to continuing the McDonald Institute’s tradition of playing an important facilitating role in the new synthesis by organising and hosting a symposium on our key issue of the early relationships between the language families of Europe, to be followed by publication based on the main contributions.




Back to Top