Historical linguistics blog - even weekends

What is the relation between universal patterns, frequency of words and forms, and language evolution and change? This is a question that is very little researched. What is the relation between universal patterns, frequency of words and forms, and language evolution and change? This is a question that is very little researched.
I have decided to move the updating of this blog to even weekends instead of Thursdays. Thursday is very often an extremely busy day, with no time left to update or complete blogposts for publication.

In this blogpost I will continue the previous topic of principles of language change. In historical linguistics, the pricinple of the particular status of the most frequent words and grammatical forms of language is well known. The most frequent lexemes and grammatical categories are more resistant to change. Lexemes, such as kinship words, body parts, numerals, fire, water, liver, and so forth, typically preserve more archaic paradigms, that may resist change for millenia. The most frequent adverbials and particles even resist phonological erosion and change. The most frequent verbs, such as 'to be' or 'to become', are typically irregular, and archaic inflection patterns and archaic categories, such as tenses, modalities, and aspectual categories, survive in these verbal stems. On the other hand, less frequent words, such as various verbs, nouns, and adjectives, are much more frequently impacted by analogy and other types of changes that harmonize and simplify language structures, making them more easy to memorize.

However, few studies investigate this from an evolutionary perspective, using phylogenetic methods. As shown by Pagel et al (2007) there is a correlation between lexical substitution and frequency in basic vocabulary. The most frequent words have generally lower substitution rates.

Frequency is very important in explaining cross-linguistic universal patterns, among others in morphological marking hierarchies in languages. More frequent categories, such as singular (in relation to plural), agent (in relation to object), present (in relation to past), are unmarked in relation to the categories, which are marked. This theory, known as the markedness theory (which has a lot of exceptions in languages) can to a large degree be explained by frequency (Greenberg 1966, Croft 1993, 2003).

In a current study I wanted to investigate the correlation between frequency and change rates of grammar, focusing on the Indo-European family. I compiled a sample of grammatical categories of word order, nominal morphology, verbal morphology and tense and organised the properties into hierarchical pairs according to the properties of present < past, pronoun < noun, agent < object, and masculine/feminine < neuter, which are well-known, universal, hierarchial relations, observed from a large number of languages. By means of an evolutionary model (performed by Chundra Cathcart), where transititions rates between property states over a tree were were reconstructed, we extracted the average number of transitions (per 1000 years) between each grammatical property in our data. 
When the results were split up into pairs of marking hierarchy, as mentioned above, it turned out that the rates of change in the lower categories (i.e., the less frequent ones from a universal perspective), was higher. The rates of the higher categories (i.e., more frequent ones from a universal perspectives), was lower. The difference was statistically significant (p=>0.005). Even if this study is based on one family (Indo-European), 149 languages and about 100 properties only, it seems likely that frequency impacts language change also in the grammar. This explains why more frequent grammatical categories preserve more archaic patterns over time.

Text has been updated 2019-03-11
Läs hela inlägget »
Marking hierarchies of grammatical properties observed in the literature. After (Bickel 2008; Comrie 1981; Croft 2003; Dixon 1979) Marking hierarchies of grammatical properties observed in the literature. After (Bickel 2008; Comrie 1981; Croft 2003; Dixon 1979)
I am currently travelling, so this blogpost will only very briefly discuss the topic of my current research in grammar reconstruction: the role of marking hierarchies in language change.
The notion of marking hierarchies has it roots in the markedness theory by Roman Jakobsen and implies that grammatical categories (e.g., singular - plural) typically are in a mutual, hierarchical relation, where one of the categories are morphologically unmarked, whereas the other is morphologically marked. The unmarked category thus has a higher position within a hierarchy of grammatical properties (singular < plural). These grammatical relations are, according to some authors, general, or "universal", anchored in our in-born grammatical system. However, we know that this is a problematic notion: there are a substantial amount of languages where the actual morphological marking contradict the proposed markedness hierarchies. Further, not all languages have morphology. Morphological marking alone cannot be the identifyer of marking hierarchies.
On the other hand, there is an obvious connection between the observed marking hierarchies and frequency. Superior categories, "unmarked" in the traditional markedness theory, are more frequently used in speech and in text. Again, the definion may be problematic, since not all languages have corpora that enable a detailed study of category frequency. Also, marking hierarchies based on frequency may contradict marking hierachies based on general morphological marking observations.
My current study on grammar reconstruction, which I have been writing about in several blogposts, indicate a clear correlation between change rates and marking hierarchies: superior categories, which are more frequent in grammar and most likely to be unmarked grammatically, have substantially lower change rates (and slower pace of change) than inferior categories, which have higher change rates (and faster pace of change). I will continue and follow up this topic in a coming blogpost. 

Bickel, Balthasar (2008), 'On the scope of the referential hierarchy in the typology of grammatical relations', in G. Corbett Greville and Michael Noonan (eds.), Case and Grammatical Relations. Studies in honor of Bernard Comrie (Amsterdam - Philadelphia: John Benjamins), 191-210.
Croft, William (2003), Typology and universals (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Comrie, Bernard (1981), Language universals and linguistic typology : syntax and morphology (Oxford: Blackwell).
Dixon, Robert M V (1994), Ergativity [Elektronisk resurs] (Cambridge: Cambridge University Press).
--- (1997), The Rise and Fall of Languages [Elektronisk resurs].
--- (2010a), Basic linguistic theory. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
--- (2010b), Basic linguistic theory [Elektronisk resurs]. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
Läs hela inlägget »
Tocharian B text THT 496, in cursive script, containing a literary poem, "Love letter". From CEToM database. Tocharian B text THT 496, in cursive script, containing a literary poem, "Love letter". From CEToM database.
This blogpost will give an overview of my popular lecture earlier this week on the role of patterns in syntax, grammar and literature for the deciphering of ancient languages (link to the lecture below, in Swedish).

My own experience on ancient language deciphering is basically restricted to Tocharian. On the other hand, Tocharian texts can be very difficult to understand, in particular if parallel text in Sanskrit, Khotanese, or Uighur (the most frequent translation languages for Tocharian) are absent.

Deciphering of ancient languages basically uses three instruments: script, language (lexicon and grammar), and literature. Reading the script is fundamental to understanding the content, and also in a phase where the content of a manuscript is known, there is often reason to go back to the manuscript and check the reading, which may open for new interpretations and renewed understanding of content of the text. In case of Tocharian, the script (North-Turkestanic Brahmi script) is relatively well known, even though there are some Tocharian B texts in cursive script that are very complex and difficult to interpret. On the other hand, almost all Tocharian texts are fragmentary in some aspects (burned, broken, etc.), which means that lacunae have to be completed and reconstructed. Parts of this reconstruction is to interpret the chacacters at manuscript edges, which may be cut or damaged. This indicates that even if the script is known, the work of a philologist still implies a substantial amound of manuscript reading.

Interpreting lexicon and grammar may imply substantial problems, if the language is not well known. In the case of Tocharian, the broken contexts, again, create large difficulties when we study syntax. Morphology is easier: paradigms can be established and reconstructed from forms found in texts, and there are few missing forms in the context of grammar forms in Tocharian. However, syntactic constructions require a larger corpus of complete sentences, and in a language such as Tocharian, there are often problems of finding enough complete sentences (that are not restored) for certain constructions, for instance in combination with a specific verb.
The lexicon has its own difficulties. In a language like Tocharian, the absence of close relatives is a problem (Tocharian descends immediatly from the Indo-European proto-language). If an unknown word is found in a text, we may assume a meaning based on the meaning of a presumed cognate in another Indo-European language. However, the connection to the presumed cognate may be a complete mistake and instead the meaning of the lexeme, as well as the etymology, is something entirely different.

This brings us over to the third category, literature. Besides script, literature is probably the most important of the instruments  mentioned at the beginning of this text. The exact meaning of words, which form the basis for a correct interpretation of a text, is highly related to the possibility of "proving" the content by a parallel or bilingual text. Most Tocharian texts are translations from Sanskrit, but besides that, Tocharian had its own literary tradition. Therefore, the exact source of a text can be difficult to trace. Some texts do not have any source texts at all. Since Tocharian, like any other literary language, is constrained by its literary tradition, the identificaiton of parallel patterns in, e.g., Sanskrit literary sources, are highly important to a proper understanding of the content and a correct translation of the lexical meanings and the syntax.

Link to a public lecture at Filosolficirkeln, Lund, about deciphering ancient languages.
Läs hela inlägget »
This week's blogpost will continue the thread about grammatical reconstruction, with some thoughts on lineage versus areality in grammar change. 

In general, change of grammar is supposedly cyclic (or spiralic according to some researchers): over time, typological organization of features in systems recur of are re-established. We may look at this issue both from a long-term and a short term perspective. One thing for a feature is the inherent possibility to be homologous (a simirlarity may depend on inheritance only) or homoplastic (a similarity may depend on internal or areal pressure, caused by various factors). Another thin is whether a similarity is caused by areal pressure or whether it is caused by lineage. A construction or a feature may be indicative of all of all these processes. For instance, a feature like word order is by nature homoplastic (similarities in word order may be due to areal or internal pressure, such as change in order of meaningful elements), but even then, a word order feature may be due to lineage: it has been inherited by ancestry generation after generation, or it is a critial innovation restricted to a specific sub-branch of a tree. Take for instance the verb-initial order in Celtic languages: it is likely that this feature is caused by interal pressure in the verbal paradigm (McCone 1987). Because of this, verb-initiality is a features which is restricted to the Celtic sub-branch and therefore a homologuous innovation of this specific branch, not caused by areal pressure. The feature is entirely independent of other Eurasian verb-initiality. Another example is the Germanic have-perfect. It is a homoplastic typological feature (expressing perfect by an auxiliary construction), which still uses the same cognate root as the auxiliary, the verb *haban. The process took place independently in all Germanic languages, due to parallel drift and possible areal pressure. As before, it is difficult to distinguish areality from lineage.

Very interesting is the process of Indo-European alignment change, from the proto-language to the daughter branches. It is quite evident that the reconstructed language bears morphological traces of a semantic-based system, similar to active-stative systems, as has been suggested by several scholars (Bauer 2000). But does it mean that Proto-Indo-European was an active language? Probably not. This concerns the question of stability of systems in general versus language-internal variation in tendencies to other systems. Indo-European alignment took three pathways of change, towards ergativity in the South-East, nautral marking in the West, and a preservation of the ancient system in between (roughly). What is the areal pressure component here, and what changes are dependent on internal procedures in languages, and what is the role of the residual morphology? These are questions that remain to be answered. 

McCone, Kim (1987), The early Irish verb (Kildare: Maynooth). 
Bauer, Brigitte (2000), Archaic syntax in Indo-European : the spread of transitivity in Latin and French (Trends in linguistics. Studies and monographs, 99-0115958-X ; 125; Berlin: Mouton de Gruyter).
Läs hela inlägget »
Berthold Delbrück (1842-1922) Berthold Delbrück (1842-1922)
The current post is about something that I am involved in right now: the reconstruction of grammar. In comparative linguistics, grammar can be reconstructed to a proto-language on the basis of the forms and functions in daughter languages. For instance, if there is a dative case in several languages with a specific marker that can be reconstructed to the joint proto-language, and this form has the function of dative in all languages, then it is also likely the the function of this marker was a dative also in the proto-language. However, the reality is often much more complex than that. Often, the function of a marker is different in various daughter languages: in our case above, we may have genitive or ablative instead of dative, and since we don't know if a genitive is more likely to become a dative or the other way round, we cannot reconstruct a the original, proto-language function of this specific marker. The problem is known as the "correspondence problem" and is a matter of controversy in syntactic reconstruction in general (Roberts 2007) (see picture below). 
The issue is particularly prominent in the reconstruction of Proto-Indo-European syntax, where many categories of the ancient languages, such as Sanskrit, Tocharian, and Greek, are absent in Anatolian, which, on the other hand, has a high number of other categories considered to be highly archaic.

In recent years, scholars have tried to approach this problem by using evolutionary and phylogenetic methods (Marutis and Griffith 2014, Dunn et al 2014, Cathcart et al 2018). The probability of presence of a specific feature at ancestral nodes is estimated, based on gains (1 -> o) and losses (0 -> 1) of features over a reference tree (lexical or hand-crafted). As expected, the method requires some adjustment to get reliable and reproducable results. One of them is to treat grammatical properties as logically dependent (which is a very tricky and complex matter), the other one is to use ancestry and clade constraints of trees, in order to avoid unecessary noice in the results.

However, even if evolutionary and phylogenetic methods are much more sophisticated than traditional methods in terms of amounts of data and number of calculations, the principle of the programs is based on the same problem as observed in the correspondence problem. If most of the daughter languages have specific property, then it is likely that this property was there also in the proto-language. If there is a rooted outgroup with another function, then the probability of presence of this function at the proto-language state is increased.

Currently, I am working with a dataset for Indo-European, which reconstructs probabilites of grammatical features to be present at the ancestral state of Proto-Indo-European (statistics has been performed by Chundra Cathcart, University of Zurich). The results are astonishing: with very few exceptions, the program reconstructs high probabilities for grammar features that were reconstructed to Proto-Indo-European by the Neogrammarians (Brugmann & Delbrück 1893, 1897, 1900). The reconstruction of Proto-Indo-European grammar by the Neogrammarians was done before the discovery of Hittite and Tocharian, which changed the preconditions for the typological reconstruction of the proto-language grammar to a high degree. Even if Tocharian and Anatolian is there in the data, this does not change the Neogrammarian reconstruction of Proto-Indo-European grammar. I will have reason to come back to this issue in further blogposts.  

Brugmann, Karl, Delbrück, Berthold, and Delbrück, Berthold (1893), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 3, Vergleichende Syntax der indogermanischen Sprachen, T. 1 (Strassburg: Trübner).
--- (1897), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 4, Vergleichende Syntax der indogermanischen Sprachen, T. 2 (Strassburg: Trübner).
--- (1900), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 5, Vergleichende Syntax der indogermanischen Sprachen, T. 3 (Strassburg: Trübner).
Cathcart, Chundra, et al. (2018), 'Areal pressure in grammatical evolution.', Diachronica, 35 (1), 1-34.
Dunn, Michael, et al. (2017), 'Dative Sickness: A Phylogenetic Analysis of Argument Structure Evolution in Germanic', Language: Journal of the Linguistic Society of America, 93 (1), e1-e22.
Harris, Alice C. and Campbell, Lyle (1995), Historical syntax in cross-linguistic perspective (Cambridge studies in linguistics, 0068-676X ; 74; Cambridge: Cambridge Univ. Press).
Maurits, Luke and Griffiths, Thomas L. (2014), 'Tracing the roots of syntax with Bayesian phylogenetics', Proceedings of the National Academy of Sciences, 111(37), 13576-81.
Roberts, Ian G. (2007), Diachronic syntax (Oxford textbooks in linguistics, 99-2380132-2; Oxford: Oxford University Press).
The principle of evolutionary reconstruction. Gains and losses are measured against a reference tree (lexical/hand-crafted), resulting is a probability of presence at ancestral nodes.
The principle of evolutionary reconstruction. Gains and losses are measured against a reference tree (lexical/hand-crafted), resulting is a probability of presence at ancestral nodes.
Representation of the correspondence problem. In the figure at the top, A is more likely than B, but in the figure below, B is more likely, despite A being more frequent. This principle is applied by evolutionary methods.
Representation of the correspondence problem. In the figure at the top, A is more likely than B, but in the figure below, B is more likely, despite A being more frequent. This principle is applied by evolutionary methods.
Läs hela inlägget »
Currently, at least if you are in the northern hemisphere, the darkest time of the year is approaching. This is also when we celebrate one of our most awaited festivities, which in English goes by the name Christmas. How old is this custom? It is highly likely that a festival during the darkest time of the year, the winter solstice, has a very long history, earlier than the introduction of Christianity, probably all the way back into Neolithic times, when the return of the sun was important for the preparation of the growing season. The festival has many forms in various cultures, among Jews it is represented by Chanukka, a feast of light, which is celebrated somewhat earlier than Christmas.

In some northern cultures, the winter solstice marks the beginning of the winter, in other Central European cultures, the winter period begins earlier. In Indo-European languages, winter, the cold and rainy season goes by the name of *ǵh(e)im- 'winter', also 'snow', a root that is found with the meaning 'cold season' in most languages, including Indo-Aryan. Germanic languages use another word for the cold season, *wintru-, which has two possible origins, either it is related to Latin unda 'wave', referring to 'the wet time of the year', or it is related to Gaulish vindo 'white', meaning 'the white time of the year'.

The festival that marks the winter solstice, 'Christmas' goes by different names in different languages. However, the symbols and the cultural habits show striking similarities between cultures. Important components in festivities are, besides excessive eating and drinking and giving of gifts, also the presence of death and the return of dead ancestors, equality of humans, and a celebration of light. In ancient Rome and other parts of the Mediterranean, the winter solstice festival had the name Saturnalia, which was a festival devoted to the god of the earth, Saturnus. An important component of the festival, besides excessive eating, drinking, visiting of friends and giving of gifts, was that the slaves were supposed to sit and eat in company of their masters. This is paralleled by the habit in northern cultures, where servants and houseowners were supposed to eat together in the kitchen during Christmas.

The words for 'Christmas' are different in various languages. Even though we have little information about celebrations of the winter solstice in older culture without written sources, the words may give us important indications of the purpose of the feast.

Many Germanic languages have preserved an ancient and obscure word for the feast, jul, Swedish jul, Old Swedish iūl, Icelandic jól, Danish jul, Old English, geohhol, géol, English yule, Gothic (fruma) jiuleis 'the month of Christmas'. From Proto-Norse, the word has also been borrowed into Finnish joulu, Estonian jõulud. The meaning of this word is uncertain, but there are two alternatives: either the word is derived from a root related to Old Icelandic él 'storm', referring to the time of winter storms, or it is derived form a root of Indo-European *jek- 'speak out loud', which in many languages, such as Latin iocus 'joke', has the meaning of 'joke, amusement'.
The word for Indo-European 'winter',  *wintru-, recurs in Latvian Ziemassvētki.

Another group of words relate to meanings of 'holiness', such as German Weinacht, Middle High German wīhenahten (known since the 12th century), meaning 'holy night', or the word for 'God', in Slavic languages bȏgъ, Polish Boże Narodzenie, Bosnian Božić, Croatian and Serbian Božić, Macedonian Božiḱ. Lithuanian has preserved an ancient word in their term Kalėdos, which is from the name of the pagan god Koliada, who personalizes the newborn winter.

An important set of Christmas words relate to meanings of 'new' and 'birth' or 'rebirth'. We have derivations of Latin natīvitas in Spanish Navidad, Latin nātalīs in French Noël, Portuguese Natal, Italian natale, borrowed into many languages, such as Marathi Nātāḷa, or Turkic Noel, also Irish nollaig, Welsh nadolig, Scots-Gaelic nollaig (borrowed from Latin natalicia 'nativity'). Alternatively, we have Russian rozhdestvo, Belorussian roždiestvo, derived from ród 'birth' and borrowed into, e.g., Kazakh Rojdestvo, Uzbek Rojdestvo.

Another group - to which we count the English Christmas - refers immediately to the birth of Christ: Greek Χριστούγεννα, Dutch Kerstmis, Frisian Kryst, Luxembourgish Chrëschtdag, Albanian Krishtlindje. From English, the word has been borrowed into many languages, such as Hindi krisamas, Nepali Krisamasa, Malayalam krismas, Japanese kurisimasu, Samoan Kerisimasi, Tamil Kiṟistumas, Talugu Krismas, Swahili Krismasi, Thai Khris̄t̒mās̄, Xhosa Krisimesi, and so forth.

And with this little overview of Christmas words, I would like to wish you all a Merry Christmas!

-The text has been updated 2018-12-15-

Läs hela inlägget »
Phylogenetic tree, where Tocharian is second to branch off, after Anatolian (by Chundra Cathcart). Phylogenetic tree, where Tocharian is second to branch off, after Anatolian (by Chundra Cathcart).
This post is related to what I am currently busy with: preparing and introductory course on Tocharian. There is a long-debated dilemma in Tocharian studies, which concern the position of Tocharian within the Indo-European language tree. Due to its status as a kentum-language, most scholars of the early 20th ct. regarded Tocharian as a western Indo-European language (together with Celtic, Germanic, Italic and so forth) rather than an eastern language. This view is not supported anymore, but the position of Tocharian still remains an enigma. Today, most scholars agree that Tocharian branched off from the Indo-European proto-language directly (and is thus not more closely related to any other branch). The disagreement of contemporary scholars is whether Tocharian branched off second, after Anatolian, and before the other Indo-European branches or not. There are several arguments in favor of the second-to-branch-off theory. One argument is the occurrence of lexical archaisms in Tocharian, meaning that a handful of etymologies have preserved a more general meaning in Tocharian, whereas the other branches show a more spezialized meaning. Examples are:
  • Toch. AB yäp- ‘enter’, Skt. yabh-, Greek oíphō, Russ. ebu ‘have intercourse’ < PIE *yebh- ‘enter’ (LIV:309) The original meaning of the verb is preserved in Tocharian.
  • TB kärweñe ‘stone, rock’, Skt. grāvan- ‘stone for pressing out soma’, Welsh breuan ‘handmill’, Old Ch. Slav. žrǔny ‘handmill’.
  • TB śrān-* ‘(adult) man’ < PIE *ģerh₂-ōn, Skt. járant- ‘old, fragile’, Gr. géront- ‘geriatric’, Oss. zärond ‘old’ < PIE * ģerh₂- ‘mature, grow’ (LIV:165). The meaning ‘old’, ‘geriatric’ is an innovation of the non-Tocharian languages.
The idea of lexical archaisms is not totally irrelevant; as I wrote in my previous blog, we know by statistical testing, that specialization is more frequent than generalization.
The other argument is from phylogenetics. In phylogenetic trees, Tocharian consistently branches off second, after Anatolian. Again, this argument is based on lexical data, but from a completely different angle.
What about grammar? The arguments in favor of Tocharian to be second to branch off are complicated, in particular since they are dependent on which type of system we reconstruct for Proto-Indo-European. Without going too much into detail, we have two types of reconstrucitons, one relatively simple system, more similar to Anatolian, from which the other branches developed their system, and one more complex reconstruction, more similar to Sanskrit and Classical Greek, in which Anatolian lost most of its grammar. The position of Tocharian here is not clear. It is obvious that Tocharian rearranged and rebuilt most of its nominal - and partly also verbal - system, and this complicates the picture. The Tocharian reformation of the system was partly done by morphological material which is found in the other branches, partly Anatolian but also Old Indic and Classical Greek.
The enigma waits to be solved.
Läs hela inlägget »
An evolutionary reconstruction of meanings of the cognacy tree of Proto-Indo-European *st(e)h₂w-ro- ’big cattle’ (?) (by Harald Hammarström) An evolutionary reconstruction of meanings of the cognacy tree of Proto-Indo-European *st(e)h₂w-ro- ’big cattle’ (?) (by Harald Hammarström)
I have not shared anything in a month, since I have been on a 'road-trip', first to Arizona for the CES conference, and then to Beijing and Changsha (Hunan Province) for a lecture series on historical and evolutionary linguistics.
In Arizona, we (with Harald Hammarström and Sandra Cronhamn) presented some results of evolutionary semantic studies on culture vocabularies of our corpus, including data from Indo-European, Caucasian families, Turkic, Uralic, Basque and ancient Semitic (book of abstracts is found here). This study has two aspects: one being the causalities of change rates, the second directionality of semantic change.
In this post, I will focus on the first aspect, causalities of change rates. As our data, we used the 100-list of cultural words of farming, pastoralism, hunting, war, technology, and industry, that we have in our database DiACL. We built an evolutionary model, where we measured gain and loss rates of 21,874 meaning tokens (6,224 types) within cognate trees, contrasted against Glottolog reference trees. After adjustment for transition frequency, 3,442 meanings remained. The gain and loss rates (given as probabilites) we tested against various metrics. We had some preliminary results, but the issue is still being researched. Previous research on lexical change rates (e.g., Pagel et al, Nature 449, Vejdemo et al, PLOS 2016 11,1) have indicated a connection to word frequency (the more frequent a word is, the lower change rates), as well as to age of acquisition, synonyms, arousal, imageability and average mutual information. However, this research has been performed on basic vocabulary only, and we expect most of these causalities to be less relevant to a vocabulary such as ours. Frequency, for instance, showed no correlation at all to our results. However, we found a negative correlation to borrowability, which is highly noteworthy: apparently, lexemes that are frequently borrowed have slower change rates. Further, we found a correlation to colexifcation tendency, as well as cognacy productivity, which is to be expected (words that change their meaning often and which are diverse in geography are expected to have high change rates). Currently, we test various semantic properties of the lexemes, and this is where the interesting part begins: it is evident that inherent properties that are said to impact gender and classifiers, such as animacy, shape, mass/count etc, have no correlation to change rates. But, cultural aspects, such as labour intensity, processability, possibility to control and change, do have an impact. I am still testing various properties and aspects, and hopefully, results can soon be made ready for submission.   
Läs hela inlägget »

Today, I am busy preparing my talk for the Cultural Evolution Society Conference at ANU Tempe, Arizona, and will not have time for a blogpost. In two weeks I will return with an overview of our talk on semantic evolution!
Visit the CES 2018 homepage, and follow on twitter or facebook.
Read also our recent paper on the DiACL database on ancient language typology.

Läs hela inlägget »
Drinking party among northern people according to Historia de gentibus septentrionalibus by Olaus Magnus (1555). Drinking party among northern people according to Historia de gentibus septentrionalibus by Olaus Magnus (1555).

Liquids are not just vital to our survival, they also form a central part of our culture. Most human gathering has the procedure of drinking as its common denominator, be it water, wine, beer, tea, or coffee. This post is about ancient drinking and words for drinking in languages (coffee and tea will be in a later blog).
The two most vital liquids to humans – as well as to mammals in general – are water and milk. Water we drink all our lives; without water we cannot survive. Milk we drink our first year; during this period, milk represents our entire need for nourishment. In many cultures, individuals continue to drink milk from cows, goats or sheep, either in the form of fresh milk or as cheese or yoghurt. In other cultures, milk is not a natural part of the diet later on in life.
Looking at the words for water and milk, they are both high-conservative words, which belong to languages’ basic vocabulary. In Indo-European, both words can be reconstructed to the proto-language, and the form has not changed much during the family’s history. The Proto-Indo-European word for water *wód-r-/wéd-n-, look similar in its earliest appearance, Hittite watar, strikingly similar to English water several millennia later. The root for milk, Proto-Indo-European *h₂melǵ-, is not very different from the form in Russian molokó, Tocharian B malkwer, or in Old Norse mjǫlk, English milk. Fresh milk as a drink is most frequent in Europe and less frequent in other parts of Eurasia, and the ability to drink milk, lactose tolerance, is a genetic mutation that goes back 6,000 years in Central Europe. The mutation is not unique to Europe, other independent epicentres are also found in Saudi Arabia and Western Africa.
 At least in Europe, there is a popular generalization about ‘drinking belts’, which sometimes are used to generalize about various peoples’ mentality, typically the ‘wine belt’ and the ‘beer belt’, often also the ‘vodka belt’ and sometimes also a ‘milk drinking zone’. Beer and wine are both very ancient and central drinks in all of Eurasia. Another important drink is mead, which is tightly connected with bee keeping. Mead has lost its importance in the last millennia, probably due to the more efficient production of beer and wine. Vodka, whiskey and other distilled drinks have a short history: they are a result of distillation, which is a relatively modern process.
Among beer and wine, beer is the most archaic drink, which appears in many lexical forms. The preparation of a toxic, fermented drink, based on cereals, was invented already by the earliest Neolithic farmers in West Asia and Anatolia 10,000 years ago. With the preparation of beer came also the practice of cultic feast; occasions where people worshipped the gods, ate, drank, sacrificed, and got (probably very) drunk. A common word for beer can be reconstructed to Indo-European *h2el-u-, but it is frequently substituted (like in English beer): likely, the production of beer was divergent and different in cultures, with many local deviations, and for this purpose, many languages substituted their beer words.
Wine has a different story. The production of wine is related to farming of the domesticated grape, a practice that began in the Caucasian region about 8-7,000 years ago. The word for wine is also the same in all languages, and it is most likely that the words spread through all languages at an early state, together with the invention of wine. Wine cannot be planted in Northern parts of Eurasia, still all languages have a word for wine. The ultimate source of the wine-root is not clear. Often, Proto-Semitic or Proto-Kartvelian are believed to be the sources of the word (PIE *woh₁i-no- ‘wine’ < PIE *weh₁-i- ‘to turn, wind’; Proto-Kartvelian *ɣwin- ‘wine’, Proto-North-West Caucasian *ωwə- ‘wine; alcoholic drink’, Proto-Dagestanian *ωun- ‘wine; one-year-old vine shoot’, also found in early Semitic languages, Old Testment Hebrew yayin, Ugaritic yn). The Indo-European root, on the other hand, is derived from a verb meaning ‘to wind’ (referring to the vine), which to some indicates an Indo-European origin. This may be a secondary adaptation in Indo-European, so we cannot be certain about the origin of the word wine.
Cognacy map of words for WINE in modern (top) and ancient (bottom) languages. One rott dominates almost the entire map.
Cognacy map of words for WINE in modern (top) and ancient (bottom) languages. One rott dominates almost the entire map.
Läs hela inlägget »

Highlighted publications


Welcome to visit the infrastructure and lab DiACL. All data is open access and free of use to everyone!