Historical linguistics blog - even weekends

Wordcloud of the texts from the blogposts of spring 2019. Wordcloud of the texts from the blogposts of spring 2019.
The Swedish summer vacation is approaching, and I will go to Australia, among others to attend the International Conference on Historical Linguistics in Canberra, 1-5 July. I will give two talks, one about the evolution and tendencies of gender assignment in Indo-European, and one about the evolution and change of alignment in Indo-European. After the summer intermission I will return and write more about these two topics in different posts.
However, I will try (if I have time and possibility) to make an overview of some of the interesting talks from the ICHL conference. Therefore, stay tuned! Thanks to all readers and have a nice summer!
Wordcloud of texts from the blogposts of autumn 2018.
Wordcloud of texts from the blogposts of autumn 2018.
Läs hela inlägget »
Variation in definiteness marking in modern Eurasian languages Variation in definiteness marking in modern Eurasian languages
This blogpost will briefly introduce a highly interesting phenomenon in the history of Eurasian languages, namely the emergence of definiteness. Most ancient attested Indo-European languages do not have definitess marking, but the phenomenon appears relatively early on in several languages, in various forms. The emergence of the various types of definiteness markings do not seem to be areally caused, rather, most of the variants emerge through internal pressure and grammaticalization. In addition, definiteness is not restricted to the Indo-European languages but occurs also in various forms in Caucasian families, in Turkic, as well as in some Uralic languages.
There are several types of definiteness marking, which typically co-occur in languages. One type, is to have a non-bound definite article (as a special word class), as in German or English:
das Haus
def house ‘the house’

Another type is a bound definite marker, as in Scandinavian:
house-def ‘the house’

The fundamental types of definiteness are  definiteness marked on the adjective, as in Swedish:
det stor-a hus-et
DEF large-DEF house-DEF
‘the large house’

Definite marking can be obligatory, either at the end or at the beginning of a Noun Phrase, as in Bulgarian:
xubava-ta kniga
nice-DEF book
‘the nice book’

The ancient Indo-European languages lack definiteness, and this state has been preserved in a huge area of predominately Slavic and Indo-Aryan languages. The emergence of the various forms of definiteness began - apparently independently and with large variation even within branches of the families - already in ancient times, and escalates during the medieval period. A large part of the existing variation seems to be caused by parallel evolution. Still, the exact causes for the variation remain obscure.

Bauer, Brigitte. 2007. "The definite article in Indo-European. Emergence of a new grammatical category?" In Nominal Determination. Typology, context constratis, and historical emergence, edited by Elisabeth Stark, Elisabeth Leiss and Werner Abraham, 103-139. Amsterdam-Philadelphia: John Benjamins.

Variation in definiteness marking in historical Eurasian languages. Legens see map of modern languages above.
Variation in definiteness marking in historical Eurasian languages. Legens see map of modern languages above.
Probability levels of different types of definiteness marking in protolanguages, based on an evolutoinary test using the data of the DiACL database.
Probability levels of different types of definiteness marking in protolanguages, based on an evolutoinary test using the data of the DiACL database.
Läs hela inlägget »
The Tocharian A - Sanskrit bilingual A 387 (THT 1021). From https://www.univie.ac.at/tocharian/ The Tocharian A - Sanskrit bilingual A 387 (THT 1021). From https://www.univie.ac.at/tocharian/
Of the 3672 entries of the Tocharian A dictionary (Carling and Pinault to appear), 772 lemma have been marked as “from Sanskrit”, which represents 21% of the entire vocabulary (of 1508 nouns, 338 are from Sanskrit, representing 22%). Of these 772 lemma, 39 are marked as “via Middle Indic”, which represents 5% of the words borrowed from Sanskrit. Compared to Sanskrit loans, other source languages are marginal: there are 22 words marked as “from Middle Iranian”, 5 “from Chinese”, 10 “from Uighur”, 10 “from Prakrit”, and 4 “from Pali”.
What does this imply? First, and foremost, of course, that Sanskrit, or rather Buddhist Hybrid Sanskrit, plays a fundamental role in Tocharian literature. “From Sanskrit” means that a word has been borrowed from Classical Sanskrit (Monier Williams 1899) or Buddhist Hybrid Sanskrit (Edgerton 1953, Bechert, Waldschmidt, and Bongard-Levin 1996) with no other change than an adaptation to the morphological system according to the languages’ rules for adapting loans (Krause and Thomas 1960). “From Prakrit” or “from Pali” means that the word can be traced back to a source attested in Pali or Prakrit texts, which apparently is much more unusual than the other way round.
So, what type of changes are we talking about when we define words as “via Middle Indic” instead of just “from Sanskrit”? (Note that the examples below are from Tocharian A: there are also similar patterns in Tocharian B (Carling 2005)). Let us look at a couple of examples.
Some of the words are almost identical to the Sanskrit word, with little change:
  •  A pāruṣak (n.) ‘name of a mythical garden’, via Middle Indic from Sanskrit pāruṣyaka- ‘n. of one of the groves of trāyastriṃśa gods’ (BHSD:343b), as in Pali phārusaka- ‘name of one of Indra's groves’ (PED:478b).
  •  A kās* ‘Kāśa, a species of grass’, via Middle Indic from Sanskrit kāśa- ‘a species of grass’ (MW:280b).
In other lexemes, there is more far-gone phonological change, which were either taken over from the Middle Indic source word, or alternatively, they took place in Tocharian. This remains unclear. Examples are:
  • A kurkal (n.) ‘bdellium, a medical ingredient’, via Middle Indic from Sanskrit gulgulu- ‘bdellium’ (MW:360b).
  • A klawe (n.m.) ‘die, throw of the die’, via Middle Indic from Sanskrit glaha-, originally ‘throw of the dice’, and individually ‘die’ (MW:374b).
  • A jar (n.m.) ‘topknot’, via Middle Indic from Sanskrit jaṭā- ‘the hair twisted together (as worn by ascetics)’ (MW:408a).
  • A tāpātriś (n.m.) ‘name of a class of gods’, via Middle Indic from Sanskrit trāya(s)-triṃśa- ‘name of a class of gods’, cf. Pali tāvatiṃsa (BHSD:257b).
  •  A patatam (adv.) ‘fortunate, gifted’, via Middle Indic from Sanskrit pradattam, neuter adv. from pradatta- ‘granted, bestowed, gifted’ (MW:679c).
  • A nātäk (n.m.) ‘lord’, via Middle Indic from Sanskrit nāthaka-, derived from Sanskrit nātha- ‘protector, patron, owner, lord’ (MW:534c).
This vocabulary, both in Tocharian A and B (which has a larger vocabulary), is very interesting. The lexemes were apparently not borrowed from the literary standard of Prakrit and Pali or from Buddhist Hybrid Sanskrit directly. Rather, they were borrowed from one or several local Indo-Aryan dialects, which became extinct, but which may be part of a general change in Middle Indo-Aryan leading to the dialectal diversity of Modern Indo-Aryan languages.
In addition, the boundaries between Indo-Aryan and Iranian in some of these lexemes are not sharp: the words may have been borrowed from Iranian, but since Indo-Aryan is much better attested (via Classical Sanskrit), an Indo-Aryan source becomes more likely.
A systematization of sound changes in these words would likely add knowledge to the evolution of sound changes in Middle Indo-Aryan leading to Modern Indo-Aryan. This will also help us to teas apart Iranian from Indo-Aryan borrowings in Tocharian.

Bechert, Heinz, Ernst Waldschmidt, and Grigorij Maksimovic Bongard-Levin. 1996. Sanskrit-Wörterbuch der buddhistischen Texte aus den Turfan-Funden. Beih. 6, Sanskrit-Texte aus dem buddhistischen Kanon: Neuentdeckungen und Neueditionen, 3. Folge. Göttingen: Vandenhoeck und Ruprecht.
Carling, Gerd. 2005. "Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language." In Proceedings of the Sixteenth Annual UCLA Indo-European Conference, edited by Karlene Jones-Bley, Martin E. Huld, Angela Vella Volpe and Miriam Robbins Dexter, 47-70. Washington: Institute of Man.
Carling, Gerd, and Georges-Jean Pinault. to appear. A Dictionary and Thesaurus of Tocharian A. Wiesbaden: Harrassowitz.
Edgerton, Franklin. 1953. Buddhist hybrid Sanskrit grammar and dictionary, William Dwight Whitney linguistic series: Yale U.P.; Oxford U.P.
Krause, Wolfgang, and Werner Thomas. 1960. Tocharisches Elementarbuch. B. 1, Grammatik. Heidelberg.
Monier Williams, Monier. 1899. A Sanskrit-English Dictionary : Etymologically and philologically arranged with special Reference to Cognate Indo-European Languages. Oxford: At the Clarendon Press.
Läs hela inlägget »
Sogdian manuscript from Dunhuang Mogao (Source: http://idp.bl.uk) Sogdian manuscript from Dunhuang Mogao (Source: http://idp.bl.uk)
During the period of the first centuries BCE, the impact of Iranian becomes important in Tocharian. This is something we know from the relatively large amounts of loanwords in Tocharian from various Iranian languages, beginning with one or several unknown Old Iranian dialects (which are not Avestan or Old Persian) and continuing with loans from various known Middle Iranian languages, such as Khotanese, Sogdian, and Bactrian. As usual with loans, the exact match of the source word is seldom found, meaning that the exact source language cannot be identified.
Iranian loans in Tocharian are interesting from the viewpoint of their semantic domains, which are indicative of the cultural impact of the Iranians on the Tocahrians in Central Asia.

A majority of the words refer to administrative concepts , e.g., titles or specific concepts of merchandise or administration, indicating that the Iranians influenced the Tocharians by imposing an administrative infrastructure. Examples are:
  • Tocharian B waipecce 'possession', from Old lranian, Avestan xʷaēpaiθya­'own'
  • Tocharian B waipte 'separately, apart' < Common Tocharian *wai-pätæ, borrowed probably from an adjective, Old Iranian *hwai­pati in the sense of 'independent, oneself’.
  • Tocharian A pärko, B pärkau 'advantage, profit, interest' < Common Toch. *pärkāwV, borrowed from Old Bactrian, Bactrian φρογαοο 'profit', Old lranian *fragāwa-, Sogdian prγ'w, βry'w, Parthian frg'w 'treasure'.
  • Tocharian A pare, B peri 'debt' < Common Tocharian *pæräī is borrowed from Old Bactrian *pāra > Bactrian paro 'debt, obligation, loan, amount, due'.
  • Tocharian A  āpṣātrik* ‘citizen of a borough or market-town’, borrowed from Old Iranian *αβþαρο < *api-xšaθra- ‘borough, sub-district (of a city)’.
Other words clearly refer to military concepts, such as values or terms for weapons:
  • Tocharian B tsain 'arrow' from an Old lranian *dzaina-, Avestan zaēna- 'weapon'.
  • Tocharian A āmāṃ B amāṃ ‘pride, arrogance’, loan from Middle Iranian, cf. Buddhist Sogdian ’’m’n ‘power’.
  • Tocharian A āṣāṃ B aṣāṃ ‘worthy’, borrowed from Middle Iranian, cf. Khotanese āṣaṇa- ‘worthy’.
  • Tocharian A āṣānik B āṣānike ‘venerable, worthy of respect’, loan from Middle Iranian, with same sourse as A āṣāṃ B aṣāṃ
  • A senik ‘care, pledge, guaranteee’, from Middle Iranian *zēnik (Khot. ysīnīta, Sogd. zynyh, Kroraina Prakrit jheniya-)
A bunch of words refer to farming and the household. Examples are:
  • Tocharian AB ās ‘she-goat’, borrowed from Middle Iranian.
  • Tocharian A kātak* B kattāke ‘master of the house, householder’, from Common Tocharian *kāttākǝ borrowed from Middle Iranian, cf. Khotanese ggāṭhaa, itself borrowed from Middle Indic, cf. Gāndhārī Prakrit *ghahaṭha, from Sanskrit gṛhastha-.
  • Tocharian A miṣi B miṣṣe, miṣṣi ‘field’, borrowed from Khotanese mäṣṣa, miṣṣa ‘field for seed’.
 A small amount of words are Buddhist terms (normally, the impact of Sanskrit is enormous on both Tocharian languages here). Examples are:
  • Tocharian A pissaṅk ‘community of monks’, from Middle Iranian from Skt. bhikṣusaṃgha-  ‘Mönchsgemeinde, Mönchsorden’ (SWTF III:298b), cf. Khotanese bisaṃga-.
Finally, we have a group of words referring to plants and ingrediants which are unfamiliar to the Tocharian fauna (also here, Sanskrit loans are much more common). Examples are:
  • Tocharian A kārāś B karāśe* Via TB from Khotanese karāśśa ‘climbing plant’.
  • Tocharian A kuñcit B kwäñcit, kuñcit, from Khotanese kuṃjsata- ‘sesame’.

In conclusion, the Iranian impact on Tocharian is mainly pre-Buddhist, referring to concepts of administration, warfare, and farming. With the change to Buddhism, the impact of Old and Middle Aryan becomes completely dominating in both Tocharian languages.
The words have been extracted from these sources:
Carling Gerd (to appear). A Dictionary and Thesaurus of Tocharian A. Complete Edition. In collaboration with Georges-Jean Pinault. Wiesbaden: Harrassowitz (610p.).
Carling, Gerd (2005). Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language. In: Jones-Bley, Karlene, Huld, Martin E., Volpe, Angela Vella,  Dexter, Miriam Robbins Proceedings of the Sixteenth Annual UCLA Indo-European Conference. Journal of Indo-European Studies. Monograph Series 50, 47-70.
These sources have many references to works by, e.g., Georges-Jean Pinault, K T Schmidt, Werner Winter, Nicholas Sims-Williams, Harold Bailey, L Isebaert, Jörundur Hilmarsson.
Läs hela inlägget »
Words for Easter in Eurasian languages, defined by their meaning. Words for Easter in Eurasian languages, defined by their meaning.
This week, also known as the Holy Week, is part of the holiday that in English goes by the name of Easter. Easter, which is celebrated throughout all of the Judaeo-Christian world, is one of the most important festivities of the year, marking the beginning of spring or summer and the resurrection of Christ. Like most Christian holidays, the roots of Easter go back into pagan times. In particular in Northern Europe, many of the mysterious habits of an ancient spring festival have survived until today. Children chase an unvisible easter hare, which puts candy-filled eggs in the grass. Birch twigs are compiled, taken indoors, and ornamented with painted eggs and feathers. Children also dress as witches or 'easterhubbies' (the difference is whether you wear a scarf or a hat), painting their faces with red dots, and go from door to door asking for candy. Afterwards, they are supposed to fly on their brooms to Brocken. Fires and fireworks are lit, and, most importantly, enormous quantities of egg, fish, meat, and candy are consumed.

So, which are the terms we use for this festival? Most languages have form of the Greek (via Latin) word paskha, itself borrowed from Aramaic (Hebrew Pesach), meaning 'passover'. The West Germanic terms, such as English Easter and German Ostern, go back to a Common Germanic goddess of spring, Old English Eastre, which is identical to the Indo-European goddess of dawn *h2éus-ōs (Sanskrit uṣās, Latin aurōra). Other languages have words that in various ways relate to the basically biblical rituals of Easter, including 'sacrificial animal', 'taking of the meat', 'resurrection', 'great day' or 'great night', or 'liberation'.

Just as with the Christmas words (see http://www.gerdcarling.se/i/a32842142/2018/12/), the map of meanings of Easter unveil important information about various cultural spheres, as well as exceptions in the form of islands of different usage.

With this little etymological overview I would like to wish you all a Happy Easter!

Lubotsky, Alexander. Brill Online Dictionaries: Indo-European Etymological Dictionaries Online (https://dictionaries-brillonline-com.ludwig.lub.lu.se/iedo). Accessed 2019-04-17.
Troels-Lund 1932. Dagligt liv i Norden på 1500-talet. VII Årets fester. Stockholm: Bonniers.
Andersson et al 1968. Kulturhistoriskt Lexikon för Nordisk Medeltid XIII. Malmö: Allhems förlag.

I thank Ante Petrović for assistance with compiling/checking data for the Easter map.

Wikipedia has an excellent overview of names of Easter: https://en.wikipedia.org/wiki/Names_of_Easter
Läs hela inlägget »
Heatmap of frequency of source and target language of loan events in our data, defined by language power and population size (from 1-5). Graph  by Johan Frid. Heatmap of frequency of source and target language of loan events in our data, defined by language power and population size (from 1-5). Graph by Johan Frid.

In the previous blogpost, I started a compilation of safe loans from and into Tocharian. I will continue this work in the next post. In this post, I will talk about loan directionality, since I am currently completing a paper (with several co-authors) on lexical borrowability in Eurasian languages. I want to say a few word about this project.

We have compiled and extracted all loan events in the lexical database, and tested various statistical measures on this data. Worth noticing is the directionality of loans in contrast to language power as well as the differential source languages of the families. As I have described in recent posts, our data set on lexical data compiles culture concepts, i.e., words for farming, technology, hunting, and war, which have a presumed age that go at least back to the Chalcolithic. This means that this vocabulary is not representative for the entire lexicon, only these specific domains. Loans are also extended over long periods, at least back to antiquity. If we look at the source languages, we notice that they differ between families. In Indo-European, Latin is most frequent, followed by Middle Low German, French, Old French, Slavic, Classical Greek. In Caucasian, Turkic languages dominate, followed by Persian, Georgian, and Arabic. In Uralic, Scandinavian languages dominate, which is mainly due to the fact that our Fenno-Ugric languages dominate in our data (see pictures below).

The correlation between loan directionality and language power and populations size is also noteworthy. We define the power of languages by a quantitative rank based on several features, including literary power, economic power and population size. This we plot against the occurence as source and target language in loan events (see graph above). All languages are equally likely to be target languages, but the most powerful languages are more likely to be source languages. This is a significant correlation. The most frequent loan event is from a very powerful language to a very weak. The second most frequent language is from a medium powerful to a weak. The third most frequent loan is from a medium powerful to a medium powerful language. In scrutinizing the data, we observe that this type of loan event is almost entirely restricted to the middle ages, which is also an interesting result. Unequality between languages seems to be specific to the antique and modern periods, whereas language contact in the middle ages was more distributed between languages of equal power.

Graph illustrating the most frequent source languages in Indo-European (top), Caucasian (middle), and Uralic (bottom) families.
Graph illustrating the most frequent source languages in Indo-European (top), Caucasian (middle), and Uralic (bottom) families.
Läs hela inlägget »
European-looking farmers or traders in a Chinese tomb from 2nd c. BCE, Hunan province. From Hunan Provincial Museum. Photo: Gerd Carling European-looking farmers or traders in a Chinese tomb from 2nd c. BCE, Hunan province. From Hunan Provincial Museum. Photo: Gerd Carling

I was asked by my friend and colleague Victor Mair (University of Pennsylvania) to come up with my 'safe list' of loans from and into Tocharian. This is a very interesting and challenging topic, which I will continue working upon in a couple of coming posts. First, I will start with the most tricky one: Tocharian loan contacts with Chinese.
Establishing Tocharian loans from and into Chinese are particularly complex for two reasons: first, the reconstruction of Chinese phonology at various stages in the Chinese prehistory, which is connected to many uncertainties and a large amount of debate, and second, the reconstruction of Tocharian phonology, which is particularly tricky and complex. The fundamental question is: How can we be certain that a specific word was borrowed at a certain stage from one reconstructed language to another? The prehistory of both languages can be stratified into various stages, Pre-Proto and Proto-Chinese, Old Chinese (Early and Late) Middle Chinese, and Pre-Proto- and Proto-Tocharian, Common Tocharian, Pre-A and Pre-B, and Tocharian A and B. Beyond that, we have the proto-languages Proto-Indo-European and Proto-Sino-Tibetan, which can be further stratified into stages on their way to Proto-Chinese and Proto-Tocharian. 
How can we know that a word, that obviously looks as if it was borrowed from Indo-European, is borrowed from Tocharian? The answer is that we have to show that specific Tocharian sound changes have taken place in the specific borrowed lexeme. These changes also have to be identified in the target language from the corresponding period. The process is very tricky, and the result is very few certain loans, more uncertain loans, and a huge number of uncertain loans.

Tocharian loans from Old Chinese (before 2nd ct BCE)
Toch. AB klu ‘rice’ was borrowed from Old Chinese: Mod. Ch. dào, Mid. Ch. *dawX, Old Chin. *C-luu-? ‘rice, rice-paddy’ (GSR 1078). In Middle Chinese, the initial cluster OChin. *gl- was simplified to d-. 
Toch. B rapaññe ‘of the last month of the year’ (LP 12 a2 rapaññe meṃne ikäṃ-wine ‘on the second day of the month rapaññe’), an adjective formed on a noun *rāp, from Old Chinese: Mod. Ch. là, Mid. Ch. *lap, Old Ch. *raap (GSR 637j) ‘winter sacrifice’. It is likely that an earlier meaning of the Chinese word is reflected in Tocharian.
Toch. A ri B rīye 'town' < Common Toch. *riye matches the Old Chinese reconstruction of Mod. Ch. lĭ, Mid. Ch. *liX, Old Ch. *r̯ǝ-? (GSR 978a) ‘walled city’. The word may also be a Tocharian loan in Old Chinese.
Further loans include  Toch. A truṅk Toch. B troṅk 'cave' 

Tocharian loans from Early Middle Chinese (possibly 3-4th ct ACE)
TA ṣoṣtäṅk ‘tax collector, banker’ (Skt. śreṣṭhin-) corresponds to Niya ṣoṭhaṃga ‘tax collector’, Bactr. σωταγγο < *šoštaṅgV. A possible source is Mod. Ch. shōucáng, Mid. Ch. *syuw+dzang, Old. Ch. *xiw-N-s-(h)raŋ (GSR 1103a+727g´) ‘receive, accept, gather’ + ‘conceal, store’.
TA ṣukṣ ‘(smaller) village’, TB kwaṣo* ‘village’. Parallel Mod. Ch. sù, Mid. Ch. *sjuwk, Old Ch. *suk (GSR 1029a) ‘lodge, mansion’. Itō & Takashima (1996:401) reconstruct Old Ch. *sjәkw-s with a final *-s (that has a function of localisation and production of nomina actionis etc.).
Toch. A āṅk* ‘seal, stamp’, Mod. Ch. yìn, Mid. Ch. *ʔjinH, Old Ch. *ʔin-s (GSR 1251f), *ʔi̯əɳ (Takashima) ‘seal, stamp’.

Further loans include
Toch. B cāk, tau  '(dry measures)', Toch. B cāne 'money'. Toch. B śakuse 'brandy', Toch. B ṣaṅk '(measure of volume)', TA yāmutsi TB yāmuttsi 'waterfowl' < 'parrot', Toch. B ṣitsok 'millet alcohol', Toch. B ṣipāṅkiñc 'abacus', Toch. A Toch B cok 'lamp', Toch. A lyäk Toch. B lyak 'thief', Toch. A < Toch. B tseṃ 'blue, Toch. A nkiñc Toch. B ñkante 'silver'.

These words give important indications of the impact of the Chinese culture on Tocharian. The track will be continued further on.

Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language. In: Jones-Bley, Karlene, Huld, Martin E., Volpe, Angela Vella,  Dexter, Miriam Robbins Proceedings of the Sixteenth Annual UCLA Indo-European Conference. Journal of Indo-European Studies. Monograph Series (Institute for the Study of Man) 50, 47-70.
Kim, Ronald. (1999). Observations on the absolute and relative chronology of Tocharian loanwords and sound changes. Tocharian and Indo-European studies, 8, p. 111–138.
Lubotsky, Alexander, & Starostin, Sergei. (2003). Turkic and Chinese loan words in Tocharian.
Židek, Jan. (2017). Tocharian Loanwords in Chinese [Dissertation]. Praha: Univerzita Karlova.

Läs hela inlägget »
What is the relation between universal patterns, frequency of words and forms, and language evolution and change? This is a question that is very little researched. What is the relation between universal patterns, frequency of words and forms, and language evolution and change? This is a question that is very little researched.
I have decided to move the updating of this blog to even weekends instead of Thursdays. Thursday is very often an extremely busy day, with no time left to update or complete blogposts for publication.

In this blogpost I will continue the previous topic of principles of language change. In historical linguistics, the pricinple of the particular status of the most frequent words and grammatical forms of language is well known. The most frequent lexemes and grammatical categories are more resistant to change. Lexemes, such as kinship words, body parts, numerals, fire, water, liver, and so forth, typically preserve more archaic paradigms, that may resist change for millenia. The most frequent adverbials and particles even resist phonological erosion and change. The most frequent verbs, such as 'to be' or 'to become', are typically irregular, and archaic inflection patterns and archaic categories, such as tenses, modalities, and aspectual categories, survive in these verbal stems. On the other hand, less frequent words, such as various verbs, nouns, and adjectives, are much more frequently impacted by analogy and other types of changes that harmonize and simplify language structures, making them more easy to memorize.

However, few studies investigate this from an evolutionary perspective, using phylogenetic methods. As shown by Pagel et al (2007) there is a correlation between lexical substitution and frequency in basic vocabulary. The most frequent words have generally lower substitution rates.

Frequency is very important in explaining cross-linguistic universal patterns, among others in morphological marking hierarchies in languages. More frequent categories, such as singular (in relation to plural), agent (in relation to object), present (in relation to past), are unmarked in relation to the categories, which are marked. This theory, known as the markedness theory (which has a lot of exceptions in languages) can to a large degree be explained by frequency (Greenberg 1966, Croft 1993, 2003).

In a current study I wanted to investigate the correlation between frequency and change rates of grammar, focusing on the Indo-European family. I compiled a sample of grammatical categories of word order, nominal morphology, verbal morphology and tense and organised the properties into hierarchical pairs according to the properties of present < past, pronoun < noun, agent < object, and masculine/feminine < neuter, which are well-known, universal, hierarchial relations, observed from a large number of languages. By means of an evolutionary model (performed by Chundra Cathcart), where transititions rates between property states over a tree were were reconstructed, we extracted the average number of transitions (per 1000 years) between each grammatical property in our data. 
When the results were split up into pairs of marking hierarchy, as mentioned above, it turned out that the rates of change in the lower categories (i.e., the less frequent ones from a universal perspective), was higher. The rates of the higher categories (i.e., more frequent ones from a universal perspectives), was lower. The difference was statistically significant (p=>0.005). Even if this study is based on one family (Indo-European), 149 languages and about 100 properties only, it seems likely that frequency impacts language change also in the grammar. This explains why more frequent grammatical categories preserve more archaic patterns over time.

Text has been updated 2019-03-11
Läs hela inlägget »
Marking hierarchies of grammatical properties observed in the literature. After (Bickel 2008; Comrie 1981; Croft 2003; Dixon 1979) Marking hierarchies of grammatical properties observed in the literature. After (Bickel 2008; Comrie 1981; Croft 2003; Dixon 1979)
I am currently travelling, so this blogpost will only very briefly discuss the topic of my current research in grammar reconstruction: the role of marking hierarchies in language change.
The notion of marking hierarchies has it roots in the markedness theory by Roman Jakobsen and implies that grammatical categories (e.g., singular - plural) typically are in a mutual, hierarchical relation, where one of the categories are morphologically unmarked, whereas the other is morphologically marked. The unmarked category thus has a higher position within a hierarchy of grammatical properties (singular < plural). These grammatical relations are, according to some authors, general, or "universal", anchored in our in-born grammatical system. However, we know that this is a problematic notion: there are a substantial amount of languages where the actual morphological marking contradict the proposed markedness hierarchies. Further, not all languages have morphology. Morphological marking alone cannot be the identifyer of marking hierarchies.
On the other hand, there is an obvious connection between the observed marking hierarchies and frequency. Superior categories, "unmarked" in the traditional markedness theory, are more frequently used in speech and in text. Again, the definion may be problematic, since not all languages have corpora that enable a detailed study of category frequency. Also, marking hierarchies based on frequency may contradict marking hierachies based on general morphological marking observations.
My current study on grammar reconstruction, which I have been writing about in several blogposts, indicate a clear correlation between change rates and marking hierarchies: superior categories, which are more frequent in grammar and most likely to be unmarked grammatically, have substantially lower change rates (and slower pace of change) than inferior categories, which have higher change rates (and faster pace of change). I will continue and follow up this topic in a coming blogpost. 

Bickel, Balthasar (2008), 'On the scope of the referential hierarchy in the typology of grammatical relations', in G. Corbett Greville and Michael Noonan (eds.), Case and Grammatical Relations. Studies in honor of Bernard Comrie (Amsterdam - Philadelphia: John Benjamins), 191-210.
Croft, William (2003), Typology and universals (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Comrie, Bernard (1981), Language universals and linguistic typology : syntax and morphology (Oxford: Blackwell).
Dixon, Robert M V (1994), Ergativity [Elektronisk resurs] (Cambridge: Cambridge University Press).
--- (1997), The Rise and Fall of Languages [Elektronisk resurs].
--- (2010a), Basic linguistic theory. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
--- (2010b), Basic linguistic theory [Elektronisk resurs]. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
Läs hela inlägget »
Tocharian B text THT 496, in cursive script, containing a literary poem, "Love letter". From CEToM database. Tocharian B text THT 496, in cursive script, containing a literary poem, "Love letter". From CEToM database.
This blogpost will give an overview of my popular lecture earlier this week on the role of patterns in syntax, grammar and literature for the deciphering of ancient languages (link to the lecture below, in Swedish).

My own experience on ancient language deciphering is basically restricted to Tocharian. On the other hand, Tocharian texts can be very difficult to understand, in particular if parallel text in Sanskrit, Khotanese, or Uighur (the most frequent translation languages for Tocharian) are absent.

Deciphering of ancient languages basically uses three instruments: script, language (lexicon and grammar), and literature. Reading the script is fundamental to understanding the content, and also in a phase where the content of a manuscript is known, there is often reason to go back to the manuscript and check the reading, which may open for new interpretations and renewed understanding of content of the text. In case of Tocharian, the script (North-Turkestanic Brahmi script) is relatively well known, even though there are some Tocharian B texts in cursive script that are very complex and difficult to interpret. On the other hand, almost all Tocharian texts are fragmentary in some aspects (burned, broken, etc.), which means that lacunae have to be completed and reconstructed. Parts of this reconstruction is to interpret the chacacters at manuscript edges, which may be cut or damaged. This indicates that even if the script is known, the work of a philologist still implies a substantial amound of manuscript reading.

Interpreting lexicon and grammar may imply substantial problems, if the language is not well known. In the case of Tocharian, the broken contexts, again, create large difficulties when we study syntax. Morphology is easier: paradigms can be established and reconstructed from forms found in texts, and there are few missing forms in the context of grammar forms in Tocharian. However, syntactic constructions require a larger corpus of complete sentences, and in a language such as Tocharian, there are often problems of finding enough complete sentences (that are not restored) for certain constructions, for instance in combination with a specific verb.
The lexicon has its own difficulties. In a language like Tocharian, the absence of close relatives is a problem (Tocharian descends immediatly from the Indo-European proto-language). If an unknown word is found in a text, we may assume a meaning based on the meaning of a presumed cognate in another Indo-European language. However, the connection to the presumed cognate may be a complete mistake and instead the meaning of the lexeme, as well as the etymology, is something entirely different.

This brings us over to the third category, literature. Besides script, literature is probably the most important of the instruments  mentioned at the beginning of this text. The exact meaning of words, which form the basis for a correct interpretation of a text, is highly related to the possibility of "proving" the content by a parallel or bilingual text. Most Tocharian texts are translations from Sanskrit, but besides that, Tocharian had its own literary tradition. Therefore, the exact source of a text can be difficult to trace. Some texts do not have any source texts at all. Since Tocharian, like any other literary language, is constrained by its literary tradition, the identificaiton of parallel patterns in, e.g., Sanskrit literary sources, are highly important to a proper understanding of the content and a correct translation of the lexical meanings and the syntax.

Link to a public lecture at Filosolficirkeln, Lund, about deciphering ancient languages.
Läs hela inlägget »

Highlighted publications



Welcome to visit the infrastructure and lab DiACL. All data is open access and free of use to everyone!