Historical linguistics blog - even weekends

Continuing my blogposts about gender, I will say a few words about gender stability. Over time, words often change their gender. This is well known, for instance, in Germanic languages, the words for 'sun' and 'moon' are feminine and masculine respectively (as in German die Sonne and der Mond), whereas other branches of Indo-European the situation is the reverse (Italian sole masculine 'sun' and luna feminine 'moon').
The important and interesting thing here is to investigate the reasons for gender stability or instability. Are they connected to a specific gender? Or are they connected to specific words? Or is gender stability a matter of frequency? There are still very few, if no studies that look at gender stability, using large-scale data sets.
If we consider fist the issue of gender instability in our culture data set for Indo-European, we notice that is little difference between the genders when it comes to stability in cognates. We distinguished three classes, cognates with more than 90% same gender (stable class), cognates with between 90-50% same gender (dominant class), and cognates with under 50% same gender (change class). Wee notice that all three genders masculine, feminine, and neuter have approximately the same distribution within the classes stable, dominant and changing gender (see picture below). However, the masculine is slightly overrepresented in the stable group, feminine in the dominant group and neuter in the change group, meaning that the masculine is most stable, feminine a bit less stable, and neuter must untable. However, the differences are small.
What is more interesting though, and probably also promising for future research on gender stability, is that there is a large variation in the stability of different semantic classes. Crops, metals, trees, vegetables, prodcuts, are all highly stable, drink & drugs, small cattle, and tillage, etc and highly unstable. And so forth. If there is a connection to general frequency remains to be controlled for the entire Indo-European family, but a study on gender in Scandinavian languages only (Van Epps, Carling & Sapir 2019), found a correlation between frequency and gender instability.

Van Epps Briana, Gerd Carling & Yair Sapir to appear. “Gender assignment in six North Scandinavian languages: Patterns of variation and change”, to appear in a journal.
 

Heatmap of frequency of occurrence of various semantic classes in the different categories stable (<90% same gender), dominant (90-50% same gender) and change (<50% same gender)
Heatmap of frequency of occurrence of various semantic classes in the different categories stable (<90% same gender), dominant (90-50% same gender) and change (<50% same gender)
Distributions of the genders masculine, feminine and neuter for the classes stable (<90% same gender), dominant (90-50% same gender) and change (<50% same gender)
Distributions of the genders masculine, feminine and neuter for the classes stable (<90% same gender), dominant (90-50% same gender) and change (<50% same gender)
Läs hela inlägget »
Heatmap of the genders alternans, commune, feminine, masculine, and neuter of semantic classes of lexemes in Indo-European lexical data (104 concepts, 105 languages). Heatmap of the genders alternans, commune, feminine, masculine, and neuter of semantic classes of lexemes in Indo-European lexical data (104 concepts, 105 languages).




This week's blog post will deal with a complex topic: gender assignment.
As I have described in a previous post, gender involves a classification of nominal entities in language. Gender can generally be defined as classes of nouns which are reflected in the behaviour of associated words (Corbett 1991: 1). That is, gender is indicated by agreement of various elements. Gendered languages have varying number of genders present and they vary with respect to assignment, or how individual lexical items receive a gender (Audring 2014, 2017). Some languages assign gender based on semantic principles (semantic assignment systems), in which gender reflects categories such as biological sex or animacy. Other languages have formal assignment systems, which can be divided into morphological and phonological assignment (Corbett 1991: 7-8). Thus, gender assignment may be guided by semantic qualities (e.g., male/female, level of abstractness, shape), by morphological criteria (e.g., stem formation, inflection class, derivational suffixes), or by phonological criteria (e.g. word-final vowels or consonants). Languages may use semantic factors only, or a combination of semantic and formal factors, but all gender languages have a some semantic core (Corbett 1991: 8).

When looking at gender assignment in Indo-European culture vocabulary (the 100-culture list of our database, consisting of 8,500 gender- and cognacy-coded lexical items), some interesting tendencies emerge. We cannot investigate the phonological and morphological assignment principles on the data in its current shape (words in languages have not ben coded for morphology or phonology), but many other interesting tendencies can be extracted from the data.
First, the total distribution of genders of lexical items in the data is straightforward as masculine<feminine<neuter<alternans (see below). This is also reflected in the timeline of evolution of genders (see below), where we see that the masculine dominates in the early period, but weakens during the antique period and then regains strength during the first and in particular the second millenia ACE, on behalf of the feminine and in particular the neuter.  
We code all concepts for various semantic properties listed in the literature as important for gender assignment, such as animacy, collectiveness, countability, sexus, concreteness, and form/shape. In addition, we divide gender by different concepts classes, which we conclude by patterns of colexification and semantic change in the data.
We find that animated concepts (animals in our data) are significantly associated with the masculine gender (we compile both male and female forms of animals, but the overrepresentation of masculine for the general terms is important in the data). Further, we find that collectives as well as concepts coded as materials are significantly associated with the neuter gender. Our data does not contain abstract nouns, but surprisingly, we find that sharp and sticking implements are significantly associated with the feminine gender.
These tendencies for semantic properties undelie the overrepresentation of particular genders in certain semantic classes, which can be seen in the heatmap of gender distribution in relation to different classes above. In this heatmap, which divides concepts into classes, we can observe that neuter is overrepresented for metals and materials and drink and drugs, masculine is overrepresented for all animals, feminine is overrepresented for weapons, trees and insects (honeybee). This indicates that assignment is not just caused by semantic property, it is very likely also caused by semantic class, but more research and data is required to prove this assumption.

Audring, Jenny (2014), 'Gender as a complex feature', Language Sciences, 43, 5-17.
--- (2017), 'Calibrating complexity: How complex is a gender system?', Language Sciences, 60, 53-68.
Carling, Gerd (2019), Mouton Atlas of Languages and Cultures. Vol. 1: Europe, Caucasus, Western and Southern Asia (Berlin - New York: Mouton de Gruyter).
Corbett, Greville G. (1991), Gender (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
--- (2014), The expression of gender [Elektronisk resurs] (Berlin ;: De Gruyter Mouton).
Corbett, Greville G. and Fraser, Norman M. (2000), 'Gender assignment: a typology and a model', in Gunter Senft (ed.), Systems of Nominal Classification (Cambridge: Cambridge University Press), 293-325.
Corbett, Greville G. and Fedden, Sebastian (2016), 'Canonical Gender', Journal of Linguistics, 52 (3), 495-531.
Van Epps, Briana 2019. Sociolinguistic, comparative and historical perspectives on Scandinavian gender: With focus on Jamtlandic. PhD dissertation, Lund.
 
Distribution of the genders alternans, commune, neuter, feminine, and masculine in the dataset (lexemes of 104 concepts in 105 Indo-European languages)
Distribution of the genders alternans, commune, neuter, feminine, and masculine in the dataset (lexemes of 104 concepts in 105 Indo-European languages)
Timeline of gender distribution in the lexical dataset (by Briana Van Epps).
Timeline of gender distribution in the lexical dataset (by Briana Van Epps).
Läs hela inlägget »
MCA (Multi Correspondence Analysis plot of the typological gender data. Graph by Marc Tang MCA (Multi Correspondence Analysis plot of the typological gender data. Graph by Marc Tang
Evolutionary reconstruction of gender in Indo-European is a highly interesting field. The subject is a perfect testbed for how well evolutionary methods generally work. The core issue is that the system that we reconstruct to Proto-Indo-European, a system with a commune/neuter distinction, which has developed into a sexus-based system (masculine/feminine/neuter) in most daughter branches, is preserved only in Anatolian (Hittite, Luwian), the oldest attested Indo-European branch. However, in Scandinavian and Dutch/Frisian, a commune/neuter system has re-emerged as a merger of a previous three-gender system. Therefore, on the surface, Anatolian and Scandinavian are similar, as we see from the MCA plot above, which indicates the synchronic similarities of Indo-European gender systems based on attested languages. However, the similarity between Scandinavian, Frisian/Dutch and Hittite/Luwian is an illusion, or - to use evolutionary terminology -  an example of homoplasy. The background and the functionality of the different systems are completely different. How can we make evolutionary methods account for this difference in the reconstruction reconstruct?
This is where we can test how well different models perform. Experiments (performed by our colleagues Chundra Cathcart, Harald Hammarström, and Marc Tang) indicate that the result of an evolutionary reconstruction are similar to the model of a comparative reconstruction (even if the the method, of course, is completely different). What we want the evolutionary reconstruciton to produce is a high probability of masculine/neuter at the root (i.e., Proto-Indo-European) and a lower probability of a feminine.
In experimenting with the data and different models, we find that the most important thing is the shape of the tree. For Indo-European, we get different results if we use a branched vs non-branched tree, if we use Indo-Anatolian vs non-Indo-Anatolian, if we use ancestry constraints vs. non-ancestry-constraints (ancestral languages are situated on the branches of trees, not 'cousins' to the living language). As for the model, we get different results depending on if we us an Markov Chain Monte Carlo model, which is basically constructing a chain that has a desired distribution as its equilibrium distribution, where one can obtain a sample of the desired distribution by recording states from the chain. A Dollo model has as its precondition that a system never returns exactly to its previous state, but it keeps trace of intermediate stages through which it passes. A Dollo model with and Indo-Anatolian tree produces a reconstruction which looks almost similar to Anatolian. However, more experimenting needs to be performed: obviously, it is necessary to have a correct tree of a family before an evolutionary reconstruction can be performed. But different models of reconstruction may be better than others, depending on how they deal with the problem of homoplasy and parallel drift.

 
Läs hela inlägget »
The distribution of nominal gender systems in Indo-European languages. The distribution of nominal gender systems in Indo-European languages.



I am taking up this blogg after a summer intermission. During the summer, I have been at International Conference of Historical Linguistics 24 in Canberra and at the 52nd Annual Meeting of Societas Linguistica Europea in Leipzig. In both places I talked about one specific topic, which have attracted my interest recently: gender evolution and gender assignment, specifically in Indo-European.
In a couple of coming blogposts, I will talk specifically about this issue. The first post will deal with the morphosyntactic reconstruction of the Indo-Europen gender system.

First of all, how do we define gender? The typical way in which this is done is to use the definition of agreement, which is visible on an agreeing article, adjective or verb. Normally, the gender system of a language is described in grammars, which is reflected in the dictionary of this language. However, this definition does not work for pronominal gender, which is more tricky. For defining pronominal gender, it is necessary to look at the occurrence of gendered forms in pronominal systems.

Gender is prototypically a property of nouns, and once the gender has been identified for all nouns in a language, an important issue is to try to define the underlying causes for gender assigment. There is plenty of research on this issue, both from a general typological perspective as well as with respect to individual languages.  According to the canonical gender literature (Corbett 1991, 2013, Corbett and Fraser 2000), there are three basic principles according to which gender is assigned in languages. These are phonological, morphological and semantic. A fundamental problem is that these rules typically compete in languages.

What is the situation in Indo-European?

  • Most languages have gender (masculine, feminine, neuter).
  • No language has ”purely” phonological, morphological, or semantic assignment.
  • Diachrony apparently plays a role: many language inherit larger or smaller parts of their gender system and gender assignment on nouns.
  • Most languages have competing rules for assignment.


The next issue is the reconstruction of Indo-European gender. For the reconstruction of the Indo-European gender system, based on a morphological reconstruction of systems in the various branches, there are three proposed suggestions in the literature. The option suggested by Hermann Hirt in the 1930s (Hirt 1934, 1937) was that Indo-European had no gender, which then later developed into a three-gender system by means of grammaticalization. The reconstruction of Delbrück and Brugmann (Brugmann & Delbrück 1893, 1897, 1900) contained three genders, like Sanskrit, Classical Greek and Latin, which later was either preserved or collapsed into a masculine-feminine or a common-neuter system. However, Brugmann and Delbrück were uncertain about the feminine gender, basically due to the formal correspondence in the reconstructed state of the feminine and the neuter (the -h2- suffix). Based on this formal similarity between the collective/neuter and the feminine, as well as the shape of the system of Anatolian with a commune and a collective/neuter, later Indo-European scholars agree that Indo-European had a two-gender animate-inanimate system (which is reflected in the Anatolian system), which later developed into a sex-based gender system with an additional collective gender, the neuter (see Table 1) (Luraghi 1911, Matasović 2004).
Basically, the model of Hirt implies that gender evolved by grammaticalization, the Delbrück model that the three-gender system of Indo-European either remained or collapsed. However, we must remember that both these models were constructed before the discovery of Anatolian.
The mainstream model is based on an idea of a typological evolution of the gender systems, which moves from an animate - inanimate to a sexus-based system, which retains the difference between animacy in the masculine feminine and the difference between abstract and concrete in feminine-neuter (table 1).

In brief, the mainstream model supposes that there is:

  • Trace of the old system in languages
  • Emergence of human~non-human distinction after the proto-language
  • Emergence of an abstract~conctrete distinction of non-human gender after the proto-language
  • Later mapping into a sexus-based system with retention of the concrete inanimate (neuter)
  • Continuation of the ancient assignment principles in various languages

Table 1. The developmental phases of the Indo-European gender system according to the mainstram model (after Luraghi 2009).
Stage 1 ANIMATE INANIMATE
Stage 2 HUMAN ABSTRACT CONCRETE
Stage 3 MASCULINE/FEMININE FEMININE NEUTER



The next issue in this process is to find out what happens if an evolutionary model is used for the reconstruction (Cathcart, Carling et al 2018, Carling 2019)? Gender reconstruction is an important question for evolutionary models, since the system reconstructed to Proto-Indo-European has been changed in most living languages (see Table 1).

I will discuss this issue in the next blogpost.


REFERENCES:
Brugmann, Karl, Delbrück, Berthold, and Delbrück, Berthold (1893), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 3, Vergleichende Syntax der indogermanischen Sprachen, T. 1 (Strassburg: Trübner).
--- (1897), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 4, Vergleichende Syntax der indogermanischen Sprachen, T. 2 (Strassburg: Trübner).
--- (1900), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 5, Vergleichende Syntax der indogermanischen Sprachen, T. 3 (Strassburg: Trübner).
Carling, Gerd (2019), Mouton Atlas of Languages and Cultures. Vol. 1: Europe, Caucasus, Western and Southern Asia (Berlin - New York: Mouton de Gruyter).
Cathcart, Chundra, et al. (2018), 'Areal pressure in grammatical evolution.', Diachronica, 35 (1), 1-34.
Corbett, Greville G. (1991), Gender (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Corbett Greville, G. (2013), 'Gender typology', The Expression of Gender.
Corbett, Greville G. and Fraser, Norman M. (2000), 'Gender assignment: a typology and a model', in Gunter Senft (ed.), Systems of Nominal Classification (Cambridge: Cambridge University Press), 293-325.
Hirt, Hermann Alfred (1934), Indogermanische Grammatik. T. 6, Syntax, 1 : syntaktische Verwendung der Kasus un der Verbalformen (Heidelberg: Carl Winter).
Luraghi, Silvia (2011), 'The origin of the Proto-Indo-European gender system: Typological considerations', Folia Linguistics, 45 (2), 435-64.
Matasović, Ranko (2004), Gender in Indo-European (Heidelberg: Winter).

 

Prononminal gender systems in Indo-European languages.
Prononminal gender systems in Indo-European languages.
Läs hela inlägget »
Wordcloud of the texts from the blogposts of spring 2019. Wordcloud of the texts from the blogposts of spring 2019.
The Swedish summer vacation is approaching, and I will go to Australia, among others to attend the International Conference on Historical Linguistics in Canberra, 1-5 July. I will give two talks, one about the evolution and tendencies of gender assignment in Indo-European, and one about the evolution and change of alignment in Indo-European. After the summer intermission I will return and write more about these two topics in different posts.
However, I will try (if I have time and possibility) to make an overview of some of the interesting talks from the ICHL conference. Therefore, stay tuned! Thanks to all readers and have a nice summer!
Wordcloud of texts from the blogposts of autumn 2018.
Wordcloud of texts from the blogposts of autumn 2018.
Läs hela inlägget »
Variation in definiteness marking in modern Eurasian languages Variation in definiteness marking in modern Eurasian languages
This blogpost will briefly introduce a highly interesting phenomenon in the history of Eurasian languages, namely the emergence of definiteness. Most ancient attested Indo-European languages do not have definitess marking, but the phenomenon appears relatively early on in several languages, in various forms. The emergence of the various types of definiteness markings do not seem to be areally caused, rather, most of the variants emerge through internal pressure and grammaticalization. In addition, definiteness is not restricted to the Indo-European languages but occurs also in various forms in Caucasian families, in Turkic, as well as in some Uralic languages.
There are several types of definiteness marking, which typically co-occur in languages. One type, is to have a non-bound definite article (as a special word class), as in German or English:
das Haus
def house ‘the house’

Another type is a bound definite marker, as in Scandinavian:
hus-et
house-def ‘the house’

The fundamental types of definiteness are  definiteness marked on the adjective, as in Swedish:
det stor-a hus-et
DEF large-DEF house-DEF
‘the large house’

Definite marking can be obligatory, either at the end or at the beginning of a Noun Phrase, as in Bulgarian:
xubava-ta kniga
nice-DEF book
‘the nice book’

The ancient Indo-European languages lack definiteness, and this state has been preserved in a huge area of predominately Slavic and Indo-Aryan languages. The emergence of the various forms of definiteness began - apparently independently and with large variation even within branches of the families - already in ancient times, and escalates during the medieval period. A large part of the existing variation seems to be caused by parallel evolution. Still, the exact causes for the variation remain obscure.

Sources:
Bauer, Brigitte. 2007. "The definite article in Indo-European. Emergence of a new grammatical category?" In Nominal Determination. Typology, context constratis, and historical emergence, edited by Elisabeth Stark, Elisabeth Leiss and Werner Abraham, 103-139. Amsterdam-Philadelphia: John Benjamins.



 
Variation in definiteness marking in historical Eurasian languages. Legens see map of modern languages above.
Variation in definiteness marking in historical Eurasian languages. Legens see map of modern languages above.
Probability levels of different types of definiteness marking in protolanguages, based on an evolutoinary test using the data of the DiACL database.
Probability levels of different types of definiteness marking in protolanguages, based on an evolutoinary test using the data of the DiACL database.
Läs hela inlägget »
The Tocharian A - Sanskrit bilingual A 387 (THT 1021). From https://www.univie.ac.at/tocharian/ The Tocharian A - Sanskrit bilingual A 387 (THT 1021). From https://www.univie.ac.at/tocharian/
Of the 3672 entries of the Tocharian A dictionary (Carling and Pinault to appear), 772 lemma have been marked as “from Sanskrit”, which represents 21% of the entire vocabulary (of 1508 nouns, 338 are from Sanskrit, representing 22%). Of these 772 lemma, 39 are marked as “via Middle Indic”, which represents 5% of the words borrowed from Sanskrit. Compared to Sanskrit loans, other source languages are marginal: there are 22 words marked as “from Middle Iranian”, 5 “from Chinese”, 10 “from Uighur”, 10 “from Prakrit”, and 4 “from Pali”.
What does this imply? First, and foremost, of course, that Sanskrit, or rather Buddhist Hybrid Sanskrit, plays a fundamental role in Tocharian literature. “From Sanskrit” means that a word has been borrowed from Classical Sanskrit (Monier Williams 1899) or Buddhist Hybrid Sanskrit (Edgerton 1953, Bechert, Waldschmidt, and Bongard-Levin 1996) with no other change than an adaptation to the morphological system according to the languages’ rules for adapting loans (Krause and Thomas 1960). “From Prakrit” or “from Pali” means that the word can be traced back to a source attested in Pali or Prakrit texts, which apparently is much more unusual than the other way round.
So, what type of changes are we talking about when we define words as “via Middle Indic” instead of just “from Sanskrit”? (Note that the examples below are from Tocharian A: there are also similar patterns in Tocharian B (Carling 2005)). Let us look at a couple of examples.
Some of the words are almost identical to the Sanskrit word, with little change:
  •  A pāruṣak (n.) ‘name of a mythical garden’, via Middle Indic from Sanskrit pāruṣyaka- ‘n. of one of the groves of trāyastriṃśa gods’ (BHSD:343b), as in Pali phārusaka- ‘name of one of Indra's groves’ (PED:478b).
  •  A kās* ‘Kāśa, a species of grass’, via Middle Indic from Sanskrit kāśa- ‘a species of grass’ (MW:280b).
 
In other lexemes, there is more far-gone phonological change, which were either taken over from the Middle Indic source word, or alternatively, they took place in Tocharian. This remains unclear. Examples are:
  • A kurkal (n.) ‘bdellium, a medical ingredient’, via Middle Indic from Sanskrit gulgulu- ‘bdellium’ (MW:360b).
  • A klawe (n.m.) ‘die, throw of the die’, via Middle Indic from Sanskrit glaha-, originally ‘throw of the dice’, and individually ‘die’ (MW:374b).
  • A jar (n.m.) ‘topknot’, via Middle Indic from Sanskrit jaṭā- ‘the hair twisted together (as worn by ascetics)’ (MW:408a).
  • A tāpātriś (n.m.) ‘name of a class of gods’, via Middle Indic from Sanskrit trāya(s)-triṃśa- ‘name of a class of gods’, cf. Pali tāvatiṃsa (BHSD:257b).
  •  A patatam (adv.) ‘fortunate, gifted’, via Middle Indic from Sanskrit pradattam, neuter adv. from pradatta- ‘granted, bestowed, gifted’ (MW:679c).
  • A nātäk (n.m.) ‘lord’, via Middle Indic from Sanskrit nāthaka-, derived from Sanskrit nātha- ‘protector, patron, owner, lord’ (MW:534c).
This vocabulary, both in Tocharian A and B (which has a larger vocabulary), is very interesting. The lexemes were apparently not borrowed from the literary standard of Prakrit and Pali or from Buddhist Hybrid Sanskrit directly. Rather, they were borrowed from one or several local Indo-Aryan dialects, which became extinct, but which may be part of a general change in Middle Indo-Aryan leading to the dialectal diversity of Modern Indo-Aryan languages.
In addition, the boundaries between Indo-Aryan and Iranian in some of these lexemes are not sharp: the words may have been borrowed from Iranian, but since Indo-Aryan is much better attested (via Classical Sanskrit), an Indo-Aryan source becomes more likely.
 
A systematization of sound changes in these words would likely add knowledge to the evolution of sound changes in Middle Indo-Aryan leading to Modern Indo-Aryan. This will also help us to teas apart Iranian from Indo-Aryan borrowings in Tocharian.

Bechert, Heinz, Ernst Waldschmidt, and Grigorij Maksimovic Bongard-Levin. 1996. Sanskrit-Wörterbuch der buddhistischen Texte aus den Turfan-Funden. Beih. 6, Sanskrit-Texte aus dem buddhistischen Kanon: Neuentdeckungen und Neueditionen, 3. Folge. Göttingen: Vandenhoeck und Ruprecht.
Carling, Gerd. 2005. "Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language." In Proceedings of the Sixteenth Annual UCLA Indo-European Conference, edited by Karlene Jones-Bley, Martin E. Huld, Angela Vella Volpe and Miriam Robbins Dexter, 47-70. Washington: Institute of Man.
Carling, Gerd, and Georges-Jean Pinault. to appear. A Dictionary and Thesaurus of Tocharian A. Wiesbaden: Harrassowitz.
Edgerton, Franklin. 1953. Buddhist hybrid Sanskrit grammar and dictionary, William Dwight Whitney linguistic series: Yale U.P.; Oxford U.P.
Krause, Wolfgang, and Werner Thomas. 1960. Tocharisches Elementarbuch. B. 1, Grammatik. Heidelberg.
Monier Williams, Monier. 1899. A Sanskrit-English Dictionary : Etymologically and philologically arranged with special Reference to Cognate Indo-European Languages. Oxford: At the Clarendon Press.
 
 
Läs hela inlägget »
Sogdian manuscript from Dunhuang Mogao (Source: http://idp.bl.uk) Sogdian manuscript from Dunhuang Mogao (Source: http://idp.bl.uk)
During the period of the first centuries BCE, the impact of Iranian becomes important in Tocharian. This is something we know from the relatively large amounts of loanwords in Tocharian from various Iranian languages, beginning with one or several unknown Old Iranian dialects (which are not Avestan or Old Persian) and continuing with loans from various known Middle Iranian languages, such as Khotanese, Sogdian, and Bactrian. As usual with loans, the exact match of the source word is seldom found, meaning that the exact source language cannot be identified.
Iranian loans in Tocharian are interesting from the viewpoint of their semantic domains, which are indicative of the cultural impact of the Iranians on the Tocahrians in Central Asia.

A majority of the words refer to administrative concepts , e.g., titles or specific concepts of merchandise or administration, indicating that the Iranians influenced the Tocharians by imposing an administrative infrastructure. Examples are:
  • Tocharian B waipecce 'possession', from Old lranian, Avestan xʷaēpaiθya­'own'
  • Tocharian B waipte 'separately, apart' < Common Tocharian *wai-pätæ, borrowed probably from an adjective, Old Iranian *hwai­pati in the sense of 'independent, oneself’.
  • Tocharian A pärko, B pärkau 'advantage, profit, interest' < Common Toch. *pärkāwV, borrowed from Old Bactrian, Bactrian φρογαοο 'profit', Old lranian *fragāwa-, Sogdian prγ'w, βry'w, Parthian frg'w 'treasure'.
  • Tocharian A pare, B peri 'debt' < Common Tocharian *pæräī is borrowed from Old Bactrian *pāra > Bactrian paro 'debt, obligation, loan, amount, due'.
  • Tocharian A  āpṣātrik* ‘citizen of a borough or market-town’, borrowed from Old Iranian *αβþαρο < *api-xšaθra- ‘borough, sub-district (of a city)’.
 
Other words clearly refer to military concepts, such as values or terms for weapons:
  • Tocharian B tsain 'arrow' from an Old lranian *dzaina-, Avestan zaēna- 'weapon'.
  • Tocharian A āmāṃ B amāṃ ‘pride, arrogance’, loan from Middle Iranian, cf. Buddhist Sogdian ’’m’n ‘power’.
  • Tocharian A āṣāṃ B aṣāṃ ‘worthy’, borrowed from Middle Iranian, cf. Khotanese āṣaṇa- ‘worthy’.
  • Tocharian A āṣānik B āṣānike ‘venerable, worthy of respect’, loan from Middle Iranian, with same sourse as A āṣāṃ B aṣāṃ
  • A senik ‘care, pledge, guaranteee’, from Middle Iranian *zēnik (Khot. ysīnīta, Sogd. zynyh, Kroraina Prakrit jheniya-)
 
A bunch of words refer to farming and the household. Examples are:
  • Tocharian AB ās ‘she-goat’, borrowed from Middle Iranian.
  • Tocharian A kātak* B kattāke ‘master of the house, householder’, from Common Tocharian *kāttākǝ borrowed from Middle Iranian, cf. Khotanese ggāṭhaa, itself borrowed from Middle Indic, cf. Gāndhārī Prakrit *ghahaṭha, from Sanskrit gṛhastha-.
  • Tocharian A miṣi B miṣṣe, miṣṣi ‘field’, borrowed from Khotanese mäṣṣa, miṣṣa ‘field for seed’.
 
 A small amount of words are Buddhist terms (normally, the impact of Sanskrit is enormous on both Tocharian languages here). Examples are:
  • Tocharian A pissaṅk ‘community of monks’, from Middle Iranian from Skt. bhikṣusaṃgha-  ‘Mönchsgemeinde, Mönchsorden’ (SWTF III:298b), cf. Khotanese bisaṃga-.
 
Finally, we have a group of words referring to plants and ingrediants which are unfamiliar to the Tocharian fauna (also here, Sanskrit loans are much more common). Examples are:
  • Tocharian A kārāś B karāśe* Via TB from Khotanese karāśśa ‘climbing plant’.
  • Tocharian A kuñcit B kwäñcit, kuñcit, from Khotanese kuṃjsata- ‘sesame’.

In conclusion, the Iranian impact on Tocharian is mainly pre-Buddhist, referring to concepts of administration, warfare, and farming. With the change to Buddhism, the impact of Old and Middle Aryan becomes completely dominating in both Tocharian languages.
 
The words have been extracted from these sources:
Carling Gerd (to appear). A Dictionary and Thesaurus of Tocharian A. Complete Edition. In collaboration with Georges-Jean Pinault. Wiesbaden: Harrassowitz (610p.).
Carling, Gerd (2005). Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language. In: Jones-Bley, Karlene, Huld, Martin E., Volpe, Angela Vella,  Dexter, Miriam Robbins Proceedings of the Sixteenth Annual UCLA Indo-European Conference. Journal of Indo-European Studies. Monograph Series 50, 47-70.
 
These sources have many references to works by, e.g., Georges-Jean Pinault, K T Schmidt, Werner Winter, Nicholas Sims-Williams, Harold Bailey, L Isebaert, Jörundur Hilmarsson.
Läs hela inlägget »
Words for Easter in Eurasian languages, defined by their meaning. Words for Easter in Eurasian languages, defined by their meaning.
This week, also known as the Holy Week, is part of the holiday that in English goes by the name of Easter. Easter, which is celebrated throughout all of the Judaeo-Christian world, is one of the most important festivities of the year, marking the beginning of spring or summer and the resurrection of Christ. Like most Christian holidays, the roots of Easter go back into pagan times. In particular in Northern Europe, many of the mysterious habits of an ancient spring festival have survived until today. Children chase an unvisible easter hare, which puts candy-filled eggs in the grass. Birch twigs are compiled, taken indoors, and ornamented with painted eggs and feathers. Children also dress as witches or 'easterhubbies' (the difference is whether you wear a scarf or a hat), painting their faces with red dots, and go from door to door asking for candy. Afterwards, they are supposed to fly on their brooms to Brocken. Fires and fireworks are lit, and, most importantly, enormous quantities of egg, fish, meat, and candy are consumed.

So, which are the terms we use for this festival? Most languages have form of the Greek (via Latin) word paskha, itself borrowed from Aramaic (Hebrew Pesach), meaning 'passover'. The West Germanic terms, such as English Easter and German Ostern, go back to a Common Germanic goddess of spring, Old English Eastre, which is identical to the Indo-European goddess of dawn *h2éus-ōs (Sanskrit uṣās, Latin aurōra). Other languages have words that in various ways relate to the basically biblical rituals of Easter, including 'sacrificial animal', 'taking of the meat', 'resurrection', 'great day' or 'great night', or 'liberation'.

Just as with the Christmas words (see http://www.gerdcarling.se/i/a32842142/2018/12/), the map of meanings of Easter unveil important information about various cultural spheres, as well as exceptions in the form of islands of different usage.

With this little etymological overview I would like to wish you all a Happy Easter!

Sources:
Lubotsky, Alexander. Brill Online Dictionaries: Indo-European Etymological Dictionaries Online (https://dictionaries-brillonline-com.ludwig.lub.lu.se/iedo). Accessed 2019-04-17.
Troels-Lund 1932. Dagligt liv i Norden på 1500-talet. VII Årets fester. Stockholm: Bonniers.
Andersson et al 1968. Kulturhistoriskt Lexikon för Nordisk Medeltid XIII. Malmö: Allhems förlag.

I thank Ante Petrović for assistance with compiling/checking data for the Easter map.

Wikipedia has an excellent overview of names of Easter: https://en.wikipedia.org/wiki/Names_of_Easter
Läs hela inlägget »
Heatmap of frequency of source and target language of loan events in our data, defined by language power and population size (from 1-5). Graph  by Johan Frid. Heatmap of frequency of source and target language of loan events in our data, defined by language power and population size (from 1-5). Graph by Johan Frid.

In the previous blogpost, I started a compilation of safe loans from and into Tocharian. I will continue this work in the next post. In this post, I will talk about loan directionality, since I am currently completing a paper (with several co-authors) on lexical borrowability in Eurasian languages. I want to say a few word about this project.

We have compiled and extracted all loan events in the lexical database, and tested various statistical measures on this data. Worth noticing is the directionality of loans in contrast to language power as well as the differential source languages of the families. As I have described in recent posts, our data set on lexical data compiles culture concepts, i.e., words for farming, technology, hunting, and war, which have a presumed age that go at least back to the Chalcolithic. This means that this vocabulary is not representative for the entire lexicon, only these specific domains. Loans are also extended over long periods, at least back to antiquity. If we look at the source languages, we notice that they differ between families. In Indo-European, Latin is most frequent, followed by Middle Low German, French, Old French, Slavic, Classical Greek. In Caucasian, Turkic languages dominate, followed by Persian, Georgian, and Arabic. In Uralic, Scandinavian languages dominate, which is mainly due to the fact that our Fenno-Ugric languages dominate in our data (see pictures below).

The correlation between loan directionality and language power and populations size is also noteworthy. We define the power of languages by a quantitative rank based on several features, including literary power, economic power and population size. This we plot against the occurence as source and target language in loan events (see graph above). All languages are equally likely to be target languages, but the most powerful languages are more likely to be source languages. This is a significant correlation. The most frequent loan event is from a very powerful language to a very weak. The second most frequent language is from a medium powerful to a weak. The third most frequent loan is from a medium powerful to a medium powerful language. In scrutinizing the data, we observe that this type of loan event is almost entirely restricted to the middle ages, which is also an interesting result. Unequality between languages seems to be specific to the antique and modern periods, whereas language contact in the middle ages was more distributed between languages of equal power.

Graph illustrating the most frequent source languages in Indo-European (top), Caucasian (middle), and Uralic (bottom) families.
Graph illustrating the most frequent source languages in Indo-European (top), Caucasian (middle), and Uralic (bottom) families.
Läs hela inlägget »

Highlighted publications

loading...

LAB/infrastructure

Welcome to visit the infrastructure and lab DiACL. All data is open access and free of use to everyone!