Historical linguistics blog - even weekends

2019

What is the relation between universal patterns, frequency of words and forms, and language evolution and change? This is a question that is very little researched. What is the relation between universal patterns, frequency of words and forms, and language evolution and change? This is a question that is very little researched.
I have decided to move the updating of this blog to even weekends instead of Thursdays. Thursday is very often an extremely busy day, with no time left to update or complete blogposts for publication.

In this blogpost I will continue the previous topic of principles of language change. In historical linguistics, the pricinple of the particular status of the most frequent words and grammatical forms of language is well known. The most frequent lexemes and grammatical categories are more resistant to change. Lexemes, such as kinship words, body parts, numerals, fire, water, liver, and so forth, typically preserve more archaic paradigms, that may resist change for millenia. The most frequent adverbials and particles even resist phonological erosion and change. The most frequent verbs, such as 'to be' or 'to become', are typically irregular, and archaic inflection patterns and archaic categories, such as tenses, modalities, and aspectual categories, survive in these verbal stems. On the other hand, less frequent words, such as various verbs, nouns, and adjectives, are much more frequently impacted by analogy and other types of changes that harmonize and simplify language structures, making them more easy to memorize.

However, few studies investigate this from an evolutionary perspective, using phylogenetic methods. As shown by Pagel et al (2007) there is a correlation between lexical substitution and frequency in basic vocabulary. The most frequent words have generally lower substitution rates.

Frequency is very important in explaining cross-linguistic universal patterns, among others in morphological marking hierarchies in languages. More frequent categories, such as singular (in relation to plural), agent (in relation to object), present (in relation to past), are unmarked in relation to the categories, which are marked. This theory, known as the markedness theory (which has a lot of exceptions in languages) can to a large degree be explained by frequency (Greenberg 1966, Croft 1993, 2003).

In a current study I wanted to investigate the correlation between frequency and change rates of grammar, focusing on the Indo-European family. I compiled a sample of grammatical categories of word order, nominal morphology, verbal morphology and tense and organised the properties into hierarchical pairs according to the properties of present < past, pronoun < noun, agent < object, and masculine/feminine < neuter, which are well-known, universal, hierarchial relations, observed from a large number of languages. By means of an evolutionary model (performed by Chundra Cathcart), where transititions rates between property states over a tree were were reconstructed, we extracted the average number of transitions (per 1000 years) between each grammatical property in our data. 
When the results were split up into pairs of marking hierarchy, as mentioned above, it turned out that the rates of change in the lower categories (i.e., the less frequent ones from a universal perspective), was higher. The rates of the higher categories (i.e., more frequent ones from a universal perspectives), was lower. The difference was statistically significant (p=>0.005). Even if this study is based on one family (Indo-European), 149 languages and about 100 properties only, it seems likely that frequency impacts language change also in the grammar. This explains why more frequent grammatical categories preserve more archaic patterns over time.

Text has been updated 2019-03-11
Läs hela inlägget »
Marking hierarchies of grammatical properties observed in the literature. After (Bickel 2008; Comrie 1981; Croft 2003; Dixon 1979) Marking hierarchies of grammatical properties observed in the literature. After (Bickel 2008; Comrie 1981; Croft 2003; Dixon 1979)
I am currently travelling, so this blogpost will only very briefly discuss the topic of my current research in grammar reconstruction: the role of marking hierarchies in language change.
The notion of marking hierarchies has it roots in the markedness theory by Roman Jakobsen and implies that grammatical categories (e.g., singular - plural) typically are in a mutual, hierarchical relation, where one of the categories are morphologically unmarked, whereas the other is morphologically marked. The unmarked category thus has a higher position within a hierarchy of grammatical properties (singular < plural). These grammatical relations are, according to some authors, general, or "universal", anchored in our in-born grammatical system. However, we know that this is a problematic notion: there are a substantial amount of languages where the actual morphological marking contradict the proposed markedness hierarchies. Further, not all languages have morphology. Morphological marking alone cannot be the identifyer of marking hierarchies.
On the other hand, there is an obvious connection between the observed marking hierarchies and frequency. Superior categories, "unmarked" in the traditional markedness theory, are more frequently used in speech and in text. Again, the definion may be problematic, since not all languages have corpora that enable a detailed study of category frequency. Also, marking hierarchies based on frequency may contradict marking hierachies based on general morphological marking observations.
My current study on grammar reconstruction, which I have been writing about in several blogposts, indicate a clear correlation between change rates and marking hierarchies: superior categories, which are more frequent in grammar and most likely to be unmarked grammatically, have substantially lower change rates (and slower pace of change) than inferior categories, which have higher change rates (and faster pace of change). I will continue and follow up this topic in a coming blogpost. 

References
Bickel, Balthasar (2008), 'On the scope of the referential hierarchy in the typology of grammatical relations', in G. Corbett Greville and Michael Noonan (eds.), Case and Grammatical Relations. Studies in honor of Bernard Comrie (Amsterdam - Philadelphia: John Benjamins), 191-210.
Croft, William (2003), Typology and universals (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Comrie, Bernard (1981), Language universals and linguistic typology : syntax and morphology (Oxford: Blackwell).
Dixon, Robert M V (1994), Ergativity [Elektronisk resurs] (Cambridge: Cambridge University Press).
--- (1997), The Rise and Fall of Languages [Elektronisk resurs].
--- (2010a), Basic linguistic theory. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
--- (2010b), Basic linguistic theory [Elektronisk resurs]. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
Läs hela inlägget »
Tocharian B text THT 496, in cursive script, containing a literary poem, "Love letter". From CEToM database. Tocharian B text THT 496, in cursive script, containing a literary poem, "Love letter". From CEToM database.
This blogpost will give an overview of my popular lecture earlier this week on the role of patterns in syntax, grammar and literature for the deciphering of ancient languages (link to the lecture below, in Swedish).

My own experience on ancient language deciphering is basically restricted to Tocharian. On the other hand, Tocharian texts can be very difficult to understand, in particular if parallel text in Sanskrit, Khotanese, or Uighur (the most frequent translation languages for Tocharian) are absent.

Deciphering of ancient languages basically uses three instruments: script, language (lexicon and grammar), and literature. Reading the script is fundamental to understanding the content, and also in a phase where the content of a manuscript is known, there is often reason to go back to the manuscript and check the reading, which may open for new interpretations and renewed understanding of content of the text. In case of Tocharian, the script (North-Turkestanic Brahmi script) is relatively well known, even though there are some Tocharian B texts in cursive script that are very complex and difficult to interpret. On the other hand, almost all Tocharian texts are fragmentary in some aspects (burned, broken, etc.), which means that lacunae have to be completed and reconstructed. Parts of this reconstruction is to interpret the chacacters at manuscript edges, which may be cut or damaged. This indicates that even if the script is known, the work of a philologist still implies a substantial amound of manuscript reading.

Interpreting lexicon and grammar may imply substantial problems, if the language is not well known. In the case of Tocharian, the broken contexts, again, create large difficulties when we study syntax. Morphology is easier: paradigms can be established and reconstructed from forms found in texts, and there are few missing forms in the context of grammar forms in Tocharian. However, syntactic constructions require a larger corpus of complete sentences, and in a language such as Tocharian, there are often problems of finding enough complete sentences (that are not restored) for certain constructions, for instance in combination with a specific verb.
The lexicon has its own difficulties. In a language like Tocharian, the absence of close relatives is a problem (Tocharian descends immediatly from the Indo-European proto-language). If an unknown word is found in a text, we may assume a meaning based on the meaning of a presumed cognate in another Indo-European language. However, the connection to the presumed cognate may be a complete mistake and instead the meaning of the lexeme, as well as the etymology, is something entirely different.

This brings us over to the third category, literature. Besides script, literature is probably the most important of the instruments  mentioned at the beginning of this text. The exact meaning of words, which form the basis for a correct interpretation of a text, is highly related to the possibility of "proving" the content by a parallel or bilingual text. Most Tocharian texts are translations from Sanskrit, but besides that, Tocharian had its own literary tradition. Therefore, the exact source of a text can be difficult to trace. Some texts do not have any source texts at all. Since Tocharian, like any other literary language, is constrained by its literary tradition, the identificaiton of parallel patterns in, e.g., Sanskrit literary sources, are highly important to a proper understanding of the content and a correct translation of the lexical meanings and the syntax.

Link to a public lecture at Filosolficirkeln, Lund, about deciphering ancient languages.
Läs hela inlägget »
This week's blogpost will continue the thread about grammatical reconstruction, with some thoughts on lineage versus areality in grammar change. 

In general, change of grammar is supposedly cyclic (or spiralic according to some researchers): over time, typological organization of features in systems recur of are re-established. We may look at this issue both from a long-term and a short term perspective. One thing for a feature is the inherent possibility to be homologous (a simirlarity may depend on inheritance only) or homoplastic (a similarity may depend on internal or areal pressure, caused by various factors). Another thin is whether a similarity is caused by areal pressure or whether it is caused by lineage. A construction or a feature may be indicative of all of all these processes. For instance, a feature like word order is by nature homoplastic (similarities in word order may be due to areal or internal pressure, such as change in order of meaningful elements), but even then, a word order feature may be due to lineage: it has been inherited by ancestry generation after generation, or it is a critial innovation restricted to a specific sub-branch of a tree. Take for instance the verb-initial order in Celtic languages: it is likely that this feature is caused by interal pressure in the verbal paradigm (McCone 1987). Because of this, verb-initiality is a features which is restricted to the Celtic sub-branch and therefore a homologuous innovation of this specific branch, not caused by areal pressure. The feature is entirely independent of other Eurasian verb-initiality. Another example is the Germanic have-perfect. It is a homoplastic typological feature (expressing perfect by an auxiliary construction), which still uses the same cognate root as the auxiliary, the verb *haban. The process took place independently in all Germanic languages, due to parallel drift and possible areal pressure. As before, it is difficult to distinguish areality from lineage.

Very interesting is the process of Indo-European alignment change, from the proto-language to the daughter branches. It is quite evident that the reconstructed language bears morphological traces of a semantic-based system, similar to active-stative systems, as has been suggested by several scholars (Bauer 2000). But does it mean that Proto-Indo-European was an active language? Probably not. This concerns the question of stability of systems in general versus language-internal variation in tendencies to other systems. Indo-European alignment took three pathways of change, towards ergativity in the South-East, nautral marking in the West, and a preservation of the ancient system in between (roughly). What is the areal pressure component here, and what changes are dependent on internal procedures in languages, and what is the role of the residual morphology? These are questions that remain to be answered. 

McCone, Kim (1987), The early Irish verb (Kildare: Maynooth). 
Bauer, Brigitte (2000), Archaic syntax in Indo-European : the spread of transitivity in Latin and French (Trends in linguistics. Studies and monographs, 99-0115958-X ; 125; Berlin: Mouton de Gruyter).
Läs hela inlägget »
Berthold Delbrück (1842-1922) Berthold Delbrück (1842-1922)
The current post is about something that I am involved in right now: the reconstruction of grammar. In comparative linguistics, grammar can be reconstructed to a proto-language on the basis of the forms and functions in daughter languages. For instance, if there is a dative case in several languages with a specific marker that can be reconstructed to the joint proto-language, and this form has the function of dative in all languages, then it is also likely the the function of this marker was a dative also in the proto-language. However, the reality is often much more complex than that. Often, the function of a marker is different in various daughter languages: in our case above, we may have genitive or ablative instead of dative, and since we don't know if a genitive is more likely to become a dative or the other way round, we cannot reconstruct a the original, proto-language function of this specific marker. The problem is known as the "correspondence problem" and is a matter of controversy in syntactic reconstruction in general (Roberts 2007) (see picture below). 
The issue is particularly prominent in the reconstruction of Proto-Indo-European syntax, where many categories of the ancient languages, such as Sanskrit, Tocharian, and Greek, are absent in Anatolian, which, on the other hand, has a high number of other categories considered to be highly archaic.

In recent years, scholars have tried to approach this problem by using evolutionary and phylogenetic methods (Marutis and Griffith 2014, Dunn et al 2014, Cathcart et al 2018). The probability of presence of a specific feature at ancestral nodes is estimated, based on gains (1 -> o) and losses (0 -> 1) of features over a reference tree (lexical or hand-crafted). As expected, the method requires some adjustment to get reliable and reproducable results. One of them is to treat grammatical properties as logically dependent (which is a very tricky and complex matter), the other one is to use ancestry and clade constraints of trees, in order to avoid unecessary noice in the results.

However, even if evolutionary and phylogenetic methods are much more sophisticated than traditional methods in terms of amounts of data and number of calculations, the principle of the programs is based on the same problem as observed in the correspondence problem. If most of the daughter languages have specific property, then it is likely that this property was there also in the proto-language. If there is a rooted outgroup with another function, then the probability of presence of this function at the proto-language state is increased.

Currently, I am working with a dataset for Indo-European, which reconstructs probabilites of grammatical features to be present at the ancestral state of Proto-Indo-European (statistics has been performed by Chundra Cathcart, University of Zurich). The results are astonishing: with very few exceptions, the program reconstructs high probabilities for grammar features that were reconstructed to Proto-Indo-European by the Neogrammarians (Brugmann & Delbrück 1893, 1897, 1900). The reconstruction of Proto-Indo-European grammar by the Neogrammarians was done before the discovery of Hittite and Tocharian, which changed the preconditions for the typological reconstruction of the proto-language grammar to a high degree. Even if Tocharian and Anatolian is there in the data, this does not change the Neogrammarian reconstruction of Proto-Indo-European grammar. I will have reason to come back to this issue in further blogposts.  

References:
Brugmann, Karl, Delbrück, Berthold, and Delbrück, Berthold (1893), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 3, Vergleichende Syntax der indogermanischen Sprachen, T. 1 (Strassburg: Trübner).
--- (1897), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 4, Vergleichende Syntax der indogermanischen Sprachen, T. 2 (Strassburg: Trübner).
--- (1900), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 5, Vergleichende Syntax der indogermanischen Sprachen, T. 3 (Strassburg: Trübner).
Cathcart, Chundra, et al. (2018), 'Areal pressure in grammatical evolution.', Diachronica, 35 (1), 1-34.
Dunn, Michael, et al. (2017), 'Dative Sickness: A Phylogenetic Analysis of Argument Structure Evolution in Germanic', Language: Journal of the Linguistic Society of America, 93 (1), e1-e22.
Harris, Alice C. and Campbell, Lyle (1995), Historical syntax in cross-linguistic perspective (Cambridge studies in linguistics, 0068-676X ; 74; Cambridge: Cambridge Univ. Press).
Maurits, Luke and Griffiths, Thomas L. (2014), 'Tracing the roots of syntax with Bayesian phylogenetics', Proceedings of the National Academy of Sciences, 111(37), 13576-81.
Roberts, Ian G. (2007), Diachronic syntax (Oxford textbooks in linguistics, 99-2380132-2; Oxford: Oxford University Press).
The principle of evolutionary reconstruction. Gains and losses are measured against a reference tree (lexical/hand-crafted), resulting is a probability of presence at ancestral nodes.
The principle of evolutionary reconstruction. Gains and losses are measured against a reference tree (lexical/hand-crafted), resulting is a probability of presence at ancestral nodes.
Representation of the correspondence problem. In the figure at the top, A is more likely than B, but in the figure below, B is more likely, despite A being more frequent. This principle is applied by evolutionary methods.
Representation of the correspondence problem. In the figure at the top, A is more likely than B, but in the figure below, B is more likely, despite A being more frequent. This principle is applied by evolutionary methods.
Läs hela inlägget »

Highlighted publications

LAB/infrastructure

Welcome to visit the infrastructure and lab DiACL. All data is open access and free of use to everyone!