linguistics blog

2019 > 09

Heatmap of the genders alternans, commune, feminine, masculine, and neuter of semantic classes of lexemes in Indo-European lexical data (104 concepts, 105 languages). Heatmap of the genders alternans, commune, feminine, masculine, and neuter of semantic classes of lexemes in Indo-European lexical data (104 concepts, 105 languages).




This week's blog post will deal with a complex topic: gender assignment.
As I have described in a previous post, gender involves a classification of nominal entities in language. Gender can generally be defined as classes of nouns which are reflected in the behaviour of associated words (Corbett 1991: 1). That is, gender is indicated by agreement of various elements. Gendered languages have varying number of genders present and they vary with respect to assignment, or how individual lexical items receive a gender (Audring 2014, 2017). Some languages assign gender based on semantic principles (semantic assignment systems), in which gender reflects categories such as biological sex or animacy. Other languages have formal assignment systems, which can be divided into morphological and phonological assignment (Corbett 1991: 7-8). Thus, gender assignment may be guided by semantic qualities (e.g., male/female, level of abstractness, shape), by morphological criteria (e.g., stem formation, inflection class, derivational suffixes), or by phonological criteria (e.g. word-final vowels or consonants). Languages may use semantic factors only, or a combination of semantic and formal factors, but all gender languages have a some semantic core (Corbett 1991: 8).

When looking at gender assignment in Indo-European culture vocabulary (the 100-culture list of our database, consisting of 8,500 gender- and cognacy-coded lexical items), some interesting tendencies emerge. We cannot investigate the phonological and morphological assignment principles on the data in its current shape (words in languages have not ben coded for morphology or phonology), but many other interesting tendencies can be extracted from the data.
First, the total distribution of genders of lexical items in the data is straightforward as masculine<feminine<neuter<alternans (see below). This is also reflected in the timeline of evolution of genders (see below), where we see that the masculine dominates in the early period, but weakens during the antique period and then regains strength during the first and in particular the second millenia ACE, on behalf of the feminine and in particular the neuter.  
We code all concepts for various semantic properties listed in the literature as important for gender assignment, such as animacy, collectiveness, countability, sexus, concreteness, and form/shape. In addition, we divide gender by different concepts classes, which we conclude by patterns of colexification and semantic change in the data.
We find that animated concepts (animals in our data) are significantly associated with the masculine gender (we compile both male and female forms of animals, but the overrepresentation of masculine for the general terms is important in the data). Further, we find that collectives as well as concepts coded as materials are significantly associated with the neuter gender. Our data does not contain abstract nouns, but surprisingly, we find that sharp and sticking implements are significantly associated with the feminine gender.
These tendencies for semantic properties undelie the overrepresentation of particular genders in certain semantic classes, which can be seen in the heatmap of gender distribution in relation to different classes above. In this heatmap, which divides concepts into classes, we can observe that neuter is overrepresented for metals and materials and drink and drugs, masculine is overrepresented for all animals, feminine is overrepresented for weapons, trees and insects (honeybee). This indicates that assignment is not just caused by semantic property, it is very likely also caused by semantic class, but more research and data is required to prove this assumption.

Audring, Jenny (2014), 'Gender as a complex feature', Language Sciences, 43, 5-17.
--- (2017), 'Calibrating complexity: How complex is a gender system?', Language Sciences, 60, 53-68.
Carling, Gerd (2019), Mouton Atlas of Languages and Cultures. Vol. 1: Europe, Caucasus, Western and Southern Asia (Berlin - New York: Mouton de Gruyter).
Corbett, Greville G. (1991), Gender (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
--- (2014), The expression of gender [Elektronisk resurs] (Berlin ;: De Gruyter Mouton).
Corbett, Greville G. and Fraser, Norman M. (2000), 'Gender assignment: a typology and a model', in Gunter Senft (ed.), Systems of Nominal Classification (Cambridge: Cambridge University Press), 293-325.
Corbett, Greville G. and Fedden, Sebastian (2016), 'Canonical Gender', Journal of Linguistics, 52 (3), 495-531.
Van Epps, Briana 2019. Sociolinguistic, comparative and historical perspectives on Scandinavian gender: With focus on Jamtlandic. PhD dissertation, Lund.
 
Distribution of the genders alternans, commune, neuter, feminine, and masculine in the dataset (lexemes of 104 concepts in 105 Indo-European languages)
Distribution of the genders alternans, commune, neuter, feminine, and masculine in the dataset (lexemes of 104 concepts in 105 Indo-European languages)
Timeline of gender distribution in the lexical dataset (by Briana Van Epps).
Timeline of gender distribution in the lexical dataset (by Briana Van Epps).
Läs hela inlägget »
MCA (Multi Correspondence Analysis plot of the typological gender data. Graph by Marc Tang MCA (Multi Correspondence Analysis plot of the typological gender data. Graph by Marc Tang
Evolutionary reconstruction of gender in Indo-European is a highly interesting field. The subject is a perfect testbed for how well evolutionary methods generally work. The core issue is that the system that we reconstruct to Proto-Indo-European, a system with a commune/neuter distinction, which has developed into a sexus-based system (masculine/feminine/neuter) in most daughter branches, is preserved only in Anatolian (Hittite, Luwian), the oldest attested Indo-European branch. However, in Scandinavian and Dutch/Frisian, a commune/neuter system has re-emerged as a merger of a previous three-gender system. Therefore, on the surface, Anatolian and Scandinavian are similar, as we see from the MCA plot above, which indicates the synchronic similarities of Indo-European gender systems based on attested languages. However, the similarity between Scandinavian, Frisian/Dutch and Hittite/Luwian is an illusion, or - to use evolutionary terminology -  an example of homoplasy. The background and the functionality of the different systems are completely different. How can we make evolutionary methods account for this difference in the reconstruction reconstruct?
This is where we can test how well different models perform. Experiments (performed by our colleagues Chundra Cathcart, Harald Hammarström, and Marc Tang) indicate that the result of an evolutionary reconstruction are similar to the model of a comparative reconstruction (even if the the method, of course, is completely different). What we want the evolutionary reconstruciton to produce is a high probability of masculine/neuter at the root (i.e., Proto-Indo-European) and a lower probability of a feminine.
In experimenting with the data and different models, we find that the most important thing is the shape of the tree. For Indo-European, we get different results if we use a branched vs non-branched tree, if we use Indo-Anatolian vs non-Indo-Anatolian, if we use ancestry constraints vs. non-ancestry-constraints (ancestral languages are situated on the branches of trees, not 'cousins' to the living language). As for the model, we get different results depending on if we us an Markov Chain Monte Carlo model, which is basically constructing a chain that has a desired distribution as its equilibrium distribution, where one can obtain a sample of the desired distribution by recording states from the chain. A Dollo model has as its precondition that a system never returns exactly to its previous state, but it keeps trace of intermediate stages through which it passes. A Dollo model with and Indo-Anatolian tree produces a reconstruction which looks almost similar to Anatolian. However, more experimenting needs to be performed: obviously, it is necessary to have a correct tree of a family before an evolutionary reconstruction can be performed. But different models of reconstruction may be better than others, depending on how they deal with the problem of homoplasy and parallel drift.

 
Läs hela inlägget »

Highlighted publications

loading...

LAB/infrastructure

Welcome to visit the infrastructure and lab DiACL. All data is open access and free of use to everyone!