Exploring the relation between semantic complexity and quantifier distribution in large corpora

Jakub Szymanik, Camilo Thorne

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

In this paper we study if semantic complexity can influence the distribution of generalized quantifiers in a large English corpus derived from Wikipedia. We consider the minimal computational device recognizing a generalized quantifier as the core measure of its semantic complexity. We regard quantifiers that belong to three increasingly more complex classes: Aristotelian (recognizable by 2-state acyclic finite automata), counting (k+2-state finite automata), and proportional quantifiers (pushdown automata). Using regression analysis we show that semantic complexity is a statistically significant factor explaining 27.29% of frequency variation. We compare this impact to that of other known sources of complexity, both semantic (quantifier monotonicity and the comparative/superlative distinction) and superficial (e.g., the length of quantifier surface forms). In general, we observe that the more complex a quantifier, the less frequent it is.

Original languageEnglish
Pages (from-to)80-93
Number of pages14
JournalLanguage Sciences
Volume60
DOIs
StatePublished - Mar 1 2017
Externally publishedYes

Keywords

  • Analysis of deviance
  • Corpus analysis
  • Generalized linear regression models
  • Generalized quantifiers
  • Semantic complexity

Fingerprint

Dive into the research topics of 'Exploring the relation between semantic complexity and quantifier distribution in large corpora'. Together they form a unique fingerprint.

Cite this