Computational Models

  1. Bingo
  2. Phonaesthemes
  3. Truncation

1. Bingo

I am bad at bingo, and wanted to model how long a game of bingo lasts so I could know how much more pain I would have to endure before the next round.


2. Phonaestheme network visualizer [prototype]

[available offline here]

This d3 visualization was written to be a web-friendly interface for part of my dissertation research that looks at phonaesthemes (e.g. gl-, sn-, sl-) — also called sound symbolism — and their relationship to prototypical morphemes (specifically prefixes in this case). There is a general sense that like conventional morphemes (e.g. re-, un-), phonaesthemes have some connection between phonological form and semantic meaning; however, they do not seem to be as consistent or morphologically productive as these conventional morphemes.

In my research, I propose that there is no fundamental difference between morphemes and phonaesthemes, contra standard views in the literature, claiming that the same decompositional algorithms that are needed to learn conventional morphemes are capable of finding phonaesthemes, though the latter have a weaker association between form and meaning.

In modeling this, I constructed semantic networks utilizing a thesaurus (Project Gutenberg Etext of Moby Thesaurus II by Grady Ward): words that share more synonyms in the thesaurus are considered more semantically related to each other. Each node in the graph is a word beginning with a two-letter string — some are attested phonaesthemes, some are conventional prefixes, many are junk — and words are connected by an edge if they are related to each other (edge weight not currently displayed). A quick pass at finding communities was calculated utilizing a NetworkX module  for Python.

The graphs show a network of relatedness among words that share a common initial string. The prediction is for conventional morphemes (e.g. un-, re-) to have larger, more related networks, indicating greater semantic relatedness among words beginning with those prefixes, junk strings to have relatively unrelated networks, and phonaesthemes (e.g. gl-) to lie somewhere in the middle. A morphological learner could then associate with varying strength the shared phonological form with a dominant community within the semantic network of words that share that form.


3. (interactive) Truncated stem predictor [coming soon]

[Project code & materials available on my collaborator’s github]

Based off my research on predicting truncated stems in Brazilian Portuguese (vagabunda –> vagaba), where the truncated stem is predicted using a model that optimizes deletability of uninformative right edge material and informativity of left edge material: i.e. delete as much as you can without preventing recovery of the original word. The model is driven by empirical data and word frequencies, rather than strict phonological theory.
The interactive predictor will be an app that predicts a truncated stem of a given word input.