-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Task: To implement Glottochronology tool.
Theory and realization:
There is 100-words Swadesh list, it contains fundamental words on Russian to research any languages and calculate their relationship (aka. distance). The relationship based on etymological links between Swadesh words within each pair of dictionaries.
The result distance is calculated using the following formula:
distance = sqrt( ln(linked_words / total_words) / -0.1 / sqrt(linked_words / total_words) ), where:
- total_words is total number of matching Swadesh words in pair of dictionaries
- linked_words is number of etymologically linked words from (1)
Maximal distance by the formula above is 21.46, when linked_words/total_words == 1/100. Possible minimal distance is zero.
There is a hard-coded value distance == 25. It’s used when linked_words and/or total_words are zero. Large distance indicates weak relationship, little distance says about closeness of dictionaries and corresponding languages or dialects.
Result:
a) 2-d constellation, where dots are the dictionaries and distances between them indicate corresponding results by the formula.
b) 3-d constellation. It has the same meaning as 2-d one.
c) Table with each-to-each distances. It shows the calculated distances between corresponding dictionaries.
d) Table with cognates. It presents etymological groups by rows. Every value has the form:
Swadesh_word [phonological_transcription] original_translation_from_dictionary
An element of the table can have more than one such item (aka. synonyms) inside.
e) Table with single Swadesh words by dictionaries. This words have no cognates within the table (d).
f) A link to xlsx-file with the tables (c),(d),(e) in the corresponding worksheets.
Limitations:
About used limitations you can note at the bottom of the modal window. This can be:
g) Hidden tables. If the calculated output is too large, some tables can be hidden. The used limit is 1М html symbols for the tables summary size.
h) Not all the input dictionaries were processed. If an input dictionary has less than 50 Swadesh words, it’s not processed by the tool.