Skip to content

Glottochronology #1435

@vmonakhov

Description

@vmonakhov

Task: To implement Glottochronology tool.

Theory and realization:

There is 100-words Swadesh list, it contains fundamental words on Russian to research any languages and calculate their relationship (aka. distance). The relationship based on etymological links between Swadesh words within each pair of dictionaries.

The result distance is calculated using the following formula:
distance = sqrt( ln(linked_words / total_words) / -0.1 / sqrt(linked_words / total_words) ), where:

  1. total_words is total number of matching Swadesh words in pair of dictionaries
  2. linked_words is number of etymologically linked words from (1)

Maximal distance by the formula above is 21.46, when linked_words/total_words == 1/100. Possible minimal distance is zero.
There is a hard-coded value distance == 25. It’s used when linked_words and/or total_words are zero. Large distance indicates weak relationship, little distance says about closeness of dictionaries and corresponding languages or dialects.

Result:

a) 2-d constellation, where dots are the dictionaries and distances between them indicate corresponding results by the formula.
b) 3-d constellation. It has the same meaning as 2-d one.
c) Table with each-to-each distances. It shows the calculated distances between corresponding dictionaries.
d) Table with cognates. It presents etymological groups by rows. Every value has the form:
Swadesh_word [phonological_transcription] original_translation_from_dictionary
An element of the table can have more than one such item (aka. synonyms) inside.
e) Table with single Swadesh words by dictionaries. This words have no cognates within the table (d).
f) A link to xlsx-file with the tables (c),(d),(e) in the corresponding worksheets.

Limitations:

About used limitations you can note at the bottom of the modal window. This can be:

g) Hidden tables. If the calculated output is too large, some tables can be hidden. The used limit is 1М html symbols for the tables summary size.
h) Not all the input dictionaries were processed. If an input dictionary has less than 50 Swadesh words, it’s not processed by the tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions