Skip to content

Nikis14/Rus_summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rus_summarizer. Text summarization tools for Russian language

This repository contains algorithms for extracrive summarization of texts in Russian language.

The thesis and presentation are availble in description folder (here and here).

The algorithms are based on 2 approaches:

  1. TextRank.
  2. Sentence clustering using K-Means.

There were several models of text feature extraction under study:

  1. Bag of words + TF-IDF.
  2. FastText (pretrained model from DeepPavlov lib).
  3. RuBERT (pretrained model from DeepPavlov lib).
  4. RuSBERT (pretrained model from DeepPavlov lib).
  5. MlSBERT (self-trained model using Sentence BERT for English).

The research showed that the best algorithm for summarization is "Mixed" (based on the union of TextRank algorithm and MlSBERT_KMeans).

All algorithms are in the folder "src/Rus_summarizers".

About

Text summarization tools for Russian language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages