Skip to content

chinmaySinghal/Wikipedia-topic-classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

topic-modeling

Inferring the topics of Wikipedia articles in different languages.
Capstone project, Fall 2019.
Top-3 best capstone poster among 36 teams.

Report | Poster

Research directions

  • Improving the architecture of currently deployed model for English articles.
    • bag-of-words models with fastText embeddings
    • LSTM, LSTM with self attention, LSTM with IDF self attention weights, transformer
  • Transferring the model to articles in other languages (Hindi, Russian).
    • using fastText multilingual word embeddings, we experiment on using model trained only on English articles vs model trained on several languages simulteneously.
  • Exploring language agnostic models based on links between articles.
    • bag-of-words model
    • graph CNN model (GraphSAGE)

Poster

Created by Marina Zavalina, Peeyush Jain, Sarthak Agarwal, Chinmay Singhal in Fall 2019.
Advisors: Isaac Johnson (Wikimedia Foundation), Anastasios Noulas (NYU CDS).
Project for DS-GA 1006, NYU Center for Data Science.

About

Inferring the topics of Wikipedia articles. Capstone project at NYU.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.2%
  • Python 3.8%