Inferring the topics of Wikipedia articles in different languages.
Capstone project, Fall 2019.
Top-3 best capstone poster among 36 teams.
- Improving the architecture of currently deployed model for English articles.
- bag-of-words models with fastText embeddings
- LSTM, LSTM with self attention, LSTM with IDF self attention weights, transformer
- Transferring the model to articles in other languages (Hindi, Russian).
- using fastText multilingual word embeddings, we experiment on using model trained only on English articles vs model trained on several languages simulteneously.
- Exploring language agnostic models based on links between articles.
- bag-of-words model
- graph CNN model (GraphSAGE)
Created by Marina Zavalina, Peeyush Jain, Sarthak Agarwal, Chinmay Singhal in Fall 2019.
Advisors: Isaac Johnson (Wikimedia Foundation), Anastasios Noulas (NYU CDS).
Project for DS-GA 1006, NYU Center for Data Science.
