KPOP Analysis

This project aims to apply NLP techniques used to analyze text to the world of popular music, specifically in the Korean language.

Version 1.0: aims to identify the top-n phrases and words used by the most popular Korean pop songs at any given time. The first iteration of the project was instantiated in July 2019, and the dataset consisted of the lyrics of the Top 100 most viewed Korean popular songs on YouTube.

Methodology

The project used web scraping technology to grab the lyrics and convert them into text files. The data sanitization process included removing all non-essential text, such as verse identifiers and non-Korean words. Furthermore, NLP preprocessing included tokenizing the text and removing stopwords and non-needed tokens such as punctuation.

The NLP performed included performing basic counting of tokens and a frequency distribution of the top-N tokens. The N for the project was chosen at random (n=40), and the results should be posted below.

Results

Upcoming

The later versions of this project aim to implement more features of NLP analysis such as part-of-speech tagging, stemming and lemmatization, and several forms of analysis such as LSA (latent semantic analysis) and LDA (latent dirichlet allocation). In the future, we hope to build a system that may be capable of automatically generating lyrics to songs in Korean based on the data we collect and analyze here.

We hope to understand the patterns and structures of music that tends to be popular to identify the particular features of any given track that are the cause of its popularity. This project could serve as a guide to differentiate songs based on popularity.

Contact

All questions, suggestions, code reviews, and feature suggestions are welcome! Please contact me at the following email: amoghdambal@utexas.edu

Disclaimer

This project is not the work of professional developers. The team behind this project consists of students, attempting to learn and polish their CS skills, especially in the area of NLP. Mistakes will be made accordingly, and we ask for any and all polite and helpful corrections so we can improve more quickly and provide more accurate, up-to-date, and more powerful analysis. Thank you :)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
data		data
img		img
.DS_Store		.DS_Store
README.md		README.md
get_lyrics.py		get_lyrics.py
html_manager.py		html_manager.py
main.py		main.py
nlp.py		nlp.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KPOP Analysis

Methodology

Results

Upcoming

Contact

Disclaimer

About

Uh oh!

Releases

Packages

Languages

amogh-dambal/kpop_freq

Folders and files

Latest commit

History

Repository files navigation

KPOP Analysis

Methodology

Results

Upcoming

Contact

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages