DSC160 Data Science and the Arts - Final Project - Generative Arts - Spring 2020
Project Team Members:
-
Byungheon Jeong, [email protected]
-
Sabrina Ho, [email protected]
-
Xinrui Zhan, [email protected]
-
Yiheng Ye, [email protected]
With the number of confirmed cases of COVID-19 increasing exponentially, we want to present how badly COVID-19 has affected the United States. Nowadays, numerical and graphical representations often lose their descriptive power when it is viewed too often and numbers are too detached from our daily life. So we decided to represent the confirmed COVID-19 cases across the United States in the form of music. We started off by generating audio from the time series data we pulled from the Center of Systems Science and Engineering at John Hopkins University, then applied style transfer to make music from the audios. With this, we hope to output music that can show the growth of COVID-19 confirmed cases since it started.
Some challenges that might arise are that we are not completely confident about the appropriateness of the music generated by the LSTM neural network. And since we do not have large enough dataset that would allow us to build a proper neural network ourselves. Moreover, we are afraid that our machines are not strong enough to run the neural network. Last but not least, there are limited audio style transfer techniques from our research and that we might not be able to produce the music out of it.
-
Time series data: COVID-19 confirmed US cases up till June 4th, acquired from the CSSE at JHU. Database contains date, time, locations including state, longitude, latitude, and confirmed cases, recovery cases, and death cases. (link: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv)
-
Location data: extracted from the time series data mentioned above to do further analysis and to output our final audio.
-
RNN: We explored the RNN network in Magenta and tried several examples they provided (Melody RNN, Music VAE0, and only found that they could not meet our need of making diverse music that represent different areas with limited time-series data. Additionally, some of our local computers cannot afford the computational cost and datahub is having a hard time dealing with Magenta library. Therefore, we decide to add some styles of music manually.
-
Data code: we downloaded the data of COVID-19 in America until 2020/6/4 from the CSSEG repository
-
Preprocessing code: raw_process.ipynb, we split the original data set into 3 part: time_series_data.csv, location.csv, and other_information.csv
We do not have a training model here as the transitional neural networks cannot satisfy our need of making various styles of music in a less-structured form, so we decide to generate our music manually.
-
Generative methods (music):
-
Miditime: 1. We use python miditime library to change our daily increase cases in different areas into sound whose bpm is 120 and instrument is grand piano, and gives different base octaves and scalings for cities in the same state according to their geological location. The standard transformation notebooks for New York City and other areas are: miditime-New York's tone.ipynb, miditime-New York State.ipynb, miditime-Calimix.ipynb, miditime-Texas.ipynb 2. We then use miditime to make the US daily cases growth in total: Midi representation of US County COVID-19 growth
-
Mido: 1. We use python Mido library to change the channels and instruments of our different midi objects generated to make our own music. The notebooks are: Change_midi_instrument.ipynb, Trumpet Style Transfer.
-
We first extracted the target cities' map outline from Google images. Here we used a basic outline detection algorithm and we transformed all pixels inside the outline to gray for a better view experience.
-
For each date, we randomly selected num # (number of confirmed cases on that day) pixels inside of the outline and make it to (120, 20, 20) red.
-
For each city, we will have 133 images in total. Since our corresponding midi is mix of four cities, we concatenate four cities' images into one
-
We transform our list of images into a gif with the FPS rate the same with our MIDI sound, so if we play the gif and the midi simultaneously, each image corresponds to one pitch.
-
Codes are in Generate Gif.ipynb
-
-
Music Output:
a). New York overall cases trend: new_york_overall.mid, this is made by daily overall data in 120 bpm and grand piano, and it becomes higher and higher when cases increase as time moves on.
b) New York daily increase: new_york_daily.mid. This is made by daily increase cases in 120 bpm and grand piano. It gets gradually higher with variations in its second part.
c) New York State daily increase (except New York City): in 120 bpm and grand piano, we made 4 midi sounds that represent 4 different cities in New York State: AL.mid for Albany, SA.mid for Saratoga, ES.mid for Essex, WY.mid for Wyoming. Every midi starts in a different level of base octave and scaling based on its geological location (north cities get higher base octaves): AL.mid is C4 with C natural minor, WY.mid is C5 with C major, SA.mid is C6 with C harmonic minor, and ES.mid is C7 with C melodic minor. After making 4 midi, we mixed them together to make a representation of New York State, which is NYmix.mid. Then, we adjust the instruments for every city to add our styles into the music, and thus we have NYmix_various.mid. The new instrument for each city is music box for Albany, violin for Wyoming, choir aahs for Saratoga, and synthesis choir for Essex.
d) California daily increase: in 120 bpm and grand piano, we made 4 midi sounds that represent 4 different cities in California: SD.mid for San Diego, LA.mid for Los Angeles, SC.mid for Sacramento, and SF.mid for San Francisco. Every midi starts in a different level of base octave and scaling based on its geological location (north cities get higher base octaves): SD.mid is C4 with C natural minor, LA.mid is C5 with C major, SF.mid is C6 with C harmonic minor, and SC.mid is C7 with C melodic minor. After making 4 midi, we mixed them together to make a representation of California, which is calimix.mid. Then, we adjust the instruments for every city to add our styles into the music, and thus we have calimix_various.mid. The new instrument for each city is trumpet for San Diego, violin for Sacramento, church organ for San Francisco, and soundtrack for Los Angeles.
e) Texas daily increase: in 120 bpm and grand piano, we made 4 midi sounds that represent 4 different cities in Texas: AU.mid for Austin, DA.mid for Dallas, EI.mid for El Paso and HS.mid for Houston. Every midi starts in a different level of base octave and scaling based on its geological location (north cities get higher base octaves): HS.mid is C4 with C natural minor, AU.mid is C5 with C major, EI.mid is C6 with C harmonic minor, and DA.mid is C7 with C melodic minor. After making 4 midi, we mixed them together to make a representation of Texas, which is Texas.mid. Then, we adjust the instruments for every city to add our styles into the music, and thus we have Texas_various.mid . The new instrument for each city is electric guitar (clean) for Austin, electric bass (finger) for Dallas, electric piano for El Paso, and synthesis drum for Houston.
f) We then looked at the virus cases per county in the United States. Any time there is an increase in cases, we assign a note by looking at the scale between 1 and the maximum one day case increase in the United States. The note pitch decreases as the case velocity increases. As the number of counties that have increased case number goes up, the intensity of the sounds increases. We then looked at doing a style transfer on that midi file. The trumpet sounded quite good aesthetically so we put it there. The final result of this midi can be found at trumpet.mid, and the original piano file can be found at myfile.mid
-
GIF output:
a. We have two GIF outputs; The new_york.gif represent only the new_york data and is corresponding to the MIDI sound new_york_daily.mid
b. The new_york_state.gif is corresponding to the NYmix.mid
c. There are three more mix mid do not have corresponding gif since the way to generate is packed in the Generate Gif.ipynb. Since this is project mainly focusing on sound generation, we did not include them into the results.
By making the midi sounds of COVID-19 data of different American cities and mixing them by the states they are in, we create a series of music artworks that give people unique insight of this pandemic and help us memorize those victims and the whole pandemic. Those midi files in general can be divided into 3 categories: 1. States midis mixed by multiple cities (calimix.mid, NYmix.mid, Texas.mid and their corresponding ~_various.mid versions); 2. The cities midis, 4 cities for each state and 2 midis for New York City; 3. Total representation of daily new cases in the US. In general, those midi files are repetitive at first half period, showing the silent 2 months of America COVID-19 pandemic, which is a dull but nervous melody. After the first part, some dissonance appears in the midis that are made by daily increase data of certain areas, while the midi made by New York City overall cases trend gets higher and higher. In the end, the notes in state-wide and national-wide midis (Category 1, 3) are jumping out from everywhere, making a chaotic and disorder feeling. The overall case midi is still going up while city daily increase midis get mader and mader. We were worrying about making meaningless sounds at first, but the regular overall trend and the madness that is growing up in our music reflect this pandemic really well and our midis turn out to be possibly impactful.
The whole idea and progress of making those music are quite innovative compared with the traditional process, as it is going from a data science perspective instead of traditional artist perspective. When artists want to express their understanding of certain events, they have to observe those events, convert the observation into their own feelings and eventually make artworks based on their feelings. Our project, however, goes directly from observation to artworks first, and then adjusts the works with our feeling, which provides more objectivity. Additionally, although midi and midi generation based on some programming techniques are not rare in modern music production, making sounds through scaling statistical data and making style adjustments with coding are still a novel thing in the technology perspective.
By representing COVID-19's confirmed cases across the United States in the form of music, we believe that it can deliver the message of how dangerous it is and how we go to the point of pandemic today from the beginning. Music is different from cold numbers. It went beyond representing information. The inside ideas and emotions that music carried around could make itself now delivering ideas and feelings. However, there are limitations: unorganizable music is just a mixture of pitches; They could only harm people instead of transferring the feelings we expected. Awaring of this kind of problem, we tried to make our music more diverse through the shift of instruments. In Texas_various.mid, we add drum sounds in the background to let people feel the intense beat of the pandemic, and the soundtrack in calimix_various.mid provides a weird and terror feeling about California's pandemic. New York City and New York State is the epicenter of this pandemic in America, so we provide much more emphasis in the diversity of delivering the data. We provide both overall case trends and daily increase midis, adding 2 choir melodies to make a hollow feeling, and making gifs to help people with traditional visualization. At last, in order to give an overall feeling of how this pandemic looks like, we made a midi that include all cities' daily increase data into one midi and use a trumpet to play the sound. That midi generated from looking at all the US county case increases was a more overarching representation of the COVID-19 case growth. A surprising result was the amount of (time) distance between the first outbreak and the general start of the pandemic. It was quite interesting and chilling to see what was the birth of something truly truly terrible. We hope those efforts can change the abstract and chaotic sounds into more accessible expressions of the current COVID-19 event.
"A dust of the time becomes a mountain when it lands on an individual's shoulder." Behind the cold data, we see lifes that are just like us. As the pandemic grows, people may get used to the tens of thousands daily increase of new cases. However, we want to give dead data with life, and the cold charts with temperature. Therefore, we bring out this suite of COVID-19 data with the addition of gifs for the New York area. Now you can hear the numbers and see the pandemic developing on this land. By sharing this direct impact to the ones who hear it, we want to ignite people's concern about this ongoing pandemic again and memorize the whole event better. We wish people can remember, if once forget, their sympathy towards those victims when hearing the choirs in NYmix_various.mid, and understand the urgency of the pandemic when the drum is beating in Texas_various.mid. After all, there is something we cannot forget during this time, and our project is intended to help us memorize some of those things, not only the data. That's not only for the people who live in the current, but also for the future.
Throughout the project, we found that style transferring audio has limited techniques and that we did not have much to try out to generate better music. Therefore, in the future we hope to explore more techniques and tools on style transferring audio into music to better express these statistics. Moreover, as mentioned earlier, due to technical issues, we were not able to run RNN to do further analysis and explorations. To expand on this project, we do not just want to explore new techniques and tools, we also wish to collect more data in order to build our own neural network to train our own data. And hopefully we can also apply the concept of this project onto other events or pandemic throughout the history. Last but not least, we want to use data from all over the world so everyone can understand and remember what COVID-19 has done and is doing to us.
-
Byungheon Jeong: Got some of the initial data. Made the midi looking at all of the counties of the United States. Ran that through the style transfer made by Yiheng
-
Sabrina Ho: Made the presentation slides and recorded the final presentation. Wrote abstract, parts of the data and model section, parts of the code section, and discussion section.
-
Xinrui Zhan: Made the Gifs presented in the result section. Wrote parts of the code section, and parts of the results section, and parts of the discussion section.
-
Yiheng Ye: Done the preprocessing of data. Wrote Parts of Data and Modeling, Code, Result, and Discussion sections in the report. Made most of the midi sounds for this project. The sounds are mentioned in a-e paragraphs for music result in the Result section.
Libraries and packages:
-
Miditime: this library plays time series data as music.
- Installation: pip install miditime
-
Mido: this library works with midi files.
- Installation: pip install mido
This project can be run on any platform. It does not require any specific platform to run on.