Project Title

DSC160 Data Science and the Arts - Final Project - Generative Arts - Spring 2020

Project Team Members:

Nikolas Racelis-Russell, [email protected]
Iakov Vasilyev, [email protected]
Cameron Shaw, [email protected]

Abstract

Generative music is not a new idea, and has been around as early as 1989. However, the use of neural networks for this creation has not been popularized until recently (Yang et al., 2017). This project plans to use neural networks to generate music from MIDI jazz files. However this presents many challenges, as melody can be generated, but structure is much harder to generate due to variance. We wanted to compose music using neural nets to attempt to generate melody from MIDI files scraped off the internet.

Right now even the best computer-generated music is not good enough to be considered an actual source of entertainment, and the examples that come close are usually heavily stylized compositions. In the future, however, there is a chance that machine-produced entertainment will rival that of human origins, and we tried to see how close we can get to that point with the current algorithms and levels of technology.

Data and Model

(10 points)

Models:

Our WaveNet Adaptation (There's 3 different versions we tested). We adapted the traditional WaveNet architecture, which uses raw waveform data, and changed it to accept string sequences of integers representing notes. The model uses 1D Dilated Causal Convolutional Layers. The most important aspect about this is the dilation, which covers for the low receptive field of the convolutions by exponentially increasing it inside the hidden layers.
- WaveNet Paper

GANs. The idea of a GAN is to have a generator model creating data and a discriminator model classifying it as real/fake trying to outsmart each other. This directly speaks to our attempts at making the music human-like, after all, real/fake classification is the means and the goal. GANs have been shown to produce incredible results in image generation, however, traditional GANs struggle with data directionality, which is one of the main features of music generation. Therefore, the descriminator has to rely on sequential neural networks, usually LSTMs for discrimination. We did not get any satisfying results with simpler GAN models, so our theory was that the generator does not get enough attention. An example of a well-built GAN model is MuseGAN, which utilizes three different approaches to note generation, as well as a layer of bar generation. Sadly, we could not train it on our data as the preprocessing was done very specifically for the dataset the creators used.
- MuseGAN paper

Performance RNN. Magenta's performance RNN uses a LSTM, Long-Short Term Memory recurrent neural network. The point of which is to retain the memory from previous training steps of the neural network and to use those steps later on in the line, something that some other neural networks lack. The performance RNN models notes in a similar way as a midi file. It represents notes with a starting pitch and ending pitch event, a time shift event, and velocity. These events are used to represent a note and its particular dynamics for modeling.

Training Data:

Maestro Dataset. A dataset released by Magenta that has over 200 hours of piano music, and is in midi format.
- Maestro sample 1
Video game midis. A bunch of only piano midi files
- Video game sample 1
Schubert. Random piano midi dataset
- Schubert sample 1

Code

(20 points)

WaveNet:

Scraping video game music Crawls a link for more download links found on page.
Base Model. First iteration of the model, covers processing of midi files, and then runs baseline WaveNet Model originally ran on maestro dataset.
WaveNet v2. Second iteration of the model, this time adds in removal of notes occuring less than X times, and also changes hyper parameters of the model in an attempt to fix generative process. Trained on videogame music.
WaveNet Mini Failed experiment where we tried to see if having a very small dataset that would become overfit would produce decent results.

GANs:

Models. The two GAN Models are in the GANs.zip folder. The Untitled notebook in the midi-lstm-gan subfolder can be run to train on the Maestro dataset.

Magenta's performance_rnn:

PerformanceRNN. In order to properly set up the magenta environment, it was set up according to their qualifications here. This code was then run once the environment was complete.

Results

(30 points)

WaveNet:

Version 1 (maestro) Sample. In this version we can melody forming, but at the time this was the first sample that wasn't just the same note over and over again. So to remedy this we wanted to try a new dataset and change the model from its baseline, which is where the next sample comes in. -Version 1 (schubert failed sample. From trying out the model trained on a very small data set named schubert, but not much good came from this model, barely any nice melody from the samples.
Version 2 (video game) Sample. With this sample we can hear the heavy videogame music influence, as it sounds kind of similar to some Final Fantasy title screen music (about 20 out of 800ish data samples were from final fantasy). This was a definite improvement from the base model, possibly due to the generative process changes from tuning hyperparameters linked from this code notebook.

Muse_GAN:

midi-lstm-gan after 3000 epochs. A constant stream of almost random notes, this model seemed to perform the worst out of all the other ones, the main reason being the music21 midi parser failing to parse the Maestro files correctly.
midi-lstm-gan loss per epoch graph. This graph shows the convergence of the discriminator and generator losses at a somewhat high number (x-axis is 100s of epochs), signifying, most likely, underfit data.

Magenta's performance_rnn:

first run after 300 steps, on a fraction of the dataset. An incoherent stream of notes following the seed stream of notes. Not only was the model trained on only a fraction of the dataset, it was only trained for 300 steps. Like a monkey slamming it's hands on the keyboard.
second run after 3000 steps, on the same dataset. Still incoherent, but the timing and musicality is a little more recognizably musical. The monkey listened to 48 hours worth of Mozart, and now thinks itself an artist.
third run using Magenta's pre-trained model. Nothing close to being a real song, but the timing and musicality is close. The note progression needs work, but it is much better. Our monkey has been going to classes for the past 3 months, and this is its first recital.

Discussion

Advancements in the field of generative art have been quite spectacular. With music, the best advancements were made by applying specific models to specific datasets, and, as our experience shows, the models malfunction when presented with music of less structure or different genres. The reason for that, we believe, is that our models ended up underfit. The algorithms tried to extract patterns from many (but not too many) very different samples, and, limited by the memory and power of our PCs, we were definitely not feeding them enough data for any clear patterns to solidify. Of course, one solution would be to give the models even less data, which would cause overfitting. Specifically for this task, overfitting would mean the algorithms would produce very similar music to the chosen set which would make it sound good, however, would it even be new music at that point? Immitating and following rules are tasks that computers excel at anyway, so there is no point in having nice sounding results if they come from overfitting. To avoid both underfitting and overfitting we would have had to feed way more data to the models than we possibly could. Trying different models shed some light on the performance issues and advantages of certain underlying neural nets. The three most popular algorithms in music generation turned out to be LSTMs, GANs, and Encoder-Decoder networks. LSTMs seem to be the most widely used networks as they capture both short and long term dependencies very well, which is important for music as there needs to be consistency not only withing each bar, but also between bars withing phrases, as well as between the phrases themselves. Furthermore, music often relies on set verse-chorus structures, and, hopefully, LSTMs can take care of that as well. GANs and Encoder-Decoders usually play secondary roles in music generation, but are also quite useful and important for more advanced models.

So what would an ideal music generative model look like? The ideal model would encompass all possible differences of music, and it is hard to tell what we are currently lacking. On the model side, it would be perfect if we could utilize some type of a “music theory of everything”, although the point of applying neural networks to this task is to let the models figure that theory out. Therefore, the problem is on the data side, in the data itself and the way it is processed. The idea is human listeners rely on more than note information for music appreciation, for example we expect different instruments to play different parts, different genres to have different structures, and so on. On top of that, the idea of “enjoyment” is unquantifiable, so there seems to be no real measure of how well a model is doing without a human supervisor, and even then the opinion is subjective. Ideally, we would have compact data that would encompass a lot of information, including (but not limited to): note info (midi covers that pretty nicely), bar info, instrument info, genre info, and maybe even some type of sentiment info. The good news is all of these measurements are achievable to some capacity, and a GAN discriminator could assume the role of an objective human supervisor, so we believe that it would be possible to create a near-perfect dataset and train a near-perfect model that could create catchy human-like music for everyone to enjoy.

Team Roles

Nikolas Racelis Russell: WaveNet model
Iakov Vasilyev: Muse_GAN model
Cameron Shaw: Magenta performance_RNN model

Technical Notes and Dependencies

For WaveNet and GANs:

Tensorflow-gpu (latest version as of 6/11/20)
music_21
scikit-learn
numpy

For Magenta(requires many different packages but mainly): -tensorflow most dependencies are taken care of after running the environment set up.

Reference

MidiNET, CNN-GAN: https://opus.lib.uts.edu.au/bitstream/10453/6862/1/2004001187.pdf
Combining theory between RNN and LSTM: http://cs229.stanford.edu/proj2016/report/Lou-MusicGenerationUsingNeuralNetworks-report.pdf
Easy to understand example: https://medium.com/@leesurkis/how-to-generate-techno-music-using-deep-learning-17c06910e1b3
Videogame music with one instrument (lots of music theory): https://www.youtube.com/watch?v=UWxfnNXlVy8&feature=youtu.be
DeepMusic: https://github.com/llSourcell/How_to_generate_music_in_tensorflow_LIVE
DeepBach: https://arxiv.org/abs/1612.01010
WaveNet Paper: https://arxiv.org/pdf/1609.03499.pdf?utm_source=Sailthru&utm_medium=email&utm_campaign=Uncubed%20Entry%20%2361%20-%20April%203%2C%202019&utm_term=entry
WaveNet Architecture Adaption: https://www.analyticsvidhya.com/blog/2020/01/how-to-perform-automatic-music-generation/
Videogame MIDIS: https://www.vgmusic.com/
Scraping code: https://github.com/x4nth055/pythoncode-tutorials/tree/master/web-scraping/link-extractor
Magenta repository: https://github.com/magenta/magenta
Magenta project homepage: https://magenta.tensorflow.org/
Maestro dataset: https://magenta.tensorflow.org/datasets/maestro

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
code		code
data		data
git_img		git_img
results		results
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Title

Abstract

Data and Model

Code

Results

Discussion

Team Roles

Technical Notes and Dependencies

Reference

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

ucsd-dsc-arts/dsc160-final-dsc160-final-group19

Folders and files

Latest commit

History

Repository files navigation

Project Title

Abstract

Data and Model

Code

Results

Discussion

Team Roles

Technical Notes and Dependencies

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages