-
Notifications
You must be signed in to change notification settings - Fork 27
Implementation of Deep Belief Networks #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,9 @@ | |||
using HDF5, Boltzmann | |||
|
|||
f = h5open("/Users/jfsantos/.julia/v0.3/Mocha/examples/mnist/data/train.hdf5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's separate package for this dataset - MNIST.jl - so no need to read data manually. The only possible detail is that you may need to scale X
to a range [0..1]
, since RBMs behave really bad with values larger than 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that. The HDF file generated by the referenced script is already normalized in the correct range.
Thanks for contributing it! DBNs are a logical continuation of RBMs, but I never had time to implement them. Few details that I was thinking of:
|
Regarding 1, I think it is definitely important. We can write an improved constructor and I am trying to contribute to Mocha as well, and was thinking about adding replaceable backends to your RBM implementations. Basically, the computing-intensive functions from layers get a Backend instance as an argument, and dispatch depending on the type of this argument (e.g., you will have It would be interesting to add some integration to Mocha, even though their "philosophies" are a bit different (which makes sense, as training algorithms for RBMs and belief networks are a bit different from those used for feed forward nets). We could start automating the process of performing unsupervised training of a DBN and then converting it to an MLP for supervised fine-tuning. This is exactly what I am doing now for my project, so I'll see if I can come up with a draft implementation. |
Implementation of Deep Belief Networks
@jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory. |
Sure, I think that is the way to go, as MNIST.jl already includes the data and does not require manually running a script as Mocha.
|
Hi, I'm the author of Mocha. I agree that some integration of the two packages will be really nice for the community. For example, the immediate thing I could think of is to use Boltzmann.jl to initialize weights for DNN that get fine-tuned in Mocha.jl. I think this should be relatively straightforward if you export the trained weights to HDF5 file and ask Mocha to load that weights as initialization. Mocha is already using this kind of mechanism to load models trained by Caffe. The HDF5 file Mocha reads has a simple format: see here: http://mochajl.readthedocs.org/en/latest/user-guide/tools/import-caffe-model.html#mocha-s-hdf5-snapshot-format Of course, we could discuss about the data format if needed. :) |
@pluskid I believe HDF5 will work fine. I'll have a long weekend starting from Thursday to spend on learning Mocha (finally) and try to implement this kind of exporting. Meanwhile, is there an example of converting Julia arrays to Mocha-compatible 4D tensor? |
@dfdx Starting with the last version (v0.0.5) Mocha actually support ND-tensor. And an ND-tensor (Blob) is essentially a (shallow wrapper of a) Julia array For example, the weight blob of a InnerProduct layer is a 2D-tensor (matrix) of the shape 'P-by-Q', where P is input dimension, and Q is the target dimension. So essentially If you are interested, there is a bit document about Blob (ND-tensors) in Mocha: http://mochajl.readthedocs.org/en/latest/dev-guide/blob.html |
I've added export to Mocha as a part of DBN redesign |
This is a cleaned up version of my DBN implementation using the RBMs in Boltzmann.jl. For now it is a really simple extension, as it justs adds a new type
DBN
and a function to fit it, as well as a helper function to compute the mean of the hiddens at a given layer. The user can only change the type of the first RBM because in most of the applications I've seen all the upper layers are Bernoulli RBMs, but this can be easily changed if needed.I also added an example that uses the MNIST dataset to train it. The HDF5 file is generated by this script from the Mocha.jl package. I think we could test it with a simpler dataset and add that test at a separate folder (e.g.,
examples/
).