Skip to content

Implementation of Deep Belief Networks #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 16, 2014
Merged

Implementation of Deep Belief Networks #3

merged 2 commits into from
Dec 16, 2014

Conversation

jfsantos
Copy link
Contributor

This is a cleaned up version of my DBN implementation using the RBMs in Boltzmann.jl. For now it is a really simple extension, as it justs adds a new type DBN and a function to fit it, as well as a helper function to compute the mean of the hiddens at a given layer. The user can only change the type of the first RBM because in most of the applications I've seen all the upper layers are Bernoulli RBMs, but this can be easily changed if needed.

I also added an example that uses the MNIST dataset to train it. The HDF5 file is generated by this script from the Mocha.jl package. I think we could test it with a simpler dataset and add that test at a separate folder (e.g., examples/).

@@ -0,0 +1,9 @@
using HDF5, Boltzmann

f = h5open("/Users/jfsantos/.julia/v0.3/Mocha/examples/mnist/data/train.hdf5")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's separate package for this dataset - MNIST.jl - so no need to read data manually. The only possible detail is that you may need to scale X to a range [0..1], since RBMs behave really bad with values larger than 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that. The HDF file generated by the referenced script is already normalized in the correct range.

@dfdx
Copy link
Owner

dfdx commented Dec 13, 2014

Thanks for contributing it! DBNs are a logical continuation of RBMs, but I never had time to implement them. Few details that I was thinking of:

  1. We should be able to pass parameters to RBM constructors (and possibly individual fit() calls on them). Probably the easiest way to achieve this will be to pass initialized layers similar to Pipeline from SciKit Learn.
  2. Though this package is not intended to be superseded or replaced by Mocha.jl (e.g. I have successfully used pure RBM on sparse data for recommendation engine, which Mocha is really not designed for), some integration with it is really welcome. I especially like their replaceable backends, which simplify writing code for CPU and GPU a lot. On other hand, as far as I know, they still miss belief networks, and we can fix it. Right now I'm more busy with some classification algorithm packages, but taking a closer look at Mocha is definitely on my TODO list.

@jfsantos
Copy link
Contributor Author

Regarding 1, I think it is definitely important. We can write an improved constructor and fit functions to do this. The fit function for DBNs could take a list of arguments to be passed to each layer's fit call, for example.

I am trying to contribute to Mocha as well, and was thinking about adding replaceable backends to your RBM implementations. Basically, the computing-intensive functions from layers get a Backend instance as an argument, and dispatch depending on the type of this argument (e.g., you will have forward(b::CPUBackend, layer, X) and forward(b::GPUBackend, layer, X)). We could do pretty much the same thing for RBMs.

It would be interesting to add some integration to Mocha, even though their "philosophies" are a bit different (which makes sense, as training algorithms for RBMs and belief networks are a bit different from those used for feed forward nets). We could start automating the process of performing unsupervised training of a DBN and then converting it to an MLP for supervised fine-tuning. This is exactly what I am doing now for my project, so I'll see if I can come up with a draft implementation.

dfdx added a commit that referenced this pull request Dec 16, 2014
Implementation of Deep Belief Networks
@dfdx dfdx merged commit 0e925aa into dfdx:master Dec 16, 2014
@dfdx
Copy link
Owner

dfdx commented Dec 16, 2014

@jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory.

@jfsantos
Copy link
Contributor Author

Sure, I think that is the way to go, as MNIST.jl already includes the data and does not require manually running a script as Mocha.

On Dec 16, 2014, at 5:30 PM, Andrei Zhabinski [email protected] wrote:

@jfsantos https://github.com/jfsantos If you don't mind, I changed test to use MNIST package instead of loading file from Mocha directory.


Reply to this email directly or view it on GitHub #3 (comment).

@jfsantos jfsantos mentioned this pull request Dec 20, 2014
4 tasks
@pluskid
Copy link

pluskid commented Dec 21, 2014

Hi, I'm the author of Mocha. I agree that some integration of the two packages will be really nice for the community. For example, the immediate thing I could think of is to use Boltzmann.jl to initialize weights for DNN that get fine-tuned in Mocha.jl. I think this should be relatively straightforward if you export the trained weights to HDF5 file and ask Mocha to load that weights as initialization. Mocha is already using this kind of mechanism to load models trained by Caffe. The HDF5 file Mocha reads has a simple format: see here: http://mochajl.readthedocs.org/en/latest/user-guide/tools/import-caffe-model.html#mocha-s-hdf5-snapshot-format

Of course, we could discuss about the data format if needed. :)

@dfdx
Copy link
Owner

dfdx commented Dec 22, 2014

@pluskid I believe HDF5 will work fine. I'll have a long weekend starting from Thursday to spend on learning Mocha (finally) and try to implement this kind of exporting. Meanwhile, is there an example of converting Julia arrays to Mocha-compatible 4D tensor?

@pluskid
Copy link

pluskid commented Dec 23, 2014

@dfdx Starting with the last version (v0.0.5) Mocha actually support ND-tensor. And an ND-tensor (Blob) is essentially a (shallow wrapper of a) Julia array Array{Float64, N} (or Float32). So if you are talking about: you have a julia array, and want to save to an HDF5 file so that Mocha can read, then there is no conversion need. Except that Mocha only support either Float32 or Float64 because BLAS only support those.

For example, the weight blob of a InnerProduct layer is a 2D-tensor (matrix) of the shape 'P-by-Q', where P is input dimension, and Q is the target dimension. So essentially rand(Float64, (P,Q)) could possibly be a valid initialization for the weights parameters.

If you are interested, there is a bit document about Blob (ND-tensors) in Mocha: http://mochajl.readthedocs.org/en/latest/dev-guide/blob.html

@dfdx
Copy link
Owner

dfdx commented Jan 3, 2015

I've added export to Mocha as a part of DBN redesign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants