Skip to content

Conversation

jmoralez
Copy link
Contributor

Hi, thanks for writing this awesome package, it really helped me grasp the idea of the collapsed gibbs sampler. Here's my attempt to give back to it.

The current implementation of the initial assignments of LDA iterates through the document-term matrix by rows and not taking into account the sparse nature of it, which makes it very slow in some circumstances (~50 minutes for a 800,000 x 20,000 case). I've modified the loop to exploit the sparse structure of the matrix by iterating through the non-zero rows of each column, this achieves a substantial improvement (the 800,000 x 20,00 case goes down to ~2 minutes).

@aviks aviks merged commit ec4c1e8 into JuliaText:master Jul 7, 2020
@jmoralez jmoralez deleted the lda_init branch July 9, 2020 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants