Modify loop in initial assignments of lda to use sparse structure. #213

jmoralez · 2020-06-22T03:22:55Z

Hi, thanks for writing this awesome package, it really helped me grasp the idea of the collapsed gibbs sampler. Here's my attempt to give back to it.

The current implementation of the initial assignments of LDA iterates through the document-term matrix by rows and not taking into account the sparse nature of it, which makes it very slow in some circumstances (~50 minutes for a 800,000 x 20,000 case). I've modified the loop to exploit the sparse structure of the matrix by iterating through the non-zero rows of each column, this achieves a substantial improvement (the 800,000 x 20,00 case goes down to ~2 minutes).

modify loop in initial assignments of lda to use sparse structure.

8d2967d

aviks merged commit ec4c1e8 into JuliaText:master Jul 7, 2020

jmoralez deleted the lda_init branch July 9, 2020 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modify loop in initial assignments of lda to use sparse structure. #213

Modify loop in initial assignments of lda to use sparse structure. #213

Uh oh!

jmoralez commented Jun 22, 2020

Uh oh!

Uh oh!

Modify loop in initial assignments of lda to use sparse structure. #213

Modify loop in initial assignments of lda to use sparse structure. #213

Uh oh!

Conversation

jmoralez commented Jun 22, 2020

Uh oh!

Uh oh!