fix(dgraph): Fix out of order issues with split keys in bulk loader.#6083
fix(dgraph): Fix out of order issues with split keys in bulk loader.#6083
Conversation
| func (r *reducer) createTmpBadger() *badger.DB { | ||
| tmpDir, err := ioutil.TempDir(r.opt.TmpDir, "split") | ||
| x.Check(err) | ||
| db := r.createBadgerInternal(tmpDir) |
There was a problem hiding this comment.
createbadgerInteral creates a badger instance with compression enabled. I don't think we need compression for temporary DB.
parasssh
left a comment
There was a problem hiding this comment.
but get Balaji to review as well.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @balajijinnah, @manishrjain, @martinmr, @parasssh, and @vvbalaji-dgraph)
martinmr
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @vvbalaji-dgraph)
dgraph/cmd/bulk/reduce.go, line 151 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
createbadgerInteral creates a badger instance with compression enabled. I don't think we need compression for temporary DB.
Done.
manishrjain
left a comment
There was a problem hiding this comment.
Didn't check the logic carefully, but looks alright overall.
Reviewed 2 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @martinmr, and @vvbalaji-dgraph)
dgraph/cmd/bulk/reduce.go, line 86 at r2 (raw file):
splitWriter := tmpDb.NewManagedWriteBatch() ci := &countIndexer{reducer: r, writer: writer, splitWriter: splitWriter, tmpDb: tmpDb}
move this to multiple lines.
dgraph/cmd/bulk/reduce.go, line 428 at r2 (raw file):
// value log from growing over the allowed limit. if splitBatchLen >= maxSplitBatchLen { x.Check(writer.Flush())
Do you need to explicitly call flush? I don't understand this -- so maybe think carefully here.
martinmr
left a comment
There was a problem hiding this comment.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @vvbalaji-dgraph)
dgraph/cmd/bulk/reduce.go, line 86 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
move this to multiple lines.
Done.
dgraph/cmd/bulk/reduce.go, line 428 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Do you need to explicitly call flush? I don't understand this -- so maybe think carefully here.
I cannot use a single writebatch or else the value log will be too big. So instead I am using multiple ones. According to the badger comments, Flush needs to be called to ensure that any pending writes are written to badger. Since the next line creates a new writer, the writer should be flushed here.
…6083) Split keys for indexes can cause out-of-order issues due to the variable length of the term inside the key. To fix the issue, the split keys are written to a temporary DB first (using the writebatch to avoid the out of order issues) and then copied to the main p directory. Related to DGRAPH-1897 (cherry picked from commit 2a3b85c)
…6083) Split keys for indexes can cause out-of-order issues due to the variable length of the term inside the key. To fix the issue, the split keys are written to a temporary DB first (using the writebatch to avoid the out of order issues) and then copied to the main p directory. Related to DGRAPH-1897 (cherry picked from commit 2a3b85c)
Split keys for indexes can cause out-of-order issues due to the variable length of the term inside the key.
To fix the issue, the split keys are written to a temporary DB first (using the writebatch to avoid the out
of order issues) and then copied to the main p directory.
Related to DGRAPH-1897
This change is