fix(dgraph): Fix out of order issues with split keys in bulk loader. by martinmr · Pull Request #6083 · dgraph-io/dgraph

martinmr · 2020-07-28T00:27:16Z

Split keys for indexes can cause out-of-order issues due to the variable length of the term inside the key.
To fix the issue, the split keys are written to a temporary DB first (using the writebatch to avoid the out
of order issues) and then copied to the main p directory.

Related to DGRAPH-1897

This change is

jarifibrahim · 2020-07-31T15:17:30Z

+func (r *reducer) createTmpBadger() *badger.DB {
+	tmpDir, err := ioutil.TempDir(r.opt.TmpDir, "split")
+	x.Check(err)
+	db := r.createBadgerInternal(tmpDir)


createbadgerInteral creates a badger instance with compression enabled. I don't think we need compression for temporary DB.

parasssh

but get Balaji to review as well.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @balajijinnah, @manishrjain, @martinmr, @parasssh, and @vvbalaji-dgraph)

martinmr

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @vvbalaji-dgraph)

dgraph/cmd/bulk/reduce.go, line 151 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

createbadgerInteral creates a badger instance with compression enabled. I don't think we need compression for temporary DB.

Done.

manishrjain

Didn't check the logic carefully, but looks alright overall.

Reviewed 2 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @martinmr, and @vvbalaji-dgraph)

dgraph/cmd/bulk/reduce.go, line 86 at r2 (raw file):

			splitWriter := tmpDb.NewManagedWriteBatch()

			ci := &countIndexer{reducer: r, writer: writer, splitWriter: splitWriter, tmpDb: tmpDb}

move this to multiple lines.

dgraph/cmd/bulk/reduce.go, line 428 at r2 (raw file):

		// value log from growing over the allowed limit.
		if splitBatchLen >= maxSplitBatchLen {
			x.Check(writer.Flush())

Do you need to explicitly call flush? I don't understand this -- so maybe think carefully here.

martinmr

Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @vvbalaji-dgraph)

dgraph/cmd/bulk/reduce.go, line 86 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

move this to multiple lines.

Done.

dgraph/cmd/bulk/reduce.go, line 428 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do you need to explicitly call flush? I don't understand this -- so maybe think carefully here.

I cannot use a single writebatch or else the value log will be too big. So instead I am using multiple ones. According to the badger comments, Flush needs to be called to ensure that any pending writes are written to badger. Since the next line creates a new writer, the writer should be flushed here.

…6083) Split keys for indexes can cause out-of-order issues due to the variable length of the term inside the key. To fix the issue, the split keys are written to a temporary DB first (using the writebatch to avoid the out of order issues) and then copied to the main p directory. Related to DGRAPH-1897 (cherry picked from commit 2a3b85c)

batch and write split keys in different streams

b064ccb

github-actions Bot added the area/bulk-loader Issues related to bulk loading. label Jul 28, 2020

martinmr added 9 commits July 28, 2020 11:00

Fix bug in append.

cdab964

remove import

c474c01

Merge branch 'master' into martinmr/bulk-split

adfb530

only append to the split batch if the request contains split lists.

245b02d

Merge branch 'master' into martinmr/bulk-split

fdf7101

Write split lists in tmp DB.

cd808e9

Cleanup.

6a1d211

use multiple write batches.

7e5265a

Add more comments.

6ce7fc3

martinmr changed the title ~~fix(dgraph): batch and write split keys in different streams during bulk load.~~ fix(dgraph): Fix out of order issues with split keys in bulk loader. Jul 30, 2020

move tmpdb and stream writer pointers to countIndexer.

859500d

martinmr requested review from a team, ashish-goswami, parasssh and poonai July 30, 2020 18:12

martinmr marked this pull request as ready for review July 30, 2020 18:13

martinmr requested review from manishrjain and vvbalaji-dgraph as code owners July 30, 2020 18:13

fix deepsource warnings.

6a9c75c

jarifibrahim reviewed Jul 31, 2020

View reviewed changes

parasssh approved these changes Jul 31, 2020

View reviewed changes

martinmr added 2 commits July 31, 2020 12:16

Merge branch 'master' into martinmr/bulk-split

4b74244

Disable compression in temporary badger.

af02588

martinmr commented Jul 31, 2020

View reviewed changes

manishrjain reviewed Jul 31, 2020

View reviewed changes

manish comments.

2407d69

martinmr commented Jul 31, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into martinmr/bulk-split

5b5cfa7

martinmr merged commit 2a3b85c into master Aug 3, 2020

martinmr deleted the martinmr/bulk-split branch August 3, 2020 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dgraph): Fix out of order issues with split keys in bulk loader.#6083

fix(dgraph): Fix out of order issues with split keys in bulk loader.#6083
martinmr merged 16 commits intomasterfrom
martinmr/bulk-split

martinmr commented Jul 28, 2020 •

edited

Loading

Uh oh!

jarifibrahim Jul 31, 2020

Uh oh!

parasssh left a comment

Uh oh!

martinmr left a comment

Uh oh!

manishrjain left a comment

Uh oh!

martinmr left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

martinmr commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jarifibrahim Jul 31, 2020

Choose a reason for hiding this comment

Uh oh!

parasssh left a comment

Choose a reason for hiding this comment

Uh oh!

martinmr left a comment

Choose a reason for hiding this comment

Uh oh!

manishrjain left a comment

Choose a reason for hiding this comment

Uh oh!

martinmr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

martinmr commented Jul 28, 2020 •

edited

Loading