Skip to content

Add new option in debug tool to get a histogram of key and value sizes.#2844

Merged
martinmr merged 3 commits intomasterfrom
martinmr/histogram
Dec 27, 2018
Merged

Add new option in debug tool to get a histogram of key and value sizes.#2844
martinmr merged 3 commits intomasterfrom
martinmr/histogram

Conversation

@martinmr
Copy link
Copy Markdown
Contributor

@martinmr martinmr commented Dec 25, 2018

The new option can be accessed via the --histogram flag. It will print a
histogram of the key and value sizes, as well as the min, max, mean and
standard deviation of their distributions.


This change is Reviewable

@martinmr martinmr requested a review from manishrjain December 25, 2018 02:41
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 25, 2018

CLA assistant check
All committers have signed the CLA.

@martinmr
Copy link
Copy Markdown
Contributor Author

Example output below (fyi, it looks justified in the terminal but not when I pasted it here):

Opening DB: ./p
badger2018/12/25 02:41:37 INFO: Replaying file id: 2 at offset: 60334520
badger2018/12/25 02:41:37 INFO: Replay took: 21.41µs
prefix =
Found 1036362 keys

Histogram of key sizes (in bytes)
Min value: 6
Max value: 33
Mean: 23.51
Standard deviation: 5.69
Range Count
[ 0, 32) 910747
[ 32, 64) 125615

Histogram of value sizes (in bytes)
Min value: 0
Max value: 2432
Mean: 37.71
Standard deviation: 45.32
Range Count
[ 0, 32) 635819
[ 32, 64) 261183
[ 64, 128) 131757
[ 128, 256) 5804
[ 256, 512) 850
[ 512, 1024) 690
[ 1024, 2048) 250
[ 2048, 4096) 9

Copy link
Copy Markdown
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 8 unresolved discussions (waiting on @martinmr and @manishrjain)


dgraph/cmd/debug/run.go, line 82 at r1 (raw file):

	flag.BoolVarP(&opt.keyHistory, "history", "y", false, "Show all versions of a key.")
	flag.StringVarP(&opt.pdir, "postings", "p", "", "Directory where posting lists are stored.")
	flag.BoolVar(&opt.sizeHistogram, "histogram", false, "Show a histogram of the key and value sizes.")

100 chars.


dgraph/cmd/debug/run.go, line 530 at r1 (raw file):

// Return a new instance of HistogramData with properly initialized fields.
func NewHistogramData(bounds []float64) *HistogramData {
	histogram := new(HistogramData)

Can also be written as:

return &HistogramData{
Bounds: bounds,
Min: math.MaxInt64,
}

Slightly easier to read.


dgraph/cmd/debug/run.go, line 539 at r1 (raw file):

// Update the Min and Max fields if value is less than or greater than the
// current values.
func (histogram *HistogramData) UpdateMinMax(value int64) {

Also, find the right bucket, and +1 it.


dgraph/cmd/debug/run.go, line 555 at r1 (raw file):

// Simple exporter function that copies the view data to the exporter.
func (exporter HistogramExporter) ExportView(viewData *view.Data) {

Remove all this.


dgraph/cmd/debug/run.go, line 611 at r1 (raw file):

func sizeHistogram(db *badger.DB) {
	txn := db.NewTransactionAt(math.MaxUint64, false)

Use opt.readTs instead of maxuint64.


dgraph/cmd/debug/run.go, line 628 at r1 (raw file):

	keySizeMeasure := stats.Int64("debug/key_sizes",
		"The size of the keys in bytes", "1")
	keySizeView := &view.View{

Get rid of the opencensus stuff.


dgraph/cmd/debug/run.go, line 673 at r1 (raw file):

		keySize := int64(len(item.Key()))
		valueSize := item.ValueSize()
		keySizeHistogram.UpdateMinMax(keySize)

.Update


dgraph/cmd/debug/run.go, line 684 at r1 (raw file):

	// printing.
	sleepDuration, _ := time.ParseDuration("500ms")
	time.Sleep(sleepDuration)

time.Sleep(500*time.Millisecond), or better still time.Sleep(time.Second).

Comment thread dgraph/cmd/debug/run.go Outdated
var (
Debug x.SubCommand
opt flagOptions
keySizeViewName = "debug/key_size_histogram"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keySizeViewName is unused (from varcheck)

Comment thread dgraph/cmd/debug/run.go Outdated
Debug x.SubCommand
opt flagOptions
keySizeViewName = "debug/key_size_histogram"
valueSizeViewName = "debug/value_size_histogram"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueSizeViewName is unused (from varcheck)

Copy link
Copy Markdown
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 82 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.


dgraph/cmd/debug/run.go, line 530 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Can also be written as:

return &HistogramData{
Bounds: bounds,
Min: math.MaxInt64,
}

Slightly easier to read.

Done.


dgraph/cmd/debug/run.go, line 539 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Also, find the right bucket, and +1 it.

Done.


dgraph/cmd/debug/run.go, line 555 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove all this.

Done.


dgraph/cmd/debug/run.go, line 611 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Use opt.readTs instead of maxuint64.

Done.


dgraph/cmd/debug/run.go, line 628 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Get rid of the opencensus stuff.

Done.


dgraph/cmd/debug/run.go, line 673 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

.Update

Done.


dgraph/cmd/debug/run.go, line 684 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

time.Sleep(500*time.Millisecond), or better still time.Sleep(time.Second).

Done . Got rid of all the opencensus stuff so this is no longer needed.


dgraph/cmd/debug/run.go, line 43 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

keySizeViewName is unused (from varcheck)

Done.


dgraph/cmd/debug/run.go, line 44 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

valueSizeViewName is unused (from varcheck)

Done.

Comment thread dgraph/cmd/debug/run.go Outdated

var opt flagOptions
var (
Debug x.SubCommand
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is not goimports-ed (from goimports)

@martinmr
Copy link
Copy Markdown
Contributor Author

I compared the output with the previous version using opencensus to verify that they are the same.

Copy link
Copy Markdown
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 11 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 41 at r3 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not goimports-ed (from goimports)

This was due to a formatting issue. I formatted the file and the warning went away.

Copy link
Copy Markdown
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 11 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 41 at r3 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

This was due to a formatting issue. I formatted the file and the warning went away.

Done.

The new option can be accessed via the --histogram flag. It will print a
histogram of the key and value sizes, as well as the min, max, mean and
standard deviation of their distributions.
Copy link
Copy Markdown
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Looks good. Got a few comments. Address them and feel free to merge.

Reviewed 1 of 1 files at r5.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @manishrjain, @golangcibot, and @martinmr)


dgraph/cmd/debug/run.go, line 562 at r5 (raw file):

	fmt.Printf("Min value: %d\n", histogram.Min)
	fmt.Printf("Max value: %d\n", histogram.Max)
	fmt.Printf("Mean: %.2f\n", histogram.Mean)

calculate the mean here.


dgraph/cmd/debug/run.go, line 617 at r5 (raw file):

		item := itr.Item()

		keySize := int64(len(item.Key()))

keySize := ...
keyHistogram.Update(keySize)

valSize := ...
valHistogram.Update(valSize)

Define the variable as close as possible to usage.


dgraph/cmd/debug/run.go, line 620 at r5 (raw file):

		valueSize := item.ValueSize()
		keySizeHistogram.Update(keySize)
		valueSizeHistogram.Update(valueSize)

valueSizeHistogra

Copy link
Copy Markdown
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @manishrjain and @golangcibot)


dgraph/cmd/debug/run.go, line 562 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

calculate the mean here.

Done.


dgraph/cmd/debug/run.go, line 617 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

keySize := ...
keyHistogram.Update(keySize)

valSize := ...
valHistogram.Update(valSize)

Define the variable as close as possible to usage.

Done.


dgraph/cmd/debug/run.go, line 620 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

valueSizeHistogra

Done.

@martinmr martinmr merged commit 320fb1c into master Dec 27, 2018
@martinmr martinmr deleted the martinmr/histogram branch December 27, 2018 23:53
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
…s. (dgraph-io#2844)

The new option can be accessed via the --histogram flag. It will print a
histogram of the key and value sizes, as well as the min, max, mean and
standard deviation of their distributions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants