Improve Index decoder implementation #893

erizocosmico · 2018-07-18T09:25:27Z

As described in https://docs.google.com/document/d/1I2Vte4LhMdpx0kvlrg-k6hNA-0kUGtUkC1-kfwbCmeI

Index will be an interface with the following methods:

type  Entry  struct {
    Hash plumbing.Hash
    Offset int64
}

type  EntryIter  interface {
    Next() (Entry, error)
    Close() error
}

type  Index  interface {
    Contains(h plumbing.Hash) (bool, error)
    FindOffset(h plumbing.Hash) (int64, error)
    FindCRC32(h plumbing.Hash) (uint32, error)
    Count() (int64, error)
    Entries() (EntryIter, error)
}

First Index implementation will be MemoryIndex and it must be just a readFull from the filesystem and load into memory as it is. JGit is doing this. Just as byte arrays.

Entries() method can be used to iterate over packfile objects. A good use case for it is per example to get all the commits into the packfile. We can seek over object headers without the need to read again all the file content (something that we are doing right now).

Index must be loaded in a lazy way.

Implement new index
Adapt index users to the new API
Add benchmarks

erizocosmico · 2018-07-19T07:49:55Z

So, right now I have a more or less working implementation of the index that does the same as JGit's. With the new way of doing things, and the new overall design as specified in the design document, it does not make much sense to make Index read-write, but read only.

As I assume, the writing of idxfiles will be done by the Packfile parser with a different encoder for that usecase. This will leave the repo in a sort of broken state, since there is no real replacement for that functionality working right now, etc.

Should we do a shared branch for working on that design document until all the pieces work together? Otherwise, PR this changes will leave everything partly broken.

WDYT? @ajnavarro @jfontan

erizocosmico · 2018-07-19T08:18:14Z

Decoding a packfile index now:

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     31445 ns/op	   30144 B/op	     613 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.902s

Decoding a packfile index before:

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     26568 ns/op	   11848 B/op	     620 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.692s

After this changes decoding is slower and uses way more memory.

UPDATE: after a few tweaks the speed is almost the same:

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     27012 ns/op	   30144 B/op	     613 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.638s

UPDATE 2: now it's slightly faster (by a few ns). I did a version with a more compact memory usage (even though it's still bigger)

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     26554 ns/op	   17680 B/op	     631 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.607s

jfontan · 2018-07-19T08:29:17Z

@erizocosmico I think it's a good idea having this branch for development. I can PR to it as soon as I have my part working.

Also it's weird that reading this way is slower. Is it just reading the index in memory?

erizocosmico · 2018-07-19T09:46:22Z

Now it's faster. The only thing I did to reduce the speed was allocating less memory, so I guess it was because it spent a lot of time allocating.

erizocosmico · 2018-07-19T11:03:32Z

@ajnavarro Packfile decoder used to get hashes by offset using the packfile.Index (the one with the maps). Is that not needed in the new design or is something we should add as a method to the Index interface?

ajnavarro · 2018-08-27T14:31:20Z

@erizocosmico added that functionality on #927

can we close this issue now?

erizocosmico · 2018-08-27T14:32:05Z

Yup

erizocosmico added enhancement performance labels Jul 18, 2018

erizocosmico self-assigned this Jul 18, 2018

erizocosmico mentioned this issue Jul 19, 2018

plumbing/format/idxfile: add new Index and MemoryIndex #896

Merged

erizocosmico closed this as completed Aug 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Index decoder implementation #893

Improve Index decoder implementation #893

erizocosmico commented Jul 18, 2018 •

edited

Loading

erizocosmico commented Jul 19, 2018

Uh oh!

erizocosmico commented Jul 19, 2018 •

edited

Loading

Uh oh!

jfontan commented Jul 19, 2018

Uh oh!

erizocosmico commented Jul 19, 2018

Uh oh!

erizocosmico commented Jul 19, 2018

Uh oh!

ajnavarro commented Aug 27, 2018

Uh oh!

erizocosmico commented Aug 27, 2018

Uh oh!

Improve Index decoder implementation #893

Improve Index decoder implementation #893

Comments

erizocosmico commented Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

erizocosmico commented Jul 19, 2018

Uh oh!

erizocosmico commented Jul 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfontan commented Jul 19, 2018

Uh oh!

erizocosmico commented Jul 19, 2018

Uh oh!

erizocosmico commented Jul 19, 2018

Uh oh!

ajnavarro commented Aug 27, 2018

Uh oh!

erizocosmico commented Aug 27, 2018

Uh oh!

erizocosmico commented Jul 18, 2018 •

edited

Loading

erizocosmico commented Jul 19, 2018 •

edited

Loading