Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Improve Index decoder implementation #893

Closed
3 tasks done
erizocosmico opened this issue Jul 18, 2018 · 7 comments
Closed
3 tasks done

Improve Index decoder implementation #893

erizocosmico opened this issue Jul 18, 2018 · 7 comments

Comments

@erizocosmico
Copy link
Contributor

erizocosmico commented Jul 18, 2018

As described in https://docs.google.com/document/d/1I2Vte4LhMdpx0kvlrg-k6hNA-0kUGtUkC1-kfwbCmeI

Index will be an interface with the following methods:

type  Entry  struct {
    Hash plumbing.Hash
    Offset int64
}

type  EntryIter  interface {
    Next() (Entry, error)
    Close() error
}

type  Index  interface {
    Contains(h plumbing.Hash) (bool, error)
    FindOffset(h plumbing.Hash) (int64, error)
    FindCRC32(h plumbing.Hash) (uint32, error)
    Count() (int64, error)
    Entries() (EntryIter, error)
}

First Index implementation will be MemoryIndex and it must be just a readFull from the filesystem and load into memory as it is. JGit is doing this. Just as byte arrays.

Entries() method can be used to iterate over packfile objects. A good use case for it is per example to get all the commits into the packfile. We can seek over object headers without the need to read again all the file content (something that we are doing right now).

Index must be loaded in a lazy way.

  • Implement new index
  • Adapt index users to the new API
  • Add benchmarks
@erizocosmico
Copy link
Contributor Author

So, right now I have a more or less working implementation of the index that does the same as JGit's. With the new way of doing things, and the new overall design as specified in the design document, it does not make much sense to make Index read-write, but read only.

As I assume, the writing of idxfiles will be done by the Packfile parser with a different encoder for that usecase. This will leave the repo in a sort of broken state, since there is no real replacement for that functionality working right now, etc.

Should we do a shared branch for working on that design document until all the pieces work together? Otherwise, PR this changes will leave everything partly broken.

WDYT? @ajnavarro @jfontan

@erizocosmico
Copy link
Contributor Author

erizocosmico commented Jul 19, 2018

Decoding a packfile index now:

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     31445 ns/op	   30144 B/op	     613 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.902s

Decoding a packfile index before:

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     26568 ns/op	   11848 B/op	     620 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.692s

After this changes decoding is slower and uses way more memory.

UPDATE: after a few tweaks the speed is almost the same:

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     27012 ns/op	   30144 B/op	     613 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.638s

UPDATE 2: now it's slightly faster (by a few ns). I did a version with a more compact memory usage (even though it's still bigger)

pkg: gopkg.in/src-d/go-git.v4/plumbing/format/idxfile
BenchmarkDecode-4   	   50000	     26554 ns/op	   17680 B/op	     631 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/idxfile	1.607s

@jfontan
Copy link
Contributor

jfontan commented Jul 19, 2018

@erizocosmico I think it's a good idea having this branch for development. I can PR to it as soon as I have my part working.

Also it's weird that reading this way is slower. Is it just reading the index in memory?

@erizocosmico
Copy link
Contributor Author

Now it's faster. The only thing I did to reduce the speed was allocating less memory, so I guess it was because it spent a lot of time allocating.

@erizocosmico
Copy link
Contributor Author

@ajnavarro Packfile decoder used to get hashes by offset using the packfile.Index (the one with the maps). Is that not needed in the new design or is something we should add as a method to the Index interface?

@ajnavarro
Copy link
Contributor

@erizocosmico added that functionality on #927

can we close this issue now?

@erizocosmico
Copy link
Contributor Author

Yup

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants