Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Improve packfile reading performance #906

Merged
merged 37 commits into from
Aug 14, 2018
Merged

Improve packfile reading performance #906

merged 37 commits into from
Aug 14, 2018

Conversation

erizocosmico
Copy link
Contributor

@erizocosmico erizocosmico commented Jul 30, 2018

erizocosmico and others added 24 commits July 19, 2018 15:20
plumbing/format/idxfile: add new Index and MemoryIndex
In one case it disables the cache and the other disables lookup when
the scanner is not seekable. Could be added back later.

Signed-off-by: Javi Fontan <[email protected]>
It's still not complete:

* 64 bit offsets
* IdxChecksum

Signed-off-by: Javi Fontan <[email protected]>
This functionality may be moved elsewhere in the future but is needed
now to fit filesystem.ObjectStorage and the new index.

Signed-off-by: Javi Fontan <[email protected]>
Index is also automatically generated when OnFooter is called.

Signed-off-by: Javi Fontan <[email protected]>
Now dotgit.PackWriter uses the new packfile.Parser and index.

Signed-off-by: Javi Fontan <[email protected]>
 plumbing: packfile, new Packfile representation
@erizocosmico erizocosmico requested a review from ajnavarro July 30, 2018 09:54
}

d.offsetToHash[h.Offset] = obj.Hash()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need that having the index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not have the index built

@@ -47,6 +46,7 @@ func (s *ReaderSuite) TestDecode(c *C) {
})
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should that be uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, yeah

var base plumbing.EncodedObject
var ok bool
hash, err := p.FindHash(offset)
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error not handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's intentional


func (s *PackfileSuite) TestContent(c *C) {
storer := memory.NewObjectStorage()
decoder, err := NewDecoder(NewScanner(s.f.Packfile()), storer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are using decoder here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fill the memory storage and check if Packfile decodes objects correctly as decoder does

@erizocosmico
Copy link
Contributor Author

Windows tests seem to be failing because of an access denied removing fixtures 🤷‍♀️

@erizocosmico
Copy link
Contributor Author

Just added a benchmark for Parser.

Before, using Decoder:

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/plumbing/format/packfile -bench ^BenchmarkDecode$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/plumbing/format/packfile
BenchmarkDecode/https://github.com/git-fixtures/root-references.git-4         	     500	   3168400 ns/op	 1106147 B/op	    1211 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/basic.git-4                   	     500	   2575930 ns/op	 1062023 B/op	     612 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/basic.git#01-4                	     500	   2612729 ns/op	 1052947 B/op	     583 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/basic.git#02-4                	     500	   2377921 ns/op	 1054738 B/op	     529 allocs/op
BenchmarkDecode/https://github.com/src-d/go-git.git-4                         	       2	 500670936 ns/op	190135260 B/op	   65837 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/tags.git-4                    	   10000	    164102 ns/op	  117410 B/op	     162 allocs/op
BenchmarkDecode/https://github.com/spinnaker/spinnaker.git-4                  	      10	 142586155 ns/op	64122908 B/op	  100138 allocs/op
BenchmarkDecode/https://github.com/jamesob/desk.git-4                         	     100	  18939530 ns/op	 6878197 B/op	   11440 allocs/op
BenchmarkDecode/https://github.com/cpcs499/Final_Pres_P.git-4                 	   30000	     58652 ns/op	   52102 B/op	      64 allocs/op
BenchmarkDecode/https://github.com/github/gem-builder.git-4                   	    1000	   1955303 ns/op	  669316 B/op	    1758 allocs/op
BenchmarkDecode/https://github.com/githubtraining/example-branches.git-4      	    3000	    505693 ns/op	  154663 B/op	     545 allocs/op
BenchmarkDecode/https://github.com/rumpkernel/rumprun-xen.git-4               	      10	 189519715 ns/op	94689960 B/op	   67404 allocs/op
BenchmarkDecode/https://github.com/mcuadros/skeetr.git-4                      	     300	   5271772 ns/op	 1265433 B/op	    5437 allocs/op
BenchmarkDecode/https://github.com/dezfowler/LiteMock.git-4                   	     200	   6750296 ns/op	 3134396 B/op	    1238 allocs/op
BenchmarkDecode/https://github.com/tyba/storable.git-4                        	      50	  25293456 ns/op	14527944 B/op	   24515 allocs/op
BenchmarkDecode/https://github.com/toqueteos/ts3.git-4                        	     500	   2788570 ns/op	  633714 B/op	    2402 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/packfile	28.533s

Now, using Parser:

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/plumbing/format/packfile -bench ^BenchmarkParse$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/plumbing/format/packfile
BenchmarkParse/https://github.com/git-fixtures/root-references.git-4         	     300	   5304848 ns/op	 1575867 B/op	    1737 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git-4                   	     300	   4469931 ns/op	 1519526 B/op	     925 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#01-4                	     300	   4508625 ns/op	 1517858 B/op	     897 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#02-4                	     300	   4321764 ns/op	 1506340 B/op	     786 allocs/op
BenchmarkParse/https://github.com/src-d/go-git.git-4                         	       2	 912509994 ns/op	146944992 B/op	  110160 allocs/op
BenchmarkParse/https://github.com/git-fixtures/tags.git-4                    	    5000	    275472 ns/op	  121125 B/op	     200 allocs/op
BenchmarkParse/https://github.com/spinnaker/spinnaker.git-4                  	       5	 275443394 ns/op	50149379 B/op	  167296 allocs/op
BenchmarkParse/https://github.com/jamesob/desk.git-4                         	      50	  38003653 ns/op	 6258682 B/op	   19182 allocs/op
BenchmarkParse/https://github.com/cpcs499/Final_Pres_P.git-4                 	   10000	    103859 ns/op	   54202 B/op	      71 allocs/op
BenchmarkParse/https://github.com/github/gem-builder.git-4                   	     500	   3510747 ns/op	  635228 B/op	    2817 allocs/op
BenchmarkParse/https://github.com/githubtraining/example-branches.git-4      	    2000	    945524 ns/op	  163489 B/op	     771 allocs/op
BenchmarkParse/https://github.com/rumpkernel/rumprun-xen.git-4               	       5	 270819780 ns/op	63493952 B/op	  108893 allocs/op
BenchmarkParse/https://github.com/mcuadros/skeetr.git-4                      	     100	  10074921 ns/op	 1290210 B/op	    8008 allocs/op
BenchmarkParse/https://github.com/dezfowler/LiteMock.git-4                   	     100	  12084934 ns/op	 5119299 B/op	    1780 allocs/op
BenchmarkParse/https://github.com/tyba/storable.git-4                        	      30	  50910581 ns/op	10763067 B/op	   40452 allocs/op
BenchmarkParse/https://github.com/toqueteos/ts3.git-4                        	     300	   5518369 ns/op	  571485 B/op	    3662 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/packfile	29.596s

As we can see, Parser is slower. Like, way slower.

@erizocosmico
Copy link
Contributor Author

erizocosmico commented Aug 9, 2018

New results after some optimizations, now it's faster and uses less memory than decoder.

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/plumbing/format/packfile -bench ^BenchmarkParse$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/plumbing/format/packfile
BenchmarkParse/https://github.com/git-fixtures/root-references.git-4         	     500	   2856689 ns/op	  969763 B/op	    1101 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git-4                   	     500	   2459174 ns/op	  931841 B/op	     581 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#01-4                	     500	   2442815 ns/op	  931101 B/op	     569 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#02-4                	     500	   2386935 ns/op	  928019 B/op	     510 allocs/op
BenchmarkParse/https://github.com/src-d/go-git.git-4                         	       3	 494443584 ns/op	79420842 B/op	   61125 allocs/op
BenchmarkParse/https://github.com/git-fixtures/tags.git-4                    	   10000	    160784 ns/op	   57800 B/op	     144 allocs/op
BenchmarkParse/https://github.com/spinnaker/spinnaker.git-4                  	      10	 145734617 ns/op	25500504 B/op	   98021 allocs/op
BenchmarkParse/https://github.com/jamesob/desk.git-4                         	     100	  19938104 ns/op	 3637961 B/op	   11293 allocs/op
BenchmarkParse/https://github.com/cpcs499/Final_Pres_P.git-4                 	   30000	     59185 ns/op	   53039 B/op	      63 allocs/op
BenchmarkParse/https://github.com/github/gem-builder.git-4                   	    1000	   1842201 ns/op	  304668 B/op	    1604 allocs/op
BenchmarkParse/https://github.com/githubtraining/example-branches.git-4      	    3000	    509322 ns/op	   80109 B/op	     479 allocs/op
BenchmarkParse/https://github.com/rumpkernel/rumprun-xen.git-4               	      10	 138410973 ns/op	28254898 B/op	   62946 allocs/op
BenchmarkParse/https://github.com/mcuadros/skeetr.git-4                      	     300	   4908787 ns/op	  628064 B/op	    4820 allocs/op
BenchmarkParse/https://github.com/dezfowler/LiteMock.git-4                   	     200	   6455751 ns/op	 2985541 B/op	    1039 allocs/op
BenchmarkParse/https://github.com/tyba/storable.git-4                        	      50	  26960985 ns/op	 5395440 B/op	   22986 allocs/op
BenchmarkParse/https://github.com/toqueteos/ts3.git-4                        	     500	   2467822 ns/op	  291987 B/op	    2143 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/packfile	28.723s

@smola
Copy link
Collaborator

smola commented Aug 9, 2018

We might want to rename DiskObject to FileSystemObject or FSObject, since disk is not really the only thing that can back an FS.

@smola
Copy link
Collaborator

smola commented Aug 9, 2018

@erizocosmico Windows issue is likely to be related to a packfile not being closed before the test ends.

@erizocosmico
Copy link
Contributor Author

erizocosmico commented Aug 9, 2018

Added benchmarks for PackfileIter.

Before:

go test -benchmem -run='^$' gopkg.in/src-d/go-git.v4/storage/filesystem -bench '^BenchmarkPackfileIter$'  -v
goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/storage/filesystem
BenchmarkPackfileIter/https://github.com/git-fixtures/root-references.git-4                  2009890771 ns/op    1417207 B/op       5455 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git-4                            2008537152 ns/op    1303230 B/op       3957 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#01-4                         2008423076 ns/op    1282282 B/op       3905 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#02-4                         2008468932 ns/op    1269857 B/op       3715 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#03-4                         2008487332 ns/op    1293435 B/op       3898 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#04-4                         2009250733 ns/op    1293650 B/op       3898 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#05-4                         100          10876467 ns/op    1303022 B/op       3915 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#06-4                         2009743280 ns/op    1293033 B/op       3897 allocs/op
BenchmarkPackfileIter/https://github.com/src-d/go-git.git-4                                    1        1948470823 ns/op   216805032 B/op    204000 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/tags.git-4                            2000 972080 ns/op     293142 B/op       2735 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/empty.git-4                          10000 105937 ns/op      25267 B/op         49 allocs/op
PASS
ok      gopkg.in/src-d/go-git.v4/storage/filesystem     24.368s

Now:

go test -benchmem -run='^$' gopkg.in/src-d/go-git.v4/storage/filesystem -bench '^BenchmarkPackfileIter$'  -v
goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/storage/filesystem
BenchmarkPackfileIter/https://github.com/git-fixtures/root-references.git-4                  3005015342 ns/op     568647 B/op       7988 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git-4                            5002899767 ns/op     420181 B/op       5079 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#01-4                         5002885262 ns/op     426862 B/op       5256 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#02-4                         5002445589 ns/op     407056 B/op       4765 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#03-4                         5003050457 ns/op     421045 B/op       5071 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#04-4                         5002930551 ns/op     421278 B/op       5070 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#05-4                         5002949758 ns/op     421219 B/op       5070 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#06-4                         5002936594 ns/op     421052 B/op       5071 allocs/op
BenchmarkPackfileIter/https://github.com/src-d/go-git.git-4                                   10         164301716 ns/op    9509232 B/op     183202 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/tags.git-4                            2000 986470 ns/op     313573 B/op       3010 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/empty.git-4                          20000  80880 ns/op      25241 B/op         49 allocs/op
PASS
ok      gopkg.in/src-d/go-git.v4/storage/filesystem     23.787s

Now, reading object content:

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/storage/filesystem -bench ^BenchmarkPackfileIter$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/storage/filesystem
BenchmarkPackfileIter/https://github.com/git-fixtures/root-references.git-4         	     200	   6394984 ns/op	 2650093 B/op	    9042 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git-4                   	     300	   4292505 ns/op	 2389910 B/op	    5628 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#01-4                	     300	   4241022 ns/op	 2363856 B/op	    5753 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#02-4                	     500	   3971266 ns/op	 2289756 B/op	    5249 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#03-4                	     300	   4344217 ns/op	 2382388 B/op	    5574 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#04-4                	     300	   4342889 ns/op	 2384563 B/op	    5574 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#05-4                	     300	   4382594 ns/op	 2379835 B/op	    5573 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#06-4                	     300	   4277826 ns/op	 2384934 B/op	    5574 allocs/op
BenchmarkPackfileIter/https://github.com/src-d/go-git.git-4                         	       2	 765374614 ns/op	263455592 B/op	  254550 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/tags.git-4                    	    2000	    909028 ns/op	  392417 B/op	    3112 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/empty.git-4                   	   20000	     83715 ns/op	   25241 B/op	      49 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/storage/filesystem	22.198s

@@ -460,6 +460,8 @@ type objectInfo struct {
Parent *objectInfo
Children []*objectInfo
SHA1 plumbing.Hash

Content []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with an LRU cache for content by offset we can avoid using a lot of memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd still need to read objects more times, which is why it was slower before

@erizocosmico
Copy link
Contributor Author

Renamed DiskObject to FSObject and I'm fixing the not closed packfile thing

@erizocosmico
Copy link
Contributor Author

Tests should be passing now.
Also, made FSObject auto-manage the file instance so the objects can be used after the packfile is closed.

@erizocosmico
Copy link
Contributor Author

I think it can be reviewed now

@@ -17,6 +17,11 @@ var (
ErrMalformedIdxFile = errors.New("Malformed IDX file")
)

const (
fanout = 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous fanout was 255, was an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version uint32
Fanout [256]uint32
// FanoutMapping maps the position in the fanout table to the position
// in the Names, Offset32 and Crc32 slices. This improves the memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRC32

FanoutMapping [256]int
Names [][]byte
Offset32 [][]byte
Crc32 [][]byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRC32

Hash plumbing.Hash
CRC32 uint32
Offset uint64
func (idx *MemoryIndex) findHashIndex(h plumbing.Hash) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In go is recommended return a second boolean value

findHashIndex(h plumbing.Hash) (int, bool)

return idx.getCrc32(k, i)
}

func (idx *MemoryIndex) getCrc32(firstLevel, secondLevel int) (uint32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCRC32

firstLevel, secondLevel int
}

func (i *idxfileEntryIter) Next() (*Entry, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function can be split in two

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you mean? I don't see any way to split this

}
}

if delta && !p.scanner.IsSeekable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the storer to retrieve objects instead of storing all deltas in memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some storers, such as the MemoryStorage, do not allow deltas stored inside them

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants