Improve packfile reading performance #906

erizocosmico · 2018-07-30T09:54:48Z

More info at: https://docs.google.com/document/d/1I2Vte4LhMdpx0kvlrg-k6hNA-0kUGtUkC1-kfwbCmeI

Signed-off-by: Miguel Molina <[email protected]>

plumbing/format/idxfile: add new Index and MemoryIndex

Signed-off-by: Javi Fontan <[email protected]>

In one case it disables the cache and the other disables lookup when the scanner is not seekable. Could be added back later. Signed-off-by: Javi Fontan <[email protected]>

It's still not complete: * 64 bit offsets * IdxChecksum Signed-off-by: Javi Fontan <[email protected]>

Signed-off-by: Javi Fontan <[email protected]>

This functionality may be moved elsewhere in the future but is needed now to fit filesystem.ObjectStorage and the new index. Signed-off-by: Javi Fontan <[email protected]>

Index is also automatically generated when OnFooter is called. Signed-off-by: Javi Fontan <[email protected]>

Now dotgit.PackWriter uses the new packfile.Parser and index. Signed-off-by: Javi Fontan <[email protected]>

Feature/new packfile parser

Signed-off-by: Miguel Molina <[email protected]>

Signed-off-by: Javi Fontan <[email protected]>

Bugfixes and IndexStorage

Signed-off-by: Miguel Molina <[email protected]>

plumbing: packfile, new Packfile representation

Signed-off-by: Javi Fontan <[email protected]>

Tests and indexes in packfile decoder

Signed-off-by: Miguel Molina <[email protected]>

ajnavarro · 2018-07-31T14:46:28Z

plumbing/format/packfile/decoder.go

 	}

+	d.offsetToHash[h.Offset] = obj.Hash()


why do we need that having the index?

You may not have the index built

ajnavarro · 2018-07-31T14:47:20Z

plumbing/format/packfile/decoder_test.go

@@ -47,6 +46,7 @@ func (s *ReaderSuite) TestDecode(c *C) {
 	})
 }

+/*


should that be uncommented?

ajnavarro · 2018-07-31T14:50:31Z

plumbing/format/packfile/packfile.go

+	var base plumbing.EncodedObject
+	var ok bool
+	hash, err := p.FindHash(offset)
+	if err == nil {


error not handled.

yeah, that's intentional

ajnavarro · 2018-07-31T14:55:30Z

plumbing/format/packfile/packfile_test.go

+
+func (s *PackfileSuite) TestContent(c *C) {
+	storer := memory.NewObjectStorage()
+	decoder, err := NewDecoder(NewScanner(s.f.Packfile()), storer)


why we are using decoder here?

To fill the memory storage and check if Packfile decodes objects correctly as decoder does

erizocosmico · 2018-08-08T15:09:21Z

Windows tests seem to be failing because of an access denied removing fixtures 🤷‍♀️

Signed-off-by: Miguel Molina <[email protected]>

erizocosmico · 2018-08-09T07:24:54Z

Just added a benchmark for Parser.

Before, using Decoder:

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/plumbing/format/packfile -bench ^BenchmarkDecode$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/plumbing/format/packfile
BenchmarkDecode/https://github.com/git-fixtures/root-references.git-4         	     500	   3168400 ns/op	 1106147 B/op	    1211 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/basic.git-4                   	     500	   2575930 ns/op	 1062023 B/op	     612 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/basic.git#01-4                	     500	   2612729 ns/op	 1052947 B/op	     583 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/basic.git#02-4                	     500	   2377921 ns/op	 1054738 B/op	     529 allocs/op
BenchmarkDecode/https://github.com/src-d/go-git.git-4                         	       2	 500670936 ns/op	190135260 B/op	   65837 allocs/op
BenchmarkDecode/https://github.com/git-fixtures/tags.git-4                    	   10000	    164102 ns/op	  117410 B/op	     162 allocs/op
BenchmarkDecode/https://github.com/spinnaker/spinnaker.git-4                  	      10	 142586155 ns/op	64122908 B/op	  100138 allocs/op
BenchmarkDecode/https://github.com/jamesob/desk.git-4                         	     100	  18939530 ns/op	 6878197 B/op	   11440 allocs/op
BenchmarkDecode/https://github.com/cpcs499/Final_Pres_P.git-4                 	   30000	     58652 ns/op	   52102 B/op	      64 allocs/op
BenchmarkDecode/https://github.com/github/gem-builder.git-4                   	    1000	   1955303 ns/op	  669316 B/op	    1758 allocs/op
BenchmarkDecode/https://github.com/githubtraining/example-branches.git-4      	    3000	    505693 ns/op	  154663 B/op	     545 allocs/op
BenchmarkDecode/https://github.com/rumpkernel/rumprun-xen.git-4               	      10	 189519715 ns/op	94689960 B/op	   67404 allocs/op
BenchmarkDecode/https://github.com/mcuadros/skeetr.git-4                      	     300	   5271772 ns/op	 1265433 B/op	    5437 allocs/op
BenchmarkDecode/https://github.com/dezfowler/LiteMock.git-4                   	     200	   6750296 ns/op	 3134396 B/op	    1238 allocs/op
BenchmarkDecode/https://github.com/tyba/storable.git-4                        	      50	  25293456 ns/op	14527944 B/op	   24515 allocs/op
BenchmarkDecode/https://github.com/toqueteos/ts3.git-4                        	     500	   2788570 ns/op	  633714 B/op	    2402 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/packfile	28.533s

Now, using Parser:

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/plumbing/format/packfile -bench ^BenchmarkParse$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/plumbing/format/packfile
BenchmarkParse/https://github.com/git-fixtures/root-references.git-4         	     300	   5304848 ns/op	 1575867 B/op	    1737 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git-4                   	     300	   4469931 ns/op	 1519526 B/op	     925 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#01-4                	     300	   4508625 ns/op	 1517858 B/op	     897 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#02-4                	     300	   4321764 ns/op	 1506340 B/op	     786 allocs/op
BenchmarkParse/https://github.com/src-d/go-git.git-4                         	       2	 912509994 ns/op	146944992 B/op	  110160 allocs/op
BenchmarkParse/https://github.com/git-fixtures/tags.git-4                    	    5000	    275472 ns/op	  121125 B/op	     200 allocs/op
BenchmarkParse/https://github.com/spinnaker/spinnaker.git-4                  	       5	 275443394 ns/op	50149379 B/op	  167296 allocs/op
BenchmarkParse/https://github.com/jamesob/desk.git-4                         	      50	  38003653 ns/op	 6258682 B/op	   19182 allocs/op
BenchmarkParse/https://github.com/cpcs499/Final_Pres_P.git-4                 	   10000	    103859 ns/op	   54202 B/op	      71 allocs/op
BenchmarkParse/https://github.com/github/gem-builder.git-4                   	     500	   3510747 ns/op	  635228 B/op	    2817 allocs/op
BenchmarkParse/https://github.com/githubtraining/example-branches.git-4      	    2000	    945524 ns/op	  163489 B/op	     771 allocs/op
BenchmarkParse/https://github.com/rumpkernel/rumprun-xen.git-4               	       5	 270819780 ns/op	63493952 B/op	  108893 allocs/op
BenchmarkParse/https://github.com/mcuadros/skeetr.git-4                      	     100	  10074921 ns/op	 1290210 B/op	    8008 allocs/op
BenchmarkParse/https://github.com/dezfowler/LiteMock.git-4                   	     100	  12084934 ns/op	 5119299 B/op	    1780 allocs/op
BenchmarkParse/https://github.com/tyba/storable.git-4                        	      30	  50910581 ns/op	10763067 B/op	   40452 allocs/op
BenchmarkParse/https://github.com/toqueteos/ts3.git-4                        	     300	   5518369 ns/op	  571485 B/op	    3662 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/packfile	29.596s

As we can see, Parser is slower. Like, way slower.

erizocosmico · 2018-08-09T08:54:22Z

New results after some optimizations, now it's faster and uses less memory than decoder.

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/plumbing/format/packfile -bench ^BenchmarkParse$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/plumbing/format/packfile
BenchmarkParse/https://github.com/git-fixtures/root-references.git-4         	     500	   2856689 ns/op	  969763 B/op	    1101 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git-4                   	     500	   2459174 ns/op	  931841 B/op	     581 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#01-4                	     500	   2442815 ns/op	  931101 B/op	     569 allocs/op
BenchmarkParse/https://github.com/git-fixtures/basic.git#02-4                	     500	   2386935 ns/op	  928019 B/op	     510 allocs/op
BenchmarkParse/https://github.com/src-d/go-git.git-4                         	       3	 494443584 ns/op	79420842 B/op	   61125 allocs/op
BenchmarkParse/https://github.com/git-fixtures/tags.git-4                    	   10000	    160784 ns/op	   57800 B/op	     144 allocs/op
BenchmarkParse/https://github.com/spinnaker/spinnaker.git-4                  	      10	 145734617 ns/op	25500504 B/op	   98021 allocs/op
BenchmarkParse/https://github.com/jamesob/desk.git-4                         	     100	  19938104 ns/op	 3637961 B/op	   11293 allocs/op
BenchmarkParse/https://github.com/cpcs499/Final_Pres_P.git-4                 	   30000	     59185 ns/op	   53039 B/op	      63 allocs/op
BenchmarkParse/https://github.com/github/gem-builder.git-4                   	    1000	   1842201 ns/op	  304668 B/op	    1604 allocs/op
BenchmarkParse/https://github.com/githubtraining/example-branches.git-4      	    3000	    509322 ns/op	   80109 B/op	     479 allocs/op
BenchmarkParse/https://github.com/rumpkernel/rumprun-xen.git-4               	      10	 138410973 ns/op	28254898 B/op	   62946 allocs/op
BenchmarkParse/https://github.com/mcuadros/skeetr.git-4                      	     300	   4908787 ns/op	  628064 B/op	    4820 allocs/op
BenchmarkParse/https://github.com/dezfowler/LiteMock.git-4                   	     200	   6455751 ns/op	 2985541 B/op	    1039 allocs/op
BenchmarkParse/https://github.com/tyba/storable.git-4                        	      50	  26960985 ns/op	 5395440 B/op	   22986 allocs/op
BenchmarkParse/https://github.com/toqueteos/ts3.git-4                        	     500	   2467822 ns/op	  291987 B/op	    2143 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/plumbing/format/packfile	28.723s

Signed-off-by: Miguel Molina <[email protected]>

smola · 2018-08-09T10:11:40Z

We might want to rename DiskObject to FileSystemObject or FSObject, since disk is not really the only thing that can back an FS.

smola · 2018-08-09T10:14:13Z

@erizocosmico Windows issue is likely to be related to a packfile not being closed before the test ends.

Signed-off-by: Miguel Molina <[email protected]>

erizocosmico · 2018-08-09T10:17:49Z

Added benchmarks for PackfileIter.

Before:

go test -benchmem -run='^$' gopkg.in/src-d/go-git.v4/storage/filesystem -bench '^BenchmarkPackfileIter$'  -v
goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/storage/filesystem
BenchmarkPackfileIter/https://github.com/git-fixtures/root-references.git-4                  2009890771 ns/op    1417207 B/op       5455 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git-4                            2008537152 ns/op    1303230 B/op       3957 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#01-4                         2008423076 ns/op    1282282 B/op       3905 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#02-4                         2008468932 ns/op    1269857 B/op       3715 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#03-4                         2008487332 ns/op    1293435 B/op       3898 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#04-4                         2009250733 ns/op    1293650 B/op       3898 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#05-4                         100          10876467 ns/op    1303022 B/op       3915 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#06-4                         2009743280 ns/op    1293033 B/op       3897 allocs/op
BenchmarkPackfileIter/https://github.com/src-d/go-git.git-4                                    1        1948470823 ns/op   216805032 B/op    204000 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/tags.git-4                            2000 972080 ns/op     293142 B/op       2735 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/empty.git-4                          10000 105937 ns/op      25267 B/op         49 allocs/op
PASS
ok      gopkg.in/src-d/go-git.v4/storage/filesystem     24.368s

Now:

go test -benchmem -run='^$' gopkg.in/src-d/go-git.v4/storage/filesystem -bench '^BenchmarkPackfileIter$'  -v
goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/storage/filesystem
BenchmarkPackfileIter/https://github.com/git-fixtures/root-references.git-4                  3005015342 ns/op     568647 B/op       7988 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git-4                            5002899767 ns/op     420181 B/op       5079 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#01-4                         5002885262 ns/op     426862 B/op       5256 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#02-4                         5002445589 ns/op     407056 B/op       4765 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#03-4                         5003050457 ns/op     421045 B/op       5071 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#04-4                         5002930551 ns/op     421278 B/op       5070 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#05-4                         5002949758 ns/op     421219 B/op       5070 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#06-4                         5002936594 ns/op     421052 B/op       5071 allocs/op
BenchmarkPackfileIter/https://github.com/src-d/go-git.git-4                                   10         164301716 ns/op    9509232 B/op     183202 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/tags.git-4                            2000 986470 ns/op     313573 B/op       3010 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/empty.git-4                          20000  80880 ns/op      25241 B/op         49 allocs/op
PASS
ok      gopkg.in/src-d/go-git.v4/storage/filesystem     23.787s

Now, reading object content:

go test -benchmem -run=^$ gopkg.in/src-d/go-git.v4/storage/filesystem -bench ^BenchmarkPackfileIter$

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-git.v4/storage/filesystem
BenchmarkPackfileIter/https://github.com/git-fixtures/root-references.git-4         	     200	   6394984 ns/op	 2650093 B/op	    9042 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git-4                   	     300	   4292505 ns/op	 2389910 B/op	    5628 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#01-4                	     300	   4241022 ns/op	 2363856 B/op	    5753 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#02-4                	     500	   3971266 ns/op	 2289756 B/op	    5249 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#03-4                	     300	   4344217 ns/op	 2382388 B/op	    5574 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#04-4                	     300	   4342889 ns/op	 2384563 B/op	    5574 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#05-4                	     300	   4382594 ns/op	 2379835 B/op	    5573 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/basic.git#06-4                	     300	   4277826 ns/op	 2384934 B/op	    5574 allocs/op
BenchmarkPackfileIter/https://github.com/src-d/go-git.git-4                         	       2	 765374614 ns/op	263455592 B/op	  254550 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/tags.git-4                    	    2000	    909028 ns/op	  392417 B/op	    3112 allocs/op
BenchmarkPackfileIter/https://github.com/git-fixtures/empty.git-4                   	   20000	     83715 ns/op	   25241 B/op	      49 allocs/op
PASS
ok  	gopkg.in/src-d/go-git.v4/storage/filesystem	22.198s

Signed-off-by: Miguel Molina <[email protected]>

ajnavarro · 2018-08-09T10:19:38Z

plumbing/format/packfile/parser.go

@@ -460,6 +460,8 @@ type objectInfo struct {
 	Parent   *objectInfo
 	Children []*objectInfo
 	SHA1     plumbing.Hash
+
+	Content []byte


I think with an LRU cache for content by offset we can avoid using a lot of memory.

We'd still need to read objects more times, which is why it was slower before

Signed-off-by: Miguel Molina <[email protected]>

erizocosmico · 2018-08-09T10:40:57Z

Renamed DiskObject to FSObject and I'm fixing the not closed packfile thing

Signed-off-by: Miguel Molina <[email protected]>

erizocosmico · 2018-08-09T14:53:42Z

Tests should be passing now.
Also, made FSObject auto-manage the file instance so the objects can be used after the packfile is closed.

Signed-off-by: Miguel Molina <[email protected]>

erizocosmico · 2018-08-10T08:38:05Z

I think it can be reviewed now

mcuadros · 2018-08-10T09:35:58Z

plumbing/format/idxfile/decoder.go

@@ -17,6 +17,11 @@ var (
 	ErrMalformedIdxFile = errors.New("Malformed IDX file")
 )

+const (
+	fanout         = 256


the previous fanout was 255, was an error?

Real fanout size is 256, we were using 255 because the last element is the object count. JGit uses 256 as well https://github.com/eclipse/jgit/blob/6d370d837c5faa7caff2e6e3e4723b887f2fbdca/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/PackIndexV2.java#L67

mcuadros · 2018-08-10T09:42:45Z