-
Notifications
You must be signed in to change notification settings - Fork 534
Conversation
Signed-off-by: Miguel Molina <[email protected]>
plumbing/format/idxfile: add new Index and MemoryIndex
Signed-off-by: Javi Fontan <[email protected]>
In one case it disables the cache and the other disables lookup when the scanner is not seekable. Could be added back later. Signed-off-by: Javi Fontan <[email protected]>
It's still not complete: * 64 bit offsets * IdxChecksum Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
This functionality may be moved elsewhere in the future but is needed now to fit filesystem.ObjectStorage and the new index. Signed-off-by: Javi Fontan <[email protected]>
Index is also automatically generated when OnFooter is called. Signed-off-by: Javi Fontan <[email protected]>
Now dotgit.PackWriter uses the new packfile.Parser and index. Signed-off-by: Javi Fontan <[email protected]>
Feature/new packfile parser
Signed-off-by: Miguel Molina <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Bugfixes and IndexStorage
Signed-off-by: Miguel Molina <[email protected]>
plumbing: packfile, new Packfile representation
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Signed-off-by: Javi Fontan <[email protected]>
Tests and indexes in packfile decoder
Signed-off-by: Miguel Molina <[email protected]>
plumbing/format/packfile/decoder.go
Outdated
} | ||
|
||
d.offsetToHash[h.Offset] = obj.Hash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need that having the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may not have the index built
@@ -47,6 +46,7 @@ func (s *ReaderSuite) TestDecode(c *C) { | |||
}) | |||
} | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should that be uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, yeah
var base plumbing.EncodedObject | ||
var ok bool | ||
hash, err := p.FindHash(offset) | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error not handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's intentional
|
||
func (s *PackfileSuite) TestContent(c *C) { | ||
storer := memory.NewObjectStorage() | ||
decoder, err := NewDecoder(NewScanner(s.f.Packfile()), storer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are using decoder here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fill the memory storage and check if Packfile decodes objects correctly as decoder does
Windows tests seem to be failing because of an access denied removing fixtures 🤷♀️ |
Signed-off-by: Miguel Molina <[email protected]>
Just added a benchmark for Before, using
Now, using
As we can see, Parser is slower. Like, way slower. |
New results after some optimizations, now it's faster and uses less memory than decoder.
|
Signed-off-by: Miguel Molina <[email protected]>
We might want to rename |
@erizocosmico Windows issue is likely to be related to a packfile not being closed before the test ends. |
Signed-off-by: Miguel Molina <[email protected]>
Added benchmarks for PackfileIter. Before:
Now:
Now, reading object content:
|
Signed-off-by: Miguel Molina <[email protected]>
fa697fa
to
65dc4f9
Compare
plumbing/format/packfile/parser.go
Outdated
@@ -460,6 +460,8 @@ type objectInfo struct { | |||
Parent *objectInfo | |||
Children []*objectInfo | |||
SHA1 plumbing.Hash | |||
|
|||
Content []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with an LRU cache for content by offset we can avoid using a lot of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd still need to read objects more times, which is why it was slower before
Signed-off-by: Miguel Molina <[email protected]>
Renamed DiskObject to FSObject and I'm fixing the not closed packfile thing |
Signed-off-by: Miguel Molina <[email protected]>
24dc247
to
d93b386
Compare
Signed-off-by: Miguel Molina <[email protected]>
Tests should be passing now. |
Signed-off-by: Miguel Molina <[email protected]>
I think it can be reviewed now |
@@ -17,6 +17,11 @@ var ( | |||
ErrMalformedIdxFile = errors.New("Malformed IDX file") | |||
) | |||
|
|||
const ( | |||
fanout = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous fanout was 255, was an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Real fanout size is 256, we were using 255 because the last element is the object count. JGit uses 256 as well https://github.com/eclipse/jgit/blob/6d370d837c5faa7caff2e6e3e4723b887f2fbdca/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/PackIndexV2.java#L67
plumbing/format/idxfile/idxfile.go
Outdated
Version uint32 | ||
Fanout [256]uint32 | ||
// FanoutMapping maps the position in the fanout table to the position | ||
// in the Names, Offset32 and Crc32 slices. This improves the memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRC32
plumbing/format/idxfile/idxfile.go
Outdated
FanoutMapping [256]int | ||
Names [][]byte | ||
Offset32 [][]byte | ||
Crc32 [][]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRC32
plumbing/format/idxfile/idxfile.go
Outdated
Hash plumbing.Hash | ||
CRC32 uint32 | ||
Offset uint64 | ||
func (idx *MemoryIndex) findHashIndex(h plumbing.Hash) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In go is recommended return a second boolean value
findHashIndex(h plumbing.Hash) (int, bool)
plumbing/format/idxfile/idxfile.go
Outdated
return idx.getCrc32(k, i) | ||
} | ||
|
||
func (idx *MemoryIndex) getCrc32(firstLevel, secondLevel int) (uint32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCRC32
firstLevel, secondLevel int | ||
} | ||
|
||
func (i *idxfileEntryIter) Next() (*Entry, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function can be split in two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part do you mean? I don't see any way to split this
Signed-off-by: Miguel Molina <[email protected]>
} | ||
} | ||
|
||
if delta && !p.scanner.IsSeekable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use the storer to retrieve objects instead of storing all deltas in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some storers, such as the MemoryStorage, do not allow deltas stored inside them
More info at: https://docs.google.com/document/d/1I2Vte4LhMdpx0kvlrg-k6hNA-0kUGtUkC1-kfwbCmeI