storage/repository: add new functions for garbage collection #669

strib · 2017-11-29T19:16:45Z

This PR lays the groundwork for garbage collection. It adds separate functions for each of the major pieces of garbage collection:

ReferenceStorer.PackRefs() packs all existing refs into a single packed-refs file, and deletes all the other loose refs. Right now the dotgit implementation is equivalent to git pack-refs --all.
Repository.Prune() deletes loose, unreferenced objects that are older than a certain specified time.
Repository.RepackObjects() re-packs all referenced objects (including those in packfiles) into a single packfile, and then deletes all other packfiles older than a certain specified time.

Note that for object walking, we've included an alternative implementation from the existing revlist implementation, that is more memory-efficient for large repos.

I know the PR is light on tests, but I wanted to get it up quickly to see what the feedback would be. We've written app-specific tests outside of this repo and are working on more general tests with large repos. Please let me know what types of unit tests you'd like to see added when you review. Thanks!

Currently this implementation is only valid for kbfsgit, since it assumes some things about the filesystem not being updated during the packing, and about conflict resolution rules. In the future, it would be nice to replace this with a more general one, and move this kbfsgit-optimized implementation into kbfsgit. Issue: KBFS-2517

Issue: KBFS-2517

The file could have been completely replaced while waiting for the lock, so we need to re-open, otherwise we might be reading a stale file that has already been deleted/overwritten.

Suggested by taruti. Issue: #13

This allows the user to check whether an object exists, without reading all the object data from storage. Issue: KBFS-2445

mcuadros · 2017-11-29T19:31:08Z

plumbing/storer/object.go

+	// loose object (that is not in a pack file). Some
+	// implementations (e.g. without loose objects)
+	// always return an error.
+	LooseObjectTime(plumbing.Hash) (time.Time, error)


For avoid force the implementation of complex storers, I rather add this methods to a new interface, like,
LooseObjectStorer or similar where includes all the related methods, and check if the current storer implement it, and if not, return a not supported error.

Similar to Transactioner interface.

mcuadros · 2017-11-29T19:31:42Z

prune.go

+		}
+		return opt.Handler(hash)
+	})
+	if err != nil {


just return the err from the for each

mcuadros · 2017-11-29T19:33:27Z

storage/filesystem/object.go

+
+func (s *ObjectStorage) ForEachObjectHash(fun func(plumbing.Hash) error) error {
+	err := s.dir.ForEachObjectHash(fun)
+	if err != nil {


if err == nil { return nil }

mcuadros · 2017-11-29T19:34:03Z

storage/memory/storage.go

@@ -114,6 +115,14 @@ func (o *ObjectStorage) SetEncodedObject(obj plumbing.EncodedObject) (plumbing.H
 	return h, nil
 }

+func (o *ObjectStorage) HasEncodedObject(h plumbing.Hash) (err error) {
+	_, ok := o.Objects[h]
+	if !ok {


if _, ok := o.Objects[h]: !ok {

mcuadros · 2017-11-29T19:35:39Z

In general looks awesome, just the detail of the interfaces. And of course more test.

Also looks like the test are failling on Windows:
https://ci.appveyor.com/project/mcuadros/go-git/build/645

Maybe you are not closing properly some connections.

Suggested by mcuadros. Issue: src-d#669

Suggested by mcuadros.

Also, object re-packing should clean up any loose objects that were packed.

strib · 2017-11-30T00:05:33Z

I've addressed your comments and added some more tests @mcuadros. Please take another look, thanks!

mcuadros · 2017-11-30T00:09:39Z

@strib https://travis-ci.org/src-d/go-git/jobs/309254658 test are failling

strib · 2017-11-30T00:13:07Z

Oops sorry. I pushed a commit but that's not the only fix. One sec...

strib · 2017-11-30T00:14:54Z

Ok should be fixed now. Sorry, I made some last minute changes and thought I had run the tests again, but I guess I messed it up somehow. Ready for a look now.

mcuadros · 2017-11-30T00:29:44Z

Windows test still failling

strib · 2017-11-30T00:31:09Z

Windows test still failling

Is that my fault? It's happening on a lot of other tests too that I didn't touch. Looks like maybe some concurrency thing with appveyor.

strib · 2017-11-30T00:32:30Z

Oh, it's a packed-refs thing, hrm weird. Not getting unlocked somehow. Will look into it.

Windows doesn't like it when we re-open a file we already have locked.

mcuadros · 2017-11-30T02:58:16Z

storage/filesystem/internal/dotgit/dotgit.go

+// packed, plus all tags.
+func (d *DotGit) PackRefs() (err error) {
+	// Lock packed-refs, and create it if it doesn't exist yet.
+	f, err := d.fs.OpenFile(packedRefsPath, os.O_RDWR|os.O_CREATE, 0600)


Ok, I found why Windows tests are failling for this operation. Since you are open a file and keeping it open, until the end of this function, the Rename at 748 line fails.

Why you are keep a Lock over the file and you are not writting directly on it? Instead of this are you writting in a temporal file and the replacing it.

Maybe we can simplify this code and just lock the file and writter over it, no?

Ugh Windows is the worst. I really want to do a rename so that the operation is atomic, and if there's a crash we don't risk having a partially-written packed-refs file.

I'm done working for the night but I'll try to come up with a solution in the morning.

Windows file system doesn't let us rename over a file while holding that file's lock, so use rewrite as a last resort. It could result in a partially-written file, if there's a failure at the wrong time.

strib · 2017-11-30T22:41:19Z

@mcuadros: ok, Windows passes now. My solution was to write directly to the locked file if we're using a storage layer that doesn't support renaming over a locked file (i.e., the Windows filesystem).

Not sure what's up with the travis failures but I don't think they're related. Please take another look, thanks!

strib and others added 13 commits November 29, 2017 10:32

dotgit: fix up PackRefs comment for upstreaming

ae2168c

filesystem: todo comment about "all" param

3447303

Issue: KBFS-2517

dotgit: during rewriting, re-open packed-refs after locking

d501611

The file could have been completely replaced while waiting for the lock, so we need to re-open, otherwise we might be reading a stale file that has already been deleted/overwritten.

dotgit: use bufio for PackRefs

a6202ca

Suggested by taruti. Issue: #13

First pass of prune design

ac1914e

Address CI and move code around

3f0b1ff

Support for repacking objects

fae4389

Make object repacking more configurable

d96582a

Use Storer.Config pack window when repacking objects

9dcb096

Make prune object walker generic

f28e447

Use object walker in repacking code

2de4f03

plumbing: add HasEncodedObject method to Storer

aa092f5

This allows the user to check whether an object exists, without reading all the object data from storage. Issue: KBFS-2445

strib requested a review from mcuadros November 29, 2017 19:16

mcuadros requested a review from erizocosmico November 29, 2017 19:27

mcuadros suggested changes Nov 29, 2017

View reviewed changes

strib added 3 commits November 29, 2017 13:58

storage: some minor code cleanup

b18457d

Suggested by mcuadros. Issue: src-d#669

storer: separate loose and packed object mgmt into optional ifaces

4c15695

Suggested by mcuadros.

repository: add tests for pruning and object re-packing

88acc31

Also, object re-packing should clean up any loose objects that were packed.

repository: oops, fix the prune test

c2e6b5d

strib force-pushed the strib/gh-gc branch from 5193831 to c2e6b5d Compare November 30, 2017 00:14

dotgit: open+lock packed-refs file until it doesn't change

5a6cc4e

Windows doesn't like it when we re-open a file we already have locked.

mcuadros suggested changes Nov 30, 2017

View reviewed changes

strib force-pushed the strib/gh-gc branch 5 times, most recently from 51dd7a5 to 2e4fa73 Compare November 30, 2017 22:05

dotgit: rewrite packed-refs while holding lock

d532648

Windows file system doesn't let us rename over a file while holding that file's lock, so use rewrite as a last resort. It could result in a partially-written file, if there's a failure at the wrong time.

strib force-pushed the strib/gh-gc branch from 2e4fa73 to d532648 Compare November 30, 2017 22:14

mcuadros approved these changes Nov 30, 2017

View reviewed changes

mcuadros merged commit b0f6b47 into src-d:master Nov 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

storage/repository: add new functions for garbage collection #669

storage/repository: add new functions for garbage collection #669

Uh oh!

strib commented Nov 29, 2017

Uh oh!

mcuadros Nov 29, 2017

Uh oh!

mcuadros Nov 29, 2017

Uh oh!

mcuadros Nov 29, 2017

Uh oh!

mcuadros Nov 29, 2017

Uh oh!

mcuadros commented Nov 29, 2017 •

edited

Loading

Uh oh!

strib commented Nov 30, 2017

Uh oh!

mcuadros commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

mcuadros commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

mcuadros Nov 30, 2017

Uh oh!

strib Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

Uh oh!

storage/repository: add new functions for garbage collection #669

storage/repository: add new functions for garbage collection #669

Uh oh!

Conversation

strib commented Nov 29, 2017

Uh oh!

mcuadros Nov 29, 2017

Choose a reason for hiding this comment

Uh oh!

mcuadros Nov 29, 2017

Choose a reason for hiding this comment

Uh oh!

mcuadros Nov 29, 2017

Choose a reason for hiding this comment

Uh oh!

mcuadros Nov 29, 2017

Choose a reason for hiding this comment

Uh oh!

mcuadros commented Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strib commented Nov 30, 2017

Uh oh!

mcuadros commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

mcuadros commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

strib commented Nov 30, 2017

Uh oh!

mcuadros Nov 30, 2017

Choose a reason for hiding this comment

Uh oh!

strib Nov 30, 2017

Choose a reason for hiding this comment

Uh oh!

strib commented Nov 30, 2017

Uh oh!

Uh oh!

mcuadros commented Nov 29, 2017 •

edited

Loading