Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

storage/repository: add new functions for garbage collection #669

Merged
merged 19 commits into from
Nov 30, 2017

Conversation

strib
Copy link
Contributor

@strib strib commented Nov 29, 2017

This PR lays the groundwork for garbage collection. It adds separate functions for each of the major pieces of garbage collection:

  • ReferenceStorer.PackRefs() packs all existing refs into a single packed-refs file, and deletes all the other loose refs. Right now the dotgit implementation is equivalent to git pack-refs --all.
  • Repository.Prune() deletes loose, unreferenced objects that are older than a certain specified time.
  • Repository.RepackObjects() re-packs all referenced objects (including those in packfiles) into a single packfile, and then deletes all other packfiles older than a certain specified time.

Note that for object walking, we've included an alternative implementation from the existing revlist implementation, that is more memory-efficient for large repos.

I know the PR is light on tests, but I wanted to get it up quickly to see what the feedback would be. We've written app-specific tests outside of this repo and are working on more general tests with large repos. Please let me know what types of unit tests you'd like to see added when you review. Thanks!

strib and others added 13 commits November 29, 2017 10:32
Currently this implementation is only valid for kbfsgit, since it
assumes some things about the filesystem not being updated during the
packing, and about conflict resolution rules.  In the future, it would
be nice to replace this with a more general one, and move this
kbfsgit-optimized implementation into kbfsgit.

Issue: KBFS-2517
The file could have been completely replaced while waiting for the
lock, so we need to re-open, otherwise we might be reading a stale
file that has already been deleted/overwritten.
Suggested by taruti.

Issue: #13
This allows the user to check whether an object exists, without
reading all the object data from storage.

Issue: KBFS-2445
@strib strib requested a review from mcuadros November 29, 2017 19:16
// loose object (that is not in a pack file). Some
// implementations (e.g. without loose objects)
// always return an error.
LooseObjectTime(plumbing.Hash) (time.Time, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For avoid force the implementation of complex storers, I rather add this methods to a new interface, like,
LooseObjectStorer or similar where includes all the related methods, and check if the current storer implement it, and if not, return a not supported error.

Similar to Transactioner interface.

prune.go Outdated
}
return opt.Handler(hash)
})
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return the err from the for each


func (s *ObjectStorage) ForEachObjectHash(fun func(plumbing.Hash) error) error {
err := s.dir.ForEachObjectHash(fun)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err == nil {
return nil
}

@@ -114,6 +115,14 @@ func (o *ObjectStorage) SetEncodedObject(obj plumbing.EncodedObject) (plumbing.H
return h, nil
}

func (o *ObjectStorage) HasEncodedObject(h plumbing.Hash) (err error) {
_, ok := o.Objects[h]
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if _, ok := o.Objects[h]: !ok {

@mcuadros
Copy link
Contributor

mcuadros commented Nov 29, 2017

In general looks awesome, just the detail of the interfaces. And of course more test.

Also looks like the test are failling on Windows:
https://ci.appveyor.com/project/mcuadros/go-git/build/645

Maybe you are not closing properly some connections.

Suggested by mcuadros.

Issue: src-d#669
Also, object re-packing should clean up any loose objects that were
packed.
@strib
Copy link
Contributor Author

strib commented Nov 30, 2017

I've addressed your comments and added some more tests @mcuadros. Please take another look, thanks!

@mcuadros
Copy link
Contributor

@strib
Copy link
Contributor Author

strib commented Nov 30, 2017

Oops sorry. I pushed a commit but that's not the only fix. One sec...

@strib
Copy link
Contributor Author

strib commented Nov 30, 2017

Ok should be fixed now. Sorry, I made some last minute changes and thought I had run the tests again, but I guess I messed it up somehow. Ready for a look now.

@mcuadros
Copy link
Contributor

Windows test still failling

@strib
Copy link
Contributor Author

strib commented Nov 30, 2017

Windows test still failling

Is that my fault? It's happening on a lot of other tests too that I didn't touch. Looks like maybe some concurrency thing with appveyor.

@strib
Copy link
Contributor Author

strib commented Nov 30, 2017

Oh, it's a packed-refs thing, hrm weird. Not getting unlocked somehow. Will look into it.

Windows doesn't like it when we re-open a file we already have locked.
// packed, plus all tags.
func (d *DotGit) PackRefs() (err error) {
// Lock packed-refs, and create it if it doesn't exist yet.
f, err := d.fs.OpenFile(packedRefsPath, os.O_RDWR|os.O_CREATE, 0600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I found why Windows tests are failling for this operation. Since you are open a file and keeping it open, until the end of this function, the Rename at 748 line fails.

Why you are keep a Lock over the file and you are not writting directly on it? Instead of this are you writting in a temporal file and the replacing it.

Maybe we can simplify this code and just lock the file and writter over it, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh Windows is the worst. I really want to do a rename so that the operation is atomic, and if there's a crash we don't risk having a partially-written packed-refs file.

I'm done working for the night but I'll try to come up with a solution in the morning.

@strib strib force-pushed the strib/gh-gc branch 5 times, most recently from 51dd7a5 to 2e4fa73 Compare November 30, 2017 22:05
Windows file system doesn't let us rename over a file while holding
that file's lock, so use rewrite as a last resort.  It could result in
a partially-written file, if there's a failure at the wrong time.
@strib
Copy link
Contributor Author

strib commented Nov 30, 2017

@mcuadros: ok, Windows passes now. My solution was to write directly to the locked file if we're using a storage layer that doesn't support renaming over a locked file (i.e., the Windows filesystem).

Not sure what's up with the travis failures but I don't think they're related. Please take another look, thanks!

@mcuadros mcuadros merged commit b0f6b47 into src-d:master Nov 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants