-
Notifications
You must be signed in to change notification settings - Fork 534
storage/repository: add new functions for garbage collection #669
Conversation
Currently this implementation is only valid for kbfsgit, since it assumes some things about the filesystem not being updated during the packing, and about conflict resolution rules. In the future, it would be nice to replace this with a more general one, and move this kbfsgit-optimized implementation into kbfsgit. Issue: KBFS-2517
Issue: KBFS-2517
The file could have been completely replaced while waiting for the lock, so we need to re-open, otherwise we might be reading a stale file that has already been deleted/overwritten.
Suggested by taruti. Issue: #13
This allows the user to check whether an object exists, without reading all the object data from storage. Issue: KBFS-2445
// loose object (that is not in a pack file). Some | ||
// implementations (e.g. without loose objects) | ||
// always return an error. | ||
LooseObjectTime(plumbing.Hash) (time.Time, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For avoid force the implementation of complex storers, I rather add this methods to a new interface, like,
LooseObjectStorer
or similar where includes all the related methods, and check if the current storer implement it, and if not, return a not supported error.
Similar to Transactioner
interface.
prune.go
Outdated
} | ||
return opt.Handler(hash) | ||
}) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just return the err from the for each
storage/filesystem/object.go
Outdated
|
||
func (s *ObjectStorage) ForEachObjectHash(fun func(plumbing.Hash) error) error { | ||
err := s.dir.ForEachObjectHash(fun) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err == nil {
return nil
}
storage/memory/storage.go
Outdated
@@ -114,6 +115,14 @@ func (o *ObjectStorage) SetEncodedObject(obj plumbing.EncodedObject) (plumbing.H | |||
return h, nil | |||
} | |||
|
|||
func (o *ObjectStorage) HasEncodedObject(h plumbing.Hash) (err error) { | |||
_, ok := o.Objects[h] | |||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if _, ok := o.Objects[h]: !ok {
In general looks awesome, just the detail of the interfaces. And of course more test. Also looks like the test are failling on Windows: Maybe you are not closing properly some connections. |
Suggested by mcuadros. Issue: src-d#669
Suggested by mcuadros.
Also, object re-packing should clean up any loose objects that were packed.
I've addressed your comments and added some more tests @mcuadros. Please take another look, thanks! |
@strib https://travis-ci.org/src-d/go-git/jobs/309254658 test are failling |
Oops sorry. I pushed a commit but that's not the only fix. One sec... |
Ok should be fixed now. Sorry, I made some last minute changes and thought I had run the tests again, but I guess I messed it up somehow. Ready for a look now. |
Windows test still failling |
Is that my fault? It's happening on a lot of other tests too that I didn't touch. Looks like maybe some concurrency thing with appveyor. |
Oh, it's a packed-refs thing, hrm weird. Not getting unlocked somehow. Will look into it. |
Windows doesn't like it when we re-open a file we already have locked.
// packed, plus all tags. | ||
func (d *DotGit) PackRefs() (err error) { | ||
// Lock packed-refs, and create it if it doesn't exist yet. | ||
f, err := d.fs.OpenFile(packedRefsPath, os.O_RDWR|os.O_CREATE, 0600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I found why Windows tests are failling for this operation. Since you are open a file and keeping it open, until the end of this function, the Rename at 748 line fails.
Why you are keep a Lock over the file and you are not writting directly on it? Instead of this are you writting in a temporal file and the replacing it.
Maybe we can simplify this code and just lock the file and writter over it, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh Windows is the worst. I really want to do a rename so that the operation is atomic, and if there's a crash we don't risk having a partially-written packed-refs file.
I'm done working for the night but I'll try to come up with a solution in the morning.
51dd7a5
to
2e4fa73
Compare
Windows file system doesn't let us rename over a file while holding that file's lock, so use rewrite as a last resort. It could result in a partially-written file, if there's a failure at the wrong time.
@mcuadros: ok, Windows passes now. My solution was to write directly to the locked file if we're using a storage layer that doesn't support renaming over a locked file (i.e., the Windows filesystem). Not sure what's up with the travis failures but I don't think they're related. Please take another look, thanks! |
This PR lays the groundwork for garbage collection. It adds separate functions for each of the major pieces of garbage collection:
ReferenceStorer.PackRefs()
packs all existing refs into a single packed-refs file, and deletes all the other loose refs. Right now thedotgit
implementation is equivalent togit pack-refs --all
.Repository.Prune()
deletes loose, unreferenced objects that are older than a certain specified time.Repository.RepackObjects()
re-packs all referenced objects (including those in packfiles) into a single packfile, and then deletes all other packfiles older than a certain specified time.Note that for object walking, we've included an alternative implementation from the existing
revlist
implementation, that is more memory-efficient for large repos.I know the PR is light on tests, but I wanted to get it up quickly to see what the feedback would be. We've written app-specific tests outside of this repo and are working on more general tests with large repos. Please let me know what types of unit tests you'd like to see added when you review. Thanks!