Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

plumbing: object, add APIs for traversing over commit graphs #1132

Merged
merged 19 commits into from
May 14, 2019
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions plumbing/object/commitnode.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
package object

import (
"io"
"time"

"gopkg.in/src-d/go-git.v4/plumbing"
"gopkg.in/src-d/go-git.v4/plumbing/storer"
)

// CommitNode is generic interface encapsulating a lightweight commit object retrieved
// from CommitNodeIndex
type CommitNode interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get the generation of the commit? This information is interesting to compare two commits or to stop walking at a certain level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is available at the format level, but I didn't propagate it here and it is not exposed through API at this level. I will look into it. One thing to consider though is that the Git folks are looking into replacing generation numbers with "corrected commit date". I will link to the relevent discussion once I get back to my computer...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, V3 was the one I most recently read about. There's no final decision about any particular approach, but I didn't feel confident about designing API around something that is likely to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are three things we can expose with a reasonable level of forward compatibility with the new proposals:

  • Generation() uint64 API on CommitNode that will uphold the property that if A.Generation() < B.Generation() then B is unreachable by A. While the current format allows the generation to be only 30-bit integer extending it to 64-bit should account for any of the other proposals to work with the same API (except maybe the FELINE version), including the 34-bit corrected commit dates.
  • A high-level API like IsUnreachableFrom(a, b CommitNode). The idea is to abstract away the concept of the generations from the user and just expose the most useful underlying property. This may need some thinking since we basically have three states: a) definitely unreachable (property coming from the generations) b) definitely reachable (not in the commit-graph, but perhaps something that is stored in reachability bitmap for some subset of commits?) c) we don't know.
  • High-level API for priority heaps for traversing the graph in topological order by reusing the Generation property without directly exposing it.

Thoughts? I don't feel like implementing the last one since I currently have no use case for it, but maybe it would be useful for merge-base or something of that sort.

Copy link
Contributor Author

@filipnavara filipnavara Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't feel confident enough to establish API shape for two of the above options. I've added the Generation() method to CommitNode in a fashion similar to what git uses internally. It should be fairly easy to expand on it later.

// ID returns the Commit object id referenced by the commit graph node.
ID() plumbing.Hash
// Tree returns the Tree referenced by the commit graph node.
Tree() (*Tree, error)
// CommitTime returns the Commiter.When time of the Commit referenced by the commit graph node.
CommitTime() time.Time
// NumParents returns the number of parents in a commit.
NumParents() int
// ParentNodes return a CommitNodeIter for parents of specified node.
ParentNodes() CommitNodeIter
// ParentNode returns the ith parent of a commit.
ParentNode(i int) (CommitNode, error)
// ParentHashes returns hashes of the parent commits for a specified node
ParentHashes() []plumbing.Hash
// Generation returns the generation of the commit for reachability analysis.
// Objects with newer generation are not reachable from objects of older generation.
Generation() uint64
// Commit returns the full commit object from the node
Commit() (*Commit, error)
}

// CommitNodeIndex is generic interface encapsulating an index of CommitNode objects
type CommitNodeIndex interface {
// Get returns a commit node from a commit hash
Get(hash plumbing.Hash) (CommitNode, error)
}

// CommitNodeIter is a generic closable interface for iterating over commit nodes.
type CommitNodeIter interface {
Next() (CommitNode, error)
ForEach(func(CommitNode) error) error
Close()
}

// parentCommitNodeIter provides an iterator for parent commits from associated CommitNodeIndex.
type parentCommitNodeIter struct {
node CommitNode
i int
}

func newParentgraphCommitNodeIter(node CommitNode) CommitNodeIter {
return &parentCommitNodeIter{node, 0}
}

// Next moves the iterator to the next commit and returns a pointer to it. If
// there are no more commits, it returns io.EOF.
func (iter *parentCommitNodeIter) Next() (CommitNode, error) {
obj, err := iter.node.ParentNode(iter.i)
if err == ErrParentNotFound {
return nil, io.EOF
}
if err == nil {
iter.i++
}

return obj, err
}

// ForEach call the cb function for each commit contained on this iter until
// an error appends or the end of the iter is reached. If ErrStop is sent
// the iteration is stopped but no error is returned. The iterator is closed.
func (iter *parentCommitNodeIter) ForEach(cb func(CommitNode) error) error {
for {
obj, err := iter.Next()
if err != nil {
if err == io.EOF {
return nil
}

return err
}

if err := cb(obj); err != nil {
if err == storer.ErrStop {
return nil
}

return err
}
}
}

func (iter *parentCommitNodeIter) Close() {
}
130 changes: 130 additions & 0 deletions plumbing/object/commitnode_graph.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
package object

import (
"fmt"
"time"

"gopkg.in/src-d/go-git.v4/plumbing"
"gopkg.in/src-d/go-git.v4/plumbing/format/commitgraph"
"gopkg.in/src-d/go-git.v4/plumbing/storer"
)

// graphCommitNode is a reduced representation of Commit as presented in the commit
// graph file (commitgraph.Node). It is merely useful as an optimization for walking
// the commit graphs.
//
// graphCommitNode implements the CommitNode interface.
type graphCommitNode struct {
// Hash for the Commit object
hash plumbing.Hash
// Index of the node in the commit graph file
index int

commitData *commitgraph.CommitData
gci *graphCommitNodeIndex
}

// graphCommitNodeIndex is an index that can load CommitNode objects from both the commit
// graph files and the object store.
//
// graphCommitNodeIndex implements the CommitNodeIndex interface
type graphCommitNodeIndex struct {
commitGraph commitgraph.Index
s storer.EncodedObjectStorer
}

// NewGraphCommitNodeIndex returns CommitNodeIndex implementation that uses commit-graph
// files as backing storage and falls back to object storage when necessary
func NewGraphCommitNodeIndex(commitGraph commitgraph.Index, s storer.EncodedObjectStorer) CommitNodeIndex {
return &graphCommitNodeIndex{commitGraph, s}
}

func (gci *graphCommitNodeIndex) Get(hash plumbing.Hash) (CommitNode, error) {
// Check the commit graph first
parentIndex, err := gci.commitGraph.GetIndexByHash(hash)
if err == nil {
parent, err := gci.commitGraph.GetCommitDataByIndex(parentIndex)
if err != nil {
return nil, err
}

return &graphCommitNode{
hash: hash,
index: parentIndex,
commitData: parent,
gci: gci,
}, nil
}

// Fallback to loading full commit object
commit, err := GetCommit(gci.s, hash)
if err != nil {
return nil, err
}

return &objectCommitNode{
nodeIndex: gci,
commit: commit,
}, nil
}

func (c *graphCommitNode) ID() plumbing.Hash {
return c.hash
}

func (c *graphCommitNode) Tree() (*Tree, error) {
return GetTree(c.gci.s, c.commitData.TreeHash)
}

func (c *graphCommitNode) CommitTime() time.Time {
return c.commitData.When
}

func (c *graphCommitNode) NumParents() int {
return len(c.commitData.ParentIndexes)
}

func (c *graphCommitNode) ParentNodes() CommitNodeIter {
return newParentgraphCommitNodeIter(c)
}

func (c *graphCommitNode) ParentNode(i int) (CommitNode, error) {
if i < 0 || i >= len(c.commitData.ParentIndexes) {
return nil, ErrParentNotFound
}

parent, err := c.gci.commitGraph.GetCommitDataByIndex(c.commitData.ParentIndexes[i])
if err != nil {
return nil, err
}

return &graphCommitNode{
hash: c.commitData.ParentHashes[i],
index: c.commitData.ParentIndexes[i],
commitData: parent,
gci: c.gci,
}, nil
}

func (c *graphCommitNode) ParentHashes() []plumbing.Hash {
return c.commitData.ParentHashes
}

func (c *graphCommitNode) Generation() uint64 {
// If the commit-graph file was generated with older Git version that
// set the generation to zero for every commit the generation assumption
// is still valid. It is just less useful.
return uint64(c.commitData.Generation)
}

func (c *graphCommitNode) Commit() (*Commit, error) {
return GetCommit(c.gci.s, c.hash)
}

func (c *graphCommitNode) String() string {
return fmt.Sprintf(
"%s %s\nDate: %s",
plumbing.CommitObject, c.ID(),
c.CommitTime().Format(DateFormat),
)
}
89 changes: 89 additions & 0 deletions plumbing/object/commitnode_object.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package object

import (
"math"
"time"

"gopkg.in/src-d/go-git.v4/plumbing"
"gopkg.in/src-d/go-git.v4/plumbing/storer"
)

// objectCommitNode is a representation of Commit as presented in the GIT object format.
//
// objectCommitNode implements the CommitNode interface.
type objectCommitNode struct {
nodeIndex CommitNodeIndex
commit *Commit
}

// NewObjectCommitNodeIndex returns CommitNodeIndex implementation that uses
// only object storage to load the nodes
func NewObjectCommitNodeIndex(s storer.EncodedObjectStorer) CommitNodeIndex {
return &objectCommitNodeIndex{s}
}

func (oci *objectCommitNodeIndex) Get(hash plumbing.Hash) (CommitNode, error) {
commit, err := GetCommit(oci.s, hash)
if err != nil {
return nil, err
}

return &objectCommitNode{
nodeIndex: oci,
commit: commit,
}, nil
}

// objectCommitNodeIndex is an index that can load CommitNode objects only from the
// object store.
//
// objectCommitNodeIndex implements the CommitNodeIndex interface
type objectCommitNodeIndex struct {
s storer.EncodedObjectStorer
}

func (c *objectCommitNode) CommitTime() time.Time {
return c.commit.Committer.When
}

func (c *objectCommitNode) ID() plumbing.Hash {
return c.commit.ID()
}

func (c *objectCommitNode) Tree() (*Tree, error) {
return c.commit.Tree()
}

func (c *objectCommitNode) NumParents() int {
return c.commit.NumParents()
}

func (c *objectCommitNode) ParentNodes() CommitNodeIter {
return newParentgraphCommitNodeIter(c)
}

func (c *objectCommitNode) ParentNode(i int) (CommitNode, error) {
if i < 0 || i >= len(c.commit.ParentHashes) {
return nil, ErrParentNotFound
}

// Note: It's necessary to go through CommitNodeIndex here to ensure
// that if the commit-graph file covers only part of the history we
// start using it when that part is reached.
return c.nodeIndex.Get(c.commit.ParentHashes[i])
}

func (c *objectCommitNode) ParentHashes() []plumbing.Hash {
return c.commit.ParentHashes
}

func (c *objectCommitNode) Generation() uint64 {
// Commit nodes representing objects outside of the commit graph can never
// be reached by objects from the commit-graph thus we return the highest
// possible value.
return math.MaxUint64
}

func (c *objectCommitNode) Commit() (*Commit, error) {
return c.commit, nil
}
Loading