Skip to content

gitbase: implement sql.Indexable on all tables #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 13, 2018
Merged

gitbase: implement sql.Indexable on all tables #298

merged 13 commits into from
Jun 13, 2018

Conversation

erizocosmico
Copy link
Contributor

@erizocosmico erizocosmico commented Jun 5, 2018

Most of the contents in #279 did not survive because they relied on the index being something generic for all tables when in reality, almost every table has its own logic regarding the index storage.

This implementation differs a lot from what was initially thought in #295. The way the indexes work made it impossible to work that way.
There is also a lot of code that is inside go-git, but that logic is not exposed. If it's ever exposed we might want to refactor some of that.

To sum it up, the implementation looks like this:

  • Every table has its own key value iterator with its own index key, because most of them need different things stored so they can be converted to rows efficiently after getting the key from the index.
  • There is some common logic to find packfiles for a given repository and get from there the offsets of the objects that will be indexed.
  • For ref_commits, commit_trees and commit_blobs there is a common logic rowKeyValueIter and rowIndexIter, since they are stored as is. The other tables had no obvious common logic, so instead of coming up with an abstraction that might be a PITA in the future, I preferred to have some more tedious and perhaps slightly repeating implementations on the other iterators (though there are no exact identical impls).

@erizocosmico erizocosmico requested a review from ajnavarro June 5, 2018 14:34
@ajnavarro
Copy link
Contributor

Related: src-d/go-git#854

commits.go Outdated
@@ -261,3 +294,107 @@ func (i *commitsByHashIter) nextList() (*object.Commit, error) {
return commit, nil
}
}

type commitIndexKey struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a generic ObjectIndexKey struct for all the objects that are into the packfile?

@erizocosmico
Copy link
Contributor Author

erizocosmico commented Jun 5, 2018 via email

@erizocosmico
Copy link
Contributor Author

Added the tests and some fixes, along with the unification of the index keys where it was possible. It can be reviewed now.

@ajnavarro
Copy link
Contributor

go-git 4.4.1 has dotgit logic outside internal folder, so we can use it now updating the dependency.

@erizocosmico
Copy link
Contributor Author

@ajnavarro Since this is against a feature branch and won't go straight to master, what do you think if we review and merge this one first and then in a followup PR I update the dependency and replace the manual impl with the dotgit one?

packfiles.go Outdated
}, nil
}
}

func (i *objectIter) Close() error {
if i.packObjects != nil {
/*if i.packObjects != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@erizocosmico
Copy link
Contributor Author

Fixed errors in travis

User string `short:"u" long:"user" default:"root" description:"User name used for connection"`
Password string `short:"P" long:"password" default:"" description:"Password used for connection"`
PilosaURL string `long:"pilosa" default:"http://localhost:10101" description:"URL to your pilosa server"`
IndexDir string `short:"i" long:"index" default:"/var/gitbase/index" description:"Directory where the gitbase indexes information will be persisted."`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change the default path. What do you think @jfontan ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What path should we change it to?

Copy link
Contributor

@jfontan jfontan Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually software does not have var directories directly in /var but in one of its subdirectories. MySQL has its database files in /var/lib/mysql. /var/lib/gitbase/index seems ok.

@@ -104,6 +110,18 @@ func (c *Server) buildDatabase() error {
c.engine.Catalog.RegisterFunctions(function.Functions)
logrus.Debug("registered all available functions in catalog")

if err := os.MkdirAll(c.IndexDir, 0755); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some log on these new steps?

}
}

func (i *filesKeyValueIter) Close() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call this on tests too.


var iter sql.RowIter = &rowIndexIter{index}

if len(filters) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this with some filter?

@ajnavarro
Copy link
Contributor

Check codecov, appears that squash_join is not tested at all.

@erizocosmico
Copy link
Contributor Author

Fixed the coverage thing of the squash tables

@erizocosmico
Copy link
Contributor Author

Changed the default index directory

remotes.go Outdated
columns, filters []sql.Expression,
index sql.IndexValueIter,
) (sql.RowIter, error) {
span, ctx := ctx.Span("gitbase.ReferencesTable.WithProjectFiltersAndIndex")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "gitbase.RemotesTable.WithProjectFiltersAndIndex"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@erizocosmico
Copy link
Contributor Author

Everything is fixed now, upgraded go-mysql-server and added engine initialization to the server command.

@erizocosmico erizocosmico merged commit eb3e95b into src-d:feature/indexable Jun 13, 2018
@erizocosmico erizocosmico deleted the feature/indexable-tables branch June 13, 2018 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants