Skip to content

DB Migrations #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions cli/migrations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
### Disclaimer:

These scripts are <u>**not real migrations**</u>. There is no up/down rules. These scripts are not production ready. The idempotence of these scripts is not guaranteed at all. The internal DB should be SQLite.

These scripts were needed to migrate the Database between different states.

- [Vacuum database](command-vacuum.go)
```shell
# rebuilds the database to defragment it
go run cli/migrations/*.go vacuum internal.db
```

- [Add UAST nullable columns](command-UAST-add-columns.go)
```shell
# prepares the current "internal.db". Adds "uast_a" and "uast_b" nulable cols
go run cli/migrations/*.go uast-add-cols internal.db
```

- [Add UAST to a database, reading from other Database](command-UAST-import-from-source-db.go)
```shell
# import UASTs into "internal.db" reading from "source.db"
go run cli/migrations/*.go uast-import internal.db source.db
```

- [Remove diff column from a database](command-diff-remove-column.go)
```shell
# prepares the current "internal.db". Remove "diff" col
go run cli/migrations/*.go diff-rm-col internal.db
```

- [Remove features table from a database](command-features-drop-table.go)
```shell
# drops the 'features' table from the current "internal.db"
go run cli/migrations/*.go features-drop-table internal.db
```
59 changes: 59 additions & 0 deletions cli/migrations/command-UAST-add-columns.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
package main
Copy link
Contributor

@bzz bzz Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, +325 LOC for not a real migration to add 2 columns to 1 table? That seems a LOT.... sorry if I have misunderstood the purpose of this, please, do not hesitate to clarify that here.

Do you think there might be a way to avoid having this complexity, but rather create a separate issue for applying a real migration tools instead?

As discussed IRL (I'm sorry if that was not articulated well enough) I think it's nice to have a DML SQL command in a small .sql file, to add a new columns. From my side, having a custom mechanism implemented to handle this task is 🙅‍♂️, as at least for now it seems like a lot of overhead (that needs to be maintained) to have that custom code that needs to be go run, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two scripts.

First one adds the new columns. It's 58 cols because of having a command tu run the query. Most of the lines are command documentation.

Second one is 176 lines to assign uast to existent FilePairs. It is needed because I couldn't guarantee that provided table with UAST is the same that current one so it could be replaced; even more: I could guarantee that it is different, with different Id's and less cols. Transforming the provided one into the current one may require more work, and it is would not be reusable.
Because of that the command just look for Files without UAST, and assign them looking for them in the source table.

About having separated commands instead of a bunch of queies: it seems easier to test and reproduce the process making shorter the time needed for the migration itself.
The commands itself can be reused for creating more commands and being consistent between them. I copied the idea from Borges that provided a clan way to pack it's commands.

About the maintenance overhead, I can not see how since it won't be needed (same way that for other cli scripts in CAT, that are not supported nor prod ready)
If you really concern about having unmaintained code in the repo we can delete all these scripts tomorrow or as soon as we finish this task. Anyway I think it's better to validate that the modifications to be done are what we expect.

About creating an issue to add a migration tool, I agree on that, but as we agreed, it is something that should be maybe done in the future.


import (
"fmt"
"log"

"github.com/src-d/code-annotation/server/dbutil"
)

type uastColsCmd struct {
commandDesc
Args struct {
InternalDBPath string `description:"filepath to the internal SQLite database"`
} `positional-args:"yes" required:"yes"`
}

var uastColsOpts = uastColsCmd{
commandDesc: commandDesc{
name: "uast-add-cols",
shortDesc: "Add uast BLOB columns",
longDesc: "Adds 'uast_a' and 'uast_b' BLOB columns to the sqlite://InternalDBPath database",
},
}

// queries
const (
leftFile = "a"
rightFile = "b"

addUastQuery = "ALTER TABLE file_pairs ADD COLUMN uast_%s BLOB"
)

func (c *uastColsCmd) Execute(args []string) error {
internalDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.InternalDBPath), false)
if err != nil {
log.Fatal(err)
}

defer internalDb.Close()

if err := addColumn(internalDb, leftFile); err != nil {
log.Fatal(err)
}

if err := addColumn(internalDb, rightFile); err != nil {
log.Fatal(err)
}

log.Println("New BLOB columns 'uast_a' and 'uast_b' were added")
return nil
}

func addColumn(db dbutil.DB, side string) error {
if _, err := db.Exec(fmt.Sprintf(addUastQuery, side)); err != nil {
return err
}

return nil
}
181 changes: 181 additions & 0 deletions cli/migrations/command-UAST-import-from-source-db.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
package main

import (
"fmt"
"log"

"github.com/src-d/code-annotation/server/dbutil"
)

type uastImportCmd struct {
commandDesc
Args struct {
InternalDBPath string `description:"filepath to the internal SQLite database"`
SourceDBPath string `description:"filepath to the SQLite database containing the UAST to import"`
} `positional-args:"yes" required:"yes"`
}

var uastImportOpts = uastImportCmd{
commandDesc: commandDesc{
name: "uast-import",
shortDesc: "Import UASTs",
longDesc: "Adds UASTs to the sqlite://InternalDBPath database reading from sqlite://SourceDBPath database",
},
}

// queries
const (
filePairsWithoutUastQuery = `SELECT
blob_id_a, uast_a IS NOT NULL as hasUastA,
blob_id_b, uast_b IS NOT NULL as hasUastB
FROM file_pairs
WHERE hasUastA = 0 or hasUastB = 0`
uastByBlobIDQuery = `SELECT uast_%s
FROM files
WHERE blob_id_%s = CAST($1 AS BLOB)
LIMIT 1`
updateByBlobIDQuery = `UPDATE file_pairs
SET uast_%s = $1
WHERE blob_id_%s = $2 and uast_%s IS NULL`
indexAddBlobID = `CREATE INDEX blob_id_%s ON file_pairs (blob_id_%s);`
)

// command messages
const (
uastImportMsgSuccess = `UASTs added into the internal DB:
- imported UASTs: %d
- edited rows: %d`
uastImportMsgError = `Some rows could not be properly updated:
- UAST not inserted: %d
- FilePair read errors: %d`
)

type file struct {
side string
blobID string
}

func (c *uastImportCmd) Execute(args []string) error {
internalDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.InternalDBPath), false)
if err != nil {
log.Fatal(err)
}

defer internalDb.Close()

sourceDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.SourceDBPath), false)
if err != nil {
log.Fatal(err)
}

defer sourceDb.Close()

fixSourceDb(sourceDb)

files, fileReadfailures := getFilesToUpdate(internalDb)
log.Printf("Found %d blobs without UAST", len(files))

var rowsEdited, uastFails, uastsImported int64
for _, file := range files {
affectedRows, err := importUastForBlobID(internalDb, sourceDb, file.side, file.blobID)
if err != nil {
log.Println(err)
uastFails++
continue
}

rowsEdited += affectedRows
uastsImported++
}

log.Printf(uastImportMsgSuccess, uastsImported, rowsEdited)

if fileReadfailures+uastFails > 0 {
log.Fatal(fmt.Sprintf(uastImportMsgError, uastFails, fileReadfailures))
}

return nil
}

type files map[string]file

func (f *files) add(blobID string, side string, ignore bool) {
if ignore {
return
}

if _, ok := (*f)[blobID+"_"+side]; !ok {
(*f)[blobID+"_"+side] = file{side: side, blobID: blobID}
}
}

func getFilesToUpdate(internalDb dbutil.DB) (map[string]file, int64) {
rows, err := internalDb.Query(filePairsWithoutUastQuery)
if err != nil {
log.Fatal(err)
}

defer rows.Close()

filesToImport := files{}
var failures int64
for rows.Next() {
var blobIDA, blobIDB string
var hasUastA, hasUastB int
err := rows.Scan(&blobIDA, &hasUastA, &blobIDB, &hasUastB)
if err != nil {
log.Printf("Failed to read row from internal DB\nerror: %v\n", err)
failures++
continue
}

filesToImport.add(blobIDA, leftFile, hasUastA == 1)
filesToImport.add(blobIDB, rightFile, hasUastB == 1)
}

return filesToImport, failures
}

func importUastForBlobID(internalDb dbutil.DB, sourceDb dbutil.DB, side string, blobID string) (int64, error) {
uast, err := getUastByBlobID(sourceDb, side, blobID)
if err != nil {
return 0, fmt.Errorf("uast_%s could not be retrieved for blobID#%s; %s", side, blobID, err)
}

return setUastToBlobID(internalDb, side, blobID, uast)
}

func getUastByBlobID(sourceDb dbutil.DB, side string, blobID string) ([]byte, error) {
var uast []byte
if err := sourceDb.QueryRow(fmt.Sprintf(uastByBlobIDQuery, side, side), blobID).Scan(&uast); err != nil {
return nil, err
}

return uast, nil
}

func setUastToBlobID(internalDb dbutil.DB, side string, blobID string, uast []byte) (int64, error) {
res, err := internalDb.Exec(fmt.Sprintf(updateByBlobIDQuery, side, side, side), uast, blobID)
if err != nil {
return 0, fmt.Errorf("uast_%s could not be saved for blobID#%s; %s", side, blobID, err)
}

rows, _ := res.RowsAffected()
if rows == 0 {
return 0, fmt.Errorf("no uast_%s to be imported for blobID#%s", side, blobID)
}

return rows, nil
}

func fixSourceDb(sourceDb dbutil.DB) error {
if _, err := sourceDb.Exec(fmt.Sprintf(indexAddBlobID, leftFile, leftFile)); err != nil {
return fmt.Errorf("can not create index over blob_id_%s; %s", leftFile, err)
}

if _, err := sourceDb.Exec(fmt.Sprintf(indexAddBlobID, rightFile, rightFile)); err != nil {
return fmt.Errorf("can not create index over blob_id_%s; %s", rightFile, err)
}

return nil
}
120 changes: 120 additions & 0 deletions cli/migrations/command-diff-remove-column.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
package main

import (
"database/sql"
"fmt"
"log"

"github.com/src-d/code-annotation/server/dbutil"
)

type diffRmColCmd struct {
commandDesc
Args struct {
InternalDBPath string `description:"filepath to the internal SQLite database"`
} `positional-args:"yes" required:"yes"`
}

var diffRmColOpts = diffRmColCmd{
commandDesc: commandDesc{
name: "diff-rm-col",
shortDesc: "Remove diff column",
longDesc: "Remove diff column from sqlite://InternalDBPath database",
},
}

// queries
const (
fileFairsTable = "file_pairs"
tmpTable = "file_pairs_tmp"

cols = `id,
blob_id_a, repository_id_a, commit_hash_a, path_a, content_a, hash_a, uast_a,
blob_id_b, repository_id_b, commit_hash_b, path_b, content_b, hash_b, uast_b,
score, experiment_id`

createTmpTableQuery = `CREATE TABLE IF NOT EXISTS ` + tmpTable + ` (
id INTEGER,
blob_id_a TEXT, repository_id_a TEXT, commit_hash_a TEXT, path_a TEXT, content_a TEXT, hash_a TEXT,
blob_id_b TEXT, repository_id_b TEXT, commit_hash_b TEXT, path_b TEXT, content_b TEXT, hash_b TEXT,
score DOUBLE PRECISION, experiment_id INTEGER,
uast_a BLOB, uast_b BLOB,
PRIMARY KEY (id),
FOREIGN KEY(experiment_id) REFERENCES experiments(id))`

fillTmpTableQuery = "INSERT INTO " + tmpTable + "(" + cols + ") SELECT " + cols + " FROM " + fileFairsTable
disableIndexQuery = "PRAGMA foreign_keys=OFF"
dropOldTableQuery = "DROP TABLE " + fileFairsTable
renameTmpTableQuery = "ALTER TABLE " + tmpTable + " RENAME TO " + fileFairsTable
checkIndexQuery = "PRAGMA foreign_key_check"
enableIndexQuery = "PRAGMA foreign_keys=ON"
)

func (c *diffRmColCmd) Execute(args []string) error {
internalDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.InternalDBPath), false)
if err != nil {
log.Fatal(err)
}

defer internalDb.Close()

queries := []string{
createTmpTableQuery,
fillTmpTableQuery,
disableIndexQuery,
dropOldTableQuery,
renameTmpTableQuery,
enableIndexQuery,
}

if err := execQueries(internalDb, queries); err != nil {
log.Fatal(err)
}

log.Println("Deleted 'diff' column")
return nil
}

func execQueries(db dbutil.DB, queries []string) (err error) {
tx, err := db.Begin()
if err != nil {
return err
}

defer func() {
if err != nil {
if rollbackErr := tx.Rollback(); rollbackErr != nil {
err = fmt.Errorf("Error running migration; %s \nThe rollback failed; %s", err, rollbackErr)
}
}
}()

for _, query := range queries {
if _, err := tx.Exec(query); err != nil {
return err
}
}

if err := ensureForeignKeys(tx); err != nil {
return err
}

return tx.Commit()
}

func ensureForeignKeys(tx queryer) error {
rows, err := tx.Query(checkIndexQuery)
if err != nil {
return err
}

if rows.Next() {
return fmt.Errorf("Foreign key constraints were violated")
}

return nil
}

type queryer interface {
Query(string, ...interface{}) (*sql.Rows, error)
}
Loading