-
Notifications
You must be signed in to change notification settings - Fork 26
DB Migrations #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
DB Migrations #221
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
1891732
Add migrations to use uast nullable cols in FilePairs
dpordomingo 82c4514
Add migrations to remove diff column
dpordomingo ff30b2c
Add migrations to vacuum internal database
dpordomingo 3f013f3
Add migrations to remove features table
dpordomingo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
### Disclaimer: | ||
|
||
These scripts are <u>**not real migrations**</u>. There is no up/down rules. These scripts are not production ready. The idempotence of these scripts is not guaranteed at all. The internal DB should be SQLite. | ||
|
||
These scripts were needed to migrate the Database between different states. | ||
|
||
- [Vacuum database](command-vacuum.go) | ||
```shell | ||
# rebuilds the database to defragment it | ||
go run cli/migrations/*.go vacuum internal.db | ||
``` | ||
|
||
- [Add UAST nullable columns](command-UAST-add-columns.go) | ||
```shell | ||
# prepares the current "internal.db". Adds "uast_a" and "uast_b" nulable cols | ||
go run cli/migrations/*.go uast-add-cols internal.db | ||
``` | ||
|
||
- [Add UAST to a database, reading from other Database](command-UAST-import-from-source-db.go) | ||
```shell | ||
# import UASTs into "internal.db" reading from "source.db" | ||
go run cli/migrations/*.go uast-import internal.db source.db | ||
``` | ||
|
||
- [Remove diff column from a database](command-diff-remove-column.go) | ||
```shell | ||
# prepares the current "internal.db". Remove "diff" col | ||
go run cli/migrations/*.go diff-rm-col internal.db | ||
``` | ||
|
||
- [Remove features table from a database](command-features-drop-table.go) | ||
```shell | ||
# drops the 'features' table from the current "internal.db" | ||
go run cli/migrations/*.go features-drop-table internal.db | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
package main | ||
|
||
import ( | ||
"fmt" | ||
"log" | ||
|
||
"github.com/src-d/code-annotation/server/dbutil" | ||
) | ||
|
||
type uastColsCmd struct { | ||
commandDesc | ||
Args struct { | ||
InternalDBPath string `description:"filepath to the internal SQLite database"` | ||
} `positional-args:"yes" required:"yes"` | ||
} | ||
|
||
var uastColsOpts = uastColsCmd{ | ||
commandDesc: commandDesc{ | ||
name: "uast-add-cols", | ||
shortDesc: "Add uast BLOB columns", | ||
longDesc: "Adds 'uast_a' and 'uast_b' BLOB columns to the sqlite://InternalDBPath database", | ||
}, | ||
} | ||
|
||
// queries | ||
const ( | ||
leftFile = "a" | ||
rightFile = "b" | ||
|
||
addUastQuery = "ALTER TABLE file_pairs ADD COLUMN uast_%s BLOB" | ||
) | ||
|
||
func (c *uastColsCmd) Execute(args []string) error { | ||
internalDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.InternalDBPath), false) | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
defer internalDb.Close() | ||
|
||
if err := addColumn(internalDb, leftFile); err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
if err := addColumn(internalDb, rightFile); err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
log.Println("New BLOB columns 'uast_a' and 'uast_b' were added") | ||
return nil | ||
} | ||
|
||
func addColumn(db dbutil.DB, side string) error { | ||
if _, err := db.Exec(fmt.Sprintf(addUastQuery, side)); err != nil { | ||
return err | ||
} | ||
|
||
return nil | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
package main | ||
|
||
import ( | ||
"fmt" | ||
"log" | ||
|
||
"github.com/src-d/code-annotation/server/dbutil" | ||
) | ||
|
||
type uastImportCmd struct { | ||
commandDesc | ||
Args struct { | ||
InternalDBPath string `description:"filepath to the internal SQLite database"` | ||
SourceDBPath string `description:"filepath to the SQLite database containing the UAST to import"` | ||
} `positional-args:"yes" required:"yes"` | ||
} | ||
|
||
var uastImportOpts = uastImportCmd{ | ||
commandDesc: commandDesc{ | ||
name: "uast-import", | ||
shortDesc: "Import UASTs", | ||
longDesc: "Adds UASTs to the sqlite://InternalDBPath database reading from sqlite://SourceDBPath database", | ||
}, | ||
} | ||
|
||
// queries | ||
const ( | ||
filePairsWithoutUastQuery = `SELECT | ||
blob_id_a, uast_a IS NOT NULL as hasUastA, | ||
blob_id_b, uast_b IS NOT NULL as hasUastB | ||
FROM file_pairs | ||
WHERE hasUastA = 0 or hasUastB = 0` | ||
uastByBlobIDQuery = `SELECT uast_%s | ||
FROM files | ||
WHERE blob_id_%s = CAST($1 AS BLOB) | ||
LIMIT 1` | ||
updateByBlobIDQuery = `UPDATE file_pairs | ||
SET uast_%s = $1 | ||
WHERE blob_id_%s = $2 and uast_%s IS NULL` | ||
indexAddBlobID = `CREATE INDEX blob_id_%s ON file_pairs (blob_id_%s);` | ||
) | ||
|
||
// command messages | ||
const ( | ||
uastImportMsgSuccess = `UASTs added into the internal DB: | ||
- imported UASTs: %d | ||
- edited rows: %d` | ||
uastImportMsgError = `Some rows could not be properly updated: | ||
- UAST not inserted: %d | ||
- FilePair read errors: %d` | ||
) | ||
|
||
type file struct { | ||
side string | ||
blobID string | ||
} | ||
|
||
func (c *uastImportCmd) Execute(args []string) error { | ||
internalDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.InternalDBPath), false) | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
defer internalDb.Close() | ||
|
||
sourceDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.SourceDBPath), false) | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
defer sourceDb.Close() | ||
|
||
fixSourceDb(sourceDb) | ||
|
||
files, fileReadfailures := getFilesToUpdate(internalDb) | ||
log.Printf("Found %d blobs without UAST", len(files)) | ||
|
||
var rowsEdited, uastFails, uastsImported int64 | ||
for _, file := range files { | ||
affectedRows, err := importUastForBlobID(internalDb, sourceDb, file.side, file.blobID) | ||
if err != nil { | ||
log.Println(err) | ||
uastFails++ | ||
continue | ||
} | ||
|
||
rowsEdited += affectedRows | ||
uastsImported++ | ||
} | ||
|
||
log.Printf(uastImportMsgSuccess, uastsImported, rowsEdited) | ||
|
||
if fileReadfailures+uastFails > 0 { | ||
log.Fatal(fmt.Sprintf(uastImportMsgError, uastFails, fileReadfailures)) | ||
} | ||
|
||
return nil | ||
} | ||
|
||
type files map[string]file | ||
|
||
func (f *files) add(blobID string, side string, ignore bool) { | ||
if ignore { | ||
return | ||
} | ||
|
||
if _, ok := (*f)[blobID+"_"+side]; !ok { | ||
(*f)[blobID+"_"+side] = file{side: side, blobID: blobID} | ||
} | ||
} | ||
|
||
func getFilesToUpdate(internalDb dbutil.DB) (map[string]file, int64) { | ||
rows, err := internalDb.Query(filePairsWithoutUastQuery) | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
defer rows.Close() | ||
|
||
filesToImport := files{} | ||
var failures int64 | ||
for rows.Next() { | ||
var blobIDA, blobIDB string | ||
var hasUastA, hasUastB int | ||
err := rows.Scan(&blobIDA, &hasUastA, &blobIDB, &hasUastB) | ||
if err != nil { | ||
log.Printf("Failed to read row from internal DB\nerror: %v\n", err) | ||
failures++ | ||
continue | ||
} | ||
|
||
filesToImport.add(blobIDA, leftFile, hasUastA == 1) | ||
filesToImport.add(blobIDB, rightFile, hasUastB == 1) | ||
} | ||
|
||
return filesToImport, failures | ||
} | ||
|
||
func importUastForBlobID(internalDb dbutil.DB, sourceDb dbutil.DB, side string, blobID string) (int64, error) { | ||
uast, err := getUastByBlobID(sourceDb, side, blobID) | ||
if err != nil { | ||
return 0, fmt.Errorf("uast_%s could not be retrieved for blobID#%s; %s", side, blobID, err) | ||
} | ||
|
||
return setUastToBlobID(internalDb, side, blobID, uast) | ||
} | ||
|
||
func getUastByBlobID(sourceDb dbutil.DB, side string, blobID string) ([]byte, error) { | ||
var uast []byte | ||
if err := sourceDb.QueryRow(fmt.Sprintf(uastByBlobIDQuery, side, side), blobID).Scan(&uast); err != nil { | ||
return nil, err | ||
} | ||
|
||
return uast, nil | ||
} | ||
|
||
func setUastToBlobID(internalDb dbutil.DB, side string, blobID string, uast []byte) (int64, error) { | ||
res, err := internalDb.Exec(fmt.Sprintf(updateByBlobIDQuery, side, side, side), uast, blobID) | ||
if err != nil { | ||
return 0, fmt.Errorf("uast_%s could not be saved for blobID#%s; %s", side, blobID, err) | ||
} | ||
|
||
rows, _ := res.RowsAffected() | ||
if rows == 0 { | ||
return 0, fmt.Errorf("no uast_%s to be imported for blobID#%s", side, blobID) | ||
} | ||
|
||
return rows, nil | ||
} | ||
|
||
func fixSourceDb(sourceDb dbutil.DB) error { | ||
if _, err := sourceDb.Exec(fmt.Sprintf(indexAddBlobID, leftFile, leftFile)); err != nil { | ||
return fmt.Errorf("can not create index over blob_id_%s; %s", leftFile, err) | ||
} | ||
|
||
if _, err := sourceDb.Exec(fmt.Sprintf(indexAddBlobID, rightFile, rightFile)); err != nil { | ||
return fmt.Errorf("can not create index over blob_id_%s; %s", rightFile, err) | ||
} | ||
|
||
return nil | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
package main | ||
|
||
import ( | ||
"database/sql" | ||
"fmt" | ||
"log" | ||
|
||
"github.com/src-d/code-annotation/server/dbutil" | ||
) | ||
|
||
type diffRmColCmd struct { | ||
commandDesc | ||
Args struct { | ||
InternalDBPath string `description:"filepath to the internal SQLite database"` | ||
} `positional-args:"yes" required:"yes"` | ||
} | ||
|
||
var diffRmColOpts = diffRmColCmd{ | ||
commandDesc: commandDesc{ | ||
name: "diff-rm-col", | ||
shortDesc: "Remove diff column", | ||
longDesc: "Remove diff column from sqlite://InternalDBPath database", | ||
}, | ||
} | ||
|
||
// queries | ||
const ( | ||
fileFairsTable = "file_pairs" | ||
tmpTable = "file_pairs_tmp" | ||
|
||
cols = `id, | ||
blob_id_a, repository_id_a, commit_hash_a, path_a, content_a, hash_a, uast_a, | ||
blob_id_b, repository_id_b, commit_hash_b, path_b, content_b, hash_b, uast_b, | ||
score, experiment_id` | ||
|
||
createTmpTableQuery = `CREATE TABLE IF NOT EXISTS ` + tmpTable + ` ( | ||
id INTEGER, | ||
blob_id_a TEXT, repository_id_a TEXT, commit_hash_a TEXT, path_a TEXT, content_a TEXT, hash_a TEXT, | ||
blob_id_b TEXT, repository_id_b TEXT, commit_hash_b TEXT, path_b TEXT, content_b TEXT, hash_b TEXT, | ||
score DOUBLE PRECISION, experiment_id INTEGER, | ||
uast_a BLOB, uast_b BLOB, | ||
PRIMARY KEY (id), | ||
FOREIGN KEY(experiment_id) REFERENCES experiments(id))` | ||
|
||
fillTmpTableQuery = "INSERT INTO " + tmpTable + "(" + cols + ") SELECT " + cols + " FROM " + fileFairsTable | ||
disableIndexQuery = "PRAGMA foreign_keys=OFF" | ||
dropOldTableQuery = "DROP TABLE " + fileFairsTable | ||
renameTmpTableQuery = "ALTER TABLE " + tmpTable + " RENAME TO " + fileFairsTable | ||
checkIndexQuery = "PRAGMA foreign_key_check" | ||
enableIndexQuery = "PRAGMA foreign_keys=ON" | ||
) | ||
|
||
func (c *diffRmColCmd) Execute(args []string) error { | ||
internalDb, err := dbutil.Open(fmt.Sprintf(sqliteDSN, c.Args.InternalDBPath), false) | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
defer internalDb.Close() | ||
|
||
queries := []string{ | ||
createTmpTableQuery, | ||
fillTmpTableQuery, | ||
disableIndexQuery, | ||
dropOldTableQuery, | ||
renameTmpTableQuery, | ||
enableIndexQuery, | ||
} | ||
|
||
if err := execQueries(internalDb, queries); err != nil { | ||
log.Fatal(err) | ||
} | ||
|
||
log.Println("Deleted 'diff' column") | ||
return nil | ||
} | ||
|
||
func execQueries(db dbutil.DB, queries []string) (err error) { | ||
tx, err := db.Begin() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
defer func() { | ||
if err != nil { | ||
if rollbackErr := tx.Rollback(); rollbackErr != nil { | ||
err = fmt.Errorf("Error running migration; %s \nThe rollback failed; %s", err, rollbackErr) | ||
} | ||
} | ||
}() | ||
|
||
for _, query := range queries { | ||
if _, err := tx.Exec(query); err != nil { | ||
return err | ||
} | ||
} | ||
|
||
if err := ensureForeignKeys(tx); err != nil { | ||
return err | ||
} | ||
|
||
return tx.Commit() | ||
} | ||
|
||
func ensureForeignKeys(tx queryer) error { | ||
rows, err := tx.Query(checkIndexQuery) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
if rows.Next() { | ||
return fmt.Errorf("Foreign key constraints were violated") | ||
} | ||
|
||
return nil | ||
} | ||
|
||
type queryer interface { | ||
Query(string, ...interface{}) (*sql.Rows, error) | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow,
+325 LOC
for not a real migration to add 2 columns to 1 table? That seems a LOT.... sorry if I have misunderstood the purpose of this, please, do not hesitate to clarify that here.Do you think there might be a way to avoid having this complexity, but rather create a separate issue for applying a real migration tools instead?
As discussed IRL (I'm sorry if that was not articulated well enough) I think it's nice to have a DML SQL command in a small .sql file, to add a new columns. From my side, having a custom mechanism implemented to handle this task is 🙅♂️, as at least for now it seems like a lot of overhead (that needs to be maintained) to have that custom code that needs to be
go run
, etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two scripts.
First one adds the new columns. It's 58 cols because of having a command tu run the query. Most of the lines are command documentation.
Second one is 176 lines to assign uast to existent FilePairs. It is needed because I couldn't guarantee that provided table with UAST is the same that current one so it could be replaced; even more: I could guarantee that it is different, with different Id's and less cols. Transforming the provided one into the current one may require more work, and it is would not be reusable.
Because of that the command just look for Files without UAST, and assign them looking for them in the source table.
About having separated commands instead of a bunch of queies: it seems easier to test and reproduce the process making shorter the time needed for the migration itself.
The commands itself can be reused for creating more commands and being consistent between them. I copied the idea from Borges that provided a clan way to pack it's commands.
About the maintenance overhead, I can not see how since it won't be needed (same way that for other cli scripts in CAT, that are not supported nor prod ready)
If you really concern about having unmaintained code in the repo we can delete all these scripts tomorrow or as soon as we finish this task. Anyway I think it's better to validate that the modifications to be done are what we expect.
About creating an issue to add a migration tool, I agree on that, but as we agreed, it is something that should be maybe done in the future.