Skip to content

Online DDL: stale migration does not update completed_timestamp, leading to uncollected garbage #8499

@shlomi-noach

Description

@shlomi-noach

The following migration was found to be stale:

*************************** 1. row ***************************
                 id: 194
     migration_uuid: 7c35182a_e3fb_11eb_a192_4e9c3de84043
           keyspace: redacted
              shard: -
       mysql_schema: redacted
        mysql_table: usage
migration_statement: alter table `redacted` modify column redacted bigint default null
           strategy: gh-ost
            options: 
    added_timestamp: 2021-07-13 16:58:02
requested_timestamp: 0000-00-00 00:00:00
    ready_timestamp: 2021-07-14 03:35:35
  started_timestamp: 2021-07-14 03:35:35
 liveness_timestamp: 2021-07-14 03:36:36
completed_timestamp: NULL
  cleanup_timestamp: NULL
   migration_status: failed
           log_path: redacted:/tmp/online-ddl-7c35182a_e3fb_11eb_a192_4e9c3de84043-168931677
          artifacts: _7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_del,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_del,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_del,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_del,
            retries: 1
             tablet: redacted
     tablet_failure: 1
           progress: 0.520303
  migration_context: redacted:ccc3e80c-9797-477e-bd43-c7ee8f6ed02f
         ddl_action: alter
            message: stale migration
        eta_seconds: 11476
        rows_copied: 2065000
         table_rows: 0

Notice, however, that completed_timestamp remains NULL. Because of that, garbage collection on artifacts of this migration never runs (GC only runs 24 hours after completed_timestamp).

We need to:

  1. Update completed_timestamp when analyzing a stale migration, and
  2. (for existing migrations in this state) Update a NULL completed_timestamp where a migration is failed.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions