Closed
Description
Currently query
describe table commits;
produces
+---------------------+----------+
| name | type |
+---------------------+----------+
| repository_id | TEXT |
| commit_hash | TEXT |
| commit_author_name | TEXT |
| commit_author_email | TEXT |
| commit_author_when | DATETIME |
| committer_name | TEXT |
| committer_email | TEXT |
| committer_when | DATETIME |
| commit_message | TEXT |
| tree_hash | TEXT |
| commit_parents | JSON |
+---------------------+----------+
Which is technically correct but columns like repository_id
, commit_hash
, commit_author_name
, ... better be VARCHAR
By semantic TEXT mean huge chunk of text and due to it unique. While VARCHAR
holds reasonable amount of text and often repeats.
No wonder people rely on it. For example Superset makes columns gropable and filterable when they are VARCHAR
but doesn't do it for TEXT
.
Source{d} product has a bug because of it. And it might be useful in some other use cases too.