Skip to content

Better types for table columns (VARCHAR instead of TEXT) #824

Closed
@smacker

Description

@smacker

Currently query

describe table commits;

produces

+---------------------+----------+
| name                | type     |
+---------------------+----------+
| repository_id       | TEXT     |
| commit_hash         | TEXT     |
| commit_author_name  | TEXT     |
| commit_author_email | TEXT     |
| commit_author_when  | DATETIME |
| committer_name      | TEXT     |
| committer_email     | TEXT     |
| committer_when      | DATETIME |
| commit_message      | TEXT     |
| tree_hash           | TEXT     |
| commit_parents      | JSON     |
+---------------------+----------+

Which is technically correct but columns like repository_id, commit_hash, commit_author_name, ... better be VARCHAR

By semantic TEXT mean huge chunk of text and due to it unique. While VARCHAR holds reasonable amount of text and often repeats.
No wonder people rely on it. For example Superset makes columns gropable and filterable when they are VARCHAR but doesn't do it for TEXT.

Source{d} product has a bug because of it. And it might be useful in some other use cases too.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions