[Iceberg]Support bucket transform for TimeType#24829
Conversation
a5bfea8 to
a842dc2
Compare
a842dc2 to
3565011
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build. Thanks!
|
Thanks for the release note entry! Minor nits: |
|
|
||
| private static Block bucketTime(Block block, int count) | ||
| { | ||
| return bucketBlock(block, count, position -> bucketHash(MILLISECONDS.toMicros(TIME.getLong(block, position)))); |
There was a problem hiding this comment.
Just to confirm my understanding, we need to do this MILLISECONDS.toMicros here because the bucketing needs to be compatible with the bucketing of other systems? Is ther anything else stopping us from just doing bucketHash directly on the TIME.getLong(...)?
There was a problem hiding this comment.
To be compatible with the bucketing of other systems is one of the reason, but the more important reason is to be compatible with Iceberg's file planning with filter. When we query the partitioned table with a filter on column of type TimeType as follows:
select * from test_bucket_transform_on_time where a = time '01:02:03.123';
We will actually first use the predicate a = time '01:02:03.123' to execute Iceberg's file planning. Iceberg will use this time value internally to calculate partition values, and then scan some specific partitions, if the calculated partition value is not compatible, we won't find the corresponding data. So we have to make sure that the calculation of partition value is exactly the same as Iceberg lib, that is, use the micro seconds value to calculate the partition value.
So if we doing bucketHash directly on the TIME.getLong(...), we will find that the example query above get an empty result.
There was a problem hiding this comment.
This was roughly my thinking. Thanks for confirming
Description
In the Iceberg specification, columns of type TimeType support being specified bucket transforms. This PR support bucket transform for columns of type TimeType in presto.
Motivation and Context
Support as many types as possible for partition transforms according to Iceberg specification.
Impact
We can now use bucket transform on TimeType columns in Iceberg table.
Test Plan
Contributor checklist
Release Notes