Kafka Connect: Support VARIANT when record convert#15283
Kafka Connect: Support VARIANT when record convert#15283seokyun-ha-toss wants to merge 19 commits intoapache:mainfrom
Conversation
…ersion methods for nested structures
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/TestRecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
|
Hi @seokyun-ha-toss, |
Hi @alexkot1394, thanks for reaching out. I stepped away for a bit to validate my changes in our production environment. I took a look at your PR as well. From what I see, it treats Variant as a string primitive, whereas I believe it should be handled recursively to properly construct the Variant structure. I’d prefer to continue working on this PR and incorporate the review comments as soon as possible. Thanks again! |
…Set for uniqueness
… for null, primitive types, lists, maps, and mixed types
|
Hello, @danielcweeks, @emkornfield, and @alexkot1394, I'm ready to get a review on this. I've tested this for a month in production, and it works well! I can query the output Iceberg tables with Variant columns using Spark and Snowflake. Please take a look at the latest updates. Thanks! |
|
👋 Hi @seokyun-ha-toss, thanks for introducing this change! Is there a reason |
Good catch! I missed the case of |
|
Hello, @brandonstanleyappfolio. I've added Thanks for pointing this out! |
|
@seokyun-ha-toss Thank you - I’ve tested the changes locally, and everything is working as expected! |
|
@danielcweeks @emkornfield Gentle ping — would appreciate your review when you have a moment. Thank you! |
| (key, val) -> { | ||
| if (key != null && key instanceof String) { | ||
| object.put((String) key, objectToVariantValue(val, metadata)); | ||
| } |
There was a problem hiding this comment.
seems like we should throw here if the key is null or the value is an instance of string?
There was a problem hiding this comment.
(one way would be checking the ShreddedObject size is the same?
| } | ||
| if (value instanceof Map) { | ||
| Map<?, ?> map = (Map<?, ?>) value; | ||
| ShreddedObject object = Variants.object(metadata); |
There was a problem hiding this comment.
Is there a non-shredded version of Object, any performance differences?
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Show resolved
Hide resolved
| assertThat(variant.value().asPrimitive().get()).isEqualTo(true); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
does this test add any value of the test below?
|
|
||
| @Test | ||
| public void testConvertVariantValueFromMap() { | ||
| Variant variant = variantConverter().convertVariantValue(ImmutableMap.of("hello", 1)); |
There was a problem hiding this comment.
should we have multiple value types? Also we should make sure we test the case when the key is null or not a string.
| } | ||
|
|
||
| @Test | ||
| public void testConvertVariantValueFromMixedNested() { |
There was a problem hiding this comment.
should we tested nested values within a struct?
emkornfield
left a comment
There was a problem hiding this comment.
My primary concern is potentially dropping keys that a null or not a string and being defensive against data loss.
| } | ||
| if (value instanceof Short) { | ||
| return Variants.of((Short) value); | ||
| } |
There was a problem hiding this comment.
Should we also consider Dates?
There was a problem hiding this comment.
I interpreted from the pr description that these are derived from JSON which doesn't have dates? But this is a good clarifying question.
There was a problem hiding this comment.
That makes sense! I can provide some context around my use case if that helps.
I am loading Confluent Avro-serialized data that contains timestamp fields. When those fields are deserialized by the connector, they are converted to java.util.Date, which this PR does not currently handle. I added the following code to RecordConverter.java to resolve the issue:
if (value instanceof Date) {
return Variants.of(((Date) value).getTime());
}
I also see that @seokyun-ha-toss added support for dates here!
| return Variants.ofTime(DateTimeUtil.microsFromTime((LocalTime) value)); | ||
| } | ||
| if (value instanceof Date) { | ||
| int days = (int) (((Date) value).getTime() / 1000 / 60 / 60 / 24); |
There was a problem hiding this comment.
The java.util.Date class is poorly named, i think we should be mapping this to Timestamp TZ so we don't lose precision. https://docs.oracle.com/javase/8/docs/api/java/util/Date.html for details.
There was a problem hiding this comment.
That’s a good point. Do we need to add a check to determine the exact logical type?
I’ve opened this PR against @seokyun-ha-toss's branch. Let me know your thoughts @seokyun-ha-toss @emkornfield!
There was a problem hiding this comment.
Wow, really appreciate it @brandonstanleyappfolio , Thanks! I'll take a look!!
Sadly, I don't have much time to work on in these days.... I'll resume this work as soon as possible. Thanks for guys! 🙏
Summary
Add support for converting arbitrary Java objects (e.g.
Map<String, Object>, lists, primitives) into Iceberg Variant type in the Kafka Connect RecordConverter. Nested maps and lists are converted recursively so that structures like{"user": {"name": "alice", "address": {"city": "Seoul", "zip": "12345"}}}are correctly represented as a single Variant.Motivation
Kafka Connect payloads often come as schema-less or JSON-like maps. To write them into Iceberg tables with a Variant column, the connector must convert these Java objects into the Variant format (metadata + value) and support nested maps/arrays without losing structure or key names.
Behaviour
{"a": 1, "b": "x"}["a", "b"], one ShreddedObject with two fields.{"user": {"name": "alice", "address": {"city": "Seoul", "zip": "12345"}}}Variant.from(ByteBuffer)where appropriate.Relates
Thanks, Good Day!