Skip to content

Kafka Connect: Support VARIANT when record convert#15283

Open
seokyun-ha-toss wants to merge 19 commits intoapache:mainfrom
seokyun-ha-toss:support-variant-for-sink-connector
Open

Kafka Connect: Support VARIANT when record convert#15283
seokyun-ha-toss wants to merge 19 commits intoapache:mainfrom
seokyun-ha-toss:support-variant-for-sink-connector

Conversation

@seokyun-ha-toss
Copy link
Copy Markdown

Summary

Add support for converting arbitrary Java objects (e.g. Map<String, Object>, lists, primitives) into Iceberg Variant type in the Kafka Connect RecordConverter. Nested maps and lists are converted recursively so that structures like {"user": {"name": "alice", "address": {"city": "Seoul", "zip": "12345"}}} are correctly represented as a single Variant.

Motivation

Kafka Connect payloads often come as schema-less or JSON-like maps. To write them into Iceberg tables with a Variant column, the connector must convert these Java objects into the Variant format (metadata + value) and support nested maps/arrays without losing structure or key names.

Behaviour

Input Result
Primitives (String, int, long, boolean, etc.) Single metadata (empty) + corresponding Variant primitive.
Flat map e.g. {"a": 1, "b": "x"} One metadata with keys ["a", "b"], one ShreddedObject with two fields.
Nested map e.g. {"user": {"name": "alice", "address": {"city": "Seoul", "zip": "12345"}}} One shared metadata for all keys; root and nested objects as ShreddedObjects with consistent field IDs.
Lists Converted to VariantArray with elements converted recursively.
Already Variant / ByteBuffer Pass-through or Variant.from(ByteBuffer) where appropriate.

Relates

Thanks, Good Day!

@alexkot1394
Copy link
Copy Markdown

Hi @seokyun-ha-toss,
I've opened a PR that's trying to solve the same issue: #15498
I'm happy to help with your PR to get this change into main. Let me know if you want to collaborate on your PR or where you're at with the review comments above.

@seokyun-ha-toss
Copy link
Copy Markdown
Author

Hi @seokyun-ha-toss, I've opened a PR that's trying to solve the same issue: #15498 I'm happy to help with your PR to get this change into main. Let me know if you want to collaborate on your PR or where you're at with the review comments above.

Hi @alexkot1394, thanks for reaching out.

I stepped away for a bit to validate my changes in our production environment.

I took a look at your PR as well. From what I see, it treats Variant as a string primitive, whereas I believe it should be handled recursively to properly construct the Variant structure.

I’d prefer to continue working on this PR and incorporate the review comments as soon as possible.

Thanks again!

@seokyun-ha-toss
Copy link
Copy Markdown
Author

Hello, @danielcweeks, @emkornfield, and @alexkot1394, I'm ready to get a review on this. I've tested this for a month in production, and it works well! I can query the output Iceberg tables with Variant columns using Spark and Snowflake.

Please take a look at the latest updates. Thanks!

@brandonstanleyappfolio
Copy link
Copy Markdown

brandonstanleyappfolio commented Mar 17, 2026

👋 Hi @seokyun-ha-toss, thanks for introducing this change! Is there a reason Struct types aren't included in this PR?

@seokyun-ha-toss
Copy link
Copy Markdown
Author

👋 Hi @seokyun-ha-toss, thanks for introducing this change! Is there a reason Struct types aren't included in this PR?

Good catch! I missed the case of org.apache.kafka.connect.data.Struct. I will handle it as well. Thanks!

@seokyun-ha-toss
Copy link
Copy Markdown
Author

Hello, @brandonstanleyappfolio. I've added Struct type handling and unit tests in the following two commits:

Thanks for pointing this out!

@brandonstanleyappfolio
Copy link
Copy Markdown

@seokyun-ha-toss Thank you - I’ve tested the changes locally, and everything is working as expected!

@seokyun-ha-toss
Copy link
Copy Markdown
Author

@danielcweeks @emkornfield Gentle ping — would appreciate your review when you have a moment. Thank you!

(key, val) -> {
if (key != null && key instanceof String) {
object.put((String) key, objectToVariantValue(val, metadata));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should throw here if the key is null or the value is an instance of string?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(one way would be checking the ShreddedObject size is the same?

}
if (value instanceof Map) {
Map<?, ?> map = (Map<?, ?>) value;
ShreddedObject object = Variants.object(metadata);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a non-shredded version of Object, any performance differences?

assertThat(variant.value().asPrimitive().get()).isEqualTo(true);
}

@Test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this test add any value of the test below?


@Test
public void testConvertVariantValueFromMap() {
Variant variant = variantConverter().convertVariantValue(ImmutableMap.of("hello", 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have multiple value types? Also we should make sure we test the case when the key is null or not a string.

}

@Test
public void testConvertVariantValueFromMixedNested() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we tested nested values within a struct?

Copy link
Copy Markdown
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern is potentially dropping keys that a null or not a string and being defensive against data loss.

}
if (value instanceof Short) {
return Variants.of((Short) value);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider Dates?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted from the pr description that these are derived from JSON which doesn't have dates? But this is a good clarifying question.

Copy link
Copy Markdown

@brandonstanleyappfolio brandonstanleyappfolio Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! I can provide some context around my use case if that helps.

I am loading Confluent Avro-serialized data that contains timestamp fields. When those fields are deserialized by the connector, they are converted to java.util.Date, which this PR does not currently handle. I added the following code to RecordConverter.java to resolve the issue:

if (value instanceof Date) {
  return Variants.of(((Date) value).getTime());
}

I also see that @seokyun-ha-toss added support for dates here!

return Variants.ofTime(DateTimeUtil.microsFromTime((LocalTime) value));
}
if (value instanceof Date) {
int days = (int) (((Date) value).getTime() / 1000 / 60 / 60 / 24);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java.util.Date class is poorly named, i think we should be mapping this to Timestamp TZ so we don't lose precision. https://docs.oracle.com/javase/8/docs/api/java/util/Date.html for details.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good point. Do we need to add a check to determine the exact logical type?
I’ve opened this PR against @seokyun-ha-toss's branch. Let me know your thoughts @seokyun-ha-toss @emkornfield!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, really appreciate it @brandonstanleyappfolio , Thanks! I'll take a look!!

Sadly, I don't have much time to work on in these days.... I'll resume this work as soon as possible. Thanks for guys! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants