Skip to content

Conversation

@zhztheplayer
Copy link
Collaborator

@zhztheplayer zhztheplayer commented Aug 18, 2025

When reading data, either in dwrf or parquet, assuming we have data written with the following schema:

auto dataColumnType = ROW({"id", "amount"}, {BIGINT(), BIGINT()});
auto dataRowType = ROW({"order"}, {dataColumnType});

Meanwhile, we request the data with the following schema, which has a non-existing date: TIMESTAMP included:

auto requestedColumnType = ROW({"id", "date", "amount"}, {BIGINT(), TIMESTAMP(), BIGINT()});
auto requestedRowType = ROW({"order"}, {requestedColumnType});

Then, we request the following subfields by adding them to the column handle of order:

std::vector<common::Subfield> requiredSubfields;
requiredSubfields.emplace_back("order.id"); // Requires order.id
requiredSubfields.emplace_back("order.amount"); // Requires order.amount
connector::ColumnHandleMap assignments;
assignments["e"] = std::make_shared<HiveColumnHandle>(
  "order",
  HiveColumnHandle::ColumnType::kRegular,
  requestedColumnType,
  requestedColumnType,
  std::move(requiredSubfields));

Before the fix, Velox's answer is (wrong nulls for order.amount):

order.id order.date order.amount
1 null null
2 null null
3 null null

After the fix, Velox's answer is (correct):

order.id order.date order.amount
1 null 100
2 null 150
3 null 200

@netlify
Copy link

netlify bot commented Aug 18, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 78c70de
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/68a35bab4c54be0008640631

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2025
@zhztheplayer
Copy link
Collaborator Author

@Yuhta @rui-mo this small fix is independent to #5962, but will allow me to get rid of a similar issue in the new Gluten delta reader in advance.

zhztheplayer added a commit to zhztheplayer/velox4j that referenced this pull request Aug 18, 2025
zhztheplayer added a commit to zhztheplayer/velox4j that referenced this pull request Aug 18, 2025
zhztheplayer added a commit to boostscale/velox4j that referenced this pull request Aug 18, 2025
@Yuhta
Copy link
Contributor

Yuhta commented Aug 19, 2025

@zhztheplayer Adding a subfield in the middle of struct is not allowed, the new non-existing subfield must be appended to the end of the struct

@zhztheplayer
Copy link
Collaborator Author

zhztheplayer commented Aug 20, 2025

the new non-existing subfield must be appended to the end of the struct

@Yuhta Shall we consider this as Presto's principle? Because Spark doesn't restrict on positions of the non-existing fields. We can have more tests from Velox side to guard Spark's use cases if needed.

@Yuhta
Copy link
Contributor

Yuhta commented Aug 20, 2025

@zhztheplayer This is not a compute engine property, it's a data warehouse property. Currently Velox only supports this positional schema evolution model; to support the name based one, it's a lot of work and testing to make sure both cases works. I would suggest starting by enhance table evolution fuzzer to add the name based mode, and see where are the gaps.

@zhztheplayer
Copy link
Collaborator Author

@zhztheplayer This is not a compute engine property, it's a data warehouse property. Currently Velox only supports this positional schema evolution model; to support the name based one, it's a lot of work and testing to make sure both cases works. I would suggest starting by enhance table evolution fuzzer to add the name based mode, and see where are the gaps.

Thanks and understood. I'll revisit the suggestion soon.

@stale
Copy link

stale bot commented Nov 24, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Nov 24, 2025
@stale stale bot closed this Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants