Skip to content

feat: Scala 2.13#1294

Merged
tchow-zlai merged 6 commits intomainfrom
tchow/serverless-classpath
Nov 25, 2025
Merged

feat: Scala 2.13#1294
tchow-zlai merged 6 commits intomainfrom
tchow/serverless-classpath

Conversation

@tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Nov 14, 2025

Summary

  • Rewriting some code to be 2.12/2.13 cross compatible
  • Implementing some mill modules to enable cross compilation
  • Wiring mill actions into our github workflows to start build and publishing through mill rails.

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Infrastructure

    • Added cross-build support for Scala 2.12.18 & 2.13.17, GCS publish helpers, and updated CI workflows to build/test/publish versioned artifacts.
  • API

    • Normalized public collection types to scala.Seq; expanded StructType with iteration, field-lookup and casting helpers; a handful of public method signatures adjusted.
  • Chores

    • Widespread import cleanups, collection materialization fixes, Java/Scala interop refinements, minor logging/config tweaks, and test updates.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

Walkthrough

Normalized collection conversions and Seq usage, added StructType Seq helpers, adjusted Java↔Scala fetcher interop, refined BigTable/KVStore materialization, introduced Mill cross-build + GCS publishing, and updated CI workflows to run versioned Mill targets and publish versioned artifacts.

Changes

Cohort / File(s) Summary
Collection import & Seq cleanup
**/src/**/**/*.scala, **/src/**/**/test/**/*.scala
Removed many unused scala.collection.Seq imports; tightened imports to scala.collection.mutable or fully-qualified scala.Seq where required.
Collection conversion & materialization
**/SparkExpressionEval*.scala, **/JoinSourceQueryFunction.scala, **/Join.scala, **/SourceJob.scala, **/TileCodec.scala, **/TableDependencies.scala, **/MonolithJoinPlanner.scala, **/.../QueryUtils.scala
Replaced .asScala with .toScala where custom converters used, added .toSeq to materialize immutable Seq at many call sites, and qualified public signatures to scala.Seq where needed.
ColumnAggregator iterator handling
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
Imported ai.chronon.api.ScalaJavaConversions.IteratorOps; VectorDispatcher.toInputIterator now explicitly supports java.util.ArrayList via .toScala, handles scala.collection.Seq[Any], filters nulls, maps to typed inputs, and throws on unsupported input types.
StructType API additions
api/src/main/scala/ai/chronon/api/DataType.scala
StructType now extends Seq[StructField]; unpack returns Seq[(String,DataType)]; added iterator, stringPrefix, typeOf, castArr, and cast helpers.
Java↔Scala interop & fetchers
online/src/main/scala/ai/chronon/online/Api.scala, online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java, online/src/main/java/ai/chronon/online/JavaFetcher.java, spark/src/main/scala/.../MockApi.scala
Converted ExternalSourceHandler to a trait with scala.Seq signatures; JavaFetcher removed Seq helper and now works with Java List and explicit conversions; updated Java/Scala handler implementations and tests.
Fetcher, metadata & model transforms
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala, online/.../MetadataStore.scala, online/.../ModelTransformsFetcher.scala
Request/response APIs now use scala.Seq; replaced deprecated mapValues/filterKeys; mutable accumulators converted to .toSeq before returning.
KV stores & BigTable tweaks
spark/.../InMemoryKvStore.scala, online/.../HTTPKVStore.scala, cloud_gcp/.../BigTableKVStoreImpl.scala, cloud_gcp/.../BigTableMetricsKvStore.scala
Public signatures use scala.Seq[...]; internal collections finalized with .toSeq; byte extraction normalized via .toByte.
cloud_gcp runtime tweaks & logging
cloud_gcp/src/main/scala/.../DataprocServerlessSubmitter.scala, cloud_gcp/.../DelegatingBigQueryMetastoreCatalog.scala
Removed some log lines, bumped RuntimeConfig version, added commented wiring for peripherals, introduced SLF4J logger field in BigQuery catalog.
Mill cross-builds & GCS publishing
build.mill, many */package.mill (aggregator, api, cloud_aws, cloud_gcp, flink, flink_connectors, online, service, service_commons, spark, python, etc.)
Introduced Cross[...] per-module cross-builds, default Scala versions (2.12.18, 2.13.17), added GcsPublishModule, pomSettingsFor, publishVersion behavior, and updated test wiring to use crossValue/versioned targets.
GitHub Actions → versioned workflows
.github/workflows/*.yaml
Added reusable workflows and new CI workflows invoking Mill targets parameterized by SCALA_VERSION; artifact upload paths now reference versioned output directories.
Misc runtime & tests
flink/..., spark/..., many tests
Various small fixes: array equality in tests switched to .toSeq, some import removals, small type tweaks to Seq qualifiers, and test updates to use scala.Seq/.toSeq.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant WF as Reusable Workflow
    participant Mill as Mill
    participant Mod as Cross Module
    Note over GH,WF: caller provides scala-version / SCALA_VERSION
    GH->>WF: workflow_call(scala-version)
    WF->>Mill: run targets (module[SCALA_VERSION].assembly/test)
    Mill->>Mod: build/test for crossValue
    Mod-->>Mill: artifact / test results
    Mill-->>GH: upload artifacts to versioned paths (out/<module>/<SCALA_VERSION>/...)
Loading
sequenceDiagram
    participant Java as Java caller
    participant JavaFetcher as JavaFetcher
    participant Trait as ExternalSourceHandler (trait)
    participant ScalaImpl as Scala fetcher impl
    Java->>JavaFetcher: call with Java List<Req>
    JavaFetcher->>Trait: convert Java List -> scala.Seq, invoke fetch
    Trait->>ScalaImpl: fetch(requests: scala.Seq[Request])
    ScalaImpl-->>Trait: Future[scala.Seq[Response]]
    Trait-->>JavaFetcher: convert scala.Seq -> Java List
    JavaFetcher-->>Java: CompletableFuture<List<Response>>
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Attention points:
    • Mill cross-build refactor and crossValue propagation across many package.mill files.
    • Java↔Scala interop and signature shifts (JavaFetcher / JavaExternalSourceHandler / MockApi).
    • StructType signature change and new helper methods in DataType.scala.
    • Collection conversion sites in joins / kv-store / batch flows (materialization and type-qualification).

Poem

Seqs trimmed, converters tuned, builds now cross the sea,
Java Lists and Scala traits waltz in tidy harmony.
Artifacts find versioned homes, tests run calm and clear,
Cleaner imports, safer casts — a leaner tree this year.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title 'feat: Scala 2.13' clearly and concisely identifies the main feature: adding Scala 2.13 support through cross-compilation efforts.
Description check ✅ Passed Description covers the core changes (cross-compatibility rewriting, Mill module implementation, GitHub workflow integration) but all checklist items remain unchecked despite the work described.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tchow/serverless-classpath

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ff8208 and 22a5166.

📒 Files selected for processing (1)
  • cloud_gcp/package.mill (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • cloud_gcp/package.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: build
  • GitHub Check: python_lint
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
cloud_gcp/package.mill (3)

2-2: Cross module wiring and test Scala version look consistent

object package as Cross[CloudGcpModule](scalaVersions) with CloudGcpModule, the explicit Cross.Module[String] on CloudGcpModule, and test.scalaVersion = crossValue / cross-aware moduleDeps all align with the cross-build pattern you’re using elsewhere. The added mill.api._ import is appropriate for Task usage here. Just ensure build.Constants.defaultScalaVersion is included in build.Constants.scalaVersions so the default, bracket-less cloud_gcp.test points at a valid cross.

Also applies to: 7-15, 60-64


25-29: Shading removal avoids BigQuery relocation breakage; re-check Dataproc classpath

Commenting out the assemblyRules relocation for org.apache.iceberg.gcp.bigquery should resolve the earlier runtime mismatch where code imported the unshaded package. Please re-verify that Dataproc serverless no longer hits the original classpath conflict that motivated this relocation.


57-57: Jackson jsr310 module pinning matches your exclusion strategy

Adding jackson-datatype-jsr310:2.15.2 and forcing its version keeps your Jackson stack consistent with the excludeJackson helper and the other pinned Jackson artifacts. Double-check that other modules use the same Jackson versions to avoid subtle serialization issues across JVM boundaries.


Comment @coderabbitai help to get the list of available commands and usage tips.

@tchow-zlai tchow-zlai force-pushed the tchow/serverless-classpath branch from 0b5d10e to 1015074 Compare November 14, 2025 19:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/src/main/scala/ai/chronon/api/DataType.scala (1)

244-261: cast() method lacks test coverage; castArr() is well-tested.

The castArr method has solid coverage in AvroConversionsListTest.scala with tests for mixed types, empty lists, and nulls. However, the cast method (returning Map[String, AnyRef]) has no dedicated tests. Add tests for cast covering missing keys, null values, and type conversions via DataType.castTo.

🧹 Nitpick comments (6)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (1)

27-27: Logger declared but never used.

The logger field and import are unused. Consider removing or add log statements if logging was intended.

-import org.slf4j.LoggerFactory
-
-  @transient private lazy val logger = LoggerFactory.getLogger(getClass)
-

Also applies to: 47-48

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2)

113-113: Hardcoded cluster path.

The Dataproc cluster path is hardcoded. If this config is enabled in the future, make it configurable via constructor parameters or environment variables.


133-133: Commented-out code.

Either enable, remove, or document with a TODO why this is disabled.

spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)

68-72: Consider using logger instead of println

DfWithStats.prunePartitions logs via println; switching to the existing logger for this diagnostic would keep logging consistent and easier to control in production.

api/src/main/scala/ai/chronon/api/DataType.scala (1)

240-242: Add tests for typeOf method.

Verification confirms typeOf is a public method actively used in OnlineDerivationUtil.scala (lines 120, 125) but has no test coverage. Consider adding tests to the DataType test suite.

spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (1)

150-150: Consider adding .toSeq for consistency.

While mutable.Buffer is a valid Seq, adding .toSeq here would match the pattern used at line 189 and provide an immutable collection.

-    val sources = groupByConf.sources.toScala
+    val sources = groupByConf.sources.toScala.toSeq
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea2ba74 and 1015074.

📒 Files selected for processing (102)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala (0 hunks)
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala (0 hunks)
  • build.mill (1 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (0 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (0 hunks)
  • cloud_gcp/package.mill (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2 hunks)
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (1 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (0 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1 hunks)
💤 Files with no reviewable changes (43)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
🧰 Additional context used
🧠 Learnings (26)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • build.mill
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • cloud_gcp/package.mill
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.

Applied to files:

  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2025-09-26T01:29:32.984Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1182
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala:169-191
Timestamp: 2025-09-26T01:29:32.984Z
Learning: In BigTableMetricsKvStore, schema keys (ending with "#schema") are always queried alone in multiGet requests and never mixed with regular metric keys, so there's no concern about timestamp filter leakage between different key types within the same query.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-09-24T00:22:38.257Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1169
File: spark/src/main/scala/ai/chronon/spark/batch/iceberg/IcebergPartitionStatsExtractor.scala:79-94
Timestamp: 2025-09-24T00:22:38.257Z
Learning: BigTableMetricsKvStore defaults tsMillis to System.currentTimeMillis() when not provided in PutRequest, so PutRequests without explicit timestamps are handled correctly by the KV store implementation.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-04-21T15:10:40.819Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-17T01:02:40.431Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In `DynamoDBKVStoreImpl.scala`, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic `Exception` type in methods like `extractListValues`.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-01-24T23:55:40.650Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-07T15:21:50.787Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-07T15:21:50.787Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2025-05-02T16:19:11.001Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.

Applied to files:

  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
📚 Learning: 2024-10-31T18:27:44.973Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
📚 Learning: 2024-10-15T15:33:22.265Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
🧬 Code graph analysis (17)
api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (1)
flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (1)
  • ProjectedEvent (95-95)
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)
spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (4)
  • StagingQuery (16-94)
  • StagingQuery (114-183)
  • from (152-164)
  • compute (84-93)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
  • fetch (30-34)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)
  • Fetcher (55-147)
  • Fetcher (152-685)
online/src/main/scala/ai/chronon/online/TileCodec.scala (3)
spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1)
  • buildRowAggregator (134-153)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • unWindowed (283-301)
  • unpack (257-279)
api/src/main/scala/ai/chronon/api/DataType.scala (1)
  • unpack (235-235)
spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • query (395-403)
  • timeWheres (725-730)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
  • multiGet (108-118)
  • multiPut (308-359)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (36-53)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
  • applyMapping (796-799)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
  • multiGet (108-118)
  • multiPut (308-359)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
  • multiGet (115-156)
  • multiPut (199-219)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (36-53)
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
  • runTimestampChecks (459-488)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
  • inputDf (479-518)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • table (439-439)
  • rightToLeft (838-848)
spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
  • keyColumns (1004-1009)
spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (1)
api/src/main/scala/ai/chronon/api/ColorPrinter.scala (1)
  • red (17-17)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
  • run (541-569)
spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (1)
  • run (28-80)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
  • JoinUtils (38-431)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
api/src/main/scala/ai/chronon/api/Row.scala (1)
  • getAs (34-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: cloud_aws_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: flink_tests
  • GitHub Check: online_tests
  • GitHub Check: batch_tests
  • GitHub Check: stats_tests
  • GitHub Check: udafs_tests
  • GitHub Check: streaming_tests
  • GitHub Check: kv_store_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: build
  • GitHub Check: enforce_triggered_workflows

def fromJoinSources(sources: java.util.List[api.Source]): Seq[TableDependency] = {
Option(sources)
.map(_.asScala)
.map(_.toScala.toSeq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Inconsistent conversion pattern.

Line 13 uses .asScala.toSeq for the same Java→Scala List conversion. Update line 13 to match this pattern for consistency.

#!/bin/bash
# Verify line 13 usage
rg -n "tableDependencies.*asScala" api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
🤖 Prompt for AI Agents
In api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala around line
131 (and also update the earlier occurrence at line 13), the Java→Scala List
conversion uses .toScala.toSeq which is inconsistent with the .asScala.toSeq
pattern used elsewhere; change the .toScala.toSeq call(s) to .asScala.toSeq so
both conversions use the same .asScala.toSeq pattern for consistency.

build.mill Outdated
Comment on lines +17 to +18
// val scalaVersion = "2.12.18"
val scalaVersion = "2.13.17"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Scala 2.13.17 upgrade thoroughly.

This is a major version upgrade from 2.12.18 to 2.13.17 with significant breaking changes, particularly in the collections library. The PR checklist shows no tests, CI coverage, integration testing, or documentation updates.


🏁 Script executed:

#!/bin/bash
# Verify that the build succeeds with Scala 2.13 and all tests pass

echo "=== Checking if build compiles with Scala 2.13.17 ==="
mill all __.compile

echo "=== Checking if tests pass ==="
mill all __.test

echo "=== Checking for binary compatibility issues ==="
# Look for common Scala 2.13 migration issues
rg -n "JavaConverters" --type scala | head -20
rg -n "scala.collection.Seq" --type scala | head -20

Length of output: 2504



🏁 Script executed:

#!/bin/bash
# Detailed analysis of Scala 2.13 migration status

echo "=== Count of deprecated JavaConverters imports ==="
rg -c "scala.collection.JavaConverters" --type scala | wc -l
rg "scala.collection.JavaConverters" --type scala | wc -l

echo ""
echo "=== Check for new scala.jdk.CollectionConverters usage ==="
rg "scala.jdk.CollectionConverters" --type scala | wc -l

echo ""
echo "=== Affected files count ==="
rg "scala.collection.JavaConverters" --type scala --files-with-matches | wc -l

echo ""
echo "=== Check build.mill for scala version lock-in ==="
cat build.mill | grep -A 20 "scalaVersion"

Length of output: 2244


🏁 Script executed:

#!/bin/bash
# Search for Scala 2.13 migration guide info
# Verify the breaking change status
echo "Scala 2.13 collections migration is a known breaking change"

Length of output: 59


Migrate all deprecated JavaConverters imports before upgrading to Scala 2.13.17.

Found 42 scala.collection.JavaConverters imports across 41 files that must be updated to scala.jdk.CollectionConverters. While 33 files have already been partially migrated, the remaining imports will cause compilation failures with the -deprecation flag. Complete the migration before merging this version upgrade.

Affected files include: spark/src/main/scala/ai/chronon/spark/Driver.scala, spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala, spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala, spark/src/main/scala/ai/chronon/spark/JoinBase.scala, spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala, online/src/main/scala/ai/chronon/online/TileCodec.scala, online/src/main/scala/ai/chronon/online/TopicChecker.scala, online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala, and others.

🤖 Prompt for AI Agents
In build.mill around lines 17-18, you're about to upgrade Scala to 2.13.17 but
there are 42 imports of scala.collection.JavaConverters across ~41 files that
must be migrated to scala.jdk.CollectionConverters; update each import
(including wildcard forms) to import scala.jdk.CollectionConverters._, keep
using .asScala/.asJava extension methods as-is, ensure any partially-migrated
files now use the new import consistently, run a project-wide search-and-replace
for "scala.collection.JavaConverters" → "scala.jdk.CollectionConverters" and
verify compilation with the -deprecation flag so no remaining deprecated imports
remain.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)

108-117: Parameterize hardcoded cluster reference.

Line 114 hardcodes project ID and cluster name. Use class fields projectId and region instead, or make cluster configurable.

-//          .setDataprocCluster("projects/canary-443022/regions/us-central1/clusters/zipline-canary-cluster")
+//          .setDataprocCluster(s"projects/$projectId/regions/$region/clusters/${sparkHistoryClusterName}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b129d4 and 36d5abf.

📒 Files selected for processing (2)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: build
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: online_tests
  • GitHub Check: streaming_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: stats_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: kv_store_tests
  • GitHub Check: groupby_tests
  • GitHub Check: udafs_tests
  • GitHub Check: flink_tests
  • GitHub Check: join_tests
  • GitHub Check: spark_tests
  • GitHub Check: batch_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2)

134-134: Consistent scaffolding approach.

Appropriately commented to match the peripheral config definition above.


106-106: RuntimeConfig version "2.3" is valid and supports Scala 2.13.

Version 2.3 is GA and includes Scala 2.13. The change at line 106 is correct.

Copy link
Contributor

@piyush-zlai piyush-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - can stamp when we've got the dual build + publish going

.setVersion("1.2")
.setVersion("2.3")

// Add PeripheralsConfig if needed. This should be set in EnvironmentConfig below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert? Or delete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's useful as a reference, it's not well documented I had to reverse engineer this from CLI behavior

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
api/package.mill (1)

7-9: Same double-cross pattern for api tests

Here api is already Cross[ApiModule](scalaVersions), and object test is another Cross[TestModule](scalaVersions); same concern as cloud_gcp about unnecessary extra cross combinations and potential version mismatches. If you don’t need mixed pairs, collapsing to a single Scala-axis for tests would be simpler.

Also applies to: 47-51

online/package.mill (1)

38-41: online test is also double-crossed

The online module is Cross[OnlineModule](scalaVersions) and object test is Cross[TestModule](scalaVersions), mirroring the double-cross test pattern noted for cloud_gcp/api. Simplifying to a single Scala cross-axis for tests would avoid redundant combinations unless you explicitly need mixed-version pairs.

service/package.mill (1)

7-9: Redundant Cross.Module[String] on ServiceModule

As with OnlineModule, ServiceModule inherits Cross.Module[String] even though build.BaseModule already extends CrossScalaModule; you can likely drop Cross.Module[String] here and rely on the CrossScalaModule machinery alone.

🧹 Nitpick comments (6)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)

64-72: Misleading name and duplicate implementation.

JavaPlusOneExternalHandler no longer uses Java types and duplicates PlusOneExternalHandler exactly. Consider renaming or removing if not needed for distinct test scenarios.

online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

22-22: Unify Seq usage for clarity

ContextualHandler.fetch uses scala.Seq while fetchRequests/responsesByNameF use bare Seq; all compile, but it’s a bit inconsistent. Consider standardizing on scala.Seq (or a single imported alias) throughout this file for readability and future maintenance.

Also applies to: 30-30

online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1)

34-56: Re-add @Override on fetch for interface contract safety

Now that this is implements ExternalSourceHandler, annotating fetch with @Override will ensure the compiler enforces the exact trait signature going forward.

-    //java friendly method
-    public abstract CompletableFuture<java.util.List<JavaResponse>> fetchJava(java.util.List<JavaRequest> requests);
-
-    public Future<Seq<Fetcher.Response>> fetch(Seq<Fetcher.Request> requests) {
+    //java friendly method
+    public abstract CompletableFuture<java.util.List<JavaResponse>> fetchJava(java.util.List<JavaRequest> requests);
+
+    @Override
+    public Future<Seq<Fetcher.Response>> fetch(Seq<Fetcher.Request> requests) {
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)

143-151: convertJavaRequestList logic is sound; naming nit only

Building a java.util.List<Fetcher.Request> and instrumenting off the original Java requests is correct; you might optionally rename scalaRequests to requestsScalaSeq/scalaRequestList to avoid confusion now that it’s a Java List.

cloud_gcp/package.mill (1)

8-12: Avoid double-crossing cloud_gcp tests unless you really need it

cloud_gcp is already Cross[CloudGcpModule](scalaVersions), and object test is a second Cross[TestModule](scalaVersions), so you get 2×2 Scala-version combinations with moduleDeps keyed only by the inner crossValue. Unless you intentionally want mixed pairs (e.g. cloud_gcp[2.12].test[2.13] depending on cloud_gcp(2.13)), consider simplifying to a single axis (e.g. object test extends TestModule) so each Scala version is tested once.

Also applies to: 22-25, 54-79

online/package.mill (1)

8-9: OnlineModule probably doesn’t need to extend Cross.Module[String]

Because build.BaseModule already extends CrossScalaModule (which is designed for use with Cross[...]), inheriting Cross.Module[String] again in OnlineModule is likely redundant; dropping it would simplify the hierarchy without changing behaviour.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36d5abf and 3af26eb.

📒 Files selected for processing (16)
  • aggregator/package.mill (2 hunks)
  • api/package.mill (2 hunks)
  • build.mill (4 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • flink/package.mill (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (1 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1 hunks)
  • service/package.mill (2 hunks)
  • service_commons/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • online/src/main/scala/ai/chronon/online/Api.scala
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • cloud_gcp/package.mill
  • cloud_aws/package.mill
  • spark/package.mill
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • build.mill
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • online/src/main/java/ai/chronon/online/JavaFetcher.java
🧬 Code graph analysis (2)
online/src/main/java/ai/chronon/online/JavaFetcher.java (3)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
  • fetchGroupBys (182-184)
  • fetchJoin (186-233)
  • fetchJoinV2 (273-315)
  • FeaturesResponseType (50-53)
  • AvroString (80-80)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (5-96)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
  • listJoins (166-218)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
  • fetch (48-53)
  • fetch (57-61)
  • fetch (65-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: udafs_tests
  • GitHub Check: spark_tests
🔇 Additional comments (14)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)

19-27: LGTM on import updates.

Import consolidation and explicit Scala 2.13 types look good.


47-54: LGTM on Scala 2.13 collection types.

Signature correctly uses scala.Seq per the migration.


56-62: LGTM on handler type migration.

Correctly extends ExternalSourceHandler and uses scala.Seq.

online/src/main/java/ai/chronon/online/JavaFetcher.java (3)

125-141: Manual Seq→List conversion in convertResponsesWithTs looks correct

Iterating resps.responses().toList() with a scala.collection.Iterator into a pre-sized ArrayList preserves semantics and is efficient enough; metrics wiring is unchanged.


153-182: Java↔Scala conversions for fetch methods look consistent

fetchGroupBys, fetchJoin, and fetchJoinBase64Avro now uniformly:

  • convert List<JavaRequest>List<Fetcher.Request> → Scala Seq via ScalaJavaConversions.toScala
  • call the Scala fetcher
  • map back via convertResponsesWithTs

Types and control flow align with the Scala Fetcher API.


184-188: listJoins conversion path is clean

Wrapping metadataStore().listJoins with FutureConverters.toJava and ScalaJavaConversions::toJava is a straightforward Scala Seq → Java List<String> bridge; no issues.

service_commons/package.mill (1)

6-9: Crossed service_commons wiring looks good

Cross[ServiceCommonsModule](scalaVersions) and the crossValue-based moduleDeps for main and test modules are consistent and look correct.

Also applies to: 21-23

build.mill (1)

1-1: Scala cross-version setup looks appropriate

Using Constants.scalaVersions = Seq("2.12.18", "2.13.17") together with BaseModule/BaseTestModule extending CrossScalaModule is a sound basis for the new Cross[...] modules; just ensure the full cross build (__.compile/__.test across both versions) passes in CI before merging.

Also applies to: 17-21, 115-115, 147-148

service/package.mill (1)

48-55: Service JUnit4 test wiring looks fine

The JUnit4 test module correctly depends on build.service(crossValue) and just layers its own test dependencies on top of super.mvnDeps(), which is a reasonable pattern.

flink/package.mill (1)

2-2: Flink cross-build wiring LGTM

Import switch to mill.api._, Cross[FlinkModule] plus crossValue-based moduleDeps, and the Cross-based test module all look consistent with the new pattern; nothing blocking here.

Also applies to: 6-8, 11-11, 41-43

spark/package.mill (1)

2-2: Spark cross-module + test wiring looks good

Cross[SparkModule]/TestModule, crossValue-based moduleDeps, and reuse of the common test fork args/framework are coherent and align with the other modules.

Also applies to: 7-10, 27-30, 31-31, 33-33

cloud_aws/package.mill (1)

2-2: Cloud AWS cross-build + tests LGTM

Cross[CloudAwsModule]/TestModule, crossValue-based moduleDeps, and the test-only container deps are wired sensibly and match the pattern used elsewhere.

Also applies to: 6-9, 24-30

flink_connectors/package.mill (1)

2-2: Verify google-cloud-pubsub version alignment across modules

Version skew confirmed: flink_connectors main uses 1.141.2 while test uses 1.114.7 (27 versions behind); cloud_gcp uses 1.134.2. Test exclusions also differ (main excludes jackson-core + jackson-databind; test excludes only jackson-core). Confirm whether version differences are intentional (e.g., test compatibility constraints) or should be aligned to avoid classpath conflicts.

aggregator/package.mill (1)

5-9: Review comment is incorrect

TestModule can access crossValue through build.BaseTestModule, which extends Mill's CrossScalaModule—a framework class that provides Cross.Module support. No explicit Cross.Module[String] mixin is needed; the code compiles and works as designed.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cloud_aws/package.mill (1)

12-22: Consider centralizing AWS SDK version

The repeated "2.30.13" across AWS SDK deps can be hoisted into build.Constants (e.g., awsSdkVersion) to simplify upgrades and keep versions in sync.

flink_connectors/package.mill (1)

23-24: Align Netty HTTP/2 version with other modules

Here netty-codec-http2 is pinned to 4.1.124.Final, while flink/package.mill uses 4.1.125.Final. Consider aligning these to a single version to avoid pulling multiple Netty variants onto the classpath.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32bfb23 and e55d4aa.

📒 Files selected for processing (9)
  • aggregator/package.mill (2 hunks)
  • api/package.mill (2 hunks)
  • build.mill (3 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • flink/package.mill (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • spark/package.mill
  • cloud_aws/package.mill
  • cloud_gcp/package.mill
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • build.mill
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: stats_tests
  • GitHub Check: udafs_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: batch_tests
  • GitHub Check: analyzer_tests
🔇 Additional comments (9)
build.mill (2)

17-17: Shared scalaVersions constant is a good cross-build anchor.

Defining Constants.scalaVersions = Seq("2.12.18", "2.13.17") centralizes the axes and lines up with the Cross modules in the packages; looks good—just confirm there’s no remaining code still reading an old single scalaVersion constant.


111-141: All modules extending BaseModule are properly configured via Cross[...]—verification complete.

All three modules that mix in build.BaseModule are already correctly wrapped in Cross[...](build.Constants.scalaVersions):

  • service_commons/package.mill: object package extends Cross[ServiceCommonsModule](build.Constants.scalaVersions)
  • cloud_gcp/package.mill: object package extends Cross[CloudGcpModule](build.Constants.scalaVersions)
  • api/package.mill: object package extends Cross[ApiModule](build.Constants.scalaVersions)

Test modules use appropriate base traits (BaseTestModule/BaseJUnitTestModule) with def scalaVersion = crossValue. The switch to CrossScalaModule is sound and all users are correctly configured.

api/package.mill (1)

7-9: API cross-module and test wiring look correct.

Cross[ApiModule](build.Constants.scalaVersions) plus test's scalaVersion = crossValue and moduleDeps = Seq(build.api(crossValue)) give a clean 2.12.18/2.13.17 split with matching deps. Verify both variants compile and tests run across Scala versions.

Also applies to: 47-49

spark/package.mill (1)

7-11: Cross-module wiring verified—code is correct.

All referenced modules (aggregator, api, online) exist; crossValue propagates consistently through moduleDeps and test configuration. No issues.

online/package.mill (1)

6-10: Cross-module wiring verified—no structural issues found.

All referenced symbols are properly defined: Constants in build.mill, BaseModule trait at build.mill:112, and cross-aware aggregator/api modules exist. OnlineModule follows the same pattern as SparkModule, FlinkModule, and others—Cross.Module[String] with crossValue-parameterized dependencies, scalaVersion() for version-specific deps, and test module with scalaVersion = crossValue. Structure is sound.

aggregator/package.mill (1)

6-9: Aggregator cross-module setup is correct.

Code inspection confirms:

  • build.Constants.scalaVersions = Seq("2.12.18", "2.13.17") exists
  • All modules declare package build.X + Cross[Module](scalaVersions) pattern
  • build.api(crossValue) and build.aggregator(crossValue) resolve via Mill's Cross convention
  • Version threading in tests (scalaVersion = crossValue) is consistent
  • No circular dependencies or missing references

Pattern is applied uniformly across all modules.

flink/package.mill (1)

6-12: Flink cross-module + test wiring looks good

FlinkModule correctly extends Cross.Module[String], and crossValue usage in moduleDeps and test (scalaVersion/moduleDeps) matches the pattern used elsewhere. No changes requested.

Also applies to: 41-44

cloud_aws/package.mill (1)

6-10: CloudAws cross-module + tests are wired correctly

CloudAwsModule extending Cross.Module[String] and using crossValue in moduleDeps and test (scalaVersion/moduleDeps) looks consistent and compile-safe.

Also applies to: 24-28

flink_connectors/package.mill (1)

6-12: Flink connectors cross-module + tests look consistent

FlinkConnectorsModule extending Cross.Module[String] and using crossValue in moduleDeps and test wiring is consistent with the other modules; no functional issues spotted.

Also applies to: 26-29

@tchow-zlai tchow-zlai force-pushed the tchow/serverless-classpath branch from 607c712 to 62fc7f2 Compare November 18, 2025 21:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)

17-17: Optional: Remove redundant .toSeq calls.

Seq(...).toSeq is redundant—Seq(...) already returns a Seq.

Example for line 17:

-    ).toSeq.toDF("id", "name", "value", "timestamp")
+    ).toDF("id", "name", "value", "timestamp")

Also applies to: 23-23, 95-95, 101-101, 243-243, 249-249

spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (1)

189-189: Conversion correct; consider reusing sources variable.

The .toScala.toSeq conversion is correct. Optionally, reuse the sources variable from line 150 to avoid redundant conversion: runTimestampChecks(sources.toSeq, groupBy.inputDf).

online/src/main/java/ai/chronon/online/JavaFetcher.java (1)

130-135: Consider using ScalaJavaConversions utility for consistency.

Direct Scala Iterator usage from Java may have Scala 2.13 compatibility issues. The ScalaJavaConversions.toJava() method (shown in relevant snippets) handles this conversion safely.

Apply this diff:

-        scala.collection.immutable.List<T> scalaList = resps.responses().toList();
-        List<JavaResponse> jResps = new ArrayList<>(scalaList.size());
-        Iterator<T> it = scalaList.iterator();
-        while (it.hasNext()) {
-            jResps.add(new JavaResponse(it.next()));
-        }
+        List<T> scalaList = ScalaJavaConversions.toJava(resps.responses());
+        List<JavaResponse> jResps = new ArrayList<>(scalaList.size());
+        for (T resp : scalaList) {
+            jResps.add(new JavaResponse(resp));
+        }
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)

109-109: Potentially redundant .toMap call.

Since tableProps is already typed as Map[String, String], the .toMap conversion may be unnecessary unless the save method requires a specific collection type.

aggregator/package.mill (1)

5-9: Aggregator Cross module wiring + scalaVersion concern

CrossAggregatorModule with moduleDeps = Seq(build.api(crossValue)) looks correct for cross-Scala wiring. One thing to double-check: where scalaVersion for the main AggregatorModule actually comes from. If build.BaseModule does not already derive it from crossValue, you may want an explicit:

 trait AggregatorModule extends Cross.Module[String] with build.BaseModule {
+  def scalaVersion = crossValue
   def moduleDeps = Seq(build.api(crossValue))
 }

to ensure each cross variant compiles against the intended Scala version.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 607c712 and 62fc7f2.

📒 Files selected for processing (114)
  • .github/workflows/test_grype.yaml (3 hunks)
  • .github/workflows/test_scala_2_12_non_spark.yaml (9 hunks)
  • .github/workflows/test_scala_2_12_spark.yaml (11 hunks)
  • aggregator/package.mill (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (0 hunks)
  • api/package.mill (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala (0 hunks)
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala (0 hunks)
  • build.mill (3 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (0 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (0 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (3 hunks)
  • flink/package.mill (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (1 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (0 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1 hunks)
  • service/package.mill (2 hunks)
  • service_commons/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (9 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1 hunks)
💤 Files with no reviewable changes (43)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
✅ Files skipped from review due to trivial changes (2)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
🚧 Files skipped from review as they are similar to previous changes (37)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • api/package.mill
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • online/package.mill
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
🧰 Additional context used
🧠 Learnings (25)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • .github/workflows/test_grype.yaml
  • build.mill
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • cloud_gcp/package.mill
  • .github/workflows/test_scala_2_12_spark.yaml
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • .github/workflows/test_scala_2_12_non_spark.yaml
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-01-15T21:00:35.574Z
Learnt from: david-zlai
Repo: zipline-ai/chronon PR: 222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • .github/workflows/test_grype.yaml
  • cloud_gcp/package.mill
  • spark/package.mill
  • cloud_aws/package.mill
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2025-09-26T01:29:32.984Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1182
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala:169-191
Timestamp: 2025-09-26T01:29:32.984Z
Learning: In BigTableMetricsKvStore, schema keys (ending with "#schema") are always queried alone in multiGet requests and never mixed with regular metric keys, so there's no concern about timestamp filter leakage between different key types within the same query.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • .github/workflows/test_scala_2_12_spark.yaml
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-09-24T00:22:38.257Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1169
File: spark/src/main/scala/ai/chronon/spark/batch/iceberg/IcebergPartitionStatsExtractor.scala:79-94
Timestamp: 2025-09-24T00:22:38.257Z
Learning: BigTableMetricsKvStore defaults tsMillis to System.currentTimeMillis() when not provided in PutRequest, so PutRequests without explicit timestamps are handled correctly by the KV store implementation.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2025-04-21T15:10:40.819Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-17T01:02:40.431Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In `DynamoDBKVStoreImpl.scala`, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic `Exception` type in methods like `extractListValues`.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2025-05-02T16:19:11.001Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2025-01-24T23:55:40.650Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2025-05-25T15:57:30.687Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
🧬 Code graph analysis (10)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (4)
api/src/main/scala/ai/chronon/api/Builders.scala (1)
  • GroupBy (141-167)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
  • runTimestampChecks (459-488)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
  • inputDf (479-518)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • table (457-457)
  • rightToLeft (856-866)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
  • fetch (48-53)
  • fetch (57-61)
  • fetch (65-71)
online/src/main/scala/ai/chronon/online/Api.scala (2)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
  • fetch (30-34)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
  • fetch (48-53)
  • fetch (57-61)
  • fetch (65-71)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
  • multiGet (108-118)
  • multiPut (308-359)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (36-53)
online/src/main/java/ai/chronon/online/JavaFetcher.java (3)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
  • fetchGroupBys (182-184)
  • fetchJoin (186-233)
  • fetchJoinV2 (273-315)
  • FeaturesResponseType (50-53)
  • AvroString (80-80)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (5-96)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
  • listJoins (166-218)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
  • JoinUtils (38-431)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
  • applyMapping (814-817)
spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
  • keyColumns (1022-1027)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (5-96)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
  • online (149-149)
  • online (176-180)
  • Fetcher (55-147)
  • Fetcher (152-685)
  • Response (68-68)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
  • fetch (30-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: cloud_aws_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: flink_tests
  • GitHub Check: online_tests
  • GitHub Check: streaming_tests
  • GitHub Check: stats_tests
  • GitHub Check: kv_store_tests
  • GitHub Check: udafs_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: build
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (57)
spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1)

31-31: Seq import cleanup is correct for Scala 2.13

Using only immutable and mutable while relying on Predef Seq is idiomatic and matches the file’s usages (immutable.Seq, mutable.ListBuffer, and plain Seq). No issues.

spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1)

25-25: LGTM—redundant import removed.

Seq from scala.Predef covers all usages.

flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1)

118-118: LGTM! Consistent collection handling.

The .toSeq addition makes the type explicit and aligns with similar conversions in this method (lines 114, 117).

spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (1)

220-220: scala.Seq for keyColumns is consistent with 2.13 collection usage

Using scala.Seq here is type-safe, aligns with the updated collection conventions in this PR, and keeps these tests consistent with the rest of the GroupBy APIs.

Also applies to: 242-242

spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1)

29-29: LGTM – Scala 2.13 migration.

Removing the explicit scala.collection.Seq import to rely on Predef's default is correct for Scala 2.13 compatibility.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)

51-76: LGTM - Scala 2.13 collection type migration.

The signature change to scala.Seq and the .toSeq conversion on line 72 correctly align with Scala 2.13's collection library changes.


89-100: LGTM - Consistent with collection type normalization.

The scala.Seq signature change maintains consistency with the codebase-wide Scala 2.13 migration.

spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)

57-58: LGTM: getSeq[Row] migration.

The switch from getAs[Seq[Row]] to getSeq[Row] is correct for Scala 2.13 and aligns with Spark's typed accessor methods.

Also applies to: 74-75, 82-83, 126-127, 134-135, 142-143, 150-151, 221-222, 230-231, 270-271

api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)

47-50: Signature change is fully compatible across all call sites.

Verified: all 5 call sites pass Seq[String] or immutable compositions (baseWheres, whereClauses, .toScala). Type migration to Scala 2.13 semantics is correct.

spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

46-46: Scala 2.13 import pattern correct.

Removing explicit Seq import is appropriate—Predef now provides scala.collection.immutable.Seq. No mutable.Seq references found in the file. All usages compatible with immutable sequences.

Note: AI summary is incorrect; Seq resolves via Predef, not the mutable package.

Verify compilation and tests pass before merging.

spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3)

16-21: LGTM: Import updates aligned with Scala 2.13 migration.

The new ai.chronon.eval._ import and consolidated imports are correct.


196-196: LGTM: Correct conversion for Scala 2.13.

The .toScala.map conversion is appropriate.


262-262: LGTM: Correct Java-to-Scala Map conversion.

The .toScala.toMap conversion properly handles the Java Map to Scala immutable Map transformation.

api/src/main/scala/ai/chronon/api/DataType.scala (2)

241-241: Clarify stringPrefix override purpose.

Overriding stringPrefix affects toString output for the case class. Is this override necessary for Scala 2.13 compatibility or debugging?


234-261: StructType Seq mixin is safe—no breaking changes detected.

The grep search found zero direct equality comparisons on StructType instances. All usages are pattern matches (case StructType(...)), which rely on case class unapply extraction, not equals(). Extending Seq[StructField] changes element-based equality semantics (correct, since StructFields define the struct), but this doesn't break existing code. No StructType instances are used as Map keys or Set elements. The 2.13 migration correctly uses scala.collection.immutable.Seq, making equality behavior appropriate.

online/src/main/java/ai/chronon/online/JavaFetcher.java (6)

143-151: LGTM!

Clean refactor to Java collections improves type clarity and eliminates unnecessary Scala collection usage.


156-159: LGTM!

Proper use of ScalaJavaConversions.toScala() for Scala 2.13 compatibility.


167-169: LGTM!

Consistent conversion pattern applied correctly.


177-179: LGTM!

Conversion applied consistently across all fetch methods.


187-187: LGTM!

Proper Scala-to-Java conversion using the utility method.


125-141: Public toJavaResponses removal verified—no internal usages found.

Comprehensive search across all Java, Scala, and test files confirms the removed method has no callers within the codebase. Safe to proceed.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3)

19-20: Import of mutable only is appropriate

Using scala.collection.mutable without importing Seq keeps local ArrayBuffer usage explicit while leaving Seq to the standard alias; no issues here.


137-148: timedValues.toSeq matches KVStore expectations

Converting the flatMap result to a Seq before wrapping in GetResponse keeps the response type consistent and avoids exposing a mutable buffer.


232-237: listValues.toSeq keeps list response immutable

Materializing to Seq here aligns with KVStore.ListResponse’s expectations and avoids leaking the underlying mutable collection.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (3)

21-27: Mutable import + converters look good

Importing scala.collection.mutable alongside scala.jdk.CollectionConverters._ matches the use of ArrayBuffer and Java→Scala collection interop here.


147-157: Correct Java byte list → Seq[Byte] conversion

tileKey.keyBytes.asScala.map(_.toByte).toSeq cleanly converts the boxed Java bytes into an immutable Seq[Byte] for both multi-get and multi-put streaming branches; this matches the helper signatures.

Also applies to: 321-329


186-193: toSeq on timedValues/listValues normalizes KVStore outputs

Adding .toSeq after the flatMap chains ensures both multi-get and list return Seq[...] rather than mutable buffers, consistent with the KVStore interface.

Also applies to: 281-286

spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (2)

12-12: LGTM - Import cleanup for Scala 2.13 compatibility.

Explicit imports align with the PR's collection normalization strategy.

Also applies to: 14-14, 19-19


22-25: Verified: All JoinPartJobContext instantiations correctly updated.

The codebase contains 2 instantiations of JoinPartJobContext (in Join.scala:324 and JoinPartJob.scala:72-75), both correctly using the new 4-parameter signature. No stale calls remain.

service_commons/package.mill (1)

2-9: LGTM: Cross-building pattern correctly implemented.

The module now supports multiple Scala versions via Cross and crossValue. Pattern is consistent with other modules in this PR.

service/package.mill (1)

2-8: LGTM: Cross-building correctly configured.

Pattern matches service_commons and other modules.

.github/workflows/test_scala_2_12_non_spark.yaml (1)

23-24: LGTM: Centralized version management.

The SCALA_VERSION env var enables easy version switching and is correctly applied to all test targets.

build.mill (1)

112-112: LGTM: CrossScalaModule enables cross-building.

This correctly implements the cross-version build infrastructure.

.github/workflows/test_scala_2_12_spark.yaml (1)

23-24: LGTM: Version parameterization matches non-Spark workflow.

All Spark test targets correctly reference the versioned modules.

cloud_gcp/package.mill (1)

8-11: LGTM: Cross-building correctly configured.

CloudGcpModule properly extends BaseModule and uses crossValue. Pattern is consistent with other modules.

.github/workflows/test_grype.yaml (1)

27-28: LGTM: Grype workflow correctly parameterized.

All build and artifact paths consistently use SCALA_VERSION, enabling cross-version security scanning.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)

106-106: No action needed. Version 2.3 is supported and stable.

The version bump from 1.2 to 2.3 is valid—both are LTS versions supported by Dataproc Serverless. Version 2.3 is listed as LTS and fully supported.

flink/package.mill (3)

2-2: Import switch to mill.api._ looks correct

Matches Mill’s recommended imports and other modules in this PR.


6-12: Flink Cross module wiring is consistent

object package + trait FlinkModule with crossValue-aware moduleDeps aligns with the cross-Scala pattern used elsewhere and keeps dependent modules on the same Scala line.


41-44: Test module correctly follows crossValue

Using scalaVersion = crossValue and moduleDeps = Seq(build.flink(crossValue)) should give per-Scala-version Flink tests wired to the right implementation.

spark/package.mill (3)

2-2: mill.api._ import change is fine

Consistent with other modules and Mill usage.


7-11: Spark Cross module graph looks sane

SparkModule as Cross.Module[String] with crossValue-based moduleDeps is aligned with the new cross-building scheme across the repo.


27-31: Spark test wiring matches cross setup

scalaVersion = crossValue plus build.spark(crossValue) / build.api(crossValue).test / build.aggregator(crossValue).test should keep tests version-aligned with their targets.

cloud_aws/package.mill (3)

2-2: API import change is aligned

Using mill.api._ is consistent with other package.mill updates.


6-10: CloudAwsModule Cross wiring appears correct

CrossCloudAwsModule with crossValue-aware moduleDeps (spark, aggregator, api, online) is coherent with the rest of the build graph.


24-31: AWS test module is properly cross-versioned

scalaVersion = crossValue, moduleDeps = Seq(build.cloud_aws(crossValue), build.spark(crossValue).test) and local test-only mvnDeps look reasonable for per-Scala AWS integration tests.

aggregator/package.mill (2)

2-2: Aggregator import update is OK

mill.api._ matches the rest of the build refactor.


21-25: Aggregator test module is cross-aligned

Using scalaVersion = crossValue and depending on build.aggregator(crossValue) plus build.api(crossValue).test should keep tests and implementation on the same Scala line.

flink_connectors/package.mill (3)

2-2: Flink connectors import change is fine

mill.api._ usage is consistent with the rest of the build files.


6-12: FlinkConnectors Cross wiring looks coherent

CrossFlinkConnectorsModule with crossValue-aware moduleDeps (aggregator, api, online, flink) matches the intended multi-module cross graph.


26-32: Flink connectors tests correctly track crossValue

scalaVersion = crossValue and moduleDeps = Seq(build.flink_connectors(crossValue)) should give per-Scala-version coverage for this module.

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)

45-45: External external-request construction / dedup looks sound

The added mutable import and the new externalToJoinRequests flatMap keep the same behavior (valid requests only, KeyMissingException routed via Right, dedup based on Request) and integrate cleanly with the existing validExternalRequestToJoinRequestMap / response wiring.

Also applies to: 543-566

spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)

19-21: Import adjustments are consistent with new handler usage

Using ai.chronon.online._ and consolidated Spark SQL imports matches the new ExternalSourceHandler/ExternalSourceRegistry references and keeps this mock compact.

Also applies to: 27-27


47-53: PlusOneExternalHandler’s scala.Seq-based fetch matches the new API

The override to fetch(requests: scala.Seq[Fetcher.Request]): Future[scala.Seq[Fetcher.Response]] and the simple plus-one transformation over req.keys align with the updated ExternalSourceHandler contract and should behave as before.

Please confirm this compiles against the updated ExternalSourceHandler trait and its callers under Scala 2.13.


56-72: AlwaysFailsHandler / JavaPlusOneExternalHandler aligned with ExternalSourceHandler

Both handlers now extend ExternalSourceHandler with the scala.Seq-based fetch signature and preserve their prior semantics (always-failing and plus-one variants), keeping the mock registry behavior intact.

Double-check any tests or docs that specifically referenced JavaExternalSourceHandler here, to ensure expectations around “java_plus_one” remain correct.

online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

22-24: ContextualHandler updated cleanly to scala.Seq-based fetch

Switching ContextualHandler.fetch to scala.Seq[Request] / Future[scala.Seq[Response]] and keeping handlerMap usage unchanged fits the new ExternalSourceHandler contract without altering behavior.

Please ensure all ExternalSourceHandler implementations (including this one) are compiled/run together to catch any remaining signature mismatches.

Also applies to: 29-35

online/src/main/scala/ai/chronon/online/Api.scala (1)

200-211: ExternalSourceHandler trait + scala.Seq signature look consistent

Promoting ExternalSourceHandler to a trait with a shared implicit ExecutionContext and a fetch(requests: scala.Seq[Fetcher.Request]): Future[scala.Seq[Fetcher.Response]] signature matches the downstream registry and mock handlers, and is appropriate for Scala 2.13 collection usage.

Please confirm JavaExternalSourceHandler (and any Java implementers) still implement this trait correctly and that cross-language tests pass.

build.mill Outdated
Comment on lines +17 to +18
val scalaVersions = Seq("2.12.18", "2.13.17")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify JavaConverters migration is complete before Scala 2.13 upgrade.

Scala 2.13 deprecates scala.collection.JavaConverters. A previous review found 42 imports that need migration to scala.jdk.CollectionConverters.


🏁 Script executed:

#!/bin/bash
# Check for remaining deprecated JavaConverters imports
echo "=== Checking for deprecated JavaConverters imports ==="
rg -n "scala\.collection\.JavaConverters" --type scala | wc -l

echo ""
echo "=== Checking for new CollectionConverters usage ==="
rg -n "scala\.jdk\.CollectionConverters" --type scala | wc -l

echo ""
echo "=== Files still using JavaConverters ==="
rg "scala\.collection\.JavaConverters" --type scala -l

Length of output: 2940


Complete JavaConverters → CollectionConverters migration before deploying Scala 2.13 support.

41 deprecated scala.collection.JavaConverters imports remain across 39 files. Scala 2.13 will fail at runtime without migration to scala.jdk.CollectionConverters.

Affected files:

  • spark/src/main/scala: JoinBase.scala, Driver.scala, streaming/GroupBy.scala, batch/ModularMonolith.scala, batch/iceberg/IcebergPartitionStatsExtractor.scala, batch/BatchNodeRunner.scala
  • online/src/main/scala: AvroConversions.scala, TileCodec.scala, TopicChecker.scala, fetcher/Fetcher.scala
  • api/src/main/scala: Row.scala, planner/*.scala (5 files), StagingQueryPlanner.scala
  • flink/src/main/scala: Test utils
  • cloud_aws/src/main/scala: EmrSubmitter.scala
  • cloud_gcp/src/test/scala: BigQueryCatalogTest.scala
  • aggregator/src/test/scala: Test files
🤖 Prompt for AI Agents
In build.mill around lines 17-18, the project enables Scala 2.13 but many files
still import scala.collection.JavaConverters (deprecated) which will break at
runtime; replace all scala.collection.JavaConverters imports with
scala.jdk.CollectionConverters and update usages accordingly (import
scala.jdk.CollectionConverters._ and keep using .asScala/.asJava or use explicit
.asScala/toList conversions where needed), scan the listed files (and the 39
affected files) for any JavaConverters.* references and migrate them, recompile
and run tests to ensure no remaining deprecated imports or runtime failures
before deploying Scala 2.13 support.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8f4b93 and bec697e.

📒 Files selected for processing (1)
  • .github/workflows/push_to_canary.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • .github/workflows/push_to_canary.yaml
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • .github/workflows/push_to_canary.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: build
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
.github/workflows/push_to_canary.yaml (1)

15-15: ✓ Environment variable introduced consistently.

The SCALA_VERSION is properly defined once and used throughout the workflow for Mill targets and artifact paths.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
build.mill (1)

18-19: Scala 2.13 upgrade blocked by JavaConverters migration.

🧹 Nitpick comments (1)
build.mill (1)

223-324: Refactor: Extract common publishing logic and use constant for default version.

publishToGcs and publishToS3 duplicate ~90% of code, and the module list appears 3 times. The default scalaVersion = "2.12.18" in buildAllAssemblies should reference Constants.scalaVersions.

Extract shared logic:

private def publishJars(
  scalaVersion: String,
  version: String,
  bucket: String,
  uploadFn: (os.Path, String) => Unit
) = {
  val commit = os.proc("git", "rev-parse", "HEAD").call().out.text().trim
  val branch = os.proc("git", "rev-parse", "--abbrev-ref", "HEAD").call().out.text().trim
  val date = Instant.now().toString

  val modules = Seq(
    ("flink", "flink_assembly_deploy.jar"),
    ("cloud_gcp", "cloud_gcp_lib_deploy.jar"),
    ("cloud_aws", "cloud_aws_lib_deploy.jar"),
    ("service", "service_assembly_deploy.jar"),
    ("flink_connectors", "connectors_pubsub_deploy.jar")
  )

  modules.foreach { case (moduleName, deployName) =>
    val jarPath = os.pwd / "out" / moduleName / scalaVersion / "assembly.dest" / "out.jar"
    if (!os.exists(jarPath)) {
      throw new Exception(s"Assembly JAR not found at $jarPath. Run './mill ${moduleName}[$scalaVersion].assembly' first.")
    }
    uploadFn(jarPath, s"$bucket/release/$version/jars/$deployName")
  }
}

Then simplify:

-def buildAllAssemblies(scalaVersion: String = "2.12.18") = Task.Command {
+def buildAllAssemblies(scalaVersion: String = Constants.scalaVersions.head) = Task.Command {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bec697e and 11f9650.

📒 Files selected for processing (2)
  • api/package.mill (2 hunks)
  • build.mill (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • build.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: build
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (8)
build.mill (4)

1-4: LGTM!

Necessary imports for cross-building and publishing support.


24-41: LGTM!

Standard publishing configuration follows Maven conventions.


131-131: LGTM!

CrossScalaModule enables multi-version support correctly.


160-172: LGTM!

Standard PublishModule implementation.

api/package.mill (4)

3-9: LGTM!

Cross-building setup follows the standard pattern.


12-24: LGTM!

PublishModule integration correct for thriftBase.


28-40: LGTM!

PublishModule integration correct for thriftJava.


54-55: LGTM!

Correct crossValue usage for cross-version testing.


// Create table dependencies for all GroupBy parts (both direct GroupBy deps and upstream join deps)
val allDeps = Option(join.joinParts).map(_.asScala).getOrElse(Seq.empty).flatMap { joinPart =>
val allDeps = Option(join.joinParts).map(_.asScala.toSeq).getOrElse(Seq.empty).flatMap { joinPart =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val allDeps = Option(join.joinParts).map(_.asScala.toSeq).getOrElse(Seq.empty).flatMap { joinPart =>
val allDeps = Option(join.joinParts).map(_.toScala).getOrElse(Seq.empty).flatMap { joinPart =>

def fromJoinSources(sources: java.util.List[api.Source]): Seq[TableDependency] = {
Option(sources)
.map(_.asScala)
.map(_.toScala.toSeq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(_.toScala.toSeq)
.map(_.toScala)

@tchow-zlai tchow-zlai force-pushed the tchow/serverless-classpath branch from ed3c3c5 to 584e98b Compare November 24, 2025 19:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)

18-37: Parameter name mismatch in override signature.

The multiPut method at line 18 uses putRequests but the parent KVStore trait (Api.scala:74) declares keyValueDatasets. This breaks named parameter compatibility.

Change parameter name from putRequests to keyValueDatasets to match the trait definition.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)

51-76: multiGet: null path creates Failure instead of Success(empty)

When a key has no entries, valueSeries is null. Current code Try { null }.map(_.toSeq) executes null.toSeq inside Try → NPE → Failure, breaking callers expecting Success(Seq.empty) for missing keys.

Both GroupByFetcher.scala:192 and DriftStore.scala:129 extract response.values directly and process the Try; they assume missing keys succeed as empty, not fail.

Apply the suggested fix to normalize null before .toSeq:

-        val values = Try {
-          val valueSeries = database
-            .get(req.dataset) // table
-            .get(encode(req.keyBytes)) // values of key
-
-          if (valueSeries == null)
-            null
-          else
-            valueSeries
-              .filter { case (version, _) =>
-                req.startTsMillis.forall(version >= _) && req.endTsMillis.forall(version <= _)
-              } // filter version
-              .map { case (version, bytes) => TimedValue(bytes, version) }
-        }.map(_.toSeq)
+        val values = Try {
+          val valueSeries = database
+            .get(req.dataset) // table
+            .get(encode(req.keyBytes)) // values of key
+
+          if (valueSeries == null)
+            null
+          else
+            valueSeries
+              .filter { case (version, _) =>
+                req.startTsMillis.forall(version >= _) && req.endTsMillis.forall(version <= _)
+              } // filter version
+              .map { case (version, bytes) => TimedValue(bytes, version) }
+        }.map(buf => Option(buf).map(_.toSeq).getOrElse(Seq.empty))
♻️ Duplicate comments (1)
build.mill (1)

18-19: Re-confirm JavaConverters → CollectionConverters migration before finalizing 2.13.17.

Given prior findings of many scala.collection.JavaConverters imports, please ensure all have been migrated to scala.jdk.CollectionConverters and the project compiles cleanly under 2.13.17.

🧹 Nitpick comments (9)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

53-56: Standardize remaining Seq references for consistency.

Lines 53 and 56 use unqualified Seq, while line 30 uses scala.Seq. Consider qualifying these for consistency with the migration pattern.

-  def fetchRequests(requests: Seq[Request], context: Context)(implicit ec: ExecutionContext): Future[Seq[Response]] = {
+  def fetchRequests(requests: scala.Seq[Request], context: Context)(implicit ec: ExecutionContext): Future[scala.Seq[Response]] = {
     val startTime = System.currentTimeMillis()
     // we make issue one batch request per external source and flatten out it later
-    val responsesByNameF: List[Future[Seq[Response]]] = requests
+    val responsesByNameF: List[Future[scala.Seq[Response]]] = requests
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)

167-167: LGTM! Removed redundant conversions.

Correctly removed unnecessary .toSeq calls since Seq(...) already returns Seq and the data variables are already sequences.

For consistency, consider also removing redundant .toSeq in lines 13-17, 19-23, 91-95, 97-101, 239-243, 245-249 where Seq(...).toSeq.toDF(...) can be simplified to Seq(...).toDF(...).

Also applies to: 177-177, 188-188, 198-198

spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)

64-72: Consider renaming for clarity.

The handler now extends ExternalSourceHandler like the others, making the "Java" prefix misleading. Consider renaming to better reflect its purpose (e.g., PlusOneExternalHandlerV2 or removing if it duplicates PlusOneExternalHandler).

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)

544-565: Consider consolidating duplicate cache lookups.

The refactoring correctly uses .toSeq for Scala 2.13 compatibility. However, metadataStore.getJoinConf(joinRequest.name) is retrieved twice (lines 545 and 550-554). You could store the result after the refresh check to avoid the duplicate lookup.

Example consolidation:

val externalToJoinRequests: Seq[ExternalToJoinRequest] = validRequests.flatMap { joinRequest =>
  var joinConfTry = metadataStore.getJoinConf(joinRequest.name)
  if (joinConfTry.isFailure) {
    metadataStore.getJoinConf.refresh(joinRequest.name)
    joinConfTry = metadataStore.getJoinConf(joinRequest.name)
  }
  val parts = joinConfTry.get.join.onlineExternalParts

  parts.iterator().asScala.map { part =>
    val externalRequest = Try(part.applyMapping(joinRequest.keys)) match {
      case Success(mappedKeys)                     => Left(Request(part.source.metadata.name, mappedKeys))
      case Failure(exception: KeyMissingException) => Right(exception)
      case Failure(otherException)                 => throw otherException
    }
    ExternalToJoinRequest(externalRequest, joinRequest, part)
  }
}.toSeq
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)

108-115: Be aware of shallower semantics vs .deep for nested arrays

arr1.toSeq == arr2.toSeq is fine for 1‑D arrays, but if you ever compare arrays-of-arrays it will no longer recurse like .deep did. If you want version‑independent deep behavior, consider an explicit recursive branch:

-    case (arr1: Array[_], arr2: Array[_]) => arr1.toSeq == arr2.toSeq
+    case (arr1: Array[_], arr2: Array[_]) =>
+      arr1.length == arr2.length &&
+        arr1.indices.forall(i => deepEquals(arr1(i), arr2(i)))
cloud_gcp/package.mill (1)

22-25: Verify BigQuery relocation doesn’t conflict with usage.

The relocate of org.apache.iceberg.gcp.bigquery.** is fine for shading, but please confirm there are no reflection/string FQN usages and no parallel unshaded iceberg-bigquery jars on the runtime classpath.

service/package.mill (1)

6-9: Service cross-module and test wiring look correct.

moduleDeps and test.moduleDeps correctly target the crossValue variants; the additional Cross.Module[String] on ServiceModule is redundant but not harmful.

Also applies to: 48-51

api/package.mill (1)

23-25: Avoid duplicating the CHRONON_VERSION default.

publishVersion here and in BaseModule both hardcode "0.0.32"; consider a single constant/helper (e.g., in Constants) to keep the default release version in sync.

Also applies to: 42-44, 50-51

spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)

89-100: multiPut: Future.successful would be cheaper and clearer

multiPut does only in-memory work and immediately has the full result; wrapping it in Future { ... } just schedules an unnecessary task on the ExecutionContext.

You can simplify to:

-  override def multiPut(putRequests: scala.Seq[KVStore.PutRequest]): Future[scala.Seq[Boolean]] = {
+  override def multiPut(putRequests: scala.Seq[KVStore.PutRequest]): Future[scala.Seq[Boolean]] = {
     val result = putRequests.map { case PutRequest(keyBytes, valueBytes, dataset, millis) =>
       val table = database.get(dataset)
       val key = encode(keyBytes)
       table.compute(key, putFunc(millis.getOrElse(System.currentTimeMillis()) -> valueBytes))
       true
     }
-
-    Future {
-      result
-    }
+    Future.successful(result)
   }

Behavior stays the same while avoiding extra thread-pool work.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed3c3c5 and 584e98b.

📒 Files selected for processing (118)
  • .github/workflows/push_to_canary.yaml (2 hunks)
  • .github/workflows/reusable_non_spark_tests.yaml (9 hunks)
  • .github/workflows/reusable_spark_tests.yaml (11 hunks)
  • .github/workflows/test_grype.yaml (3 hunks)
  • .github/workflows/test_non_spark.yaml (1 hunks)
  • .github/workflows/test_spark.yaml (1 hunks)
  • aggregator/package.mill (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (0 hunks)
  • api/package.mill (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (3 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala (0 hunks)
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala (0 hunks)
  • build.mill (5 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (0 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (0 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (3 hunks)
  • flink/package.mill (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (1 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (0 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1 hunks)
  • python/package.mill (1 hunks)
  • service/package.mill (2 hunks)
  • service_commons/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (9 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1 hunks)
💤 Files with no reviewable changes (43)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
✅ Files skipped from review due to trivial changes (4)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
🚧 Files skipped from review as they are similar to previous changes (38)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • .github/workflows/push_to_canary.yaml
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • .github/workflows/test_non_spark.yaml
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
  • .github/workflows/reusable_spark_tests.yaml
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • .github/workflows/test_spark.yaml
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • .github/workflows/reusable_non_spark_tests.yaml
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • aggregator/package.mill
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • flink/package.mill
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
🧰 Additional context used
🧠 Learnings (19)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • cloud_gcp/package.mill
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • .github/workflows/test_grype.yaml
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • build.mill
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-05-25T15:57:30.687Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • cloud_gcp/package.mill
  • .github/workflows/test_grype.yaml
  • spark/package.mill
  • cloud_aws/package.mill
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.

Applied to files:

  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.

Applied to files:

  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.

Applied to files:

  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
🧬 Code graph analysis (7)
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)
flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala (1)
  • DeserializationSchemaBuilder (31-62)
flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (1)
  • ProjectedEvent (95-95)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (36-53)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
  • fetch (48-53)
  • fetch (57-61)
  • fetch (65-71)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
  • multiGet (108-118)
  • multiPut (308-359)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (5-96)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
  • online (149-149)
  • online (176-180)
  • Fetcher (55-147)
  • Fetcher (152-685)
  • Response (68-68)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)
  • empty (78-78)
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: python_lint
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (41)
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1)

47-47: LGTM! Type annotation aligns with Scala 2.13.

The change from scala.collection.Seq to scala.Seq is correct and consistent with the broader Scala 2.13 migration.

.github/workflows/test_grype.yaml (4)

27-28: Environment variable properly defines versioned build target.


47-47: Versioned Mill targets correctly parameterized.

The build commands consistently use the SCALA_VERSION variable with proper Mill cross-build syntax.

Also applies to: 50-50, 53-53, 56-56


78-100: Artifact upload paths consistently versioned.

All four JARs are properly uploaded from versioned output directories.


58-63: The Dockerfile contains no flink references, confirming that the service and cloud_gcp JARs are the only artifacts needed for the Docker build. The workflow file correctly copies only these two JARs to build_output.

flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)

4-4: LGTM! Unused import removed.

The concrete schema class is accessed via the builder, so the direct import is unnecessary.


75-75: LGTM! Proper Scala 2.13 collection handling.

Explicit .toSeq materialization aligns with 2.13 best practices.

online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1)

78-78: Seq qualifier change aligns with Scala 2.13 immutable defaults.

The change from collection.Seq to scala.Seq is correct. CatalystUtil accepts Seq[(String, String)], and scala.Seq (immutable) is fully compatible with the unqualified Seq parameter type in Scala 2.13.

spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)

35-35: LGTM—correct for Scala 2.13 migration.

Removing the explicit scala.collection.Seq import allows the predef Seq (which is immutable.Seq in 2.13) to be used. All usages in the file are compatible.

python/package.mill (1)

46-48: The hard-coded "0.0.32" fallback is used consistently across all package files (build.mill, python/package.mill, api/package.mill). The ast-grep search found no build.Constants.chrononVersion object, confirming it doesn't exist. This is an intentional, consistent pattern—not a maintenance issue.

Likely an incorrect or invalid review comment.

flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1)

20-20: LGTM—correct for Scala 2.13.

Predef imports immutable.Seq by default. All usages in this file remain compatible.

online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

22-22: LGTM—consistent with Scala 2.13 standardization.

The import cleanup and explicit scala.Seq qualification align with the cross-version migration pattern.

Also applies to: 30-30

flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (1)

80-80: LGTM! Collection conversion is correct and consistent.

The .asScala.toSeq pattern correctly materializes Java collections as immutable Seqs, aligning with the PR's normalization effort.

Also applies to: 115-115

online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (2)

39-56: Implementation is correct.

The conversion logic properly handles Scala–Java interop for requests/responses and futures.


34-34: Inheritance change is correct and safe.

JavaExternalSourceHandler is the only Java implementation of ExternalSourceHandler in the codebase—no subclasses exist to be affected by the inheritance change.

spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)

57-58: LGTM! Cleaner API usage.

Replacing getAs[Seq[Row]] with getSeq[Row] throughout is cleaner and aligns with Scala 2.13 normalization.

Also applies to: 74-75, 82-83, 126-127, 134-135, 142-143, 150-151, 221-222, 230-231, 270-271

online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)

16-16: Signature change correct; parent trait signatures are compatible.

The explicit scala.Seq and KVStore.GetRequest qualifications match the parent trait's unqualified Seq[GetRequest] (resolved identically in Scala 2.13). The override is valid.

spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)

19-19: Import narrowing to scala.collection.Map looks correct

Dropping scala.collection.Seq keeps Seq using the default immutable alias while preserving existing Map usages; no behavior change observed.

spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)

19-19: LGTM!

Import consolidations are clean and appropriate for the Scala 2.13 migration.

Also applies to: 21-21, 27-27


56-62: LGTM!

Migration from JavaExternalSourceHandler to ExternalSourceHandler is correct. The fetch implementation properly returns failures wrapped in Future. Same ExecutionContext verification from line 48 applies here.


48-53: ExecutionContext is available via ExternalSourceHandler.

The Future() call at line 49 will resolve the implicit ExecutionContext from ExternalSourceHandler (which PlusOneExternalHandler extends). Line 209 of Api.scala defines it as implicit lazy val executionContext: ExecutionContext = ExternalSourceHandler.executor. The signature change to scala.Seq is correct for Scala 2.13. No changes needed.

api/src/main/scala/ai/chronon/api/DataType.scala (1)

234-235: No issues found. Change is correct for Scala 2.13.

StructType is inherently immutable (Array-backed with read-only interface: apply, length, iterator). The change from scala.collection.Seq to unqualified Seq correctly resolves to scala.collection.immutable.Seq in Scala 2.13. No code attempts mutable operations, and existing tests verify serialization compatibility.

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)

45-45: LGTM - Scala 2.13 compatibility.

The explicit mutable collections import is required for Scala 2.13.

spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2)

66-66: LGTM: Proper Java-Scala conversion.

The .asScala.toSeq correctly materializes the Java collection into a Scala Seq[String], aligning with the updated QueryUtils.build signature.


80-80: LGTM: Consistent sequence concatenation.

Using ++ Seq(...) for the final element maintains consistency with the preceding concatenations.

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)

35-35: LGTM: Import cleanup for Scala 2.13.

Removing Seq from the scala.collection import aligns with the Scala 2.13 migration.


294-304: Return type change verified as safe across all call sites.

All seven call sites in JoinBase.scala and VersioningTest.scala use only immutable-Seq-compatible operations (.isEmpty, .foreach, .size, equality comparison). The signature change from scala.collection.Seq[String] to scala.Seq[String] introduces no breaking changes.

api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)

49-49: All callers verified—parameter type change is safe.

The whereClauses method (DataRange.scala:78) returns Seq[String] (immutable). All five call sites pass compatible immutable sequences:

  • effectiveRange.whereClausesSeq[String]
  • .toScala and .asScala.toSeq → immutable
  • Concatenations (++, :+) preserve immutability

No breaking changes. Parameter type narrowing is backward compatible.

online/src/main/scala/ai/chronon/online/TileCodec.scala (3)

31-33: Import cleanup looks good

Dropping the explicit Seq import and keeping mutable only is consistent with relying on the default Seq alias and keeps imports minimal.


35-42: Materializing unWindowed results as Seq is reasonable

Using .toSeq on flatMap(_.unWindowed) gives RowAggregator a concrete sequence and avoids surprises from Java–Scala view types; overhead is minor for this setup path.


44-51: Same toSeq rationale for windowed aggregations

The .toSeq on flatMap(_.unpack) similarly normalizes the collection passed to RowAggregator; behavior remains the same with clearer types.

service_commons/package.mill (1)

6-9: Cross wiring for service_commons and tests looks consistent.

moduleDeps and test.moduleDeps correctly use crossValue and align with other modules’ cross setup.

Also applies to: 21-23

spark/package.mill (1)

7-11: Spark cross-module + test setup is consistent and sound.

Cross wiring and test deps align with the new scalaVersions scheme and other modules.

Also applies to: 27-35

cloud_aws/package.mill (1)

6-9: Cloud AWS cross wiring and tests are correctly updated.

Module and test dependencies use the appropriate crossValue variants and look consistent with the rest of the build.

Also applies to: 24-31

online/package.mill (2)

2-9: Cross-Scala wiring for online module looks consistent

Cross[OnlineModule](build.Constants.scalaVersions) plus OnlineModule extending Cross.Module[String] with build.BaseModule and using crossValue in moduleDeps all look aligned with the broader cross-build pattern; just make sure scalaVersion for this module is effectively derived from crossValue so the scala-reflect / scala-compiler mvn lines pick the correct 2.12 vs 2.13 artifacts.


38-41: Test module cross-wiring looks correct

scalaVersion = crossValue and moduleDeps = Seq(build.online(crossValue), build.api(crossValue).test) give you per-Scala-version tests hooked to the matching main and API test modules; this looks good and consistent with the new cross layout.

flink_connectors/package.mill (2)

2-12: Flink connectors Cross module setup looks good

The move to Cross[FlinkConnectorsModule](build.Constants.scalaVersions) with FlinkConnectorsModule extending Cross.Module[String] and wiring moduleDeps via crossValue (aggregator/api/online/flink) looks coherent and should give per-Scala builds for this stack; just confirm that the underlying scalaVersion used by this module is tied to crossValue as intended.


26-28: Flink connectors test module correctly follows cross-Scala pattern

Setting scalaVersion = crossValue and moduleDeps = Seq(build.flink_connectors(crossValue)) is the expected pattern for per-Scala tests of this module and matches the rest of the Cross setup.

spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (2)

11-26: parseSkewKeys: .asScala.toSeq looks correct

Using list.asScala.toSeq to back Map[String, Seq[String]] is the right normalization for Scala 2.13 and matches the method’s declared return type. No further issues here.


41-47: allFilters.toSeq.toJava is a safe interop normalization

Materializing allFilters to a Seq before toJava avoids collection-type quirks in 2.13 and keeps the Java-facing API deterministic. Change looks good.

spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)

11-11: StagingQuery alias + setups .asScala.toSeq are consistent with 2.13 cleanup

Using StagingQueryUtil as an alias keeps the call site explicit without changing behavior, and normalizing setups via Option(...).map(_.asScala.toSeq).getOrElse(Seq.empty) matches how other Java collections are handled in this file. Looks good.

Also applies to: 173-178

@tchow-zlai tchow-zlai force-pushed the tchow/serverless-classpath branch from 520c8fc to 4482794 Compare November 24, 2025 22:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
build.mill (1)

18-19: Scala 2.13 upgrade blocked by deprecated imports.

Past reviews identified 41 scala.collection.JavaConverters imports that must be migrated to scala.jdk.CollectionConverters before this upgrade.

🧹 Nitpick comments (3)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)

64-72: JavaPlusOneExternalHandler is now identical to PlusOneExternalHandler.

After migration, both handlers share the same base class and implementation. Consider consolidating or documenting why both are needed in the test mock.

online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

22-22: Seq import removal narrows fetchRequests’ param type

Dropping scala.collection.Seq means Seq[Request] now aliases scala.collection.immutable.Seq[Request]; please confirm no callers pass mutable.Seq here, and consider switching the signature (and related Seq usages) to scala.Seq for clarity/consistency with the rest of the PR.

service_commons/package.mill (1)

6-9: Cross-building works but pattern inconsistent.

ServiceCommonsModule only extends build.BaseModule (like CloudGcpModule), while FlinkModule and CloudAwsModule explicitly extend Cross.Module[String]. Both approaches work since BaseModule extends CrossScalaModule, but consistency would improve clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 520c8fc and 4482794.

📒 Files selected for processing (119)
  • .github/workflows/push_to_canary.yaml (2 hunks)
  • .github/workflows/reusable_non_spark_tests.yaml (9 hunks)
  • .github/workflows/reusable_spark_tests.yaml (11 hunks)
  • .github/workflows/test_grype.yaml (3 hunks)
  • .github/workflows/test_non_spark.yaml (1 hunks)
  • .github/workflows/test_spark.yaml (1 hunks)
  • aggregator/package.mill (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (0 hunks)
  • api/package.mill (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (3 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala (0 hunks)
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala (0 hunks)
  • build.mill (5 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (0 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (0 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (3 hunks)
  • flink/package.mill (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (2 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (0 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1 hunks)
  • python/package.mill (1 hunks)
  • service/package.mill (2 hunks)
  • service_commons/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (9 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1 hunks)
💤 Files with no reviewable changes (43)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
🚧 Files skipped from review as they are similar to previous changes (44)
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • .github/workflows/reusable_spark_tests.yaml
  • online/src/main/scala/ai/chronon/online/Api.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • .github/workflows/test_spark.yaml
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • .github/workflows/test_non_spark.yaml
  • .github/workflows/test_grype.yaml
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • online/package.mill
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • .github/workflows/push_to_canary.yaml
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
  • python/package.mill
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
  • api/package.mill
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
🧰 Additional context used
🧠 Learnings (23)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • cloud_gcp/package.mill
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • build.mill
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.

Applied to files:

  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • cloud_gcp/package.mill
  • spark/package.mill
  • cloud_aws/package.mill
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2025-05-02T16:19:11.001Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-31T18:27:44.973Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-15T15:33:22.265Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-01-15T21:00:35.574Z
Learnt from: david-zlai
Repo: zipline-ai/chronon PR: 222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
🧬 Code graph analysis (10)
spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (4)
api/src/main/java/ai/chronon/api/thrift/Option.java (1)
  • Option (25-143)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
online/src/main/scala/ai/chronon/online/JoinCodec.scala (1)
  • toMap (51-51)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
  • query (413-421)
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (2)
  • ScalaJavaConversions (5-96)
  • IteratorOps (50-54)
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (4)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
  • toChrononSchema (53-55)
api/src/main/scala/ai/chronon/api/Builders.scala (1)
  • GroupBy (141-167)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
  • runTimestampChecks (459-488)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
  • inputDf (479-518)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)
  • empty (78-78)
online/src/main/scala/ai/chronon/online/TileCodec.scala (2)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • unWindowed (301-319)
  • unpack (275-297)
api/src/main/scala/ai/chronon/api/DataType.scala (1)
  • unpack (235-235)
online/src/main/java/ai/chronon/online/JavaFetcher.java (4)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)
  • fetchGroupBys (183-185)
  • fetchJoin (187-234)
online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (1)
  • fetchGroupBys (150-266)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (5-96)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
  • listJoins (166-218)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
  • fetch (48-53)
  • fetch (57-61)
  • fetch (65-71)
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)
flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala (1)
  • DeserializationSchemaBuilder (31-62)
flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (1)
  • ProjectedEvent (95-95)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
  • online (149-149)
  • online (177-181)
  • Fetcher (55-147)
  • Fetcher (152-759)
  • Response (68-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: build
  • GitHub Check: python_lint
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (61)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2)

31-35: Import narrowing to scala.collection.mutable is correct

Using SortedMap from scala.collection.immutable and relying on Predef Seq while importing only mutable is idiomatic and avoids the redundant Seq import. No behavior change.


192-213: Returning newAcc.toSeq makes listJoins API safely immutable

Switching from returning the mutable ArrayBuffer to newAcc.toSeq ensures callers get an immutable Seq[String], preventing accidental external mutation and aligning with the Future[Seq[String]] signature.

spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)

66-81: Seq normalization looks good

Normalizing query.wheres to Seq[String] via .asScala.toSeq and building baseWheres ++ timeWheres ++ Seq(s"($keyWhereOption)") is type-safe, preserves ordering, and matches the updated QueryUtils.build API.

spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1)

28-28: LGTM! Clean removal of unused import.

Aligns with Scala 2.13 migration.

flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)

4-4: Import update for ProjectedEvent looks correct

ProjectedEvent is used for resultList and matches the type returned by buildSourceProjectionDeserSchema; cleaning up the unused schema import is fine.


75-75: Explicit .toSeq keeps behavior and clarifies return type

Converting resultList.asScala to Seq before map(_.fields) is behaviorally equivalent here and matches the Seq[Map[String, Any]] return type.

spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

46-46: LGTM!

Import cleanup for Scala 2.13 compatibility. Seq resolves via Predef.

spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1)

54-54: LGTM!

Import cleanup consistent with Scala 2.13 upgrade pattern.

spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1)

41-41: LGTM!

Unused import removal aligns with Scala 2.13 migration.

spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)

19-27: Import changes look good for Scala 2.13 compatibility.

Wildcard import on line 21 brings in ExternalSourceHandler used by the handler classes below.


47-54: LGTM!

Signature update to scala.Seq is correct for Scala 2.13 compatibility.


56-62: LGTM!

Handler correctly migrated from JavaExternalSourceHandler to unified ExternalSourceHandler trait.

spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3)

11-11: Explicit scala.collection.Map import is fine

Keeps signatures consistent with generic Map while preserving behavior.


24-26: list.asScala.toSeq is a safer materialization

Avoids exposing a mutable view and makes the Seq[String] contract explicit; good for 2.13 collections.


41-47: allFilters.toSeq.toJava improves Java interop robustness

Ensures setWheres always sees a proper Seq before conversion; matches the Scala/Java conversion intent.

online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1)

24-26: No actionable issues found; import change is benign

The 47 call sites across the codebase pass api.StructType objects (using the second overload) or construct sequences that resolve correctly. Removing import scala.collection.Seq does not tighten the type signature—Seq resolves to scala.Seq (a prelude alias for scala.collection.Seq), which remains covariant and compatible with both mutable and immutable collections. No call sites pass explicit mutable.Seq instances, and all current usage patterns remain valid.

spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3)

16-21: LGTM!

Import changes align with Scala 2.13 migration pattern. The ai.chronon.eval._ wildcard import brings in the eval result types used throughout this file.


189-196: LGTM!

Consistent with the toScala pattern used elsewhere (e.g., GroupBy.scala line 488). The .toSeq ensures type compatibility with runTimestampChecks signature.


260-268: LGTM!

Correctly migrates Java Map → Scala Map conversion using JMapOps.toScala. The .toMap ensures immutable Map[String, String] for checkKeySchema.

online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (1)

270-270: LGTM - Scala 2.13 migration.

Replacing deprecated mapValues with eager map is correct.

aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2)

22-22: LGTM: Necessary import for Scala 2.13 migration.

The import enables consistent Java-to-Scala iterator conversion.

Based on learnings, ScalaJavaConversions imports are correct for this migration.


87-97: LGTM: Clean Scala 2.13 collection conversion migration.

Changes normalize Java-Scala interop and improve error handling. The explicit scala.collection.Seq[Any] type correctly handles both mutable and immutable sequences. The fallback case provides clearer error messages than the previous MatchError.

spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1)

38-38: Import removal aligns with Scala 2.13.

Removing scala.collection.Seq import is correct—Scala 2.13's default Seq from Predef suffices.

online/src/main/scala/ai/chronon/online/TileCodec.scala (3)

32-32: LGTM! Scala 2.13 import cleanup.

Removing Seq from explicit imports is correct—it's available in Predef.


40-41: LGTM! Correct materialization for Scala 2.13.

The .toSeq materializes the collection after .asScala, which is required in Scala 2.13.


49-50: LGTM! Consistent with buildRowAggregator.

Same correct pattern: materialize with .toSeq after the flatMap.

.github/workflows/reusable_non_spark_tests.yaml (2)

1-12: Workflow conversion to reusable pattern with version input looks solid.

The transition from trigger-based to workflow_call with a required scala-version input is clean. Environment variable binding and description are appropriate.


34-34: Mill cross-version syntax verified—no issues found.

The escaping is correct and all modules use the proper Cross[Module](versions) pattern. At runtime, \[ and \] in the workflow resolve to literal [ and ], producing valid Mill commands like ./mill flink[2.13.17].test.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)

106-106: Version 2.3 is valid and supported—confirm YARN resource calculator impact.

Version 2.3 changes YARN's default resource calculator from DefaultResourceCalculator to DominantResourceCalculator, which impacts autoscaling behavior. This is a significant internal change from the deprecated 1.2. Verify that existing job configurations and autoscaling policies work as expected with this new calculator.

online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

30-34: ContextualHandler.fetch signature looks consistent

Using scala.Seq in the param and return types aligns this override with the updated fetcher APIs elsewhere; the body remains behaviorally unchanged.

online/src/main/java/ai/chronon/online/JavaFetcher.java (3)

137-142: Response conversion loop looks correct.

Building jResps from the Scala List via an iterator is straightforward and preserves ordering; no issues seen here.


163-167: Fetcher entrypoints now consistently use Java List + ScalaJavaConversions.toScala.

fetchGroupBys, fetchJoin, and fetchJoinBase64Avro all follow the same pattern of converting Java requests to a List<Fetcher.Request> and only then to Scala Seq at the boundary; this looks clean and consistent.

Also applies to: 171-177, 181-187


205-205: listJoins Java-facing conversion looks good.

The FutureConverters + ScalaJavaConversions::toJava pipeline cleanly exposes Future[Seq[String]] as CompletableFuture<List<String>> without extra boilerplate.

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)

35-35: LGTM - Scala 2.13 import cleanup.

Correct to remove Seq import since code now uses explicit scala.Seq.


294-294: Return type change verified as safe. All 6 callers use only polymorphic Seq methods (.isEmpty, .foreach, .size, comparisons) compatible with immutable Seq.

api/src/main/scala/ai/chronon/api/DataType.scala (5)

237-241: LGTM!

Standard Seq trait implementations correctly delegate to the underlying array.


242-242: LGTM!

New helper method correctly finds field type by name.


244-251: LGTM!

Correctly casts Map values to Array based on struct schema. Missing fields default to null.


253-260: LGTM!

Correctly casts Map values with type coercion. Consistent with castArr behavior.


234-235: Seq type change verified compatible with codebase.

The search found no incompatible usages. All StructType instances in the codebase use read-only operations (.fields, .map, .iterator). The change to Seq (immutable in Scala 2.13) aligns with the implementation's existing contract—only providing immutable operations. Internal code handles this correctly.

Note: This is a public API breaking change for external consumers. Consider documenting the Scala 2.13 migration requirement if applicable.

spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1)

5-7: Seq import removal looks safe

Using unqualified Seq in multiGet/multiPut is fine, since scala.Predef keeps Seq in scope for both 2.12 and 2.13, so dropping scala.collection.Seq here should not affect compilation, and the Future usage can continue to rely on the implicit ExecutionContext provided by KVStore. Based on learnings, this matches how other KVStore implementations are written.

Also applies to: 17-21

online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (2)

16-16: Method not implemented.

multiGet returns ???. If this method is called, it will throw NotImplementedError.

⛔ Skipped due to learnings
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

18-37: Code is correct. Signatures match.

The trait uses unqualified Seq[Boolean], and the override uses scala.Seq[Boolean]—both resolve to the same type in Scala 2.13+. The implementation's unqualified Seq return values align with the trait's return type. No type mismatch exists.

build.mill (2)

314-324: Publishing configuration looks good.

Standard POM settings and version management from environment variable.


132-280: GCS publishing — gcloud present in CI; mill tasks defined but not wired

Workflows (.github/workflows/push_to_canary.yaml and .github/workflows/publish_release.yaml) configure google-github-actions/auth@v2 and google-github-actions/setup-gcloud@v2 and already run gcloud storage cp/objects update, so gcloud is available/authenticated in CI; build.mill adds GcsPublishModule (trait at build.mill:133) with publishGcsAssembly() (≈line 182) and publishGcsJar() (≈line 233) and is mixed into BaseModule (≈line 283), but no workflow or other file invokes those mill tasks — if CI should call them, add a mill run step in the appropriate job (ensure auth + setup-gcloud are present in that job).

cloud_gcp/package.mill (2)

8-11: Cross-building wired correctly.

Module extends Cross[CloudGcpModule] and uses crossValue for dependencies.


57-59: Test cross-wiring looks good.

Test module correctly references crossValue variants.

flink/package.mill (2)

6-11: Cross-building implementation correct.

Follows standard pattern with Cross.Module[String] and crossValue-aware dependencies.


42-43: Test wiring correct.

aggregator/package.mill (2)

6-9: Cross-building correct.


22-23: Test wiring correct.

flink_connectors/package.mill (2)

6-11: Cross-building correct.


27-28: Test wiring correct.

service_commons/package.mill (1)

22-23: Test wiring correct.

spark/package.mill (2)

7-10: Cross-building correct.


28-30: Test wiring correct.

cloud_aws/package.mill (2)

6-9: Cross-building correct.


25-28: Test wiring correct.

service/package.mill (3)

2-2: Import looks correct for Mill cross-building.

mill.api._ provides the necessary cross-building utilities.


6-8: Cross-building setup looks good.

Standard Mill pattern: Cross[ServiceModule] with Cross.Module[String] trait, and crossValue properly wiring version-specific dependencies.


48-55: Test module wiring is correct.

Using crossValue for module dependency and super.mvnDeps() to properly inherit base test dependencies from BaseJUnit4TestModule.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)

145-194: Java→Scala request conversion and callsites now align with Fetcher APIs.

convertJavaRequestList returning List<Fetcher.Request> and wrapping with ScalaJavaConversions.toScala(...) at all callsites fixes the earlier type mismatch; flow is consistent across groupBy/join/model-transforms paths.

Minor nit: you could pre-size scalaRequests via new ArrayList<>(requests.size()) to avoid internal resizing.

🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)

618-641: External‑request dedup refactor preserves behavior; slight DRY opportunity.

Using validRequests.flatMap plus parts.iterator().asScala.map(...) is clearer and still relies on cached getJoinConf data. You could optionally reuse the initial joinConf instead of calling metadataStore.getJoinConf(joinRequest.name) again when computing parts, but as written it’s functionally correct.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4482794 and 01c074c.

📒 Files selected for processing (3)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
🧬 Code graph analysis (2)
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
  • ScalaJavaConversions (5-96)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (1)
  • fetchModelTransforms (67-87)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: python_tests
  • GitHub Check: python_lint
  • GitHub Check: build
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
online/src/main/java/ai/chronon/online/JavaFetcher.java (2)

127-143: Response conversion from Scala future looks correct.

Manual iteration over resps.responses().toList() into a pre-sized ArrayList is fine and keeps metrics instrumentation unchanged.


196-200: listJoins Scala→Java conversion is straightforward.

Converting the Scala Seq from metadataStore.listJoins to java.util.List via ScalaJavaConversions.toJava is appropriate here.

online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (3)

67-87: fetchModelTransforms scala.Seq migration looks consistent.

Using scala.Seq[Request]/Future[scala.Seq[Response]] at the API boundary while keeping internal logic with Seq.empty and requests.map(...) is fine and aligns with the project-wide scala.Seq normalization.


270-275: Safe conversion of SQL results to Map[String, AnyRef].

Using results.head.map { (k, v) => k -> v.asInstanceOf[AnyRef] } under results.nonEmpty is a clean replacement for mapValues and keeps the expected key/value types.


277-285: Passthrough field filtering is correct and avoids deprecated APIs.

baseData.filter { case (k, _) => fieldsToPassthrough.contains(k) } with a precomputed Set preserves semantics of the old filterKeys while being 2.13‑friendly.

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)

43-48: Model‑transforms path and scala.Seq usage look sound.

Importing scala.collection.mutable matches the later LinkedHashMap/HashMap usages.

fetchModelTransforms now:

  • Accepts scala.Seq[Request],
  • Groups by req.name, runs fetchModelTransformsWithConf per group,
  • Pairs each response with its original index and, after Future.sequence, flattens and sorts by index.

This preserves ordering and cleanly handles both inline and metadata‑driven modelTransforms configs; fetchModelTransformsWithConf correctly reuses the same ModelTransformsFetcher for join‑source and direct paths.

Also applies to: 236-308

@tchow-zlai tchow-zlai force-pushed the tchow/serverless-classpath branch from 01c074c to 95972c2 Compare November 24, 2025 23:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)

145-199: Minor perf/naming tweaks for request conversion and callsites.

  • In convertJavaRequestList, consider pre-sizing the list to avoid growth reallocations:
-    List<Fetcher.Request> scalaRequests = new ArrayList<>();
+    List<Fetcher.Request> scalaRequests = new ArrayList<>(requests.size());
  • scalaRequests is now a Java List; optionally rename (e.g. requestsList) for clarity.
  • The fetchGroupBys / fetchJoin / fetchJoinBase64Avro / fetchModelTransforms / listJoins usages of this helper look consistent and correct.
api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1)

13-13: Remove redundant .toSeq after .toScala.

Since toScala on java.util.List[T] already returns Seq[T] (per ScalaJavaConversions.scala:15), calling .toSeq is unnecessary. Change both occurrences to just .toScala:

  • Line 13: .map(_.toScala) instead of .map(_.toScala.toSeq)
  • Line 131: .map(_.toScala) instead of .map(_.toScala.toSeq)

Also applies to: 131-131

build.mill (1)

18-18: Scala 2.13 enablement: keep in sync with JavaConverters migration review.

scalaVersions now includes 2.13.17; please only land this once the JavaConverters → CollectionConverters migration and 2.13 compilation/tests called out in the earlier review are fully green.

🧹 Nitpick comments (7)
.github/workflows/reusable_non_spark_tests.yaml (1)

34-34: Remove unnecessary bracket escaping from mill commands.

The backslash escaping of brackets (\[ and \]) is non-idiomatic in GitHub Actions run steps. Brackets don't require escaping in this context and can be used directly in the command string.

Apply this diff to all mill test commands (shown for flink, repeat for other jobs):

      - name: Run Flink tests
        run: |
-          ./mill flink\[${{ env.SCALA_VERSION }}\].test
+          ./mill flink[${{ env.SCALA_VERSION }}].test

Alternatively, if you prefer explicit quoting for clarity:

      - name: Run Flink tests
        run: |
-          ./mill flink\[${{ env.SCALA_VERSION }}\].test
+          ./mill "flink[${{ env.SCALA_VERSION }}].test"

Also applies to: 55-55, 76-76, 97-97, 118-118, 139-139, 160-160, 181-181

aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)

87-97: Broaden Java input type match to avoid brittle behavior

Right now only java.util.ArrayList[Any] is accepted; any other java.util.List (e.g., LinkedList, unmodifiable lists) will now hit the IllegalArgumentException. Unless callers are guaranteed to always send ArrayList, it’s safer to match util.List[_] instead, and to avoid Any in the pattern:

-    val anyIterator = inputVal match {
-      case inputList: util.ArrayList[Any]      => inputList.iterator().toScala
-      case inputSeq: scala.collection.Seq[Any] => inputSeq.iterator
+    val anyIterator = inputVal match {
+      case inputList: util.List[_]                 => inputList.iterator().toScala
+      case inputSeq: scala.collection.Seq[_]       => inputSeq.iterator
       case other =>
         throw new IllegalArgumentException(s"Unsupported input type: ${other.getClass.getName}")
     }

This keeps the fail-fast behavior while being more tolerant of upstream Java collections and avoiding unchecked-type warnings.

spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)

80-80: Change is correct; :+ would be more concise.

Functionally equivalent, though :+ s"($keyWhereOption)" is the Scala idiom for single-element append. Acceptable given the PR's standardization goals.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)

108-135: PeripheralsConfig example is fine; note env-specific bits for future enablement

Keeping the PeripheralsConfig / SparkHistoryServer example commented out as inline docs is reasonable and avoids unused-code issues. When you eventually enable it, consider wiring the cluster ID via config instead of a hard-coded canary path.

spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)

169-179: StagingQueryUtil usage and setups conversion look correct; tiny DRY nit

StagingQueryUtil.from(...) + Option(...).asScala.toSeq.getOrElse(Seq.empty) is null‑safe and matches other Java→Scala conversions in this file. You could very slightly DRY by reusing stagingQueryConf for setups instead of stagingQuery.stagingQuery, but current code is fine.

build.mill (2)

31-38: Avoid duplicating PomSettings configuration.

BaseModule.pomSettings is effectively Constants.pomSettingsFor(artifactName()); consider delegating to that helper to keep POM metadata in one place.

Also applies to: 313-321


127-130: GCS publish tasks rely on git, gcloud, and CHRONON_VERSION env.

The GcsPublishModule helpers look solid, but they will fail hard if git/gcloud are missing or CHRONON_VERSION is unset (falling back to 0.0.32); make sure your release/CI docs or configs call this out, or disable via publishToGcsEnabled where needed.

Also applies to: 132-279, 323-325

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01c074c and 95972c2.

📒 Files selected for processing (120)
  • .github/workflows/push_to_canary.yaml (2 hunks)
  • .github/workflows/reusable_non_spark_tests.yaml (9 hunks)
  • .github/workflows/reusable_spark_tests.yaml (11 hunks)
  • .github/workflows/test_grype.yaml (3 hunks)
  • .github/workflows/test_non_spark.yaml (1 hunks)
  • .github/workflows/test_spark.yaml (1 hunks)
  • aggregator/package.mill (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (0 hunks)
  • api/package.mill (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (0 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (3 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala (0 hunks)
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala (0 hunks)
  • build.mill (5 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (0 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (0 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (0 hunks)
  • flink/package.mill (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (0 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (2 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (0 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1 hunks)
  • python/package.mill (1 hunks)
  • service/package.mill (2 hunks)
  • service_commons/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (9 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1 hunks)
💤 Files with no reviewable changes (44)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala
  • spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
✅ Files skipped from review due to trivial changes (2)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
🚧 Files skipped from review as they are similar to previous changes (39)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • cloud_gcp/package.mill
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala
  • spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • python/package.mill
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • .github/workflows/test_non_spark.yaml
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • flink/package.mill
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • api/package.mill
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • .github/workflows/test_spark.yaml
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
  • spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
🧰 Additional context used
🧠 Learnings (24)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.

Applied to files:

  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • .github/workflows/test_grype.yaml
  • .github/workflows/push_to_canary.yaml
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
  • service_commons/package.mill
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • build.mill
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.

Applied to files:

  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • .github/workflows/reusable_spark_tests.yaml
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • .github/workflows/test_grype.yaml
  • .github/workflows/push_to_canary.yaml
  • spark/package.mill
  • cloud_aws/package.mill
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-01-15T21:00:35.574Z
Learnt from: david-zlai
Repo: zipline-ai/chronon PR: 222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
📚 Learning: 2025-09-26T01:29:32.984Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1182
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala:169-191
Timestamp: 2025-09-26T01:29:32.984Z
Learning: In BigTableMetricsKvStore, schema keys (ending with "#schema") are always queried alone in multiGet requests and never mixed with regular metric keys, so there's no concern about timestamp filter leakage between different key types within the same query.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-04-21T15:10:40.819Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-09-24T00:22:38.257Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1169
File: spark/src/main/scala/ai/chronon/spark/batch/iceberg/IcebergPartitionStatsExtractor.scala:79-94
Timestamp: 2025-09-24T00:22:38.257Z
Learning: BigTableMetricsKvStore defaults tsMillis to System.currentTimeMillis() when not provided in PutRequest, so PutRequests without explicit timestamps are handled correctly by the KV store implementation.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2024-10-17T01:02:40.431Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In `DynamoDBKVStoreImpl.scala`, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic `Exception` type in methods like `extractListValues`.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-05-19T17:50:44.503Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 789
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala:531-533
Timestamp: 2025-05-19T17:50:44.503Z
Learning: Scala 2.12 doesn't have `toIntOption`. For safely parsing strings to integers in Scala 2.12, use `scala.util.Try(string.toInt).getOrElse(...)` or check with regex using `string.matches("\\d+")` before calling `toInt`.

Applied to files:

  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Applied to files:

  • online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.

Applied to files:

  • spark/src/main/scala/ai/chronon/spark/Join.scala
🧬 Code graph analysis (8)
online/src/main/scala/ai/chronon/online/TileCodec.scala (3)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • unWindowed (301-319)
  • unpack (275-297)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
api/src/main/scala/ai/chronon/api/DataType.scala (1)
  • unpack (235-235)
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2)
spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (3)
  • StagingQuery (16-94)
  • StagingQuery (114-183)
  • from (152-164)
api/src/main/scala/ai/chronon/api/Extensions.scala (3)
  • from (327-369)
  • setups (574-582)
  • setups (1127-1131)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
  • online (149-149)
  • online (177-181)
  • Fetcher (55-147)
  • Fetcher (152-760)
  • Response (68-68)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
  • fetch (30-34)
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (2)
  • ScalaJavaConversions (5-96)
  • IteratorOps (50-54)
online/src/main/scala/ai/chronon/online/Api.scala (2)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
  • fetch (30-34)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
  • fetch (48-53)
  • fetch (57-61)
  • fetch (65-71)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
  • toSeq (29-29)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
  • multiGet (108-118)
  • multiPut (308-359)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
  • multiGet (115-156)
  • multiPut (199-219)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (36-53)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)
online/src/main/scala/ai/chronon/online/Api.scala (1)
  • KVStore (36-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
  • GitHub Check: build
  • GitHub Check: python_tests
  • GitHub Check: python_lint
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
🔇 Additional comments (48)
.github/workflows/push_to_canary.yaml (3)

15-15: Verify SCALA_VERSION aligns with PR intent.

PR title indicates Scala 2.13 migration, but SCALA_VERSION is hardcoded to 2.12.18. Confirm whether this is intentional (e.g., maintaining backward compatibility as default) or if it should be updated to a 2.13.x version.


84-88: Versioned Mill targets are properly configured.

The escape sequences for ${{ env.SCALA_VERSION }} are correct, and all five modules (flink, cloud_gcp, cloud_aws, service, flink_connectors) consistently use the versioned target syntax.


94-94: Artifact upload paths consistently use versioned directory structure.

All five artifact uploads have been updated to include the ${{ env.SCALA_VERSION }} subdirectory in the path. The versioned paths are uniform across flink, cloud_aws, cloud_gcp, service, and flink_connectors modules.

Also applies to: 100-100, 106-106, 112-112, 118-118

.github/workflows/reusable_spark_tests.yaml (2)

1-12: Reusable workflow refactoring looks solid.

Clean conversion to support dynamic Scala version selection. Input parameter is properly required and documented; environment variable centralizes version management.


35-35: All test jobs consistently updated to use versioned mill selector.

Each job now parameterizes the Scala version via spark[${{ env.SCALA_VERSION }}], enabling cross-version testing. Bracket escaping is correct.

Also applies to: 57-57, 79-79, 101-101, 123-123, 145-145, 167-167, 189-189, 211-211, 233-233

online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1)

215-221: GenericData.Array wildcard pattern is appropriate

Matching on GenericData.Array[_] correctly generalizes the element type, avoiding over-specific type params while preserving the existing iteration behavior.

spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (2)

34-35: Import cleanup aligns with scala.Seq usage.

Dropping scala.collection.Seq in favor of scala.collection.mutable keeps Seq as immutable scala.Seq, matching the rest of the file and the 2.13 migration direction.


547-566: Correctly convert mutable buffer to immutable Seq for exception.

Using inconsistentFields.toSeq matches the IncompatibleSchemaException signature (Seq[(String, DataType, DataType)]) and avoids leaking a mutable buffer.

spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1)

31-31: Import cleanup matches 2.13 collections usage

Using scala.collection.{immutable, mutable} and dropping scala.collection.Seq is correct here: immutable.Seq/mutable.ListBuffer stay explicit, and unqualified Seq cleanly resolves via scala.Seq/Predef without changing behavior.

.github/workflows/reusable_non_spark_tests.yaml (1)

1-12: ✓ Workflow call pattern correctly structured.

The conversion to workflow_call with parametrized scala-version input is well-designed for cross-version testing. Naming and structure follow GitHub Actions conventions.

spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1)

29-29: LGTM!

Correct Scala 2.13 migration: removing explicit scala.collection.Seq import allows the code to use Predef.Seq (immutable by default in 2.13).

spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1)

39-44: Redundant Seq import removal looks safe

Dropping the explicit scala.collection.Seq import means Seq now resolves to Predef.Seq (immutable.Seq in 2.13), but all usages here are test‑local and use immutable collections anyway, so behavior and test semantics remain unchanged while aligning with the wider 2.13 cleanup.

flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1)

22-22: Import cleanup LGTM

Removing the unused scala.collection.Seq import here (leaving only the blank separator line) keeps the test imports minimal and has no behavioral impact.

flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1)

20-20: Seq import removal is appropriate

Keeping JavaConverters for .asScala while dropping the redundant scala.collection.Seq import matches the collection-normalization effort and doesn’t affect behavior.

aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)

22-22: IteratorOps import is appropriate here

Using ScalaJavaConversions.IteratorOps to wrap java.util.Iterator → Scala Iterator is consistent with the shared conversion helpers and keeps 2.13 Java/Scala interop localized. Looks good.

spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)

66-66: LGTM – explicit .toSeq conversion.

The .asScala.toSeq ensures an immutable Seq[String], which is correct for Scala 2.13 compatibility.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3)

19-19: LGTM: Import simplification for Scala 2.13.

Removing explicit Seq import aligns with cross-version compatibility, as the code now relies on Predef's default Seq and uses explicit .toSeq conversions where needed.


147-147: LGTM: Explicit Seq conversion for cross-version compatibility.

The .toSeq ensures the collection is explicitly converted to Seq[KVStore.TimedValue], which is essential for Scala 2.13 compatibility.


236-236: LGTM: Explicit Seq conversion for cross-version compatibility.

The .toSeq ensures the collection is explicitly converted to Seq[KVStore.ListValue], which is essential for Scala 2.13 compatibility.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)

104-107: RuntimeConfig version bump looks safe

Only the Dataproc runtime version string changes; no other logic around RuntimeConfig is modified. From this file’s perspective, the change is isolated and looks fine.

spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)

11-11: Alias import clarifies which StagingQuery is used

Good call aliasing ai.chronon.spark.batch.StagingQuery so the API StagingQuery name stays unambiguous in this file.

spark/src/main/scala/ai/chronon/spark/Join.scala (4)

34-34: LGTM! Import cleanup.

Removed unused Seq from mutable import, aligning with explicit scala.Seq usage elsewhere.


149-150: LGTM! Explicit Seq conversion.

The .toSeq conversion ensures type consistency for Scala 2.13.


208-208: LGTM! Consistent Seq conversion.

Explicit .toSeq ensures the return type matches the method signature.


329-329: LGTM! Java collection conversion.

The list.asScala.toSeq pattern correctly converts Java collections to Scala Seq.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)

51-76: LGTM! API signature updates.

Method signatures correctly updated to use scala.Seq for Scala 2.13 compatibility. The .toSeq conversion at line 72 ensures proper type alignment.


89-100: LGTM! Consistent signature update.

multiPut signature updated to match the pattern established in multiGet.

online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)

16-37: LGTM! KVStore signature alignment.

Method signatures updated to use scala.Seq, consistent with the KVStore trait and other implementations.

api/src/main/scala/ai/chronon/api/DataType.scala (1)

234-261: LGTM! StructType now Seq-compatible.

The signature updates and new methods (iterator, stringPrefix, typeOf, castArr, cast) properly extend StructType to behave as a Seq[StructField], improving ergonomics while maintaining backwards compatibility.

online/src/main/scala/ai/chronon/online/Api.scala (1)

208-211: LGTM! ExternalSourceHandler trait migration.

The shift from abstract class to trait with scala.Seq signatures aligns with Scala 2.13 patterns and maintains Java interop (traits compile to interfaces).

online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1)

34-56: LGTM! Java interop update.

Correctly changed from extends to implements since ExternalSourceHandler is now a Scala trait (compiles to Java interface).

online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)

22-34: LGTM! Signature alignment.

Import cleanup and fetch signature updated to use scala.Seq, consistent with the ExternalSourceHandler trait changes.

spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)

19-27: LGTM! Import consolidation.

Grouped imports for better readability.


47-72: LGTM! Mock handler migration.

All mock external handlers correctly updated to extend the trait-based ExternalSourceHandler with scala.Seq signatures. The implementations properly return Future[scala.Seq[Fetcher.Response]].

online/src/main/scala/ai/chronon/online/TileCodec.scala (2)

32-32: LGTM—Seq import cleanup aligns with Scala 2.13 defaults.

Removing the explicit Seq import lets the code use scala.collection.immutable.Seq (from Predef), which is the Scala 2.13 standard.


40-40: LGTM—explicit .toSeq ensures proper materialization.

The .toSeq calls guarantee a concrete Seq type after flatMap, avoiding Scala 2.13 collection wrapper issues.

Also applies to: 49-49

.github/workflows/test_grype.yaml (4)

27-28: Clean SCALA_VERSION environment variable setup.

Centralizing the Scala version as an env var is a solid approach for cross-build orchestration.


47-47: Versioned mill targets consistently applied.

All four module builds use the [${{ env.SCALA_VERSION }}] syntax uniformly.

Also applies to: 50-50, 53-53, 56-56


62-63: Versioned artifact paths are consistent.

All Mill output paths follow the pattern out/<module>/${{ env.SCALA_VERSION }}/assembly.dest/out.jar and are mirrored in uploads and scans. Paths properly flow through artifact upload → download → Grype scan stages.

Also applies to: 82-82, 88-88, 94-94, 100-100


52-56: Original concern about Flink JARs is incorrect; actual issue is missing cloud_aws build.

Verification confirms the Dockerfile does not reference Flink JARs, so lines 52–56 building them is unused for Docker. However, the Dockerfile expects build_output/cloud_aws_lib_deploy.jar (for CLOUD_AWS_JAR_PATH), but the workflow has no build step for cloud_aws. Add the missing build and copy step:

      - name: Build Cloud AWS JAR
        run: ./mill cloud_aws\[${{ env.SCALA_VERSION }}\].assembly
      
      ...then in "Prepare build artifacts":
      cp out/cloud_aws/${{ env.SCALA_VERSION }}/assembly.dest/out.jar build_output/cloud_aws_lib_deploy.jar
⛔ Skipped due to learnings
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
online/src/main/java/ai/chronon/online/JavaFetcher.java (2)

97-120: Builder section: no behavioral change.

Spacing around builder methods only; fluent API and semantics unchanged.


127-143: Response conversion path looks correct.

Creating a pre-sized ArrayList and filling it via the iterator over resps.responses() is simple and safe; no issues spotted.

spark/package.mill (1)

7-10: LGTM!

Cross-build pattern correctly implemented with Cross.Module[String] trait and crossValue wiring.

service/package.mill (1)

6-8: LGTM!

Cross-version pattern correctly applied.

aggregator/package.mill (1)

6-9: LGTM!

Cross-build wiring is consistent with other modules.

online/package.mill (1)

6-9: LGTM!

Cross-version module correctly configured.

cloud_aws/package.mill (1)

2-9: Cross wiring for cloud_aws looks good.

Cross[CloudAwsModule], crossValue-based moduleDeps, and test.scalaVersion/moduleDeps are consistent; no blockers here.

Also applies to: 24-28

flink_connectors/package.mill (1)

2-12: Cross-version wiring for flink_connectors looks consistent.

Cross module, crossValue-based deps, and test.scalaVersion/moduleDeps line up with the new BaseModule/BaseTestModule pattern.

Also applies to: 26-29

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
service/package.mill (1)

51-58: Add def scalaVersion = crossValue to service test module.

The service test module (lines 51-59 in service/package.mill) uses crossValue in moduleDeps but lacks the explicit scalaVersion = crossValue definition. BaseJUnit4TestModule doesn't provide this—all other cross-compiled test modules (spark, flink, online, flink_connectors, cloud_gcp, cloud_aws, api, aggregator) explicitly set it in their test blocks. Add the missing line to maintain consistency.

🧹 Nitpick comments (1)
build.mill (1)

183-229: Add pre-flight checks for gcloud availability.

Shell commands assume gcloud is installed and authenticated. Consider verifying availability before attempting upload to provide clearer errors.

   def publishGcsAssembly() = Task.Command {
     if (!publishToGcsEnabled) {
       println(s"GCS publishing disabled for ${artifactName()}")
       ()
     } else {
+      // Verify gcloud is available
+      try {
+        os.proc("gcloud", "version").call(stdout = os.Pipe, stderr = os.Pipe)
+      } catch {
+        case e: Exception =>
+          throw new Exception("gcloud CLI not found. Install and authenticate gcloud before publishing.", e)
+      }
+
       // Get the assembly jar path
       val jarPathRef = assembly()

Also applies to: 234-280

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95972c2 and cbda7d4.

📒 Files selected for processing (11)
  • aggregator/package.mill (2 hunks)
  • api/package.mill (2 hunks)
  • build.mill (5 hunks)
  • cloud_aws/package.mill (2 hunks)
  • cloud_gcp/package.mill (3 hunks)
  • flink/package.mill (2 hunks)
  • flink_connectors/package.mill (2 hunks)
  • online/package.mill (2 hunks)
  • service/package.mill (2 hunks)
  • service_commons/package.mill (2 hunks)
  • spark/package.mill (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • aggregator/package.mill
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • spark/package.mill
  • cloud_aws/package.mill
  • cloud_gcp/package.mill
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • service_commons/package.mill
  • build.mill
  • cloud_gcp/package.mill
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: build
  • GitHub Check: python_lint
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (10)
spark/package.mill (2)

6-12: LGTM!

Cross-version module wiring is clean and consistent with the pattern used across other modules.


29-32: LGTM!

Test module properly wires scalaVersion and moduleDeps using crossValue.

cloud_gcp/package.mill (2)

7-14: LGTM!

Cross-version wiring follows established pattern.


59-62: LGTM!

Test module correctly uses crossValue for version and dependency wiring.

service/package.mill (1)

5-11: LGTM!

Cross-version module pattern is consistent.

flink/package.mill (2)

5-13: LGTM!

Cross-version wiring is clean.


43-45: LGTM!

Test module properly configured with crossValue.

online/package.mill (2)

5-11: LGTM!

Cross-version module structure is consistent with the rest of the codebase.


40-43: LGTM!

Test module wiring correct.

api/package.mill (1)

25-27: CHRONON_VERSION env var is never set in publishing workflows—Thrift artifacts will always publish as "0.0.32".

Verification found:

  • publish_release.yaml and push_to_canary.yaml set VERSION and pass ZIPLINE_VERSION to mill
  • api/package.mill lines 25-27 and 44-46 read CHRONON_VERSION (not set)
  • The Thrift publishVersion tasks fall back to hardcoded default "0.0.32"
  • Inconsistent env var naming: VERSION (workflows) vs CHRONON_VERSION (code) vs ZIPLINE_VERSION (python wheel)

Either export CHRONON_VERSION in workflows or align the env var name across all artifact types.

Comment on lines +18 to +19
val defaultScalaVersion = "2.12.18"
val scalaVersions = Seq(defaultScalaVersion, "2.13.17")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify JavaConverters migration is complete for Scala 2.13.

Previous reviews found 41 deprecated scala.collection.JavaConverters imports that must be migrated to scala.jdk.CollectionConverters for Scala 2.13 compatibility.


🏁 Script executed:

#!/bin/bash
# Check for remaining deprecated JavaConverters imports
echo "=== Deprecated JavaConverters count ==="
rg -c "scala\.collection\.JavaConverters" --type scala | wc -l

echo "=== New CollectionConverters count ==="
rg -c "scala\.jdk\.CollectionConverters" --type scala | wc -l

echo "=== Files still using JavaConverters ==="
rg "scala\.collection\.JavaConverters" --type scala -l | head -20

Length of output: 1464


Migration to scala.jdk.CollectionConverters is incomplete for Scala 2.13 support.

Verification confirms 41 files still use the deprecated scala.collection.JavaConverters import. With Scala 2.13.17 added at line 19, these will cause compilation/runtime failures. Migrate all 41 occurrences to scala.jdk.CollectionConverters:

  • spark/src/main/scala/ (11 files)
  • spark/src/test/scala/ (5 files)
  • online/src/main/scala/ (6 files)
  • online/src/test/scala/ (3 files)
  • flink/src/test/scala/ (1 file)
  • other modules (15 files)

@tchow-zlai tchow-zlai added this pull request to the merge queue Nov 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 25, 2025
@tchow-zlai tchow-zlai added this pull request to the merge queue Nov 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 25, 2025
@tchow-zlai tchow-zlai enabled auto-merge November 25, 2025 03:24
@tchow-zlai tchow-zlai added this pull request to the merge queue Nov 25, 2025
Merged via the queue into main with commit 170b4a6 Nov 25, 2025
45 checks passed
@tchow-zlai tchow-zlai deleted the tchow/serverless-classpath branch November 25, 2025 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants