Conversation
WalkthroughNormalized collection conversions and Seq usage, added StructType Seq helpers, adjusted Java↔Scala fetcher interop, refined BigTable/KVStore materialization, introduced Mill cross-build + GCS publishing, and updated CI workflows to run versioned Mill targets and publish versioned artifacts. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant WF as Reusable Workflow
participant Mill as Mill
participant Mod as Cross Module
Note over GH,WF: caller provides scala-version / SCALA_VERSION
GH->>WF: workflow_call(scala-version)
WF->>Mill: run targets (module[SCALA_VERSION].assembly/test)
Mill->>Mod: build/test for crossValue
Mod-->>Mill: artifact / test results
Mill-->>GH: upload artifacts to versioned paths (out/<module>/<SCALA_VERSION>/...)
sequenceDiagram
participant Java as Java caller
participant JavaFetcher as JavaFetcher
participant Trait as ExternalSourceHandler (trait)
participant ScalaImpl as Scala fetcher impl
Java->>JavaFetcher: call with Java List<Req>
JavaFetcher->>Trait: convert Java List -> scala.Seq, invoke fetch
Trait->>ScalaImpl: fetch(requests: scala.Seq[Request])
ScalaImpl-->>Trait: Future[scala.Seq[Response]]
Trait-->>JavaFetcher: convert scala.Seq -> Java List
JavaFetcher-->>Java: CompletableFuture<List<Response>>
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings📚 Learning: 2024-12-03T04:04:33.809ZApplied to files:
📚 Learning: 2025-02-22T20:30:28.381ZApplied to files:
📚 Learning: 2024-10-17T01:09:24.653ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
🔇 Additional comments (3)
Comment |
0b5d10e to
1015074
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/src/main/scala/ai/chronon/api/DataType.scala (1)
244-261:cast()method lacks test coverage;castArr()is well-tested.The
castArrmethod has solid coverage inAvroConversionsListTest.scalawith tests for mixed types, empty lists, and nulls. However, thecastmethod (returningMap[String, AnyRef]) has no dedicated tests. Add tests forcastcovering missing keys, null values, and type conversions viaDataType.castTo.
🧹 Nitpick comments (6)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (1)
27-27: Logger declared but never used.The logger field and import are unused. Consider removing or add log statements if logging was intended.
-import org.slf4j.LoggerFactory - - @transient private lazy val logger = LoggerFactory.getLogger(getClass) -Also applies to: 47-48
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2)
113-113: Hardcoded cluster path.The Dataproc cluster path is hardcoded. If this config is enabled in the future, make it configurable via constructor parameters or environment variables.
133-133: Commented-out code.Either enable, remove, or document with a TODO why this is disabled.
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
68-72: Consider using logger instead of printlnDfWithStats.prunePartitions logs via println; switching to the existing logger for this diagnostic would keep logging consistent and easier to control in production.
api/src/main/scala/ai/chronon/api/DataType.scala (1)
240-242: Add tests fortypeOfmethod.Verification confirms
typeOfis a public method actively used inOnlineDerivationUtil.scala(lines 120, 125) but has no test coverage. Consider adding tests to the DataType test suite.spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (1)
150-150: Consider adding.toSeqfor consistency.While
mutable.Bufferis a validSeq, adding.toSeqhere would match the pattern used at line 189 and provide an immutable collection.- val sources = groupByConf.sources.toScala + val sources = groupByConf.sources.toScala.toSeq
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (102)
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala(0 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(0 hunks)api/src/main/scala/ai/chronon/api/CollectionExtensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/DataType.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/QueryUtils.scala(1 hunks)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala(1 hunks)api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala(1 hunks)api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala(0 hunks)api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala(0 hunks)build.mill(1 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala(0 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala(0 hunks)cloud_gcp/package.mill(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala(5 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala(3 hunks)flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala(2 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala(2 hunks)online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java(1 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(1 hunks)online/src/main/scala/ai/chronon/online/Api.scala(1 hunks)online/src/main/scala/ai/chronon/online/CatalystUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala(1 hunks)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala(1 hunks)online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala(0 hunks)online/src/main/scala/ai/chronon/online/HTTPKVStore.scala(1 hunks)online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/TileCodec.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala(2 hunks)online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(0 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala(1 hunks)online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Join.scala(4 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/Eval.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala(1 hunks)
💤 Files with no reviewable changes (43)
- flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
- spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
- api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
- api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
- api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
- spark/src/main/scala/ai/chronon/spark/JoinBase.scala
- api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
- online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
- flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
- online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
- online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
- online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
- spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
- flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
- flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
- spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
- flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
- api/src/main/scala/ai/chronon/api/Extensions.scala
- flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
- api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
- api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
- online/src/main/scala/ai/chronon/online/CatalystUtil.scala
- spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
- online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
- flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
- spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
- api/src/main/scala/ai/chronon/api/Builders.scala
- api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
- spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
🧰 Additional context used
🧠 Learnings (26)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scalabuild.millspark/src/main/scala/ai/chronon/spark/GroupBy.scalaapi/src/main/scala/ai/chronon/api/planner/TableDependencies.scalaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scalaspark/src/main/scala/ai/chronon/spark/GroupByUpload.scalaonline/src/main/scala/ai/chronon/online/serde/SparkConversions.scalaflink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scalaflink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scalaonline/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scalaspark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scalaspark/src/main/scala/ai/chronon/spark/batch/SourceJob.scalaspark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scalaaggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaapi/src/main/scala/ai/chronon/api/QueryUtils.scalaspark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scalaaggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalacloud_gcp/package.millflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scalaonline/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/main/scala/ai/chronon/spark/Extensions.scalaspark/src/main/scala/ai/chronon/spark/batch/Eval.scalaonline/src/main/scala/ai/chronon/online/serde/AvroConversions.scalaspark/src/main/scala/ai/chronon/spark/JoinUtils.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.javaflink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scalaspark/src/main/scala/ai/chronon/spark/Analyzer.scalaspark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scalaspark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scalaspark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scalaonline/src/main/scala/ai/chronon/online/Api.scalaonline/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scalaspark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scalaonline/src/main/java/ai/chronon/online/JavaExternalSourceHandler.javaspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/batch/MergeJob.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scalaspark/src/main/scala/ai/chronon/spark/Join.scalaspark/src/main/scala/ai/chronon/spark/BootstrapInfo.scalaspark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scalaonline/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scalaspark/src/main/scala/ai/chronon/spark/GroupBy.scalaonline/src/test/scala/ai/chronon/online/test/TileCodecTest.scalaspark/src/main/scala/ai/chronon/spark/GroupByUpload.scalaonline/src/main/scala/ai/chronon/online/serde/SparkConversions.scalaflink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scalaflink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scalaonline/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scalaspark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scalaspark/src/main/scala/ai/chronon/spark/batch/SourceJob.scalaspark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scalaspark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scalaaggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scalaspark/src/main/scala/ai/chronon/spark/Extensions.scalaspark/src/main/scala/ai/chronon/spark/batch/Eval.scalaonline/src/main/scala/ai/chronon/online/serde/AvroConversions.scalaflink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scalaspark/src/main/scala/ai/chronon/spark/Analyzer.scalaspark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scalaspark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scalaspark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scalaonline/src/main/scala/ai/chronon/online/Api.scalaspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/batch/MergeJob.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scalaspark/src/main/scala/ai/chronon/spark/Join.scalaspark/src/main/scala/ai/chronon/spark/BootstrapInfo.scalaspark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scalaonline/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Applied to files:
spark/src/main/scala/ai/chronon/spark/GroupBy.scalaspark/src/main/scala/ai/chronon/spark/GroupByUpload.scalaonline/src/main/scala/ai/chronon/online/serde/SparkConversions.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scalaspark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/Driver.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Applied to files:
online/src/main/scala/ai/chronon/online/serde/SparkConversions.scalaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scalaonline/src/main/scala/ai/chronon/online/fetcher/Fetcher.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scalaspark/src/main/scala/ai/chronon/spark/Driver.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalaspark/src/main/scala/ai/chronon/spark/Join.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2025-09-26T01:29:32.984Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1182
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala:169-191
Timestamp: 2025-09-26T01:29:32.984Z
Learning: In BigTableMetricsKvStore, schema keys (ending with "#schema") are always queried alone in multiGet requests and never mixed with regular metric keys, so there's no concern about timestamp filter leakage between different key types within the same query.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-09-24T00:22:38.257Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1169
File: spark/src/main/scala/ai/chronon/spark/batch/iceberg/IcebergPartitionStatsExtractor.scala:79-94
Timestamp: 2025-09-24T00:22:38.257Z
Learning: BigTableMetricsKvStore defaults tsMillis to System.currentTimeMillis() when not provided in PutRequest, so PutRequests without explicit timestamps are handled correctly by the KV store implementation.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-04-21T15:10:40.819Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-17T01:02:40.431Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In `DynamoDBKVStoreImpl.scala`, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic `Exception` type in methods like `extractListValues`.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-01-24T23:55:40.650Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scalaspark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-07T15:21:50.787Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-07T15:21:50.787Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/Driver.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/Driver.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2025-05-02T16:19:11.001Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
online/src/main/java/ai/chronon/online/JavaFetcher.javaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.
Applied to files:
flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
📚 Learning: 2024-10-31T18:27:44.973Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/Driver.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
📚 Learning: 2024-10-15T15:33:22.265Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
🧬 Code graph analysis (17)
api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (1)
flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (1)
ProjectedEvent(95-95)
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)
spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (4)
StagingQuery(16-94)StagingQuery(114-183)from(152-164)compute(84-93)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
fetch(30-34)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)
Fetcher(55-147)Fetcher(152-685)
online/src/main/scala/ai/chronon/online/TileCodec.scala (3)
spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1)
buildRowAggregator(134-153)api/src/main/scala/ai/chronon/api/Extensions.scala (2)
unWindowed(283-301)unpack(257-279)api/src/main/scala/ai/chronon/api/DataType.scala (1)
unpack(235-235)
spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
query(395-403)timeWheres(725-730)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
multiGet(108-118)multiPut(308-359)online/src/main/scala/ai/chronon/online/Api.scala (1)
KVStore(36-53)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
applyMapping(796-799)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
multiGet(108-118)multiPut(308-359)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
multiGet(115-156)multiPut(199-219)online/src/main/scala/ai/chronon/online/Api.scala (1)
KVStore(36-53)
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
runTimestampChecks(459-488)spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
inputDf(479-518)api/src/main/scala/ai/chronon/api/Extensions.scala (2)
table(439-439)rightToLeft(838-848)
spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
keyColumns(1004-1009)
spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (1)
api/src/main/scala/ai/chronon/api/ColorPrinter.scala (1)
red(17-17)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2)
spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
run(541-569)spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (1)
run(28-80)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
JoinUtils(38-431)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
api/src/main/scala/ai/chronon/api/Row.scala (1)
getAs(34-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: flink_tests
- GitHub Check: online_tests
- GitHub Check: batch_tests
- GitHub Check: stats_tests
- GitHub Check: udafs_tests
- GitHub Check: streaming_tests
- GitHub Check: kv_store_tests
- GitHub Check: join_tests
- GitHub Check: groupby_tests
- GitHub Check: fetcher_tests
- GitHub Check: spark_tests
- GitHub Check: analyzer_tests
- GitHub Check: build
- GitHub Check: enforce_triggered_workflows
| def fromJoinSources(sources: java.util.List[api.Source]): Seq[TableDependency] = { | ||
| Option(sources) | ||
| .map(_.asScala) | ||
| .map(_.toScala.toSeq) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Inconsistent conversion pattern.
Line 13 uses .asScala.toSeq for the same Java→Scala List conversion. Update line 13 to match this pattern for consistency.
#!/bin/bash
# Verify line 13 usage
rg -n "tableDependencies.*asScala" api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala🤖 Prompt for AI Agents
In api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala around line
131 (and also update the earlier occurrence at line 13), the Java→Scala List
conversion uses .toScala.toSeq which is inconsistent with the .asScala.toSeq
pattern used elsewhere; change the .toScala.toSeq call(s) to .asScala.toSeq so
both conversions use the same .asScala.toSeq pattern for consistency.
build.mill
Outdated
| // val scalaVersion = "2.12.18" | ||
| val scalaVersion = "2.13.17" |
There was a problem hiding this comment.
🧩 Analysis chain
Verify Scala 2.13.17 upgrade thoroughly.
This is a major version upgrade from 2.12.18 to 2.13.17 with significant breaking changes, particularly in the collections library. The PR checklist shows no tests, CI coverage, integration testing, or documentation updates.
🏁 Script executed:
#!/bin/bash
# Verify that the build succeeds with Scala 2.13 and all tests pass
echo "=== Checking if build compiles with Scala 2.13.17 ==="
mill all __.compile
echo "=== Checking if tests pass ==="
mill all __.test
echo "=== Checking for binary compatibility issues ==="
# Look for common Scala 2.13 migration issues
rg -n "JavaConverters" --type scala | head -20
rg -n "scala.collection.Seq" --type scala | head -20Length of output: 2504
🏁 Script executed:
#!/bin/bash
# Detailed analysis of Scala 2.13 migration status
echo "=== Count of deprecated JavaConverters imports ==="
rg -c "scala.collection.JavaConverters" --type scala | wc -l
rg "scala.collection.JavaConverters" --type scala | wc -l
echo ""
echo "=== Check for new scala.jdk.CollectionConverters usage ==="
rg "scala.jdk.CollectionConverters" --type scala | wc -l
echo ""
echo "=== Affected files count ==="
rg "scala.collection.JavaConverters" --type scala --files-with-matches | wc -l
echo ""
echo "=== Check build.mill for scala version lock-in ==="
cat build.mill | grep -A 20 "scalaVersion"Length of output: 2244
🏁 Script executed:
#!/bin/bash
# Search for Scala 2.13 migration guide info
# Verify the breaking change status
echo "Scala 2.13 collections migration is a known breaking change"Length of output: 59
Migrate all deprecated JavaConverters imports before upgrading to Scala 2.13.17.
Found 42 scala.collection.JavaConverters imports across 41 files that must be updated to scala.jdk.CollectionConverters. While 33 files have already been partially migrated, the remaining imports will cause compilation failures with the -deprecation flag. Complete the migration before merging this version upgrade.
Affected files include: spark/src/main/scala/ai/chronon/spark/Driver.scala, spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala, spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala, spark/src/main/scala/ai/chronon/spark/JoinBase.scala, spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala, online/src/main/scala/ai/chronon/online/TileCodec.scala, online/src/main/scala/ai/chronon/online/TopicChecker.scala, online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala, and others.
🤖 Prompt for AI Agents
In build.mill around lines 17-18, you're about to upgrade Scala to 2.13.17 but
there are 42 imports of scala.collection.JavaConverters across ~41 files that
must be migrated to scala.jdk.CollectionConverters; update each import
(including wildcard forms) to import scala.jdk.CollectionConverters._, keep
using .asScala/.asJava extension methods as-is, ensure any partially-migrated
files now use the new import consistently, run a project-wide search-and-replace
for "scala.collection.JavaConverters" → "scala.jdk.CollectionConverters" and
verify compilation with the -deprecation flag so no remaining deprecated imports
remain.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)
108-117: Parameterize hardcoded cluster reference.Line 114 hardcodes project ID and cluster name. Use class fields
projectIdandregioninstead, or make cluster configurable.-// .setDataprocCluster("projects/canary-443022/regions/us-central1/clusters/zipline-canary-cluster") +// .setDataprocCluster(s"projects/$projectId/regions/$region/clusters/${sparkHistoryClusterName}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala(5 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: build
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: analyzer_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: api_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: online_tests
- GitHub Check: streaming_tests
- GitHub Check: fetcher_tests
- GitHub Check: stats_tests
- GitHub Check: aggregator_tests
- GitHub Check: kv_store_tests
- GitHub Check: groupby_tests
- GitHub Check: udafs_tests
- GitHub Check: flink_tests
- GitHub Check: join_tests
- GitHub Check: spark_tests
- GitHub Check: batch_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (2)
134-134: Consistent scaffolding approach.Appropriately commented to match the peripheral config definition above.
106-106: RuntimeConfig version "2.3" is valid and supports Scala 2.13.Version 2.3 is GA and includes Scala 2.13. The change at line 106 is correct.
piyush-zlai
left a comment
There was a problem hiding this comment.
LGTM - can stamp when we've got the dual build + publish going
| .setVersion("1.2") | ||
| .setVersion("2.3") | ||
|
|
||
| // Add PeripheralsConfig if needed. This should be set in EnvironmentConfig below. |
There was a problem hiding this comment.
it's useful as a reference, it's not well documented I had to reverse engineer this from CLI behavior
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
api/package.mill (1)
7-9: Same double-cross pattern for api testsHere
apiis alreadyCross[ApiModule](scalaVersions), andobject testis anotherCross[TestModule](scalaVersions); same concern ascloud_gcpabout unnecessary extra cross combinations and potential version mismatches. If you don’t need mixed pairs, collapsing to a single Scala-axis for tests would be simpler.Also applies to: 47-51
online/package.mill (1)
38-41: online test is also double-crossedThe
onlinemodule isCross[OnlineModule](scalaVersions)andobject testisCross[TestModule](scalaVersions), mirroring the double-cross test pattern noted forcloud_gcp/api. Simplifying to a single Scala cross-axis for tests would avoid redundant combinations unless you explicitly need mixed-version pairs.service/package.mill (1)
7-9: RedundantCross.Module[String]on ServiceModuleAs with
OnlineModule,ServiceModuleinheritsCross.Module[String]even thoughbuild.BaseModulealready extendsCrossScalaModule; you can likely dropCross.Module[String]here and rely on the CrossScalaModule machinery alone.
🧹 Nitpick comments (6)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)
64-72: Misleading name and duplicate implementation.
JavaPlusOneExternalHandlerno longer uses Java types and duplicatesPlusOneExternalHandlerexactly. Consider renaming or removing if not needed for distinct test scenarios.online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
22-22: UnifySequsage for clarity
ContextualHandler.fetchusesscala.SeqwhilefetchRequests/responsesByNameFuse bareSeq; all compile, but it’s a bit inconsistent. Consider standardizing onscala.Seq(or a single imported alias) throughout this file for readability and future maintenance.Also applies to: 30-30
online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1)
34-56: Re-add@Overrideonfetchfor interface contract safetyNow that this is
implements ExternalSourceHandler, annotatingfetchwith@Overridewill ensure the compiler enforces the exact trait signature going forward.- //java friendly method - public abstract CompletableFuture<java.util.List<JavaResponse>> fetchJava(java.util.List<JavaRequest> requests); - - public Future<Seq<Fetcher.Response>> fetch(Seq<Fetcher.Request> requests) { + //java friendly method + public abstract CompletableFuture<java.util.List<JavaResponse>> fetchJava(java.util.List<JavaRequest> requests); + + @Override + public Future<Seq<Fetcher.Response>> fetch(Seq<Fetcher.Request> requests) {online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
143-151:convertJavaRequestListlogic is sound; naming nit onlyBuilding a
java.util.List<Fetcher.Request>and instrumenting off the original Java requests is correct; you might optionally renamescalaRequeststorequestsScalaSeq/scalaRequestListto avoid confusion now that it’s a JavaList.cloud_gcp/package.mill (1)
8-12: Avoid double-crossing cloud_gcp tests unless you really need it
cloud_gcpis alreadyCross[CloudGcpModule](scalaVersions), andobject testis a secondCross[TestModule](scalaVersions), so you get 2×2 Scala-version combinations withmoduleDepskeyed only by the innercrossValue. Unless you intentionally want mixed pairs (e.g.cloud_gcp[2.12].test[2.13]depending oncloud_gcp(2.13)), consider simplifying to a single axis (e.g.object test extends TestModule) so each Scala version is tested once.Also applies to: 22-25, 54-79
online/package.mill (1)
8-9:OnlineModuleprobably doesn’t need to extendCross.Module[String]Because
build.BaseModulealready extendsCrossScalaModule(which is designed for use withCross[...]), inheritingCross.Module[String]again inOnlineModuleis likely redundant; dropping it would simplify the hierarchy without changing behaviour.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
aggregator/package.mill(2 hunks)api/package.mill(2 hunks)build.mill(4 hunks)cloud_aws/package.mill(2 hunks)cloud_gcp/package.mill(3 hunks)flink/package.mill(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java(1 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(1 hunks)online/src/main/scala/ai/chronon/online/Api.scala(2 hunks)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala(1 hunks)service/package.mill(2 hunks)service_commons/package.mill(2 hunks)spark/package.mill(3 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- online/src/main/scala/ai/chronon/online/Api.scala
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
cloud_gcp/package.millcloud_aws/package.millspark/package.mill
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
build.millspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.javaonline/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
online/src/main/java/ai/chronon/online/JavaFetcher.java
🧬 Code graph analysis (2)
online/src/main/java/ai/chronon/online/JavaFetcher.java (3)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
fetchGroupBys(182-184)fetchJoin(186-233)fetchJoinV2(273-315)FeaturesResponseType(50-53)AvroString(80-80)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(5-96)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
listJoins(166-218)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
fetch(48-53)fetch(57-61)fetch(65-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: streaming_tests
- GitHub Check: groupby_tests
- GitHub Check: udafs_tests
- GitHub Check: spark_tests
🔇 Additional comments (14)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
19-27: LGTM on import updates.Import consolidation and explicit Scala 2.13 types look good.
47-54: LGTM on Scala 2.13 collection types.Signature correctly uses
scala.Seqper the migration.
56-62: LGTM on handler type migration.Correctly extends
ExternalSourceHandlerand usesscala.Seq.online/src/main/java/ai/chronon/online/JavaFetcher.java (3)
125-141: Manual Seq→List conversion inconvertResponsesWithTslooks correctIterating
resps.responses().toList()with ascala.collection.Iteratorinto a pre-sizedArrayListpreserves semantics and is efficient enough; metrics wiring is unchanged.
153-182: Java↔Scala conversions for fetch methods look consistent
fetchGroupBys,fetchJoin, andfetchJoinBase64Avronow uniformly:
- convert
List<JavaRequest>→List<Fetcher.Request>→ ScalaSeqviaScalaJavaConversions.toScala- call the Scala fetcher
- map back via
convertResponsesWithTsTypes and control flow align with the Scala
FetcherAPI.
184-188:listJoinsconversion path is cleanWrapping
metadataStore().listJoinswithFutureConverters.toJavaandScalaJavaConversions::toJavais a straightforward ScalaSeq→ JavaList<String>bridge; no issues.service_commons/package.mill (1)
6-9: Crossed service_commons wiring looks good
Cross[ServiceCommonsModule](scalaVersions)and the crossValue-basedmoduleDepsfor main and test modules are consistent and look correct.Also applies to: 21-23
build.mill (1)
1-1: Scala cross-version setup looks appropriateUsing
Constants.scalaVersions = Seq("2.12.18", "2.13.17")together withBaseModule/BaseTestModuleextendingCrossScalaModuleis a sound basis for the newCross[...]modules; just ensure the full cross build (__.compile/__.testacross both versions) passes in CI before merging.Also applies to: 17-21, 115-115, 147-148
service/package.mill (1)
48-55: Service JUnit4 test wiring looks fineThe JUnit4
testmodule correctly depends onbuild.service(crossValue)and just layers its own test dependencies on top ofsuper.mvnDeps(), which is a reasonable pattern.flink/package.mill (1)
2-2: Flink cross-build wiring LGTMImport switch to
mill.api._, Cross[FlinkModule] plus crossValue-basedmoduleDeps, and the Cross-basedtestmodule all look consistent with the new pattern; nothing blocking here.Also applies to: 6-8, 11-11, 41-43
spark/package.mill (1)
2-2: Spark cross-module + test wiring looks goodCross[SparkModule]/TestModule, crossValue-based
moduleDeps, and reuse of the common test fork args/framework are coherent and align with the other modules.Also applies to: 7-10, 27-30, 31-31, 33-33
cloud_aws/package.mill (1)
2-2: Cloud AWS cross-build + tests LGTMCross[CloudAwsModule]/TestModule, crossValue-based
moduleDeps, and the test-only container deps are wired sensibly and match the pattern used elsewhere.Also applies to: 6-9, 24-30
flink_connectors/package.mill (1)
2-2: Verify google-cloud-pubsub version alignment across modulesVersion skew confirmed:
flink_connectorsmain uses1.141.2while test uses1.114.7(27 versions behind);cloud_gcpuses1.134.2. Test exclusions also differ (main excludes jackson-core + jackson-databind; test excludes only jackson-core). Confirm whether version differences are intentional (e.g., test compatibility constraints) or should be aligned to avoid classpath conflicts.aggregator/package.mill (1)
5-9: Review comment is incorrect
TestModulecan accesscrossValuethroughbuild.BaseTestModule, which extends Mill'sCrossScalaModule—a framework class that providesCross.Modulesupport. No explicitCross.Module[String]mixin is needed; the code compiles and works as designed.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
cloud_aws/package.mill (1)
12-22: Consider centralizing AWS SDK versionThe repeated
"2.30.13"across AWS SDK deps can be hoisted intobuild.Constants(e.g.,awsSdkVersion) to simplify upgrades and keep versions in sync.flink_connectors/package.mill (1)
23-24: Align Netty HTTP/2 version with other modulesHere
netty-codec-http2is pinned to4.1.124.Final, whileflink/package.milluses4.1.125.Final. Consider aligning these to a single version to avoid pulling multiple Netty variants onto the classpath.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
aggregator/package.mill(2 hunks)api/package.mill(2 hunks)build.mill(3 hunks)cloud_aws/package.mill(2 hunks)cloud_gcp/package.mill(3 hunks)flink/package.mill(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)spark/package.mill(3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
spark/package.millcloud_aws/package.millcloud_gcp/package.mill
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
build.mill
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: stats_tests
- GitHub Check: udafs_tests
- GitHub Check: fetcher_tests
- GitHub Check: groupby_tests
- GitHub Check: join_tests
- GitHub Check: batch_tests
- GitHub Check: analyzer_tests
🔇 Additional comments (9)
build.mill (2)
17-17: SharedscalaVersionsconstant is a good cross-build anchor.Defining
Constants.scalaVersions = Seq("2.12.18", "2.13.17")centralizes the axes and lines up with the Cross modules in the packages; looks good—just confirm there’s no remaining code still reading an old singlescalaVersionconstant.
111-141: All modules extending BaseModule are properly configured via Cross[...]—verification complete.All three modules that mix in
build.BaseModuleare already correctly wrapped inCross[...](build.Constants.scalaVersions):
service_commons/package.mill:objectpackageextends Cross[ServiceCommonsModule](build.Constants.scalaVersions)cloud_gcp/package.mill:objectpackageextends Cross[CloudGcpModule](build.Constants.scalaVersions)api/package.mill:objectpackageextends Cross[ApiModule](build.Constants.scalaVersions)Test modules use appropriate base traits (BaseTestModule/BaseJUnitTestModule) with
def scalaVersion = crossValue. The switch toCrossScalaModuleis sound and all users are correctly configured.api/package.mill (1)
7-9: API cross-module and test wiring look correct.
Cross[ApiModule](build.Constants.scalaVersions)plustest'sscalaVersion = crossValueandmoduleDeps = Seq(build.api(crossValue))give a clean 2.12.18/2.13.17 split with matching deps. Verify both variants compile and tests run across Scala versions.Also applies to: 47-49
spark/package.mill (1)
7-11: Cross-module wiring verified—code is correct.All referenced modules (aggregator, api, online) exist;
crossValuepropagates consistently throughmoduleDepsand test configuration. No issues.online/package.mill (1)
6-10: Cross-module wiring verified—no structural issues found.All referenced symbols are properly defined:
Constantsinbuild.mill,BaseModuletrait atbuild.mill:112, and cross-awareaggregator/apimodules exist.OnlineModulefollows the same pattern asSparkModule,FlinkModule, and others—Cross.Module[String]withcrossValue-parameterized dependencies,scalaVersion()for version-specific deps, and test module withscalaVersion = crossValue. Structure is sound.aggregator/package.mill (1)
6-9: Aggregator cross-module setup is correct.Code inspection confirms:
build.Constants.scalaVersions = Seq("2.12.18", "2.13.17")exists- All modules declare
package build.X+Cross[Module](scalaVersions)patternbuild.api(crossValue)andbuild.aggregator(crossValue)resolve via Mill's Cross convention- Version threading in tests (
scalaVersion = crossValue) is consistent- No circular dependencies or missing references
Pattern is applied uniformly across all modules.
flink/package.mill (1)
6-12: Flink cross-module + test wiring looks good
FlinkModulecorrectly extendsCross.Module[String], andcrossValueusage inmoduleDepsandtest(scalaVersion/moduleDeps) matches the pattern used elsewhere. No changes requested.Also applies to: 41-44
cloud_aws/package.mill (1)
6-10: CloudAws cross-module + tests are wired correctly
CloudAwsModuleextendingCross.Module[String]and usingcrossValueinmoduleDepsand test (scalaVersion/moduleDeps) looks consistent and compile-safe.Also applies to: 24-28
flink_connectors/package.mill (1)
6-12: Flink connectors cross-module + tests look consistent
FlinkConnectorsModuleextendingCross.Module[String]and usingcrossValueinmoduleDepsand test wiring is consistent with the other modules; no functional issues spotted.Also applies to: 26-29
607c712 to
62fc7f2
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)
17-17: Optional: Remove redundant.toSeqcalls.
Seq(...).toSeqis redundant—Seq(...)already returns a Seq.Example for line 17:
- ).toSeq.toDF("id", "name", "value", "timestamp") + ).toDF("id", "name", "value", "timestamp")Also applies to: 23-23, 95-95, 101-101, 243-243, 249-249
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (1)
189-189: Conversion correct; consider reusingsourcesvariable.The
.toScala.toSeqconversion is correct. Optionally, reuse thesourcesvariable from line 150 to avoid redundant conversion:runTimestampChecks(sources.toSeq, groupBy.inputDf).online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
130-135: Consider using ScalaJavaConversions utility for consistency.Direct Scala Iterator usage from Java may have Scala 2.13 compatibility issues. The
ScalaJavaConversions.toJava()method (shown in relevant snippets) handles this conversion safely.Apply this diff:
- scala.collection.immutable.List<T> scalaList = resps.responses().toList(); - List<JavaResponse> jResps = new ArrayList<>(scalaList.size()); - Iterator<T> it = scalaList.iterator(); - while (it.hasNext()) { - jResps.add(new JavaResponse(it.next())); - } + List<T> scalaList = ScalaJavaConversions.toJava(resps.responses()); + List<JavaResponse> jResps = new ArrayList<>(scalaList.size()); + for (T resp : scalaList) { + jResps.add(new JavaResponse(resp)); + }spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
109-109: Potentially redundant.toMapcall.Since
tablePropsis already typed asMap[String, String], the.toMapconversion may be unnecessary unless thesavemethod requires a specific collection type.aggregator/package.mill (1)
5-9: Aggregator Cross module wiring + scalaVersion concernCrossAggregatorModule with
moduleDeps = Seq(build.api(crossValue))looks correct for cross-Scala wiring. One thing to double-check: wherescalaVersionfor the main AggregatorModule actually comes from. Ifbuild.BaseModuledoes not already derive it fromcrossValue, you may want an explicit:trait AggregatorModule extends Cross.Module[String] with build.BaseModule { + def scalaVersion = crossValue def moduleDeps = Seq(build.api(crossValue)) }to ensure each cross variant compiles against the intended Scala version.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (114)
.github/workflows/test_grype.yaml(3 hunks).github/workflows/test_scala_2_12_non_spark.yaml(9 hunks).github/workflows/test_scala_2_12_spark.yaml(11 hunks)aggregator/package.mill(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala(0 hunks)api/package.mill(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(0 hunks)api/src/main/scala/ai/chronon/api/CollectionExtensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/DataType.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/QueryUtils.scala(1 hunks)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala(1 hunks)api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala(1 hunks)api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala(0 hunks)api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala(0 hunks)build.mill(3 hunks)cloud_aws/package.mill(2 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala(0 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala(0 hunks)cloud_gcp/package.mill(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala(5 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala(3 hunks)flink/package.mill(2 hunks)flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala(2 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java(1 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(1 hunks)online/src/main/scala/ai/chronon/online/Api.scala(2 hunks)online/src/main/scala/ai/chronon/online/CatalystUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala(1 hunks)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala(1 hunks)online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala(0 hunks)online/src/main/scala/ai/chronon/online/HTTPKVStore.scala(1 hunks)online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/TileCodec.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala(2 hunks)online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(0 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala(1 hunks)online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala(1 hunks)service/package.mill(2 hunks)service_commons/package.mill(2 hunks)spark/package.mill(3 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Join.scala(4 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/Eval.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala(9 hunks)spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala(1 hunks)
💤 Files with no reviewable changes (43)
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
- online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
- spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
- spark/src/main/scala/ai/chronon/spark/JoinBase.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
- online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
- api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
- api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
- flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
- spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
- online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
- spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
- api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
- online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
- spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
- spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
- flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
- flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
- online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
- flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
- api/src/main/scala/ai/chronon/api/Builders.scala
- flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
- api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
- api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
- api/src/main/scala/ai/chronon/api/Extensions.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
- online/src/main/scala/ai/chronon/online/CatalystUtil.scala
- api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
- spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
- flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
✅ Files skipped from review due to trivial changes (2)
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
🚧 Files skipped from review as they are similar to previous changes (37)
- online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
- api/package.mill
- spark/src/main/scala/ai/chronon/spark/Analyzer.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
- api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
- spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
- online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
- online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
- spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
- spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
- online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
- flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
- spark/src/main/scala/ai/chronon/spark/Extensions.scala
- spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
- online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
- flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
- spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
- spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
- online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
- spark/src/main/scala/ai/chronon/spark/Join.scala
- api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
- spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
- spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
- spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
- spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
- online/package.mill
- spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
- online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java
- online/src/main/scala/ai/chronon/online/TileCodec.scala
- spark/src/main/scala/ai/chronon/spark/Driver.scala
- online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
- spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
- spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
- spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
🧰 Additional context used
🧠 Learnings (25)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala.github/workflows/test_grype.yamlbuild.millcloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaspark/src/main/scala/ai/chronon/spark/batch/Eval.scalaapi/src/main/scala/ai/chronon/api/QueryUtils.scalaonline/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalacloud_gcp/package.mill.github/workflows/test_scala_2_12_spark.yamlonline/src/main/java/ai/chronon/online/JavaFetcher.javacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scalaspark/src/main/scala/ai/chronon/spark/BootstrapInfo.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala.github/workflows/test_scala_2_12_non_spark.yamlspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scalaspark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scalaflink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scalaspark/src/main/scala/ai/chronon/spark/GroupBy.scalaspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.
Applied to files:
spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scalaspark/src/main/scala/ai/chronon/spark/batch/Eval.scalaspark/src/main/scala/ai/chronon/spark/BootstrapInfo.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scalaspark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scalaflink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scalaspark/src/main/scala/ai/chronon/spark/GroupBy.scalaspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-01-15T21:00:35.574Z
Learnt from: david-zlai
Repo: zipline-ai/chronon PR: 222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
.github/workflows/test_grype.yamlcloud_gcp/package.millspark/package.millcloud_aws/package.mill
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2025-09-26T01:29:32.984Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1182
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala:169-191
Timestamp: 2025-09-26T01:29:32.984Z
Learning: In BigTableMetricsKvStore, schema keys (ending with "#schema") are always queried alone in multiGet requests and never mixed with regular metric keys, so there's no concern about timestamp filter leakage between different key types within the same query.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala.github/workflows/test_scala_2_12_spark.yamlcloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-09-24T00:22:38.257Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1169
File: spark/src/main/scala/ai/chronon/spark/batch/iceberg/IcebergPartitionStatsExtractor.scala:79-94
Timestamp: 2025-09-24T00:22:38.257Z
Learning: BigTableMetricsKvStore defaults tsMillis to System.currentTimeMillis() when not provided in PutRequest, so PutRequests without explicit timestamps are handled correctly by the KV store implementation.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2025-04-21T15:10:40.819Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-17T01:02:40.431Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In `DynamoDBKVStoreImpl.scala`, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic `Exception` type in methods like `extractListValues`.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2025-05-02T16:19:11.001Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
online/src/main/java/ai/chronon/online/JavaFetcher.javaspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
online/src/main/java/ai/chronon/online/JavaFetcher.javaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalaonline/src/main/scala/ai/chronon/online/fetcher/Fetcher.scalaflink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scalaspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2025-01-24T23:55:40.650Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Applied to files:
spark/src/main/scala/ai/chronon/spark/GroupBy.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2025-05-25T15:57:30.687Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
🧬 Code graph analysis (10)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (4)
api/src/main/scala/ai/chronon/api/Builders.scala (1)
GroupBy(141-167)spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
runTimestampChecks(459-488)spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
inputDf(479-518)api/src/main/scala/ai/chronon/api/Extensions.scala (2)
table(457-457)rightToLeft(856-866)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
fetch(48-53)fetch(57-61)fetch(65-71)
online/src/main/scala/ai/chronon/online/Api.scala (2)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
fetch(30-34)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
fetch(48-53)fetch(57-61)fetch(65-71)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
multiGet(108-118)multiPut(308-359)online/src/main/scala/ai/chronon/online/Api.scala (1)
KVStore(36-53)
online/src/main/java/ai/chronon/online/JavaFetcher.java (3)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
fetchGroupBys(182-184)fetchJoin(186-233)fetchJoinV2(273-315)FeaturesResponseType(50-53)AvroString(80-80)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(5-96)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
listJoins(166-218)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
JoinUtils(38-431)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
applyMapping(814-817)
spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
keyColumns(1022-1027)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(5-96)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
online(149-149)online(176-180)Fetcher(55-147)Fetcher(152-685)Response(68-68)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
fetch(30-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: flink_tests
- GitHub Check: online_tests
- GitHub Check: streaming_tests
- GitHub Check: stats_tests
- GitHub Check: kv_store_tests
- GitHub Check: udafs_tests
- GitHub Check: analyzer_tests
- GitHub Check: fetcher_tests
- GitHub Check: batch_tests
- GitHub Check: spark_tests
- GitHub Check: join_tests
- GitHub Check: groupby_tests
- GitHub Check: build
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (57)
spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1)
31-31: Seq import cleanup is correct for Scala 2.13Using only
immutableandmutablewhile relying on PredefSeqis idiomatic and matches the file’s usages (immutable.Seq,mutable.ListBuffer, and plainSeq). No issues.spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala (1)
25-25: LGTM—redundant import removed.
Seqfromscala.Predefcovers all usages.flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala (1)
118-118: LGTM! Consistent collection handling.The
.toSeqaddition makes the type explicit and aligns with similar conversions in this method (lines 114, 117).spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala (1)
220-220: scala.Seq for keyColumns is consistent with 2.13 collection usageUsing
scala.Seqhere is type-safe, aligns with the updated collection conventions in this PR, and keeps these tests consistent with the rest of the GroupBy APIs.Also applies to: 242-242
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1)
29-29: LGTM – Scala 2.13 migration.Removing the explicit
scala.collection.Seqimport to rely on Predef's default is correct for Scala 2.13 compatibility.spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)
51-76: LGTM - Scala 2.13 collection type migration.The signature change to
scala.Seqand the.toSeqconversion on line 72 correctly align with Scala 2.13's collection library changes.
89-100: LGTM - Consistent with collection type normalization.The
scala.Seqsignature change maintains consistency with the codebase-wide Scala 2.13 migration.spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)
57-58: LGTM:getSeq[Row]migration.The switch from
getAs[Seq[Row]]togetSeq[Row]is correct for Scala 2.13 and aligns with Spark's typed accessor methods.Also applies to: 74-75, 82-83, 126-127, 134-135, 142-143, 150-151, 221-222, 230-231, 270-271
api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)
47-50: Signature change is fully compatible across all call sites.Verified: all 5 call sites pass
Seq[String]or immutable compositions (baseWheres,whereClauses,.toScala). Type migration to Scala 2.13 semantics is correct.spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
46-46: Scala 2.13 import pattern correct.Removing explicit
Seqimport is appropriate—Predef now providesscala.collection.immutable.Seq. Nomutable.Seqreferences found in the file. All usages compatible with immutable sequences.Note: AI summary is incorrect; Seq resolves via Predef, not the mutable package.
Verify compilation and tests pass before merging.
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3)
16-21: LGTM: Import updates aligned with Scala 2.13 migration.The new
ai.chronon.eval._import and consolidated imports are correct.
196-196: LGTM: Correct conversion for Scala 2.13.The
.toScala.mapconversion is appropriate.
262-262: LGTM: Correct Java-to-Scala Map conversion.The
.toScala.toMapconversion properly handles the Java Map to Scala immutable Map transformation.api/src/main/scala/ai/chronon/api/DataType.scala (2)
241-241: Clarify stringPrefix override purpose.Overriding
stringPrefixaffects toString output for the case class. Is this override necessary for Scala 2.13 compatibility or debugging?
234-261: StructType Seq mixin is safe—no breaking changes detected.The grep search found zero direct equality comparisons on StructType instances. All usages are pattern matches (
case StructType(...)), which rely on case class unapply extraction, not equals(). ExtendingSeq[StructField]changes element-based equality semantics (correct, since StructFields define the struct), but this doesn't break existing code. No StructType instances are used as Map keys or Set elements. The 2.13 migration correctly usesscala.collection.immutable.Seq, making equality behavior appropriate.online/src/main/java/ai/chronon/online/JavaFetcher.java (6)
143-151: LGTM!Clean refactor to Java collections improves type clarity and eliminates unnecessary Scala collection usage.
156-159: LGTM!Proper use of
ScalaJavaConversions.toScala()for Scala 2.13 compatibility.
167-169: LGTM!Consistent conversion pattern applied correctly.
177-179: LGTM!Conversion applied consistently across all fetch methods.
187-187: LGTM!Proper Scala-to-Java conversion using the utility method.
125-141: PublictoJavaResponsesremoval verified—no internal usages found.Comprehensive search across all Java, Scala, and test files confirms the removed method has no callers within the codebase. Safe to proceed.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3)
19-20: Import ofmutableonly is appropriateUsing
scala.collection.mutablewithout importingSeqkeeps localArrayBufferusage explicit while leavingSeqto the standard alias; no issues here.
137-148:timedValues.toSeqmatches KVStore expectationsConverting the
flatMapresult to aSeqbefore wrapping inGetResponsekeeps the response type consistent and avoids exposing a mutable buffer.
232-237:listValues.toSeqkeeps list response immutableMaterializing to
Seqhere aligns withKVStore.ListResponse’s expectations and avoids leaking the underlying mutable collection.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (3)
21-27: Mutable import + converters look goodImporting
scala.collection.mutablealongsidescala.jdk.CollectionConverters._matches the use ofArrayBufferand Java→Scala collection interop here.
147-157: Correct Java byte list →Seq[Byte]conversion
tileKey.keyBytes.asScala.map(_.toByte).toSeqcleanly converts the boxed Java bytes into an immutableSeq[Byte]for both multi-get and multi-put streaming branches; this matches the helper signatures.Also applies to: 321-329
186-193:toSeqontimedValues/listValuesnormalizes KVStore outputsAdding
.toSeqafter theflatMapchains ensures both multi-get and list returnSeq[...]rather than mutable buffers, consistent with the KVStore interface.Also applies to: 281-286
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (2)
12-12: LGTM - Import cleanup for Scala 2.13 compatibility.Explicit imports align with the PR's collection normalization strategy.
Also applies to: 14-14, 19-19
22-25: Verified: All JoinPartJobContext instantiations correctly updated.The codebase contains 2 instantiations of JoinPartJobContext (in Join.scala:324 and JoinPartJob.scala:72-75), both correctly using the new 4-parameter signature. No stale calls remain.
service_commons/package.mill (1)
2-9: LGTM: Cross-building pattern correctly implemented.The module now supports multiple Scala versions via Cross and crossValue. Pattern is consistent with other modules in this PR.
service/package.mill (1)
2-8: LGTM: Cross-building correctly configured.Pattern matches service_commons and other modules.
.github/workflows/test_scala_2_12_non_spark.yaml (1)
23-24: LGTM: Centralized version management.The SCALA_VERSION env var enables easy version switching and is correctly applied to all test targets.
build.mill (1)
112-112: LGTM: CrossScalaModule enables cross-building.This correctly implements the cross-version build infrastructure.
.github/workflows/test_scala_2_12_spark.yaml (1)
23-24: LGTM: Version parameterization matches non-Spark workflow.All Spark test targets correctly reference the versioned modules.
cloud_gcp/package.mill (1)
8-11: LGTM: Cross-building correctly configured.CloudGcpModule properly extends BaseModule and uses crossValue. Pattern is consistent with other modules.
.github/workflows/test_grype.yaml (1)
27-28: LGTM: Grype workflow correctly parameterized.All build and artifact paths consistently use SCALA_VERSION, enabling cross-version security scanning.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)
106-106: No action needed. Version 2.3 is supported and stable.The version bump from 1.2 to 2.3 is valid—both are LTS versions supported by Dataproc Serverless. Version 2.3 is listed as LTS and fully supported.
flink/package.mill (3)
2-2: Import switch tomill.api._looks correctMatches Mill’s recommended imports and other modules in this PR.
6-12: Flink Cross module wiring is consistent
object package+trait FlinkModulewithcrossValue-awaremoduleDepsaligns with the cross-Scala pattern used elsewhere and keeps dependent modules on the same Scala line.
41-44: Test module correctly follows crossValueUsing
scalaVersion = crossValueandmoduleDeps = Seq(build.flink(crossValue))should give per-Scala-version Flink tests wired to the right implementation.spark/package.mill (3)
2-2:mill.api._import change is fineConsistent with other modules and Mill usage.
7-11: Spark Cross module graph looks sane
SparkModuleasCross.Module[String]with crossValue-basedmoduleDepsis aligned with the new cross-building scheme across the repo.
27-31: Spark test wiring matches cross setup
scalaVersion = crossValueplusbuild.spark(crossValue)/build.api(crossValue).test/build.aggregator(crossValue).testshould keep tests version-aligned with their targets.cloud_aws/package.mill (3)
2-2: API import change is alignedUsing
mill.api._is consistent with other package.mill updates.
6-10: CloudAwsModule Cross wiring appears correctCrossCloudAwsModule with crossValue-aware
moduleDeps(spark,aggregator,api,online) is coherent with the rest of the build graph.
24-31: AWS test module is properly cross-versioned
scalaVersion = crossValue,moduleDeps = Seq(build.cloud_aws(crossValue), build.spark(crossValue).test)and local test-only mvnDeps look reasonable for per-Scala AWS integration tests.aggregator/package.mill (2)
2-2: Aggregator import update is OK
mill.api._matches the rest of the build refactor.
21-25: Aggregator test module is cross-alignedUsing
scalaVersion = crossValueand depending onbuild.aggregator(crossValue)plusbuild.api(crossValue).testshould keep tests and implementation on the same Scala line.flink_connectors/package.mill (3)
2-2: Flink connectors import change is fine
mill.api._usage is consistent with the rest of the build files.
6-12: FlinkConnectors Cross wiring looks coherentCrossFlinkConnectorsModule with crossValue-aware
moduleDeps(aggregator, api, online, flink) matches the intended multi-module cross graph.
26-32: Flink connectors tests correctly track crossValue
scalaVersion = crossValueandmoduleDeps = Seq(build.flink_connectors(crossValue))should give per-Scala-version coverage for this module.online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
45-45: External external-request construction / dedup looks soundThe added
mutableimport and the newexternalToJoinRequestsflatMap keep the same behavior (valid requests only, KeyMissingException routed via Right, dedup based onRequest) and integrate cleanly with the existingvalidExternalRequestToJoinRequestMap/ response wiring.Also applies to: 543-566
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
19-21: Import adjustments are consistent with new handler usageUsing
ai.chronon.online._and consolidated Spark SQL imports matches the new ExternalSourceHandler/ExternalSourceRegistry references and keeps this mock compact.Also applies to: 27-27
47-53: PlusOneExternalHandler’s scala.Seq-based fetch matches the new APIThe override to
fetch(requests: scala.Seq[Fetcher.Request]): Future[scala.Seq[Fetcher.Response]]and the simple plus-one transformation overreq.keysalign with the updated ExternalSourceHandler contract and should behave as before.Please confirm this compiles against the updated ExternalSourceHandler trait and its callers under Scala 2.13.
56-72: AlwaysFailsHandler / JavaPlusOneExternalHandler aligned with ExternalSourceHandlerBoth handlers now extend ExternalSourceHandler with the scala.Seq-based fetch signature and preserve their prior semantics (always-failing and plus-one variants), keeping the mock registry behavior intact.
Double-check any tests or docs that specifically referenced JavaExternalSourceHandler here, to ensure expectations around “java_plus_one” remain correct.
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
22-24: ContextualHandler updated cleanly to scala.Seq-based fetchSwitching ContextualHandler.fetch to
scala.Seq[Request]/Future[scala.Seq[Response]]and keepinghandlerMapusage unchanged fits the new ExternalSourceHandler contract without altering behavior.Please ensure all ExternalSourceHandler implementations (including this one) are compiled/run together to catch any remaining signature mismatches.
Also applies to: 29-35
online/src/main/scala/ai/chronon/online/Api.scala (1)
200-211: ExternalSourceHandler trait + scala.Seq signature look consistentPromoting ExternalSourceHandler to a trait with a shared implicit ExecutionContext and a
fetch(requests: scala.Seq[Fetcher.Request]): Future[scala.Seq[Fetcher.Response]]signature matches the downstream registry and mock handlers, and is appropriate for Scala 2.13 collection usage.Please confirm JavaExternalSourceHandler (and any Java implementers) still implement this trait correctly and that cross-language tests pass.
build.mill
Outdated
| val scalaVersions = Seq("2.12.18", "2.13.17") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Verify JavaConverters migration is complete before Scala 2.13 upgrade.
Scala 2.13 deprecates scala.collection.JavaConverters. A previous review found 42 imports that need migration to scala.jdk.CollectionConverters.
🏁 Script executed:
#!/bin/bash
# Check for remaining deprecated JavaConverters imports
echo "=== Checking for deprecated JavaConverters imports ==="
rg -n "scala\.collection\.JavaConverters" --type scala | wc -l
echo ""
echo "=== Checking for new CollectionConverters usage ==="
rg -n "scala\.jdk\.CollectionConverters" --type scala | wc -l
echo ""
echo "=== Files still using JavaConverters ==="
rg "scala\.collection\.JavaConverters" --type scala -lLength of output: 2940
Complete JavaConverters → CollectionConverters migration before deploying Scala 2.13 support.
41 deprecated scala.collection.JavaConverters imports remain across 39 files. Scala 2.13 will fail at runtime without migration to scala.jdk.CollectionConverters.
Affected files:
- spark/src/main/scala: JoinBase.scala, Driver.scala, streaming/GroupBy.scala, batch/ModularMonolith.scala, batch/iceberg/IcebergPartitionStatsExtractor.scala, batch/BatchNodeRunner.scala
- online/src/main/scala: AvroConversions.scala, TileCodec.scala, TopicChecker.scala, fetcher/Fetcher.scala
- api/src/main/scala: Row.scala, planner/*.scala (5 files), StagingQueryPlanner.scala
- flink/src/main/scala: Test utils
- cloud_aws/src/main/scala: EmrSubmitter.scala
- cloud_gcp/src/test/scala: BigQueryCatalogTest.scala
- aggregator/src/test/scala: Test files
🤖 Prompt for AI Agents
In build.mill around lines 17-18, the project enables Scala 2.13 but many files
still import scala.collection.JavaConverters (deprecated) which will break at
runtime; replace all scala.collection.JavaConverters imports with
scala.jdk.CollectionConverters and update usages accordingly (import
scala.jdk.CollectionConverters._ and keep using .asScala/.asJava or use explicit
.asScala/toList conversions where needed), scan the listed files (and the 39
affected files) for any JavaConverters.* references and migrate them, recompile
and run tests to ensure no remaining deprecated imports or runtime failures
before deploying Scala 2.13 support.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/push_to_canary.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
.github/workflows/push_to_canary.yaml
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
.github/workflows/push_to_canary.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: build
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
.github/workflows/push_to_canary.yaml (1)
15-15: ✓ Environment variable introduced consistently.The
SCALA_VERSIONis properly defined once and used throughout the workflow for Mill targets and artifact paths.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
build.mill (1)
18-19: Scala 2.13 upgrade blocked by JavaConverters migration.
🧹 Nitpick comments (1)
build.mill (1)
223-324: Refactor: Extract common publishing logic and use constant for default version.
publishToGcsandpublishToS3duplicate ~90% of code, and the module list appears 3 times. The defaultscalaVersion = "2.12.18"inbuildAllAssembliesshould referenceConstants.scalaVersions.Extract shared logic:
private def publishJars( scalaVersion: String, version: String, bucket: String, uploadFn: (os.Path, String) => Unit ) = { val commit = os.proc("git", "rev-parse", "HEAD").call().out.text().trim val branch = os.proc("git", "rev-parse", "--abbrev-ref", "HEAD").call().out.text().trim val date = Instant.now().toString val modules = Seq( ("flink", "flink_assembly_deploy.jar"), ("cloud_gcp", "cloud_gcp_lib_deploy.jar"), ("cloud_aws", "cloud_aws_lib_deploy.jar"), ("service", "service_assembly_deploy.jar"), ("flink_connectors", "connectors_pubsub_deploy.jar") ) modules.foreach { case (moduleName, deployName) => val jarPath = os.pwd / "out" / moduleName / scalaVersion / "assembly.dest" / "out.jar" if (!os.exists(jarPath)) { throw new Exception(s"Assembly JAR not found at $jarPath. Run './mill ${moduleName}[$scalaVersion].assembly' first.") } uploadFn(jarPath, s"$bucket/release/$version/jars/$deployName") } }Then simplify:
-def buildAllAssemblies(scalaVersion: String = "2.12.18") = Task.Command { +def buildAllAssemblies(scalaVersion: String = Constants.scalaVersions.head) = Task.Command {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
api/package.mill(2 hunks)build.mill(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
build.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: build
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (8)
build.mill (4)
1-4: LGTM!Necessary imports for cross-building and publishing support.
24-41: LGTM!Standard publishing configuration follows Maven conventions.
131-131: LGTM!CrossScalaModule enables multi-version support correctly.
160-172: LGTM!Standard PublishModule implementation.
api/package.mill (4)
3-9: LGTM!Cross-building setup follows the standard pattern.
12-24: LGTM!PublishModule integration correct for thriftBase.
28-40: LGTM!PublishModule integration correct for thriftJava.
54-55: LGTM!Correct
crossValueusage for cross-version testing.
|
|
||
| // Create table dependencies for all GroupBy parts (both direct GroupBy deps and upstream join deps) | ||
| val allDeps = Option(join.joinParts).map(_.asScala).getOrElse(Seq.empty).flatMap { joinPart => | ||
| val allDeps = Option(join.joinParts).map(_.asScala.toSeq).getOrElse(Seq.empty).flatMap { joinPart => |
There was a problem hiding this comment.
| val allDeps = Option(join.joinParts).map(_.asScala.toSeq).getOrElse(Seq.empty).flatMap { joinPart => | |
| val allDeps = Option(join.joinParts).map(_.toScala).getOrElse(Seq.empty).flatMap { joinPart => |
| def fromJoinSources(sources: java.util.List[api.Source]): Seq[TableDependency] = { | ||
| Option(sources) | ||
| .map(_.asScala) | ||
| .map(_.toScala.toSeq) |
There was a problem hiding this comment.
| .map(_.toScala.toSeq) | |
| .map(_.toScala) |
ed3c3c5 to
584e98b
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)
18-37: Parameter name mismatch in override signature.The
multiPutmethod at line 18 usesputRequestsbut the parentKVStoretrait (Api.scala:74) declareskeyValueDatasets. This breaks named parameter compatibility.Change parameter name from
putRequeststokeyValueDatasetsto match the trait definition.spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)
51-76: multiGet: null path creates Failure instead of Success(empty)When a key has no entries,
valueSeriesisnull. Current codeTry { null }.map(_.toSeq)executesnull.toSeqinside Try → NPE →Failure, breaking callers expectingSuccess(Seq.empty)for missing keys.Both GroupByFetcher.scala:192 and DriftStore.scala:129 extract
response.valuesdirectly and process the Try; they assume missing keys succeed as empty, not fail.Apply the suggested fix to normalize null before
.toSeq:- val values = Try { - val valueSeries = database - .get(req.dataset) // table - .get(encode(req.keyBytes)) // values of key - - if (valueSeries == null) - null - else - valueSeries - .filter { case (version, _) => - req.startTsMillis.forall(version >= _) && req.endTsMillis.forall(version <= _) - } // filter version - .map { case (version, bytes) => TimedValue(bytes, version) } - }.map(_.toSeq) + val values = Try { + val valueSeries = database + .get(req.dataset) // table + .get(encode(req.keyBytes)) // values of key + + if (valueSeries == null) + null + else + valueSeries + .filter { case (version, _) => + req.startTsMillis.forall(version >= _) && req.endTsMillis.forall(version <= _) + } // filter version + .map { case (version, bytes) => TimedValue(bytes, version) } + }.map(buf => Option(buf).map(_.toSeq).getOrElse(Seq.empty))
♻️ Duplicate comments (1)
build.mill (1)
18-19: Re-confirm JavaConverters → CollectionConverters migration before finalizing 2.13.17.Given prior findings of many
scala.collection.JavaConvertersimports, please ensure all have been migrated toscala.jdk.CollectionConvertersand the project compiles cleanly under 2.13.17.
🧹 Nitpick comments (9)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
53-56: Standardize remainingSeqreferences for consistency.Lines 53 and 56 use unqualified
Seq, while line 30 usesscala.Seq. Consider qualifying these for consistency with the migration pattern.- def fetchRequests(requests: Seq[Request], context: Context)(implicit ec: ExecutionContext): Future[Seq[Response]] = { + def fetchRequests(requests: scala.Seq[Request], context: Context)(implicit ec: ExecutionContext): Future[scala.Seq[Response]] = { val startTime = System.currentTimeMillis() // we make issue one batch request per external source and flatten out it later - val responsesByNameF: List[Future[Seq[Response]]] = requests + val responsesByNameF: List[Future[scala.Seq[Response]]] = requestsspark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)
167-167: LGTM! Removed redundant conversions.Correctly removed unnecessary
.toSeqcalls sinceSeq(...)already returns Seq and the data variables are already sequences.For consistency, consider also removing redundant
.toSeqin lines 13-17, 19-23, 91-95, 97-101, 239-243, 245-249 whereSeq(...).toSeq.toDF(...)can be simplified toSeq(...).toDF(...).Also applies to: 177-177, 188-188, 198-198
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)
64-72: Consider renaming for clarity.The handler now extends
ExternalSourceHandlerlike the others, making the "Java" prefix misleading. Consider renaming to better reflect its purpose (e.g.,PlusOneExternalHandlerV2or removing if it duplicatesPlusOneExternalHandler).online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
544-565: Consider consolidating duplicate cache lookups.The refactoring correctly uses
.toSeqfor Scala 2.13 compatibility. However,metadataStore.getJoinConf(joinRequest.name)is retrieved twice (lines 545 and 550-554). You could store the result after the refresh check to avoid the duplicate lookup.Example consolidation:
val externalToJoinRequests: Seq[ExternalToJoinRequest] = validRequests.flatMap { joinRequest => var joinConfTry = metadataStore.getJoinConf(joinRequest.name) if (joinConfTry.isFailure) { metadataStore.getJoinConf.refresh(joinRequest.name) joinConfTry = metadataStore.getJoinConf(joinRequest.name) } val parts = joinConfTry.get.join.onlineExternalParts parts.iterator().asScala.map { part => val externalRequest = Try(part.applyMapping(joinRequest.keys)) match { case Success(mappedKeys) => Left(Request(part.source.metadata.name, mappedKeys)) case Failure(exception: KeyMissingException) => Right(exception) case Failure(otherException) => throw otherException } ExternalToJoinRequest(externalRequest, joinRequest, part) } }.toSeqonline/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
108-115: Be aware of shallower semantics vs.deepfor nested arrays
arr1.toSeq == arr2.toSeqis fine for 1‑D arrays, but if you ever compare arrays-of-arrays it will no longer recurse like.deepdid. If you want version‑independent deep behavior, consider an explicit recursive branch:- case (arr1: Array[_], arr2: Array[_]) => arr1.toSeq == arr2.toSeq + case (arr1: Array[_], arr2: Array[_]) => + arr1.length == arr2.length && + arr1.indices.forall(i => deepEquals(arr1(i), arr2(i)))cloud_gcp/package.mill (1)
22-25: Verify BigQuery relocation doesn’t conflict with usage.The relocate of
org.apache.iceberg.gcp.bigquery.**is fine for shading, but please confirm there are no reflection/string FQN usages and no parallel unshadediceberg-bigqueryjars on the runtime classpath.service/package.mill (1)
6-9: Service cross-module and test wiring look correct.
moduleDepsandtest.moduleDepscorrectly target thecrossValuevariants; the additionalCross.Module[String]onServiceModuleis redundant but not harmful.Also applies to: 48-51
api/package.mill (1)
23-25: Avoid duplicating theCHRONON_VERSIONdefault.
publishVersionhere and inBaseModuleboth hardcode"0.0.32"; consider a single constant/helper (e.g., inConstants) to keep the default release version in sync.Also applies to: 42-44, 50-51
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)
89-100: multiPut:Future.successfulwould be cheaper and clearer
multiPutdoes only in-memory work and immediately has the fullresult; wrapping it inFuture { ... }just schedules an unnecessary task on theExecutionContext.You can simplify to:
- override def multiPut(putRequests: scala.Seq[KVStore.PutRequest]): Future[scala.Seq[Boolean]] = { + override def multiPut(putRequests: scala.Seq[KVStore.PutRequest]): Future[scala.Seq[Boolean]] = { val result = putRequests.map { case PutRequest(keyBytes, valueBytes, dataset, millis) => val table = database.get(dataset) val key = encode(keyBytes) table.compute(key, putFunc(millis.getOrElse(System.currentTimeMillis()) -> valueBytes)) true } - - Future { - result - } + Future.successful(result) }Behavior stays the same while avoiding extra thread-pool work.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (118)
.github/workflows/push_to_canary.yaml(2 hunks).github/workflows/reusable_non_spark_tests.yaml(9 hunks).github/workflows/reusable_spark_tests.yaml(11 hunks).github/workflows/test_grype.yaml(3 hunks).github/workflows/test_non_spark.yaml(1 hunks).github/workflows/test_spark.yaml(1 hunks)aggregator/package.mill(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala(0 hunks)api/package.mill(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(0 hunks)api/src/main/scala/ai/chronon/api/CollectionExtensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/DataType.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/QueryUtils.scala(1 hunks)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala(3 hunks)api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala(1 hunks)api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala(0 hunks)api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala(0 hunks)build.mill(5 hunks)cloud_aws/package.mill(2 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala(0 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala(0 hunks)cloud_gcp/package.mill(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala(5 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala(3 hunks)flink/package.mill(2 hunks)flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala(2 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java(1 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(1 hunks)online/src/main/scala/ai/chronon/online/Api.scala(2 hunks)online/src/main/scala/ai/chronon/online/CatalystUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala(1 hunks)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala(1 hunks)online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala(0 hunks)online/src/main/scala/ai/chronon/online/HTTPKVStore.scala(1 hunks)online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/TileCodec.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala(2 hunks)online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(0 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala(1 hunks)online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala(1 hunks)python/package.mill(1 hunks)service/package.mill(2 hunks)service_commons/package.mill(2 hunks)spark/package.mill(3 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Join.scala(4 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/Eval.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala(9 hunks)spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala(1 hunks)
💤 Files with no reviewable changes (43)
- flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
- api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
- api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
- online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
- spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
- api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
- spark/src/main/scala/ai/chronon/spark/JoinBase.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
- flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
- spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
- online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
- api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
- spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
- api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
- api/src/main/scala/ai/chronon/api/Extensions.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
- spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
- online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
- online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
- api/src/main/scala/ai/chronon/api/Builders.scala
- flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
- online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
- api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
- flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
- api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
- spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
- flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
- spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
- online/src/main/scala/ai/chronon/online/CatalystUtil.scala
- flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
✅ Files skipped from review due to trivial changes (4)
- online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
- spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala
- spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
- spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
🚧 Files skipped from review as they are similar to previous changes (38)
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
- .github/workflows/push_to_canary.yaml
- spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
- .github/workflows/test_non_spark.yaml
- api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
- online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
- .github/workflows/reusable_spark_tests.yaml
- spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
- spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
- .github/workflows/test_spark.yaml
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
- flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
- .github/workflows/reusable_non_spark_tests.yaml
- online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
- spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
- online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
- spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
- spark/src/main/scala/ai/chronon/spark/GroupBy.scala
- aggregator/package.mill
- spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
- online/src/main/java/ai/chronon/online/JavaFetcher.java
- spark/src/main/scala/ai/chronon/spark/Extensions.scala
- api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
- spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
- flink/package.mill
- spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
- spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
- online/src/main/scala/ai/chronon/online/Api.scala
- spark/src/main/scala/ai/chronon/spark/Driver.scala
- spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
- spark/src/main/scala/ai/chronon/spark/Join.scala
- spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
🧰 Additional context used
🧠 Learnings (19)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalacloud_gcp/package.millspark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scalaflink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalaonline/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala.github/workflows/test_grype.yamlspark/src/main/scala/ai/chronon/spark/Analyzer.scalaonline/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scalaflink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalaapi/src/main/scala/ai/chronon/api/QueryUtils.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaspark/src/main/scala/ai/chronon/spark/JoinUtils.scalaonline/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scalabuild.millspark/src/main/scala/ai/chronon/spark/batch/SourceJob.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaonline/src/main/scala/ai/chronon/online/fetcher/Fetcher.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-05-25T15:57:30.687Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.
Applied to files:
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
cloud_gcp/package.mill.github/workflows/test_grype.yamlspark/package.millcloud_aws/package.mill
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scalaspark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.
Applied to files:
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scalaspark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaflink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaonline/src/test/scala/ai/chronon/online/test/TileCodecTest.scalaonline/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scalaspark/src/main/scala/ai/chronon/spark/batch/SourceJob.scalaspark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Applied to files:
online/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.
Applied to files:
flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
🧬 Code graph analysis (7)
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)
flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala (1)
DeserializationSchemaBuilder(31-62)flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (1)
ProjectedEvent(95-95)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)
online/src/main/scala/ai/chronon/online/Api.scala (1)
KVStore(36-53)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
fetch(48-53)fetch(57-61)fetch(65-71)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
multiGet(108-118)multiPut(308-359)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(5-96)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
online(149-149)online(176-180)Fetcher(55-147)Fetcher(152-685)Response(68-68)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)
empty(78-78)
online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: python_lint
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (41)
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala (1)
47-47: LGTM! Type annotation aligns with Scala 2.13.The change from
scala.collection.Seqtoscala.Seqis correct and consistent with the broader Scala 2.13 migration..github/workflows/test_grype.yaml (4)
27-28: Environment variable properly defines versioned build target.
47-47: Versioned Mill targets correctly parameterized.The build commands consistently use the SCALA_VERSION variable with proper Mill cross-build syntax.
Also applies to: 50-50, 53-53, 56-56
78-100: Artifact upload paths consistently versioned.All four JARs are properly uploaded from versioned output directories.
58-63: The Dockerfile contains no flink references, confirming that the service and cloud_gcp JARs are the only artifacts needed for the Docker build. The workflow file correctly copies only these two JARs tobuild_output.flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)
4-4: LGTM! Unused import removed.The concrete schema class is accessed via the builder, so the direct import is unnecessary.
75-75: LGTM! Proper Scala 2.13 collection handling.Explicit
.toSeqmaterialization aligns with 2.13 best practices.online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1)
78-78: Seq qualifier change aligns with Scala 2.13 immutable defaults.The change from
collection.Seqtoscala.Seqis correct.CatalystUtilacceptsSeq[(String, String)], andscala.Seq(immutable) is fully compatible with the unqualifiedSeqparameter type in Scala 2.13.spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
35-35: LGTM—correct for Scala 2.13 migration.Removing the explicit
scala.collection.Seqimport allows the predefSeq(which isimmutable.Seqin 2.13) to be used. All usages in the file are compatible.python/package.mill (1)
46-48: The hard-coded "0.0.32" fallback is used consistently across all package files (build.mill,python/package.mill,api/package.mill). Theast-grepsearch found nobuild.Constants.chrononVersionobject, confirming it doesn't exist. This is an intentional, consistent pattern—not a maintenance issue.Likely an incorrect or invalid review comment.
flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1)
20-20: LGTM—correct for Scala 2.13.Predef imports
immutable.Seqby default. All usages in this file remain compatible.online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
22-22: LGTM—consistent with Scala 2.13 standardization.The import cleanup and explicit
scala.Seqqualification align with the cross-version migration pattern.Also applies to: 30-30
flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (1)
80-80: LGTM! Collection conversion is correct and consistent.The
.asScala.toSeqpattern correctly materializes Java collections as immutable Seqs, aligning with the PR's normalization effort.Also applies to: 115-115
online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (2)
39-56: Implementation is correct.The conversion logic properly handles Scala–Java interop for requests/responses and futures.
34-34: Inheritance change is correct and safe.JavaExternalSourceHandler is the only Java implementation of ExternalSourceHandler in the codebase—no subclasses exist to be affected by the inheritance change.
spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala (1)
57-58: LGTM! Cleaner API usage.Replacing
getAs[Seq[Row]]withgetSeq[Row]throughout is cleaner and aligns with Scala 2.13 normalization.Also applies to: 74-75, 82-83, 126-127, 134-135, 142-143, 150-151, 221-222, 230-231, 270-271
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)
16-16: Signature change correct; parent trait signatures are compatible.The explicit
scala.SeqandKVStore.GetRequestqualifications match the parent trait's unqualifiedSeq[GetRequest](resolved identically in Scala 2.13). The override is valid.spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
19-19: Import narrowing toscala.collection.Maplooks correctDropping
scala.collection.SeqkeepsSequsing the default immutable alias while preserving existingMapusages; no behavior change observed.spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
19-19: LGTM!Import consolidations are clean and appropriate for the Scala 2.13 migration.
Also applies to: 21-21, 27-27
56-62: LGTM!Migration from
JavaExternalSourceHandlertoExternalSourceHandleris correct. Thefetchimplementation properly returns failures wrapped inFuture. Same ExecutionContext verification from line 48 applies here.
48-53: ExecutionContext is available via ExternalSourceHandler.The
Future()call at line 49 will resolve the implicitExecutionContextfromExternalSourceHandler(whichPlusOneExternalHandlerextends). Line 209 ofApi.scaladefines it asimplicit lazy val executionContext: ExecutionContext = ExternalSourceHandler.executor. The signature change toscala.Seqis correct for Scala 2.13. No changes needed.api/src/main/scala/ai/chronon/api/DataType.scala (1)
234-235: No issues found. Change is correct for Scala 2.13.StructType is inherently immutable (Array-backed with read-only interface:
apply,length,iterator). The change fromscala.collection.Seqto unqualifiedSeqcorrectly resolves toscala.collection.immutable.Seqin Scala 2.13. No code attempts mutable operations, and existing tests verify serialization compatibility.online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
45-45: LGTM - Scala 2.13 compatibility.The explicit mutable collections import is required for Scala 2.13.
spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (2)
66-66: LGTM: Proper Java-Scala conversion.The
.asScala.toSeqcorrectly materializes the Java collection into a ScalaSeq[String], aligning with the updatedQueryUtils.buildsignature.
80-80: LGTM: Consistent sequence concatenation.Using
++ Seq(...)for the final element maintains consistency with the preceding concatenations.spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
35-35: LGTM: Import cleanup for Scala 2.13.Removing
Seqfrom thescala.collectionimport aligns with the Scala 2.13 migration.
294-304: Return type change verified as safe across all call sites.All seven call sites in JoinBase.scala and VersioningTest.scala use only immutable-Seq-compatible operations (
.isEmpty,.foreach,.size, equality comparison). The signature change fromscala.collection.Seq[String]toscala.Seq[String]introduces no breaking changes.api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)
49-49: All callers verified—parameter type change is safe.The
whereClausesmethod (DataRange.scala:78) returnsSeq[String](immutable). All five call sites pass compatible immutable sequences:
effectiveRange.whereClauses→Seq[String].toScalaand.asScala.toSeq→ immutable- Concatenations (
++,:+) preserve immutabilityNo breaking changes. Parameter type narrowing is backward compatible.
online/src/main/scala/ai/chronon/online/TileCodec.scala (3)
31-33: Import cleanup looks goodDropping the explicit
Seqimport and keepingmutableonly is consistent with relying on the defaultSeqalias and keeps imports minimal.
35-42: MaterializingunWindowedresults asSeqis reasonableUsing
.toSeqonflatMap(_.unWindowed)givesRowAggregatora concrete sequence and avoids surprises from Java–Scala view types; overhead is minor for this setup path.
44-51: SametoSeqrationale for windowed aggregationsThe
.toSeqonflatMap(_.unpack)similarly normalizes the collection passed toRowAggregator; behavior remains the same with clearer types.service_commons/package.mill (1)
6-9: Cross wiring for service_commons and tests looks consistent.
moduleDepsandtest.moduleDepscorrectly usecrossValueand align with other modules’ cross setup.Also applies to: 21-23
spark/package.mill (1)
7-11: Spark cross-module + test setup is consistent and sound.Cross wiring and test deps align with the new
scalaVersionsscheme and other modules.Also applies to: 27-35
cloud_aws/package.mill (1)
6-9: Cloud AWS cross wiring and tests are correctly updated.Module and test dependencies use the appropriate
crossValuevariants and look consistent with the rest of the build.Also applies to: 24-31
online/package.mill (2)
2-9: Cross-Scala wiring foronlinemodule looks consistent
Cross[OnlineModule](build.Constants.scalaVersions)plusOnlineModuleextendingCross.Module[String]withbuild.BaseModuleand usingcrossValueinmoduleDepsall look aligned with the broader cross-build pattern; just make surescalaVersionfor this module is effectively derived fromcrossValueso thescala-reflect/scala-compilermvn lines pick the correct 2.12 vs 2.13 artifacts.
38-41: Test module cross-wiring looks correct
scalaVersion = crossValueandmoduleDeps = Seq(build.online(crossValue), build.api(crossValue).test)give you per-Scala-version tests hooked to the matching main and API test modules; this looks good and consistent with the new cross layout.flink_connectors/package.mill (2)
2-12: Flink connectors Cross module setup looks goodThe move to
Cross[FlinkConnectorsModule](build.Constants.scalaVersions)withFlinkConnectorsModuleextendingCross.Module[String]and wiringmoduleDepsviacrossValue(aggregator/api/online/flink) looks coherent and should give per-Scala builds for this stack; just confirm that the underlyingscalaVersionused by this module is tied tocrossValueas intended.
26-28: Flink connectors test module correctly follows cross-Scala patternSetting
scalaVersion = crossValueandmoduleDeps = Seq(build.flink_connectors(crossValue))is the expected pattern for per-Scala tests of this module and matches the rest of the Cross setup.spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (2)
11-26: parseSkewKeys:.asScala.toSeqlooks correctUsing
list.asScala.toSeqto backMap[String, Seq[String]]is the right normalization for Scala 2.13 and matches the method’s declared return type. No further issues here.
41-47:allFilters.toSeq.toJavais a safe interop normalizationMaterializing
allFiltersto aSeqbeforetoJavaavoids collection-type quirks in 2.13 and keeps the Java-facing API deterministic. Change looks good.spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)
11-11: StagingQuery alias + setups.asScala.toSeqare consistent with 2.13 cleanupUsing
StagingQueryUtilas an alias keeps the call site explicit without changing behavior, and normalizingsetupsviaOption(...).map(_.asScala.toSeq).getOrElse(Seq.empty)matches how other Java collections are handled in this file. Looks good.Also applies to: 173-178
520c8fc to
4482794
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
build.mill (1)
18-19: Scala 2.13 upgrade blocked by deprecated imports.Past reviews identified 41
scala.collection.JavaConvertersimports that must be migrated toscala.jdk.CollectionConvertersbefore this upgrade.
🧹 Nitpick comments (3)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)
64-72:JavaPlusOneExternalHandleris now identical toPlusOneExternalHandler.After migration, both handlers share the same base class and implementation. Consider consolidating or documenting why both are needed in the test mock.
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
22-22: Seq import removal narrowsfetchRequests’ param typeDropping
scala.collection.SeqmeansSeq[Request]now aliasesscala.collection.immutable.Seq[Request]; please confirm no callers passmutable.Seqhere, and consider switching the signature (and relatedSequsages) toscala.Seqfor clarity/consistency with the rest of the PR.service_commons/package.mill (1)
6-9: Cross-building works but pattern inconsistent.
ServiceCommonsModuleonly extendsbuild.BaseModule(likeCloudGcpModule), whileFlinkModuleandCloudAwsModuleexplicitly extendCross.Module[String]. Both approaches work sinceBaseModuleextendsCrossScalaModule, but consistency would improve clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (119)
.github/workflows/push_to_canary.yaml(2 hunks).github/workflows/reusable_non_spark_tests.yaml(9 hunks).github/workflows/reusable_spark_tests.yaml(11 hunks).github/workflows/test_grype.yaml(3 hunks).github/workflows/test_non_spark.yaml(1 hunks).github/workflows/test_spark.yaml(1 hunks)aggregator/package.mill(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala(0 hunks)api/package.mill(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(0 hunks)api/src/main/scala/ai/chronon/api/CollectionExtensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/DataType.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/QueryUtils.scala(1 hunks)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala(3 hunks)api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala(2 hunks)api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala(0 hunks)api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala(0 hunks)build.mill(5 hunks)cloud_aws/package.mill(2 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala(0 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala(0 hunks)cloud_gcp/package.mill(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala(5 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala(3 hunks)flink/package.mill(2 hunks)flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala(2 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java(1 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(2 hunks)online/src/main/scala/ai/chronon/online/Api.scala(2 hunks)online/src/main/scala/ai/chronon/online/CatalystUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala(1 hunks)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala(1 hunks)online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala(0 hunks)online/src/main/scala/ai/chronon/online/HTTPKVStore.scala(1 hunks)online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/TileCodec.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala(1 hunks)online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(0 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala(1 hunks)online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala(1 hunks)python/package.mill(1 hunks)service/package.mill(2 hunks)service_commons/package.mill(2 hunks)spark/package.mill(3 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Join.scala(4 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/Eval.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala(9 hunks)spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala(1 hunks)
💤 Files with no reviewable changes (43)
- aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
- spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
- flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
- spark/src/main/scala/ai/chronon/spark/JoinBase.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
- spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
- api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
- api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
- spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
- online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
- spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
- online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
- spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
- api/src/main/scala/ai/chronon/api/Builders.scala
- api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
- online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
- spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
- flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
- api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
- online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
- api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
- flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
- flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
- online/src/main/scala/ai/chronon/online/CatalystUtil.scala
- flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
- api/src/main/scala/ai/chronon/api/Extensions.scala
- flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
- online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
- api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
🚧 Files skipped from review as they are similar to previous changes (44)
- spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
- .github/workflows/reusable_spark_tests.yaml
- online/src/main/scala/ai/chronon/online/Api.scala
- spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
- .github/workflows/test_spark.yaml
- spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
- .github/workflows/test_non_spark.yaml
- .github/workflows/test_grype.yaml
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
- spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
- spark/src/main/scala/ai/chronon/spark/Analyzer.scala
- flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
- spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
- online/package.mill
- api/src/main/scala/ai/chronon/api/QueryUtils.scala
- online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
- api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
- spark/src/main/scala/ai/chronon/spark/Join.scala
- spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
- .github/workflows/push_to_canary.yaml
- spark/src/main/scala/ai/chronon/spark/Driver.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
- online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
- spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
- spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
- spark/src/main/scala/ai/chronon/spark/Extensions.scala
- spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
- spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
- online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java
- online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
- api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
- online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
- python/package.mill
- spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
- spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
- api/package.mill
- spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
- flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
- online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
🧰 Additional context used
🧠 Learnings (23)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scalaspark/src/main/scala/ai/chronon/spark/batch/SourceJob.scalaspark/src/main/scala/ai/chronon/spark/GroupByUpload.scalaonline/src/main/scala/ai/chronon/online/serde/SparkConversions.scalacloud_gcp/package.millspark/src/main/scala/ai/chronon/spark/GroupBy.scalaaggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scalaspark/src/main/scala/ai/chronon/spark/batch/Eval.scalaspark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scalaspark/src/main/scala/ai/chronon/spark/JoinUtils.scalaspark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scalabuild.millspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.javaonline/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scalaflink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scalaspark/src/main/scala/ai/chronon/spark/batch/SourceJob.scalaspark/src/main/scala/ai/chronon/spark/GroupByUpload.scalaonline/src/main/scala/ai/chronon/online/serde/SparkConversions.scalaspark/src/main/scala/ai/chronon/spark/GroupBy.scalaspark/src/main/scala/ai/chronon/spark/batch/Eval.scalaspark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scalaflink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Applied to files:
spark/src/main/scala/ai/chronon/spark/GroupByUpload.scalaonline/src/main/scala/ai/chronon/online/serde/SparkConversions.scalaspark/src/main/scala/ai/chronon/spark/GroupBy.scalaaggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scalaonline/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Applied to files:
online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
cloud_gcp/package.millspark/package.millcloud_aws/package.mill
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2025-10-28T15:02:05.707Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala:42-47
Timestamp: 2025-10-28T15:02:05.707Z
Learning: In Flink jobs (e.g., SparkExpressionEvalFn.scala), the metric group uses "feature_group" as the tag name, not "group_by", because existing dashboards depend on this naming convention.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2025-05-02T16:19:11.001Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-31T18:27:44.973Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-15T15:33:22.265Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.javaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-01-15T21:00:35.574Z
Learnt from: david-zlai
Repo: zipline-ai/chronon PR: 222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
🧬 Code graph analysis (10)
spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (4)
api/src/main/java/ai/chronon/api/thrift/Option.java (1)
Option(25-143)spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)online/src/main/scala/ai/chronon/online/JoinCodec.scala (1)
toMap(51-51)api/src/main/scala/ai/chronon/api/Extensions.scala (1)
query(413-421)
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (2)
ScalaJavaConversions(5-96)IteratorOps(50-54)
spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (4)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
toChrononSchema(53-55)api/src/main/scala/ai/chronon/api/Builders.scala (1)
GroupBy(141-167)spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1)
runTimestampChecks(459-488)spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
inputDf(479-518)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)
empty(78-78)
online/src/main/scala/ai/chronon/online/TileCodec.scala (2)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
unWindowed(301-319)unpack(275-297)api/src/main/scala/ai/chronon/api/DataType.scala (1)
unpack(235-235)
online/src/main/java/ai/chronon/online/JavaFetcher.java (4)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (2)
fetchGroupBys(183-185)fetchJoin(187-234)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (1)
fetchGroupBys(150-266)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(5-96)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
listJoins(166-218)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
fetch(48-53)fetch(57-61)fetch(65-71)
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)
flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala (1)
DeserializationSchemaBuilder(31-62)flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (1)
ProjectedEvent(95-95)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
online(149-149)online(177-181)Fetcher(55-147)Fetcher(152-759)Response(68-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: build
- GitHub Check: python_lint
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (61)
online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2)
31-35: Import narrowing toscala.collection.mutableis correctUsing
SortedMapfromscala.collection.immutableand relying on PredefSeqwhile importing onlymutableis idiomatic and avoids the redundantSeqimport. No behavior change.
192-213: ReturningnewAcc.toSeqmakeslistJoinsAPI safely immutableSwitching from returning the mutable
ArrayBuffertonewAcc.toSeqensures callers get an immutableSeq[String], preventing accidental external mutation and aligning with theFuture[Seq[String]]signature.spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)
66-81: Seq normalization looks goodNormalizing
query.wherestoSeq[String]via.asScala.toSeqand buildingbaseWheres ++ timeWheres ++ Seq(s"($keyWhereOption)")is type-safe, preserves ordering, and matches the updatedQueryUtils.buildAPI.spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala (1)
28-28: LGTM! Clean removal of unused import.Aligns with Scala 2.13 migration.
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (2)
4-4: Import update forProjectedEventlooks correct
ProjectedEventis used forresultListand matches the type returned bybuildSourceProjectionDeserSchema; cleaning up the unused schema import is fine.
75-75: Explicit.toSeqkeeps behavior and clarifies return typeConverting
resultList.asScalatoSeqbeforemap(_.fields)is behaviorally equivalent here and matches theSeq[Map[String, Any]]return type.spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
46-46: LGTM!Import cleanup for Scala 2.13 compatibility.
Seqresolves via Predef.spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1)
54-54: LGTM!Import cleanup consistent with Scala 2.13 upgrade pattern.
spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1)
41-41: LGTM!Unused import removal aligns with Scala 2.13 migration.
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
19-27: Import changes look good for Scala 2.13 compatibility.Wildcard import on line 21 brings in
ExternalSourceHandlerused by the handler classes below.
47-54: LGTM!Signature update to
scala.Seqis correct for Scala 2.13 compatibility.
56-62: LGTM!Handler correctly migrated from
JavaExternalSourceHandlerto unifiedExternalSourceHandlertrait.spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (3)
11-11: Explicitscala.collection.Mapimport is fineKeeps signatures consistent with generic
Mapwhile preserving behavior.
24-26:list.asScala.toSeqis a safer materializationAvoids exposing a mutable view and makes the
Seq[String]contract explicit; good for 2.13 collections.
41-47:allFilters.toSeq.toJavaimproves Java interop robustnessEnsures
setWheresalways sees a properSeqbefore conversion; matches the Scala/Java conversion intent.online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1)
24-26: No actionable issues found; import change is benignThe 47 call sites across the codebase pass
api.StructTypeobjects (using the second overload) or construct sequences that resolve correctly. Removingimport scala.collection.Seqdoes not tighten the type signature—Seqresolves toscala.Seq(a prelude alias forscala.collection.Seq), which remains covariant and compatible with both mutable and immutable collections. No call sites pass explicitmutable.Seqinstances, and all current usage patterns remain valid.spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (3)
16-21: LGTM!Import changes align with Scala 2.13 migration pattern. The
ai.chronon.eval._wildcard import brings in the eval result types used throughout this file.
189-196: LGTM!Consistent with the
toScalapattern used elsewhere (e.g.,GroupBy.scalaline 488). The.toSeqensures type compatibility withrunTimestampCheckssignature.
260-268: LGTM!Correctly migrates Java Map → Scala Map conversion using
JMapOps.toScala. The.toMapensures immutableMap[String, String]forcheckKeySchema.online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (1)
270-270: LGTM - Scala 2.13 migration.Replacing deprecated
mapValueswith eagermapis correct.aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2)
22-22: LGTM: Necessary import for Scala 2.13 migration.The import enables consistent Java-to-Scala iterator conversion.
Based on learnings,
ScalaJavaConversionsimports are correct for this migration.
87-97: LGTM: Clean Scala 2.13 collection conversion migration.Changes normalize Java-Scala interop and improve error handling. The explicit
scala.collection.Seq[Any]type correctly handles both mutable and immutable sequences. The fallback case provides clearer error messages than the previousMatchError.spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala (1)
38-38: Import removal aligns with Scala 2.13.Removing
scala.collection.Seqimport is correct—Scala 2.13's defaultSeqfrom Predef suffices.online/src/main/scala/ai/chronon/online/TileCodec.scala (3)
32-32: LGTM! Scala 2.13 import cleanup.Removing
Seqfrom explicit imports is correct—it's available in Predef.
40-41: LGTM! Correct materialization for Scala 2.13.The
.toSeqmaterializes the collection after.asScala, which is required in Scala 2.13.
49-50: LGTM! Consistent with buildRowAggregator.Same correct pattern: materialize with
.toSeqafter the flatMap..github/workflows/reusable_non_spark_tests.yaml (2)
1-12: Workflow conversion to reusable pattern with version input looks solid.The transition from trigger-based to
workflow_callwith a requiredscala-versioninput is clean. Environment variable binding and description are appropriate.
34-34: Mill cross-version syntax verified—no issues found.The escaping is correct and all modules use the proper
Cross[Module](versions)pattern. At runtime,\[and\]in the workflow resolve to literal[and], producing valid Mill commands like./mill flink[2.13.17].test.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)
106-106: Version 2.3 is valid and supported—confirm YARN resource calculator impact.Version 2.3 changes YARN's default resource calculator from DefaultResourceCalculator to DominantResourceCalculator, which impacts autoscaling behavior. This is a significant internal change from the deprecated 1.2. Verify that existing job configurations and autoscaling policies work as expected with this new calculator.
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
30-34:ContextualHandler.fetchsignature looks consistentUsing
scala.Seqin the param and return types aligns this override with the updated fetcher APIs elsewhere; the body remains behaviorally unchanged.online/src/main/java/ai/chronon/online/JavaFetcher.java (3)
137-142: Response conversion loop looks correct.Building
jRespsfrom the ScalaListvia an iterator is straightforward and preserves ordering; no issues seen here.
163-167: Fetcher entrypoints now consistently use JavaList+ScalaJavaConversions.toScala.
fetchGroupBys,fetchJoin, andfetchJoinBase64Avroall follow the same pattern of converting Java requests to aList<Fetcher.Request>and only then to ScalaSeqat the boundary; this looks clean and consistent.Also applies to: 171-177, 181-187
205-205:listJoinsJava-facing conversion looks good.The
FutureConverters+ScalaJavaConversions::toJavapipeline cleanly exposesFuture[Seq[String]]asCompletableFuture<List<String>>without extra boilerplate.spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
35-35: LGTM - Scala 2.13 import cleanup.Correct to remove
Seqimport since code now uses explicitscala.Seq.
294-294: Return type change verified as safe. All 6 callers use only polymorphic Seq methods (.isEmpty,.foreach,.size, comparisons) compatible with immutable Seq.api/src/main/scala/ai/chronon/api/DataType.scala (5)
237-241: LGTM!Standard Seq trait implementations correctly delegate to the underlying array.
242-242: LGTM!New helper method correctly finds field type by name.
244-251: LGTM!Correctly casts Map values to Array based on struct schema. Missing fields default to null.
253-260: LGTM!Correctly casts Map values with type coercion. Consistent with castArr behavior.
234-235: Seq type change verified compatible with codebase.The search found no incompatible usages. All StructType instances in the codebase use read-only operations (
.fields,.map,.iterator). The change toSeq(immutable in Scala 2.13) aligns with the implementation's existing contract—only providing immutable operations. Internal code handles this correctly.Note: This is a public API breaking change for external consumers. Consider documenting the Scala 2.13 migration requirement if applicable.
spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala (1)
5-7: Seq import removal looks safeUsing unqualified
SeqinmultiGet/multiPutis fine, sincescala.PredefkeepsSeqin scope for both 2.12 and 2.13, so droppingscala.collection.Seqhere should not affect compilation, and theFutureusage can continue to rely on the implicitExecutionContextprovided byKVStore. Based on learnings, this matches how otherKVStoreimplementations are written.Also applies to: 17-21
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (2)
16-16: Method not implemented.
multiGetreturns???. If this method is called, it will throw NotImplementedError.⛔ Skipped due to learnings
Learnt from: chewy-zlai Repo: zipline-ai/chronon PR: 50 File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28 Timestamp: 2024-10-31T18:29:45.027Z Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
18-37: Code is correct. Signatures match.The trait uses unqualified
Seq[Boolean], and the override usesscala.Seq[Boolean]—both resolve to the same type in Scala 2.13+. The implementation's unqualifiedSeqreturn values align with the trait's return type. No type mismatch exists.build.mill (2)
314-324: Publishing configuration looks good.Standard POM settings and version management from environment variable.
132-280: GCS publishing — gcloud present in CI; mill tasks defined but not wiredWorkflows (.github/workflows/push_to_canary.yaml and .github/workflows/publish_release.yaml) configure google-github-actions/auth@v2 and google-github-actions/setup-gcloud@v2 and already run gcloud storage cp/objects update, so gcloud is available/authenticated in CI; build.mill adds GcsPublishModule (trait at build.mill:133) with publishGcsAssembly() (≈line 182) and publishGcsJar() (≈line 233) and is mixed into BaseModule (≈line 283), but no workflow or other file invokes those mill tasks — if CI should call them, add a mill run step in the appropriate job (ensure auth + setup-gcloud are present in that job).
cloud_gcp/package.mill (2)
8-11: Cross-building wired correctly.Module extends
Cross[CloudGcpModule]and usescrossValuefor dependencies.
57-59: Test cross-wiring looks good.Test module correctly references
crossValuevariants.flink/package.mill (2)
6-11: Cross-building implementation correct.Follows standard pattern with
Cross.Module[String]andcrossValue-aware dependencies.
42-43: Test wiring correct.aggregator/package.mill (2)
6-9: Cross-building correct.
22-23: Test wiring correct.flink_connectors/package.mill (2)
6-11: Cross-building correct.
27-28: Test wiring correct.service_commons/package.mill (1)
22-23: Test wiring correct.spark/package.mill (2)
7-10: Cross-building correct.
28-30: Test wiring correct.cloud_aws/package.mill (2)
6-9: Cross-building correct.
25-28: Test wiring correct.service/package.mill (3)
2-2: Import looks correct for Mill cross-building.
mill.api._provides the necessary cross-building utilities.
6-8: Cross-building setup looks good.Standard Mill pattern:
Cross[ServiceModule]withCross.Module[String]trait, andcrossValueproperly wiring version-specific dependencies.
48-55: Test module wiring is correct.Using
crossValuefor module dependency andsuper.mvnDeps()to properly inherit base test dependencies fromBaseJUnit4TestModule.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
145-194: Java→Scala request conversion and callsites now align with Fetcher APIs.
convertJavaRequestListreturningList<Fetcher.Request>and wrapping withScalaJavaConversions.toScala(...)at all callsites fixes the earlier type mismatch; flow is consistent across groupBy/join/model-transforms paths.Minor nit: you could pre-size
scalaRequestsvianew ArrayList<>(requests.size())to avoid internal resizing.
🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
618-641: External‑request dedup refactor preserves behavior; slight DRY opportunity.Using
validRequests.flatMapplusparts.iterator().asScala.map(...)is clearer and still relies on cachedgetJoinConfdata. You could optionally reuse the initialjoinConfinstead of callingmetadataStore.getJoinConf(joinRequest.name)again when computingparts, but as written it’s functionally correct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
online/src/main/java/ai/chronon/online/JavaFetcher.java(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala(4 hunks)online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala(3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
online/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
🧬 Code graph analysis (2)
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(5-96)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (1)
fetchModelTransforms(67-87)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: python_tests
- GitHub Check: python_lint
- GitHub Check: build
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
online/src/main/java/ai/chronon/online/JavaFetcher.java (2)
127-143: Response conversion from Scala future looks correct.Manual iteration over
resps.responses().toList()into a pre-sizedArrayListis fine and keeps metrics instrumentation unchanged.
196-200:listJoinsScala→Java conversion is straightforward.Converting the Scala
SeqfrommetadataStore.listJoinstojava.util.ListviaScalaJavaConversions.toJavais appropriate here.online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala (3)
67-87:fetchModelTransformsscala.Seq migration looks consistent.Using
scala.Seq[Request]/Future[scala.Seq[Response]]at the API boundary while keeping internal logic withSeq.emptyandrequests.map(...)is fine and aligns with the project-wide scala.Seq normalization.
270-275: Safe conversion of SQL results toMap[String, AnyRef].Using
results.head.map { (k, v) => k -> v.asInstanceOf[AnyRef] }underresults.nonEmptyis a clean replacement formapValuesand keeps the expected key/value types.
277-285: Passthrough field filtering is correct and avoids deprecated APIs.
baseData.filter { case (k, _) => fieldsToPassthrough.contains(k) }with a precomputedSetpreserves semantics of the oldfilterKeyswhile being 2.13‑friendly.online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (1)
43-48: Model‑transforms path and scala.Seq usage look sound.Importing
scala.collection.mutablematches the laterLinkedHashMap/HashMapusages.
fetchModelTransformsnow:
- Accepts
scala.Seq[Request],- Groups by
req.name, runsfetchModelTransformsWithConfper group,- Pairs each response with its original index and, after
Future.sequence, flattens and sorts by index.This preserves ordering and cleanly handles both inline and metadata‑driven modelTransforms configs;
fetchModelTransformsWithConfcorrectly reuses the sameModelTransformsFetcherfor join‑source and direct paths.Also applies to: 236-308
01c074c to
95972c2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
145-199: Minor perf/naming tweaks for request conversion and callsites.
- In
convertJavaRequestList, consider pre-sizing the list to avoid growth reallocations:- List<Fetcher.Request> scalaRequests = new ArrayList<>(); + List<Fetcher.Request> scalaRequests = new ArrayList<>(requests.size());
scalaRequestsis now a JavaList; optionally rename (e.g.requestsList) for clarity.- The
fetchGroupBys/fetchJoin/fetchJoinBase64Avro/fetchModelTransforms/listJoinsusages of this helper look consistent and correct.api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (1)
13-13: Remove redundant.toSeqafter.toScala.Since
toScalaonjava.util.List[T]already returnsSeq[T](perScalaJavaConversions.scala:15), calling.toSeqis unnecessary. Change both occurrences to just.toScala:
- Line 13:
.map(_.toScala)instead of.map(_.toScala.toSeq)- Line 131:
.map(_.toScala)instead of.map(_.toScala.toSeq)Also applies to: 131-131
build.mill (1)
18-18: Scala 2.13 enablement: keep in sync with JavaConverters migration review.scalaVersions now includes 2.13.17; please only land this once the
JavaConverters → CollectionConvertersmigration and 2.13 compilation/tests called out in the earlier review are fully green.
🧹 Nitpick comments (7)
.github/workflows/reusable_non_spark_tests.yaml (1)
34-34: Remove unnecessary bracket escaping from mill commands.The backslash escaping of brackets (
\[and\]) is non-idiomatic in GitHub Actions run steps. Brackets don't require escaping in this context and can be used directly in the command string.Apply this diff to all mill test commands (shown for flink, repeat for other jobs):
- name: Run Flink tests run: | - ./mill flink\[${{ env.SCALA_VERSION }}\].test + ./mill flink[${{ env.SCALA_VERSION }}].testAlternatively, if you prefer explicit quoting for clarity:
- name: Run Flink tests run: | - ./mill flink\[${{ env.SCALA_VERSION }}\].test + ./mill "flink[${{ env.SCALA_VERSION }}].test"Also applies to: 55-55, 76-76, 97-97, 118-118, 139-139, 160-160, 181-181
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
87-97: Broaden Java input type match to avoid brittle behaviorRight now only
java.util.ArrayList[Any]is accepted; any otherjava.util.List(e.g.,LinkedList, unmodifiable lists) will now hit theIllegalArgumentException. Unless callers are guaranteed to always sendArrayList, it’s safer to matchutil.List[_]instead, and to avoidAnyin the pattern:- val anyIterator = inputVal match { - case inputList: util.ArrayList[Any] => inputList.iterator().toScala - case inputSeq: scala.collection.Seq[Any] => inputSeq.iterator + val anyIterator = inputVal match { + case inputList: util.List[_] => inputList.iterator().toScala + case inputSeq: scala.collection.Seq[_] => inputSeq.iterator case other => throw new IllegalArgumentException(s"Unsupported input type: ${other.getClass.getName}") }This keeps the fail-fast behavior while being more tolerant of upstream Java collections and avoiding unchecked-type warnings.
spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)
80-80: Change is correct;:+would be more concise.Functionally equivalent, though
:+ s"($keyWhereOption)"is the Scala idiom for single-element append. Acceptable given the PR's standardization goals.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)
108-135: PeripheralsConfig example is fine; note env-specific bits for future enablementKeeping the PeripheralsConfig / SparkHistoryServer example commented out as inline docs is reasonable and avoids unused-code issues. When you eventually enable it, consider wiring the cluster ID via config instead of a hard-coded
canarypath.spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)
169-179: StagingQueryUtil usage and setups conversion look correct; tiny DRY nit
StagingQueryUtil.from(...)+Option(...).asScala.toSeq.getOrElse(Seq.empty)is null‑safe and matches other Java→Scala conversions in this file. You could very slightly DRY by reusingstagingQueryConfforsetupsinstead ofstagingQuery.stagingQuery, but current code is fine.build.mill (2)
31-38: Avoid duplicating PomSettings configuration.BaseModule.pomSettings is effectively
Constants.pomSettingsFor(artifactName()); consider delegating to that helper to keep POM metadata in one place.Also applies to: 313-321
127-130: GCS publish tasks rely on git, gcloud, and CHRONON_VERSION env.The GcsPublishModule helpers look solid, but they will fail hard if git/gcloud are missing or CHRONON_VERSION is unset (falling back to 0.0.32); make sure your release/CI docs or configs call this out, or disable via publishToGcsEnabled where needed.
Also applies to: 132-279, 323-325
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (120)
.github/workflows/push_to_canary.yaml(2 hunks).github/workflows/reusable_non_spark_tests.yaml(9 hunks).github/workflows/reusable_spark_tests.yaml(11 hunks).github/workflows/test_grype.yaml(3 hunks).github/workflows/test_non_spark.yaml(1 hunks).github/workflows/test_spark.yaml(1 hunks)aggregator/package.mill(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala(0 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala(0 hunks)api/package.mill(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(0 hunks)api/src/main/scala/ai/chronon/api/CollectionExtensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/DataType.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(0 hunks)api/src/main/scala/ai/chronon/api/QueryUtils.scala(1 hunks)api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala(0 hunks)api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala(3 hunks)api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala(2 hunks)api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala(0 hunks)api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala(0 hunks)build.mill(5 hunks)cloud_aws/package.mill(2 hunks)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala(0 hunks)cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala(0 hunks)cloud_gcp/package.mill(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala(5 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala(3 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala(0 hunks)flink/package.mill(2 hunks)flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/FlinkJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala(2 hunks)flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala(0 hunks)flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java(1 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(2 hunks)online/src/main/scala/ai/chronon/online/Api.scala(2 hunks)online/src/main/scala/ai/chronon/online/CatalystUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala(1 hunks)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala(1 hunks)online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala(0 hunks)online/src/main/scala/ai/chronon/online/HTTPKVStore.scala(1 hunks)online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala(0 hunks)online/src/main/scala/ai/chronon/online/TileCodec.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala(4 hunks)online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala(2 hunks)online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala(3 hunks)online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala(1 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(0 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala(1 hunks)online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala(1 hunks)python/package.mill(1 hunks)service/package.mill(2 hunks)service_commons/package.mill(2 hunks)spark/package.mill(3 hunks)spark/src/main/scala/ai/chronon/spark/Analyzer.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Join.scala(4 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/batch/Eval.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala(0 hunks)spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala(9 hunks)spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala(1 hunks)
💤 Files with no reviewable changes (44)
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala
- spark/src/main/scala/ai/chronon/spark/batch/ModularMonolith.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
- api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
- flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
- online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
- online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
- flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
- online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
- spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
- flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
- flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
- spark/src/main/scala/ai/chronon/spark/batch/StepRunner.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
- aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
- api/src/main/scala/ai/chronon/api/Extensions.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
- api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
- api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
- spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
- api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
- spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
- online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
- api/src/main/scala/ai/chronon/api/CollectionExtensions.scala
- online/src/main/scala/ai/chronon/online/CatalystUtil.scala
- spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
- flink/src/main/scala/ai/chronon/flink/chaining/ChainedGroupByJob.scala
- api/src/main/scala/ai/chronon/api/Builders.scala
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
- online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
- api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala
- spark/src/main/scala/ai/chronon/spark/JoinBase.scala
- flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
- api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
✅ Files skipped from review due to trivial changes (2)
- spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
- spark/src/test/scala/ai/chronon/spark/batch/BatchNodeRunnerTest.scala
🚧 Files skipped from review as they are similar to previous changes (39)
- online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
- flink/src/main/scala/ai/chronon/flink/chaining/JoinSourceQueryFunction.scala
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
- spark/src/test/scala/ai/chronon/spark/join/UnionJoinSpec.scala
- api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
- spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
- cloud_gcp/package.mill
- api/src/main/scala/ai/chronon/api/QueryUtils.scala
- spark/src/test/scala/ai/chronon/spark/batch/MergeJobAnalyzeReuseTest.scala
- online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
- spark/src/main/scala/ai/chronon/spark/GroupBy.scala
- python/package.mill
- online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
- spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
- .github/workflows/test_non_spark.yaml
- spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
- online/src/main/scala/ai/chronon/online/fetcher/ModelTransformsFetcher.scala
- flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
- spark/src/main/scala/ai/chronon/spark/Analyzer.scala
- flink/package.mill
- spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
- online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
- aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
- spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
- spark/src/test/scala/ai/chronon/spark/fetcher/ChainingFetcherTest.scala
- online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
- api/package.mill
- spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
- spark/src/test/scala/ai/chronon/spark/utils/DataFrameGen.scala
- spark/src/main/scala/ai/chronon/spark/Extensions.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
- .github/workflows/test_spark.yaml
- cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
- spark/src/main/scala/ai/chronon/spark/Driver.scala
- spark/src/test/scala/ai/chronon/spark/fetcher/FetcherMetadataTest.scala
- online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
- spark/src/test/scala/ai/chronon/spark/groupby/GroupByUploadTest.scala
- flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
- spark/src/test/scala/ai/chronon/spark/other/MockKVStore.scala
🧰 Additional context used
🧠 Learnings (24)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2025-10-28T15:00:37.670Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala:99-104
Timestamp: 2025-10-28T15:00:37.670Z
Learning: In flink/src/main/scala/ai/chronon/flink/FlinkGroupByStreamingJob.scala, AvroCodecFn and TiledAvroCodecFn do not require explicit imports from ai.chronon.flink.deser as the code compiles and runs fine without them.
Applied to files:
online/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scalaspark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalaonline/src/main/scala/ai/chronon/online/serde/AvroConversions.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaspark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scalaflink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scalaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scalaspark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scalaspark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
online/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala.github/workflows/test_grype.yaml.github/workflows/push_to_canary.yamlonline/src/main/scala/ai/chronon/online/serde/AvroConversions.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaaggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scalaspark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scalaservice_commons/package.millflink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaapi/src/main/scala/ai/chronon/api/planner/TableDependencies.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scalabuild.millflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scalaonline/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.javaspark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scalaspark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scalaspark/src/main/scala/ai/chronon/spark/BootstrapInfo.scalaspark/src/main/scala/ai/chronon/spark/Join.scala
📚 Learning: 2024-11-03T14:51:40.825Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Applied to files:
online/src/main/scala/ai/chronon/online/TileCodec.scalaspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaaggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scalaflink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-31T18:29:45.027Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Applied to files:
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala.github/workflows/reusable_spark_tests.yamlspark/src/main/scala/ai/chronon/spark/utils/MockApi.scalacloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaonline/src/main/scala/ai/chronon/online/HTTPKVStore.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
.github/workflows/test_grype.yaml.github/workflows/push_to_canary.yamlspark/package.millcloud_aws/package.mill
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scalaonline/src/main/java/ai/chronon/online/JavaFetcher.java
📚 Learning: 2025-01-15T21:00:35.574Z
Learnt from: david-zlai
Repo: zipline-ai/chronon PR: 222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scalaapi/src/main/scala/ai/chronon/api/planner/TableDependencies.scalaspark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
📚 Learning: 2025-09-26T01:29:32.984Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1182
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala:169-191
Timestamp: 2025-09-26T01:29:32.984Z
Learning: In BigTableMetricsKvStore, schema keys (ending with "#schema") are always queried alone in multiGet requests and never mixed with regular metric keys, so there's no concern about timestamp filter leakage between different key types within the same query.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-04-21T15:10:40.819Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-09-24T00:22:38.257Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 1169
File: spark/src/main/scala/ai/chronon/spark/batch/iceberg/IcebergPartitionStatsExtractor.scala:79-94
Timestamp: 2025-09-24T00:22:38.257Z
Learning: BigTableMetricsKvStore defaults tsMillis to System.currentTimeMillis() when not provided in PutRequest, so PutRequests without explicit timestamps are handled correctly by the KV store implementation.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2024-10-17T01:02:40.431Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In `DynamoDBKVStoreImpl.scala`, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic `Exception` type in methods like `extractListValues`.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala
📚 Learning: 2025-05-19T17:50:44.503Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 789
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala:531-533
Timestamp: 2025-05-19T17:50:44.503Z
Learning: Scala 2.12 doesn't have `toIntOption`. For safely parsing strings to integers in Scala 2.12, use `scala.util.Try(string.toInt).getOrElse(...)` or check with regex using `string.matches("\\d+")` before calling `toInt`.
Applied to files:
api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
📚 Learning: 2024-10-07T15:04:30.069Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111
Timestamp: 2024-10-07T15:04:30.069Z
Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-08T16:18:45.669Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-17T00:12:09.763Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-10-07T15:09:51.567Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Applied to files:
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2024-11-06T21:54:56.160Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Applied to files:
spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala
📚 Learning: 2025-10-28T14:59:45.280Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 1247
File: flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala:107-114
Timestamp: 2025-10-28T14:59:45.280Z
Learning: In JoinEnrichmentAsyncFunction (flink/src/main/scala/ai/chronon/flink/joinrunner/JoinEnrichmentAsyncFunction.scala), events should be swallowed on join fetch errors rather than passing through original events or failing the stream. This prevents Flink app failures/restarts while avoiding incomplete data reaching downstream join source queries that expect enriched fields.
Applied to files:
spark/src/main/scala/ai/chronon/spark/Join.scala
🧬 Code graph analysis (8)
online/src/main/scala/ai/chronon/online/TileCodec.scala (3)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
unWindowed(301-319)unpack(275-297)spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)api/src/main/scala/ai/chronon/api/DataType.scala (1)
unpack(235-235)
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (2)
spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (3)
StagingQuery(16-94)StagingQuery(114-183)from(152-164)api/src/main/scala/ai/chronon/api/Extensions.scala (3)
from(327-369)setups(574-582)setups(1127-1131)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (5)
online(149-149)online(177-181)Fetcher(55-147)Fetcher(152-760)Response(68-68)online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
fetch(30-34)
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (2)
ScalaJavaConversions(5-96)IteratorOps(50-54)
online/src/main/scala/ai/chronon/online/Api.scala (2)
online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
fetch(30-34)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (3)
fetch(48-53)fetch(57-61)fetch(65-71)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (1)
spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (1)
toSeq(29-29)
online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)
multiGet(108-118)multiPut(308-359)cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
multiGet(115-156)multiPut(199-219)online/src/main/scala/ai/chronon/online/Api.scala (1)
KVStore(36-53)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1)
online/src/main/scala/ai/chronon/online/Api.scala (1)
KVStore(36-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: build
- GitHub Check: python_tests
- GitHub Check: python_lint
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
🔇 Additional comments (48)
.github/workflows/push_to_canary.yaml (3)
15-15: Verify SCALA_VERSION aligns with PR intent.PR title indicates Scala 2.13 migration, but
SCALA_VERSIONis hardcoded to2.12.18. Confirm whether this is intentional (e.g., maintaining backward compatibility as default) or if it should be updated to a 2.13.x version.
84-88: Versioned Mill targets are properly configured.The escape sequences for
${{ env.SCALA_VERSION }}are correct, and all five modules (flink, cloud_gcp, cloud_aws, service, flink_connectors) consistently use the versioned target syntax.
94-94: Artifact upload paths consistently use versioned directory structure.All five artifact uploads have been updated to include the
${{ env.SCALA_VERSION }}subdirectory in the path. The versioned paths are uniform across flink, cloud_aws, cloud_gcp, service, and flink_connectors modules.Also applies to: 100-100, 106-106, 112-112, 118-118
.github/workflows/reusable_spark_tests.yaml (2)
1-12: Reusable workflow refactoring looks solid.Clean conversion to support dynamic Scala version selection. Input parameter is properly required and documented; environment variable centralizes version management.
35-35: All test jobs consistently updated to use versioned mill selector.Each job now parameterizes the Scala version via
spark[${{ env.SCALA_VERSION }}], enabling cross-version testing. Bracket escaping is correct.Also applies to: 57-57, 79-79, 101-101, 123-123, 145-145, 167-167, 189-189, 211-211, 233-233
online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (1)
215-221: GenericData.Array wildcard pattern is appropriateMatching on
GenericData.Array[_]correctly generalizes the element type, avoiding over-specific type params while preserving the existing iteration behavior.spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (2)
34-35: Import cleanup aligns with scala.Seq usage.Dropping
scala.collection.Seqin favor ofscala.collection.mutablekeepsSeqas immutablescala.Seq, matching the rest of the file and the 2.13 migration direction.
547-566: Correctly convert mutable buffer to immutableSeqfor exception.Using
inconsistentFields.toSeqmatches theIncompatibleSchemaExceptionsignature (Seq[(String, DataType, DataType)]) and avoids leaking a mutable buffer.spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (1)
31-31: Import cleanup matches 2.13 collections usageUsing
scala.collection.{immutable, mutable}and droppingscala.collection.Seqis correct here:immutable.Seq/mutable.ListBufferstay explicit, and unqualifiedSeqcleanly resolves viascala.Seq/Predef without changing behavior..github/workflows/reusable_non_spark_tests.yaml (1)
1-12: ✓ Workflow call pattern correctly structured.The conversion to
workflow_callwith parametrized scala-version input is well-designed for cross-version testing. Naming and structure follow GitHub Actions conventions.spark/src/test/scala/ai/chronon/spark/fetcher/FetcherTestUtil.scala (1)
29-29: LGTM!Correct Scala 2.13 migration: removing explicit
scala.collection.Seqimport allows the code to usePredef.Seq(immutable by default in 2.13).spark/src/test/scala/ai/chronon/spark/other/SchemaEvolutionTest.scala (1)
39-44: RedundantSeqimport removal looks safeDropping the explicit
scala.collection.Seqimport meansSeqnow resolves toPredef.Seq(immutable.Seqin 2.13), but all usages here are test‑local and use immutable collections anyway, so behavior and test semantics remain unchanged while aligning with the wider 2.13 cleanup.flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (1)
22-22: Import cleanup LGTMRemoving the unused
scala.collection.Seqimport here (leaving only the blank separator line) keeps the test imports minimal and has no behavioral impact.flink/src/test/scala/ai/chronon/flink/chaining/ChainedGroupByJobIntegrationTest.scala (1)
20-20: Seq import removal is appropriateKeeping
JavaConvertersfor.asScalawhile dropping the redundantscala.collection.Seqimport matches the collection-normalization effort and doesn’t affect behavior.aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
22-22: IteratorOps import is appropriate hereUsing
ScalaJavaConversions.IteratorOpsto wrapjava.util.Iterator→ ScalaIteratoris consistent with the shared conversion helpers and keeps 2.13 Java/Scala interop localized. Looks good.spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)
66-66: LGTM – explicit.toSeqconversion.The
.asScala.toSeqensures an immutableSeq[String], which is correct for Scala 2.13 compatibility.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableMetricsKvStore.scala (3)
19-19: LGTM: Import simplification for Scala 2.13.Removing explicit
Seqimport aligns with cross-version compatibility, as the code now relies on Predef's defaultSeqand uses explicit.toSeqconversions where needed.
147-147: LGTM: Explicit Seq conversion for cross-version compatibility.The
.toSeqensures the collection is explicitly converted toSeq[KVStore.TimedValue], which is essential for Scala 2.13 compatibility.
236-236: LGTM: Explicit Seq conversion for cross-version compatibility.The
.toSeqensures the collection is explicitly converted toSeq[KVStore.ListValue], which is essential for Scala 2.13 compatibility.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocServerlessSubmitter.scala (1)
104-107: RuntimeConfig version bump looks safeOnly the Dataproc runtime version string changes; no other logic around
RuntimeConfigis modified. From this file’s perspective, the change is isolated and looks fine.spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)
11-11: Alias import clarifies which StagingQuery is usedGood call aliasing
ai.chronon.spark.batch.StagingQueryso the APIStagingQueryname stays unambiguous in this file.spark/src/main/scala/ai/chronon/spark/Join.scala (4)
34-34: LGTM! Import cleanup.Removed unused
Seqfrom mutable import, aligning with explicitscala.Sequsage elsewhere.
149-150: LGTM! Explicit Seq conversion.The
.toSeqconversion ensures type consistency for Scala 2.13.
208-208: LGTM! Consistent Seq conversion.Explicit
.toSeqensures the return type matches the method signature.
329-329: LGTM! Java collection conversion.The
list.asScala.toSeqpattern correctly converts Java collections to Scala Seq.spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (2)
51-76: LGTM! API signature updates.Method signatures correctly updated to use
scala.Seqfor Scala 2.13 compatibility. The.toSeqconversion at line 72 ensures proper type alignment.
89-100: LGTM! Consistent signature update.
multiPutsignature updated to match the pattern established inmultiGet.online/src/main/scala/ai/chronon/online/HTTPKVStore.scala (1)
16-37: LGTM! KVStore signature alignment.Method signatures updated to use
scala.Seq, consistent with the KVStore trait and other implementations.api/src/main/scala/ai/chronon/api/DataType.scala (1)
234-261: LGTM! StructType now Seq-compatible.The signature updates and new methods (
iterator,stringPrefix,typeOf,castArr,cast) properly extend StructType to behave as aSeq[StructField], improving ergonomics while maintaining backwards compatibility.online/src/main/scala/ai/chronon/online/Api.scala (1)
208-211: LGTM! ExternalSourceHandler trait migration.The shift from abstract class to trait with
scala.Seqsignatures aligns with Scala 2.13 patterns and maintains Java interop (traits compile to interfaces).online/src/main/java/ai/chronon/online/JavaExternalSourceHandler.java (1)
34-56: LGTM! Java interop update.Correctly changed from
extendstoimplementssinceExternalSourceHandleris now a Scala trait (compiles to Java interface).online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (1)
22-34: LGTM! Signature alignment.Import cleanup and
fetchsignature updated to usescala.Seq, consistent with the ExternalSourceHandler trait changes.spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (2)
19-27: LGTM! Import consolidation.Grouped imports for better readability.
47-72: LGTM! Mock handler migration.All mock external handlers correctly updated to extend the trait-based
ExternalSourceHandlerwithscala.Seqsignatures. The implementations properly returnFuture[scala.Seq[Fetcher.Response]].online/src/main/scala/ai/chronon/online/TileCodec.scala (2)
32-32: LGTM—Seq import cleanup aligns with Scala 2.13 defaults.Removing the explicit
Seqimport lets the code usescala.collection.immutable.Seq(from Predef), which is the Scala 2.13 standard.
40-40: LGTM—explicit.toSeqensures proper materialization.The
.toSeqcalls guarantee a concreteSeqtype afterflatMap, avoiding Scala 2.13 collection wrapper issues.Also applies to: 49-49
.github/workflows/test_grype.yaml (4)
27-28: Clean SCALA_VERSION environment variable setup.Centralizing the Scala version as an env var is a solid approach for cross-build orchestration.
47-47: Versioned mill targets consistently applied.All four module builds use the
[${{ env.SCALA_VERSION }}]syntax uniformly.Also applies to: 50-50, 53-53, 56-56
62-63: Versioned artifact paths are consistent.All Mill output paths follow the pattern
out/<module>/${{ env.SCALA_VERSION }}/assembly.dest/out.jarand are mirrored in uploads and scans. Paths properly flow through artifact upload → download → Grype scan stages.Also applies to: 82-82, 88-88, 94-94, 100-100
52-56: Original concern about Flink JARs is incorrect; actual issue is missing cloud_aws build.Verification confirms the Dockerfile does not reference Flink JARs, so lines 52–56 building them is unused for Docker. However, the Dockerfile expects
build_output/cloud_aws_lib_deploy.jar(forCLOUD_AWS_JAR_PATH), but the workflow has no build step forcloud_aws. Add the missing build and copy step:- name: Build Cloud AWS JAR run: ./mill cloud_aws\[${{ env.SCALA_VERSION }}\].assembly ...then in "Prepare build artifacts": cp out/cloud_aws/${{ env.SCALA_VERSION }}/assembly.dest/out.jar build_output/cloud_aws_lib_deploy.jar⛔ Skipped due to learnings
Learnt from: chewy-zlai Repo: zipline-ai/chronon PR: 47 File: docker-init/Dockerfile:36-38 Timestamp: 2024-10-17T01:09:24.653Z Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.Learnt from: tchow-zlai Repo: zipline-ai/chronon PR: 393 File: cloud_gcp/BUILD.bazel:99-99 Timestamp: 2025-02-22T20:30:28.381Z Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.online/src/main/java/ai/chronon/online/JavaFetcher.java (2)
97-120: Builder section: no behavioral change.Spacing around builder methods only; fluent API and semantics unchanged.
127-143: Response conversion path looks correct.Creating a pre-sized
ArrayListand filling it via the iterator overresps.responses()is simple and safe; no issues spotted.spark/package.mill (1)
7-10: LGTM!Cross-build pattern correctly implemented with
Cross.Module[String]trait andcrossValuewiring.service/package.mill (1)
6-8: LGTM!Cross-version pattern correctly applied.
aggregator/package.mill (1)
6-9: LGTM!Cross-build wiring is consistent with other modules.
online/package.mill (1)
6-9: LGTM!Cross-version module correctly configured.
cloud_aws/package.mill (1)
2-9: Cross wiring for cloud_aws looks good.Cross[CloudAwsModule], crossValue-based moduleDeps, and test.scalaVersion/moduleDeps are consistent; no blockers here.
Also applies to: 24-28
flink_connectors/package.mill (1)
2-12: Cross-version wiring for flink_connectors looks consistent.Cross module, crossValue-based deps, and test.scalaVersion/moduleDeps line up with the new BaseModule/BaseTestModule pattern.
Also applies to: 26-29
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
service/package.mill (1)
51-58: Adddef scalaVersion = crossValueto service test module.The service test module (lines 51-59 in service/package.mill) uses
crossValueinmoduleDepsbut lacks the explicitscalaVersion = crossValuedefinition. BaseJUnit4TestModule doesn't provide this—all other cross-compiled test modules (spark, flink, online, flink_connectors, cloud_gcp, cloud_aws, api, aggregator) explicitly set it in their test blocks. Add the missing line to maintain consistency.
🧹 Nitpick comments (1)
build.mill (1)
183-229: Add pre-flight checks for gcloud availability.Shell commands assume
gcloudis installed and authenticated. Consider verifying availability before attempting upload to provide clearer errors.def publishGcsAssembly() = Task.Command { if (!publishToGcsEnabled) { println(s"GCS publishing disabled for ${artifactName()}") () } else { + // Verify gcloud is available + try { + os.proc("gcloud", "version").call(stdout = os.Pipe, stderr = os.Pipe) + } catch { + case e: Exception => + throw new Exception("gcloud CLI not found. Install and authenticate gcloud before publishing.", e) + } + // Get the assembly jar path val jarPathRef = assembly()Also applies to: 234-280
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
aggregator/package.mill(2 hunks)api/package.mill(2 hunks)build.mill(5 hunks)cloud_aws/package.mill(2 hunks)cloud_gcp/package.mill(3 hunks)flink/package.mill(2 hunks)flink_connectors/package.mill(2 hunks)online/package.mill(2 hunks)service/package.mill(2 hunks)service_commons/package.mill(2 hunks)spark/package.mill(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- aggregator/package.mill
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
spark/package.millcloud_aws/package.millcloud_gcp/package.mill
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
service_commons/package.millbuild.millcloud_gcp/package.mill
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: build
- GitHub Check: python_lint
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (10)
spark/package.mill (2)
6-12: LGTM!Cross-version module wiring is clean and consistent with the pattern used across other modules.
29-32: LGTM!Test module properly wires
scalaVersionandmoduleDepsusingcrossValue.cloud_gcp/package.mill (2)
7-14: LGTM!Cross-version wiring follows established pattern.
59-62: LGTM!Test module correctly uses
crossValuefor version and dependency wiring.service/package.mill (1)
5-11: LGTM!Cross-version module pattern is consistent.
flink/package.mill (2)
5-13: LGTM!Cross-version wiring is clean.
43-45: LGTM!Test module properly configured with
crossValue.online/package.mill (2)
5-11: LGTM!Cross-version module structure is consistent with the rest of the codebase.
40-43: LGTM!Test module wiring correct.
api/package.mill (1)
25-27: CHRONON_VERSION env var is never set in publishing workflows—Thrift artifacts will always publish as "0.0.32".Verification found:
publish_release.yamlandpush_to_canary.yamlsetVERSIONand passZIPLINE_VERSIONto millapi/package.milllines 25-27 and 44-46 readCHRONON_VERSION(not set)- The Thrift
publishVersiontasks fall back to hardcoded default "0.0.32"- Inconsistent env var naming:
VERSION(workflows) vsCHRONON_VERSION(code) vsZIPLINE_VERSION(python wheel)Either export
CHRONON_VERSIONin workflows or align the env var name across all artifact types.
| val defaultScalaVersion = "2.12.18" | ||
| val scalaVersions = Seq(defaultScalaVersion, "2.13.17") |
There was a problem hiding this comment.
🧩 Analysis chain
Verify JavaConverters migration is complete for Scala 2.13.
Previous reviews found 41 deprecated scala.collection.JavaConverters imports that must be migrated to scala.jdk.CollectionConverters for Scala 2.13 compatibility.
🏁 Script executed:
#!/bin/bash
# Check for remaining deprecated JavaConverters imports
echo "=== Deprecated JavaConverters count ==="
rg -c "scala\.collection\.JavaConverters" --type scala | wc -l
echo "=== New CollectionConverters count ==="
rg -c "scala\.jdk\.CollectionConverters" --type scala | wc -l
echo "=== Files still using JavaConverters ==="
rg "scala\.collection\.JavaConverters" --type scala -l | head -20Length of output: 1464
Migration to scala.jdk.CollectionConverters is incomplete for Scala 2.13 support.
Verification confirms 41 files still use the deprecated scala.collection.JavaConverters import. With Scala 2.13.17 added at line 19, these will cause compilation/runtime failures. Migrate all 41 occurrences to scala.jdk.CollectionConverters:
- spark/src/main/scala/ (11 files)
- spark/src/test/scala/ (5 files)
- online/src/main/scala/ (6 files)
- online/src/test/scala/ (3 files)
- flink/src/test/scala/ (1 file)
- other modules (15 files)
Summary
Checklist
Summary by CodeRabbit
Infrastructure
API
Chores
✏️ Tip: You can customize this high-level summary in your review settings.