[SPARK-41206][SQL][FOLLOWUP] Make result of checkColumnNameDuplication stable to fix COLUMN_ALREADY_EXISTS check failed with Scala 2.13

LuciferYang · beliefer · commit 01b4f380da59 · 2022-12-18T08:44:36.000+08:00
### What changes were proposed in this pull request? This pr add a sort when `columnAlreadyExistsError` will be thrown to make the result of `SchemaUtils#checkColumnNameDuplication` stable. ### Why are the changes needed? Fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GA - Manual test: ``` dev/change-scala-version.sh 2.13 build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameSuite" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonV1Suite" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonV2Suite" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonLegacyTimeParserSuite" -Pscala-2.13 ``` All tests passed Closes apache#38764 from LuciferYang/SPARK-41206. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
@@ -107,7 +107,7 @@ private[spark] object SchemaUtils {
     val names = if (caseSensitiveAnalysis) columnNames else columnNames.map(_.toLowerCase)
     // scalastyle:on caselocale
     if (names.distinct.length != names.length) {
-      val columnName = names.groupBy(identity).collectFirst {
+      val columnName = names.groupBy(identity).toSeq.sortBy(_._1).collectFirst {
         case (x, ys) if ys.length > 1 => x
       }.get
       throw QueryCompilationErrors.columnAlreadyExistsError(columnName)