You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-50851][ML][CONNECT][PYTHON] Express ML params with proto.Expression.Literal
### What changes were proposed in this pull request?
Express ML params with `proto.Expression.Literal`:
1, introduce `Literal.SpecializedArray` for large primitive literal arrays (e.g. the initial model coefficients which can be large)
```
message SpecializedArray {
oneof value_type {
Bools bools = 1;
Ints ints = 2;
Longs longs = 3;
Floats floats = 4;
Doubles doubles = 5;
Strings strings = 6;
}
message Bools {
repeated bool values = 1;
}
message Ints {
repeated int32 values = 1;
}
message Longs {
repeated int64 values = 1;
}
message Floats {
repeated float values = 1;
}
message Doubles {
repeated double values = 1;
}
message Strings {
repeated string values = 1;
}
}
```
2, Replace `proto.Param ` with `proto.Expression` to be consistent with SQL side
For `Param[Vector]` and `Param[Matrix]`, apply `proto.Expression.Literal.Struct` with the underlying schema of `VectorUDT` and `MatrixUDT`.
E.g. for `Param[Vector]` with value `Vectors.sparse(4, [(1, 1.0), (3, 5.5)])`, the message is like:
```
literal {
struct {
struct_type {
struct {
... <- schema of VectorUDT
}
}
elements {
byte: 0
}
elements {
integer: 4
}
elements {
specialized_array {
ints {
values: 1
values: 3
}
}
}
elements {
specialized_array {
doubles {
values: 1
values: 5.5
}
}
}
}
```
### Why are the changes needed?
1, to optimize large literal arrays, for both ML and SQL (we can apply it in SQL side later)
2, be consistent with SQL side, e.g. the parameterized SQL
```
// (Optional) A map of parameter names to expressions.
// It cannot coexist with `pos_arguments`.
map<string, Expression.Literal> named_arguments = 4;
// (Optional) A sequence of expressions for positional parameters in the SQL query text.
// It cannot coexist with `named_arguments`.
repeated Expression pos_arguments = 5;
```
3, to minimize the protobuf change
### Does this PR introduce _any_ user-facing change?
no, refactor-only
### How was this patch tested?
existing tests
### Was this patch authored or co-authored using generative AI tooling?
no
Closes#49529 from zhengruifeng/ml_proto_expr.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
0 commit comments