-
Notifications
You must be signed in to change notification settings - Fork 328
Expose DataStreamWriter.ForeachBatch API #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@@ -67,32 +66,19 @@ class DotnetBackendHandler(server: DotnetBackend) | |||
writeInt(dos, -1) | |||
} | |||
case "connectCallback" => | |||
val t = readObjectType(dis) | |||
var t = readObjectType(dis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make it simpler and allow only one call to "connectCallback"
src/csharp/Microsoft.Spark.E2ETest/IpcTests/Sql/Streaming/DataStreamWriterTests.cs
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackClient.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
@@ -86,13 +88,17 @@ public void TestForeachBatch() | |||
.WriteStream() | |||
.ForeachBatch((df, id) => | |||
{ | |||
df.Write().Csv(Path.Combine(dstTempDirectory.Path, id.ToString())); | |||
Func<Column, Column> innerUdf = Udf<int, int>(i => i + 200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work until the bug re broadcast variables is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this is related to the broadcast variable bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s_jvmBroadcastVariables is [ThreadStatic]
and is null
when the ForeachBatchCallbackHandler
is called (since it runs in a separate worker task.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok. it's in a critical path (CreatePythonFunction
). :)
- Start CallbackServer and Worker as threads instead of tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early feedback (4 C# files remaining)
src/csharp/Microsoft.Spark/Interop/Ipc/ForeachBatchCallbackHandler.cs
Outdated
Show resolved
Hide resolved
// Tests can subclass this to get Console output to display when using | ||
// xUnit testing framework. | ||
// Workaround found at https://github.com/microsoft/vstest/issues/799 | ||
public class MakeConsoleWork : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this class please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed.
.../microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/DotnetBackendHandler.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackClient.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all files now. Looking really good.
.../microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/DotnetBackendHandler.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/DotnetBackend.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackClient.scala
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark/Interop/Ipc/ForeachBatchCallbackHandler.cs
Outdated
Show resolved
Hide resolved
src/scala/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/DotnetBackend.scala
Outdated
Show resolved
Hide resolved
...la/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/api/dotnet/CallbackConnection.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @suhsteve for working so hard on this!
@imback82 thanks for reviewing! Let me add the relevant files to |
This PR exposes the
DataStreamWriter.ForeachBatch
API#208
Users can call this API by using an
Action<DataFrame, long>
delegate. For exampleLogs produced on the dotnet side:
Logs produced on the JVM side: