[Bug]: No parallelization for ReadFromParquet (or any Python Read transforms) in Spark RDD Runner

### What happened?

The python `iobase.Read` transform is a splittable dofn. Since SparkRunner does not support splittable dofns, all Read operations end up on one Spark task/partition. This does not scale on any moderate+ sized workload. To fix this, I had to set the option `--experiments=pre_optimize=all`, which expands the SDF into a pair + split + read. But this option is hidden/undocumented/magic. I think it would be better if `translations.expand_sdf` was enabled on all runners that don't support SDFs.

### Issue Priority

Priority: 2

### Issue Component

Component: runner-spark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: No parallelization for ReadFromParquet (or any Python Read transforms) in Spark RDD Runner #24422

What happened?

Issue Priority

Issue Component

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: No parallelization for ReadFromParquet (or any Python Read transforms) in Spark RDD Runner #24422

Description

What happened?

Issue Priority

Issue Component

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions