-
Notifications
You must be signed in to change notification settings - Fork 28.7k
SPARK-1181. 'mvn test' fails out of the box since sbt assembly does not necessarily exist #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…require the assembly to pass
The standard maven build procedure should be to run Requiring maven users to invoke "sbt assembly" not only forces downloading SBT itself, but also ends up duplicating artifacts in .ivy2 and .m2. |
Merged build triggered. |
Merged build started. |
In Maven, you can run tests that depend on packages/assemblies during Maven's This approach would require us to move the integration tests into a separate test directory or package. |
Merged build finished. |
One or more automated tests failed |
OK that works, to package and then test. In the canonical Maven lifecycle, packaging comes after test, so test would not depend on packaging. In practice this is at worst a fine workaround. I agree that making these integration tests is likely the right-est answer. I will close this however as I think that's for another day; other bigger build changes might make this point irrelevant. |
Add pyspark test
[SPARK-24917] make chunk size configurable
[SPARK-24917] make chunk size configurable
…-ccm Fixes issue apache#76
### What changes were proposed in this pull request? This PR added a physical rule to remove redundant project nodes. A `ProjectExec` is redundant when 1. It has the same output attributes and order as its child's output when ordering of these attributes is required. 2. It has the same output attributes as its child's output when attribute output ordering is not required. For example: After Filter: ``` == Physical Plan == *(1) Project [a#14L, b#15L, c#16, key#17] +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 5)) +- *(1) ColumnarToRow +- FileScan parquet [a#14L,b#15L,c#16,key#17] ``` The `Project a#14L, b#15L, c#16, key#17` is redundant because its output is exactly the same as filter's output. Before Aggregate: ``` == Physical Plan == *(2) HashAggregate(keys=[key#17], functions=[sum(a#14L), last(b#15L, false)], output=[sum_a#39L, key#17, last_b#41L]) +- Exchange hashpartitioning(key#17, 5), true, [id=#77] +- *(1) HashAggregate(keys=[key#17], functions=[partial_sum(a#14L), partial_last(b#15L, false)], output=[key#17, sum#49L, last#50L, valueSet#51]) +- *(1) Project [key#17, a#14L, b#15L] +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 100)) +- *(1) ColumnarToRow +- FileScan parquet [a#14L,b#15L,key#17] ``` The `Project key#17, a#14L, b#15L` is redundant because hash aggregate doesn't require child plan's output to be in a specific order. ### Why are the changes needed? It removes unnecessary query nodes and makes query plan cleaner. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes #29031 from allisonwang-db/remove-project. Lead-authored-by: allisonwang-db <[email protected]> Co-authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
The test suite requires that "sbt assembly" has been run in order for some tests (like DriverSuite) to pass. The tests themselves say as much.
This means that a "mvn test" from a fresh clone fails.
There's a pretty simple fix, to have Maven's test-compile phase invoke "sbt assembly". I suppose the only downside is re-invoking "sbt assembly" each time tests are run.
I'm open to ideas about how to set this up more intelligently but it would be a generally good thing if the Maven build's tests passed out of the box.