Skip to content

[SPARK-1436] In-memory columnar storage bug fixes #374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

liancheng
Copy link
Contributor

Fixed several bugs of in-memory columnar storage to make HiveInMemoryCompatibilitySuite pass.

@rxin @marmbrus It is reasonable to include HiveInMemoryCompatibilitySuite in this PR, but I didn't, since it significantly increases test execution time. What do you think?

UPDATE HiveCompatibilitySuite has been made to cache tables in memory. HiveInMemoryCompatibilitySuite was removed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 9, 2014

A few notes.

  • This is tracked by SPARK-1436
  • We are trying to include this in 1.0. If we can't we will need to flip off cachingTables or something as this currently results in the SparkSQL returning wrong answers.
  • Running the Hive tests found tons of bugs, so I think we need to include them until there is better coverage for columnar stuff in unit tests. SPARK-1455 will make the increase in test time less of an issue.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13967/

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13968/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13978/

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13980/

@liancheng
Copy link
Contributor Author

@marmbrus I think we'd better remove either HiveInMemoryCompatabilitySuite or HiveCompatabilitySuite. Travis complains that the build time is too long (> 50 min). I prefer removing HiveCompatabilitySuite since the in-memory version should have already covered it.

@marmbrus
Copy link
Contributor

Okay, that's a good point. Why don't you just move the before and after
functions that turn on caching into hive compatibility and remove the in
memory sub class.
On Apr 11, 2014 10:18 AM, "Cheng Lian" [email protected] wrote:

@marmbrus https://github.com/marmbrus I think we'd better remove either
HiveInMemoryCompatabilitySuite or HiveCompatabilitySuite. Travis
complains that the build time is too long (> 50 min). I prefer removing
HiveCompatabilitySuite since the in-memory version should have already
covered it.

Reply to this email directly or view it on GitHubhttps://github.com//pull/374#issuecomment-40227452
.

@liancheng
Copy link
Contributor Author

OK, merged these two giant suites, hope both Travis and Jenkins are happy with this.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14055/

@marmbrus
Copy link
Contributor

Hmm, this failure seems to be related to the new Python Spark SQL API (which isn't even part of this PR). I will investigate.

@pwendell
Copy link
Contributor

@marmbrus thanks for catching this I've submitted #393 to fix it.

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14115/

@marmbrus
Copy link
Contributor

@pwendell this is ready to merge.

@pwendell
Copy link
Contributor

Thanks merged into master and 1.0

asfgit pushed a commit that referenced this pull request Apr 14, 2014
Fixed several bugs of in-memory columnar storage to make `HiveInMemoryCompatibilitySuite` pass.

@rxin @marmbrus It is reasonable to include `HiveInMemoryCompatibilitySuite` in this PR, but I didn't, since it significantly increases test execution time. What do you think?

**UPDATE** `HiveCompatibilitySuite` has been made to cache tables in memory. `HiveInMemoryCompatibilitySuite` was removed.

Author: Cheng Lian <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes #374 from liancheng/inMemBugFix and squashes the following commits:

6ad6d9b [Cheng Lian] Merged HiveCompatibilitySuite and HiveInMemoryCompatibilitySuite
5bdbfe7 [Cheng Lian] Revert 882c538 & 8426ddc, which introduced regression
882c538 [Cheng Lian] Remove attributes field from InMemoryColumnarTableScan
32cc9ce [Cheng Lian] Code style cleanup
99382bf [Cheng Lian] Enable compression by default
4390bcc [Cheng Lian] Report error for any Throwable in HiveComparisonTest
d1df4fd [Michael Armbrust] Remove test tables that might always get created anyway?
ab9e807 [Michael Armbrust] Fix the logged console version of failed test cases to use the new syntax.
1965123 [Michael Armbrust] Don't use coalesce for gathering all data to a single partition, as it does not work correctly with mutable rows.
e36cdd0 [Michael Armbrust] Spelling.
2d0e168 [Michael Armbrust] Run Hive tests in-memory too.
6360723 [Cheng Lian] Made PreInsertionCasts support SparkLogicalPlan and InMemoryColumnarTableScan
c9b0f6f [Cheng Lian] Let InsertIntoTable support InMemoryColumnarTableScan
9c8fc40 [Cheng Lian] Disable compression by default
e619995 [Cheng Lian] Bug fix: incorrect byte order in CompressionScheme.columnHeaderSize
8426ddc [Cheng Lian] Bug fix: InMemoryColumnarTableScan should cache columns specified by the attributes argument
036cd09 [Cheng Lian] Clean up unused imports
44591a5 [Cheng Lian] Bug fix: NullableColumnAccessor.hasNext must take nulls into account
052bf41 [Cheng Lian] Bug fix: should only gather compressibility info for non-null values
95b3301 [Cheng Lian] Fixed bugs in IntegralDelta
(cherry picked from commit 7dbca68)

Signed-off-by: Patrick Wendell <[email protected]>
@asfgit asfgit closed this in 7dbca68 Apr 14, 2014
@liancheng liancheng changed the title [BUGFIX] In-memory columnar storage bug fixes [SPARK-1436] In-memory columnar storage bug fixes Apr 15, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Fixed several bugs of in-memory columnar storage to make `HiveInMemoryCompatibilitySuite` pass.

@rxin @marmbrus It is reasonable to include `HiveInMemoryCompatibilitySuite` in this PR, but I didn't, since it significantly increases test execution time. What do you think?

**UPDATE** `HiveCompatibilitySuite` has been made to cache tables in memory. `HiveInMemoryCompatibilitySuite` was removed.

Author: Cheng Lian <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes apache#374 from liancheng/inMemBugFix and squashes the following commits:

6ad6d9b [Cheng Lian] Merged HiveCompatibilitySuite and HiveInMemoryCompatibilitySuite
5bdbfe7 [Cheng Lian] Revert 882c538 & 8426ddc, which introduced regression
882c538 [Cheng Lian] Remove attributes field from InMemoryColumnarTableScan
32cc9ce [Cheng Lian] Code style cleanup
99382bf [Cheng Lian] Enable compression by default
4390bcc [Cheng Lian] Report error for any Throwable in HiveComparisonTest
d1df4fd [Michael Armbrust] Remove test tables that might always get created anyway?
ab9e807 [Michael Armbrust] Fix the logged console version of failed test cases to use the new syntax.
1965123 [Michael Armbrust] Don't use coalesce for gathering all data to a single partition, as it does not work correctly with mutable rows.
e36cdd0 [Michael Armbrust] Spelling.
2d0e168 [Michael Armbrust] Run Hive tests in-memory too.
6360723 [Cheng Lian] Made PreInsertionCasts support SparkLogicalPlan and InMemoryColumnarTableScan
c9b0f6f [Cheng Lian] Let InsertIntoTable support InMemoryColumnarTableScan
9c8fc40 [Cheng Lian] Disable compression by default
e619995 [Cheng Lian] Bug fix: incorrect byte order in CompressionScheme.columnHeaderSize
8426ddc [Cheng Lian] Bug fix: InMemoryColumnarTableScan should cache columns specified by the attributes argument
036cd09 [Cheng Lian] Clean up unused imports
44591a5 [Cheng Lian] Bug fix: NullableColumnAccessor.hasNext must take nulls into account
052bf41 [Cheng Lian] Bug fix: should only gather compressibility info for non-null values
95b3301 [Cheng Lian] Fixed bugs in IntegralDelta
@liancheng liancheng deleted the inMemBugFix branch July 3, 2014 21:27
tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Jul 25, 2017
Otherwise we can get a Scalastyle error when building from SBT.
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
Otherwise we can get a Scalastyle error when building from SBT.
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Use different properties and existed id to test confliction cases
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants