-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4237][BUILD] Fix MANIFEST.MF in maven assembly jar #3103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #22914 has started for PR 3103 at commit
|
Hey @scwf, I'm also working on this. As you've mentioned, the cause of this error is that somehow the assembly jar built by Maven misses a legitimate MANIFEST.MF, while HiveThriftServer2 relies on the Of course, the MANIFEST.MF issue should be fixed. On the other hand, while writing the version inspection code, I failed to notice the existing
|
Test build #22914 has finished for PR 3103 at commit
|
Test PASSed. |
@liancheng, agree with you. This PR write the right info into MANIFEST.MF, it also can fix the beeline issue, but the guava's still leave there, we should clean them. |
Just opened #3105 to fix SPARK-4225. SPARK-4237 still need to be fixed, but I'm not sure whether adding implementation/specification fields can solve this issue entirely. Haven't tried this PR locally, but I guess this PR only adds required fields to the Guava MANIFEST.MF (which replaces the Spark MANIFEST.MF in the assembly jar)? |
Yes, that's why i said should clean the Guava MANIFEST.MF. |
Then would you mind to add a [WIP] tag to your PR title? |
<manifestEntries> | ||
<Implementation-Vendor>org.apache.spark</Implementation-Vendor> | ||
<Implementation-Title>spark-assembly</Implementation-Title> | ||
<Implementation-Version>1.2.0-SNAPSHOT</Implementation-Version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the project.version instead of hard-coding this? I don't think you need or want to specify all this stuff beyond maybe vendor and version. There is no "specification" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, here i just refer to the MANIFEST.MF produced by SBT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is a change to the Maven build. SBT isn't used to produce the artifacts.
To make this clear, the issue should be: Now this PR just add these needed info to manifest file to fix this problem. Now i am trying to remove the guava's MF and make a new one (just like the SBT's mf), i am not a maven/build expert, @srowen this is ok? any idea? |
Just FYI, in #3105, HiveThriftServer2 doesn't depend on MANIFEST.MF anymore. |
Right, |
It's not good to depend on MANIFEST.MF if it's avoidable. I don't understand the discussion about changing Guava's MANIFEST.MF. Surely that's irrelevant. Maven does not use Guava's MANIFEST.MF in assembly; it does combine entries from dependency manifests. Are you saying that the combined MANIFEST.MF includes these fields from Guava, not HiveThriftServer? Really, the combined MANIFEST.MF should blank these values if anything. It sounds like we can avoid this mechanism entirely, except for Beeline, but, does that cause a problem other than logging "???" ? |
Yeah, I think it's the combined MANIFEST.MF, and i miss understand here, the MF is as follows:
Now with this MF, beeline does not work(we can not use it to connect with thrift server), logging "???" is just a small issue. |
How about changing the assembly build to blank these properties by default, instead of randomly inheriting them from Guava, and then, in the build profile that requires Beeline, set them? But, I don't see how setting "1.2.0-SNAPSHOT", the Spark version, helps Beeline? it would be wrong to set this value in the Spark assembly anyway. Really this is a Beeline problem that prevents it from being repackaged. Are you sure this is a problem? the error your cite does not appear related. |
You mean in maven profile to set |
@srowen How about this change? just add Implementation-* to MF. I think it's ok since SBT assembly jar MF also has them. |
Test build #22925 has started for PR 3103 at commit
|
@srowen Sorry for not being clear enough about the Guava stuff. So the situation is, currently if you build Spark master branch (say c8abddc) with Maven, the META-INF/MANIFEST.MF file within the assembly jar is copied from Guava 15.0, and doesn't contain legitimate information about Spark. That's why I said this PR only adds implementation/specification fields to the Guava manifest. Only Maven suffers from this issue, the SBT build is fine. In SBT, we discard all MANIFEST.MF files in dependency jar files in the assembly merge strategies, and it seems that SBT is able to generate a proper MANIFEST.MF for the Spark assembly jar. However, if we exclude MANIFEST.MF in assembly/pom.xml, no MANIFEST.MF is left in the assembly jar. |
@liancheng the MANIFEST.MF is not copied from any particular place, but is merged from all dependency manifests. This is good for libraries that need some custom property to be preserved, although I'm not aware of a particular example of that. The problem is that common properties like this collide and we end up with some arbitrary value from an arbitrary dependency. The manifest is still 'proper' but some values aren't helpful. Yes they should be blanked out or set to something Spark-specific. The Maven build can easily discard all of these values like SBT. I'm not suggesting no MANIFEST.MF. But, I still don't see how this can help Beeline. This PR sets values that are definitely wrong for it. |
Test build #22925 has finished for PR 3103 at commit
|
Test PASSed. |
@srowen Hm, I don't think the MANIFEST.MF file is a merged version. Here is its full content, which is exactly the same with the one I extracted from guava-14.0.1.jar (it's Guava 14.0.1 rather than Guava 15.0 as I mentioned above):
The assembly jar was built with the following command (was debugging Jenkins build failure then, the
Then I simply extracted the assembly jar with |
Hm, the shade plugin can merge MANIFEST.MF and I thought it was set to, but maybe it's just letting the whole files collide and Guava's wins. It may be best to exclude all MANIFEST.MF and make sure to explicitly control what goes into it. I'm still wondering how this interacts with the Beeline issue though. If this change "works" but is not setting a valid value for Beeline, then is it really the source of the issue? |
Beeline just looks for MANIFEST.MF from the jar it is loaded from, and print whatever written in However, IMO, the Beeline version info problem is not that important. What are important are two other things:
|
@scwf indicated that the Beeline issue was the cause of the TProtocolException above. If it isn't, and #3105 fixes SPARK-4225, then we're just left with improving the build by cleaning up the final MANIFEST.MF in the Maven build. How about trying just excluding MANIFEST.MF entirely and adding no entries? You can see how other files in META-INF are already excluded, just omit MANIFEST.MF too. Then this PR is just a nice-to-have on top of that, to add back a few basic entries that are appropriate for Spark. |
Ah, actually both the Beeline and the TProtocolException problem are caused by the wrong MANIFEST.MF file, since they both rely on Implementation-Version :) Agree with other parts. |
So now i move core dep to the first one and use transformer to overwrite Implementation-* and Specification-*, then the MF is the same as SBT. |
Test build #22985 has started for PR 3103 at commit
|
Test build #22985 has finished for PR 3103 at commit
|
Test PASSed. |
<artifactId>guava</artifactId> | ||
<scope>compile</scope> | ||
</dependency> | ||
<!-- Note: place this first to ensure the proper MANIFEST.MF--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the way to control MANIFEST.MF since this is ordering is not guaranteed to affect anything at all. You want to exclude MANIFEST.MF from deps entirely, probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @srowen, if we exclude MANIFEST.MF from deps entirely, there will be no MANIFEST.MF in assembly jar. I failed to find a way to add MANIFEST.MF in shade plugin(assembly jar), it seems we can only set manifestEntries but can not create a new MANIFEST.MF in shade plugin.
And another issue is that if we exclude MANIFEST.MF but set manifestEntries, then the excluding will be invalid(see #3103 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No MANIFEST.MF file isn't a problem per se, if we don't want it to contain anything anyway. Why not exclude MANIFEST.MF and also set entries you want? I don't see that option tried. but, the latter is really optional. You don't even need MANIFEST.MF at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here i place core dep first and overwrite version info by setting manifestEntries, or any advice to make a clean MANIFEST.MF in shade plugin?
Another solution is to publish a empty jar with proper MANIFEST.MF for this issue, but i do not suggest that because it make this more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering does not necessarily guarantee this result. Why do you want MANIFEST.MF at all? The discussion here suggests it is not and does not need to be used.
@srowen "Why not exclude MANIFEST.MF and also set entries you want?"
|
@scwf Yes that's what I mean. I don't even think it's necessary to set Spark properties. |
@srowen Actually i have tried that, and the result is the MANIFEST.MF still in assembly jar. I mean when set
the So your another suggestion is do not include the MANIFEST.MF in assembly jar, right? If so SBT assembly jar and Maven assembly jar have different behavior for MANIFEST.MF, it's ok? |
@scwf Yes, again, what is in MANIFEST.MF in this case? just the props you set? that's fine. Or: how about no MANIFEST.MF? is there a problem with that? I don't think what SBT does matters, but, what does SBT produce? you said it already discards other third party MANIFEST.MF files. |
SBT produce the MF as follows, it discards other third party MANIFEST.MF but fill in spark's:
|
@srowen, ok, i will exclude MANIFEST.MF so that there is no MANIFEST.MF in maven assembly jar. |
OK. This PR should produce the same output in the Maven build. If you mean that setting manifest entries causes it to ignore excludes for MANIFEST.MF... hm really? that seems wrong and like a bug. Next-best is simply exclude MANIFEST.MF from the Maven build. |
@srowen Yes, in my locally test it is and the MANIFEST.MF inherited from its first dep. |
Now this PR make Maven produce the same output with SBT. You can have a test. |
…rsion This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks. Another related PR is #3103, which tries to fix the MANIFEST issue. Author: Cheng Lian <[email protected]> Closes #3105 from liancheng/spark-4225 and squashes the following commits: d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version (cherry picked from commit 86e9eaa) Signed-off-by: Michael Armbrust <[email protected]>
retest this please |
Test build #23116 has started for PR 3103 at commit
|
Test build #23116 has finished for PR 3103 at commit
|
Test PASSed. |
ping @pwendell |
Ping |
1 similar comment
Ping |
I think this should be closed unless we can know why excluding |
yes this solution is a hack, have not found a proper way to fix this, and now we do not use MANIFEST.MF in sql/beeline, so close this. |
Build spark with maven as follows:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -Phive -DskipTests clean package
1 Then Running with spark sql jdbc/odbc, the output will be
JackydeMacBook-Pro:spark1 jackylee$ bin/beeline
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Beeline version
???
by Apache Hive2 And use beeline to connect to thrift server, get this error:
14/11/04 11:30:31 INFO ObjectStore: Initialized ObjectStore
14/11/04 11:30:31 INFO AbstractService: Service:ThriftBinaryCLIService is started.
14/11/04 11:30:31 INFO AbstractService: Service:HiveServer2 is started.
14/11/04 11:30:31 INFO HiveThriftServer2: HiveThriftServer2 started
14/11/04 11:30:31 INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:10000
14/11/04 11:33:26 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
14/11/04 11:33:26 INFO HiveMetaStore: No user is added in admin role, since config is empty
14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
14/11/04 11:33:26 ERROR TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no set value!
at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240)
at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
at org.apache.thrift.TUnion.write(TUnion.java:152)
at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:456)
at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:406)
at org.apache.hive.service.cli.thrift.TGetInfoResp.write(TGetInfoResp.java:341)
at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3754)
at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3718)
at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result.write(TCLIService.java:3669)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Actually this is because there is no
ImplementationVersion
in MANIFEST.MF for beeline when using maven. https://github.com/apache/hive/blob/release-0.13.1/beeline/src/java/org/apache/hive/beeline/BeeLine.java#L289