Skip to content

Move JRuby extension to SnakeYAML Engine #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 20, 2023

Conversation

headius
Copy link
Contributor

@headius headius commented Jan 13, 2023

See jruby/jruby#7570 for some of the justification for this move. We only require the parser from SnakeYAML, but in the original form it is encumbered with Java object serialization code that keeps getting flagged as a CVE risk. We disagree with the assessment, at least as it pertains to JRuby (we do not use the code in question) but our inclusion of the library continues to get flagged by auditing tools.

This PR starts the process of moving to the successor library, SnakeYAML Engine. The parser API is largely unchanged, except as seen in this commit. No Java exceptions are thrown, but a number of Psych tests fail (possibly due to Engine being YAML 1.2 only).

@headius headius changed the title Initial move to SnakeYAML Engine Move JRuby extension to SnakeYAML Engine Jan 13, 2023
@asomov
Copy link

asomov commented Jan 13, 2023

The failures might be caused by the schema. Version 2.5 is taking the JSON schema, version 2.6 (released yesterday) supports also the Core schema which was used in SnakeYAML. But it must be explicitly enabled (JSON is the default)
More on schemas: https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas

@headius
Copy link
Contributor Author

headius commented Jan 13, 2023

@asomov Thanks for that, but it doesn't seem to impact the failures too much. I get 32F and 2E with my latest local fixes. Nearly all of these fail due to the %YAML 1.2 header being emitted where no header was emitted before. Forcing the version to 1.1 reduces it to 27F. I don't see a way to turn off the explicit header.

I'll push my latest after some cleanup.

@asomov
Copy link

asomov commented Jan 13, 2023

@headius it looks like a bug - I am pretty sure that the %YAML 1.2 directive is not emitted by default. I will check...

@headius
Copy link
Contributor Author

headius commented Jan 13, 2023

@asomov I figured out how to disable it, by passing an empty Optional into the DocumentStartEvent constructor. Now I'm down to 7F, 1E...mostly failing tests that want the YAML directive to be present. 🤣

See jruby/jruby#7570 for some of the justification for this move. We only
require the parser from SnakeYAML, but in the original form it is
encumbered with Java object serialization code that keeps getting
flagged as a CVE risk. We disagree with the assessment, at least
as it pertains to JRuby (we do not use the code in question) but
our inclusion of the library continues to get flagged by auditing
tools.

This commit starts the process of moving to the successor library,
SnakeYAML Engine. The parser API is largely unchanged, except as
seen in this commit. No Java exceptions are thrown, but a number
of Psych tests fail (possibly due to Engine being YAML 1.2 only).
This eliminates the %YAML 1.2 directive at the start of each emit,
which improves tests passing but also breaks a few tests that
*expect* the YAML directive to be present.
@headius headius mentioned this pull request Jan 13, 2023
@headius
Copy link
Contributor Author

headius commented Jan 13, 2023

With latest push, I'm down to only one error! The header handling was me not honoring Psych's way of indicating whether a version header should or should not be printed.

The remaining error is a test case that triggers a SyntaxError in Engine:

        assert_to_yaml(
            [{"arrival"=>"EDI", "departure"=>"LAX", "fareref"=>"DOGMA", "currency"=>"GBP"}, {"arrival"=>"MEL", "departure"=>"SYD", "fareref"=>"MADF", "currency"=>"AUD"}, {"arrival"=>"MCO", "departure"=>"JFK", "fareref"=>"DFSF", "currency"=>"USD"}], <<EOY
  -
    &F fareref: DOGMA
    &C currency: GBP
    &D departure: LAX
    &A arrival: EDI
  - { *F: MADF, *C: AUD, *D: SYD, *A: MEL }
  - { *F: DFSF, *C: USD, *D: JFK, *A: MCO }
EOY
        )

The <<EOY ... EOY is a multiline string. I'm not sure what this error means.

@headius
Copy link
Contributor Author

headius commented Jan 13, 2023

Oops, here's that error:

Error: test_spec_anchors_and_aliases(Psych_Unit_Tests): Psych::SyntaxError: (<unknown>): expected ',' or '}', but got <scalar> while parsing a flow mapping at line 6 column 11

Ruby trace:

Error: test_spec_anchors_and_aliases(Psych_Unit_Tests): Psych::SyntaxError: (<unknown>): expected ',' or '}', but got <scalar> while parsing a flow mapping at line 6 column 11
org/jruby/ext/psych/PsychParser.java:274:in `_native_parse'
/home/headius/work/psych/lib/psych/parser.rb:62:in `parse'
/home/headius/work/psych/lib/psych.rb:455:in `parse_stream'
/home/headius/work/psych/lib/psych.rb:399:in `parse'
/home/headius/work/psych/lib/psych.rb:323:in `safe_load'
/home/headius/work/psych/lib/psych.rb:369:in `load'
/home/headius/work/psych/test/psych/helper.rb:45:in `assert_to_yaml'
/home/headius/work/psych/test/psych/test_yaml.rb:219:in `test_spec_anchors_and_aliases'

Full trace with Java lines (JIT threshold 0 to reduce interpreter frames):

Error: test_spec_anchors_and_aliases(Psych_Unit_Tests): Psych::SyntaxError: (<unknown>): expected ',' or '}', but got <scalar> while parsing a flow mapping at line 6 column 11
java/lang/Thread.java:1610:in `getStackTrace'
org/jruby/runtime/backtrace/TraceType.java:247:in `getBacktraceData'
org/jruby/runtime/backtrace/TraceType.java:53:in `getBacktrace'
org/jruby/RubyException.java:402:in `captureBacktrace'
org/jruby/exceptions/RaiseException.java:208:in `preRaise'
org/jruby/exceptions/RaiseException.java:65:in `<init>'
org/jruby/exceptions/Exception.java:39:in `<init>'
org/jruby/exceptions/StandardError.java:38:in `<init>'
org/jruby/exceptions/RuntimeError.java:38:in `<init>'
org/jruby/RubyRuntimeError.java:52:in `constructThrowable'
org/jruby/RubyException.java:364:in `toThrowable'
org/jruby/RubyKernel.java:922:in `raise'
org/jruby/ext/psych/PsychParser.java:407:in `raiseParserException'
org/jruby/ext/psych/PsychParser.java:274:in `parse'
/home/headius/work/psych/lib/psych/parser.rb:62:in `parse'
/home/headius/work/psych/lib/psych.rb:455:in `parse_stream'
/home/headius/work/psych/lib/psych.rb:399:in `parse'
/home/headius/work/psych/lib/psych.rb:323:in `safe_load'
/home/headius/work/psych/lib/psych.rb:369:in `load'
org/jruby/internal/runtime/methods/CompiledIRMethod.java:139:in `call'
org/jruby/internal/runtime/methods/CompiledIRMethod.java:162:in `call'
org/jruby/internal/runtime/methods/MixedModeIRMethod.java:185:in `call'
org/jruby/RubyClass.java:530:in `finvokeWithRefinements'
org/jruby/RubyBasicObject.java:1715:in `send'
org/jruby/RubyKernel.java:2235:in `send'
org/jruby/RubyKernel$INVOKER$s$send.gen:-1:in `call'
org/jruby/internal/runtime/methods/JavaMethod.java:444:in `call'
/home/headius/work/psych/test/psych/helper.rb:45:in `assert_to_yaml'
org/jruby/internal/runtime/methods/CompiledIRMethod.java:139:in `call'
org/jruby/internal/runtime/methods/MixedModeIRMethod.java:112:in `call'
org/jruby/ir/targets/indy/InvokeSite.java:208:in `invoke'
/home/headius/work/psych/test/psych/test_yaml.rb:219:in `test_spec_anchors_and_aliases'

@headius
Copy link
Contributor Author

headius commented Jan 13, 2023

@hsbt @tenderlove This is very close to 100%. My reading of the YAML 1.2 spec seems to indicate that it should be mostly a superset of 1.1, but maybe you have some concerns about the JRuby extension updating to 1.2 before the C extension.

For context, see jruby/jruby#7570 where we are dealing with some CVE reports against the old SnakeYAML. To summarize, SnakeYAML has features that allow dumping and loading Java objects, similar to Psych's support for dumping and loading Ruby objects. And like Psych, SnakeYAML has had this feature reported as being exploitable. There's ongoing debate about how to handle this in SnakeYAML, since the exploit is the feature (object serialization), but we can dodge the issue by moving to SnakeYAML Engine since it does not have the same feature.

Note that JRuby has never used this serialization feature, but just having SnakeYAML ship with JRuby causes us to be flagged as well.

@asomov
Copy link

asomov commented Jan 14, 2023

@headius sorry, I did not quite catch how I can help. The Engine indeed had some modifications to be closer to the YAML 1.2 spec.

@headius
Copy link
Contributor Author

headius commented Jan 14, 2023

@asomov I don't know how to interpret that syntax error. It may be a valid rejection of off-spec yaml.

@hsbt
Copy link
Member

hsbt commented Jan 16, 2023

@headius Thanks for your works. I'm +1 to migrate SnakeYAML to SnakeYAML-Engine for JRuby.

It's great first step to support YAML 1.2 spec because I didn't understand the details of YAML 1.2 spec yet 😇

@headius
Copy link
Contributor Author

headius commented Jan 18, 2023

The last two things that fail are both within test_spec_anchors_and_aliases:

assert_to_yaml(
[{"arrival"=>"EDI", "departure"=>"LAX", "fareref"=>"DOGMA", "currency"=>"GBP"}, {"arrival"=>"MEL", "departure"=>"SYD", "fareref"=>"MADF", "currency"=>"AUD"}, {"arrival"=>"MCO", "departure"=>"JFK", "fareref"=>"DFSF", "currency"=>"USD"}], <<EOY
-
&F fareref: DOGMA
&C currency: GBP
&D departure: LAX
&A arrival: EDI
- { *F: MADF, *C: AUD, *D: SYD, *A: MEL }
- { *F: DFSF, *C: USD, *D: JFK, *A: MCO }
EOY
)
assert_to_yaml(
{"ALIASES"=>["fareref", "currency", "departure", "arrival"], "FARES"=>[{"arrival"=>"EDI", "departure"=>"LAX", "fareref"=>"DOGMA", "currency"=>"GBP"}, {"arrival"=>"MEL", "departure"=>"SYD", "fareref"=>"MADF", "currency"=>"AUD"}, {"arrival"=>"MCO", "departure"=>"JFK", "fareref"=>"DFSF", "currency"=>"USD"}]}, <<EOY
---
ALIASES: [&f fareref, &c currency, &d departure, &a arrival]
FARES:
- *f: DOGMA
*c: GBP
*d: LAX
*a: EDI
- *f: MADF
*c: AUD
*d: SYD
*a: MEL
- *f: DFSF
*c: USD
*d: JFK
*a: MCO
EOY
)

The YAML in these two sections follows, and as far as I can tell it does not parse with any online YAML checkers:

  -
    &F fareref: DOGMA
    &C currency: GBP
    &D departure: LAX
    &A arrival: EDI
  - { *F: MADF, *C: AUD, *D: SYD, *A: MEL }
  - { *F: DFSF, *C: USD, *D: JFK, *A: MCO }
---
ALIASES: [&f fareref, &c currency, &d departure, &a arrival]
FARES:
- *f: DOGMA
  *c: GBP
  *d: LAX
  *a: EDI

- *f: MADF
  *c: AUD
  *d: SYD
  *a: MEL

- *f: DFSF
  *c: USD
  *d: JFK
  *a: MCO

The errors for these two cases, with the text coming from SnakeYAML, are below:

Error: test_spec_anchors_and_aliases(Psych_Unit_Tests): Psych::SyntaxError: (<unknown>): expected ',' or '}', but got <scalar> while parsing a flow mapping at line 6 column 11
org/jruby/ext/psych/PsychParser.java:274:in `_native_parse'
/home/headius/work/psych/lib/psych/parser.rb:62:in `parse'
/home/headius/work/psych/lib/psych.rb:455:in `parse_stream'
/home/headius/work/psych/lib/psych.rb:399:in `parse'
/home/headius/work/psych/lib/psych.rb:323:in `safe_load'
/home/headius/work/psych/lib/psych.rb:369:in `load'
/home/headius/work/psych/test/psych/helper.rb:45:in `assert_to_yaml'
/home/headius/work/psych/test/psych/test_yaml.rb:219:in `test_spec_anchors_and_aliases'
     216: EOY
     217: 	 	)
     218: 
  => 219:         assert_to_yaml(
     220:             [{"arrival"=>"EDI", "departure"=>"LAX", "fareref"=>"DOGMA", "currency"=>"GBP"}, {"arrival"=>"MEL", "departure"=>"SYD", "fareref"=>"MADF", "currency"=>"AUD"}, {"arrival"=>"MCO", "departure"=>"JFK", "fareref"=>"DFSF", "currency"=>"USD"}], <<EOY
     221:   -
     222:     &F fareref: DOGMA

and

Error: test_spec_anchors_and_aliases(Psych_Unit_Tests): Psych::SyntaxError: (<unknown>): expected <block end>, but found '<scalar>' while parsing a block mapping at line 4 column 7
org/jruby/ext/psych/PsychParser.java:274:in `_native_parse'
/home/headius/work/psych/lib/psych/parser.rb:62:in `parse'
/home/headius/work/psych/lib/psych.rb:455:in `parse_stream'
/home/headius/work/psych/lib/psych.rb:399:in `parse'
/home/headius/work/psych/lib/psych.rb:323:in `safe_load'
/home/headius/work/psych/lib/psych.rb:369:in `load'
/home/headius/work/psych/test/psych/helper.rb:45:in `assert_to_yaml'
/home/headius/work/psych/test/psych/test_yaml.rb:231:in `test_spec_anchors_and_aliases'
     228: # EOY
     229: #         )
     230: 
  => 231:         assert_to_yaml(
     232:             {"ALIASES"=>["fareref", "currency", "departure", "arrival"], "FARES"=>[{"arrival"=>"EDI", "departure"=>"LAX", "fareref"=>"DOGMA", "currency"=>"GBP"}, {"arrival"=>"MEL", "departure"=>"SYD", "fareref"=>"MADF", "currency"=>"AUD"}, {"arrival"=>"MCO", "departure"=>"JFK", "fareref"=>"DFSF", "currency"=>"USD"}]}, <<EOY
     233: ---
     234: ALIASES: [&f fareref, &c currency, &d departure, &a arrival]

I've stared at the specs for a bit but can't figure out how this YAML fits into the grammar. Perhaps these cases are no longer valid in YAML 1.2? @asomov @hsbt @tenderlove

@asomov
Copy link

asomov commented Jan 18, 2023

@headius this is valid in YAML 1.2:

-
  &F fareref: DOGMA
  &C currency: GBP
  &D departure: LAX
  &A arrival: EDI
- { *F : MADF, *C : AUD, *D : SYD, *A : MEL }
- { *F : DFSF, *C : USD, *D : JFK, *A : MCO }

mind the extra space between the alias and the colon. It is required because a colon can be a part of alias (*F: and *F are both valid aliases)
Let me double check it...

@headius
Copy link
Contributor Author

headius commented Jan 18, 2023

@asomov Aha! If I update the YAML with the extra space, it does pass correctly under SnakeYAML Engine!

@hsbt @tenderlove: Would we prefer to test only the new 1.2 syntax (which parses fine on 1.0 and 1.1) or test both with a version guard of some kind?

I will push a change to the test for now.

This allows these tests to pass on SnakeYAML Engine -- which is a
1.2-only YAML library -- while still passing on libyaml 1.1.
@headius headius marked this pull request as ready for review January 18, 2023 18:01
@tenderlove
Copy link
Member

Would we prefer to test only the new 1.2 syntax (which parses fine on 1.0 and 1.1) or test both with a version guard of some kind?

If the 1.2 syntax works on 1.0 and 1.1, then probably just update it to use the 1.2 syntax? I don't think there is a reason to have version specific tests (at least I can't think of a reason).

@headius
Copy link
Contributor Author

headius commented Jan 18, 2023

just update it to use the 1.2 syntax?

Done! The most recent commit in this branch updates the test and everything is green.

If we could embed indy call sites here they would cache as
constants; this is the best we can do at the moment.
@headius
Copy link
Contributor Author

headius commented Jan 20, 2023

@hsbt @tenderlove Any objections to merging this? Does it need a major version update? Any concerns about releasing it soon?

@tenderlove tenderlove merged commit 2d472f5 into ruby:master Jan 20, 2023
@tenderlove
Copy link
Member

Does it need a major version update?

I am unsure. If it's backwards compatible, then I don't see any need for a major version increase.

@hsbt
Copy link
Member

hsbt commented Jan 20, 2023

👍 I'm +1 to bump "5.1.0" for this.

@headius headius deleted the snakeyaml_engine branch January 23, 2023 17:40
@headius
Copy link
Contributor Author

headius commented Jan 23, 2023

Thanks so much everyone! Looking forward to incorporating this in JRuby!

@headius
Copy link
Contributor Author

headius commented Jan 23, 2023

Oh, @hsbt @tenderlove if we can push a 5.1.pre gem I can run it through JRuby's CI and make sure everything looks ok there too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants