Skip to content

Commit 3c137eb

Browse files
koumame
authored andcommitted
Fix a parser bug that some data may be ignored before DOCTYPE
HackerOne: HO-1104077 For example, "x<?x y" in "x<?x y\n<!--..." is ignored. Reported by Juho Nurminen. Thanks!!!
1 parent 9b311e5 commit 3c137eb

File tree

3 files changed

+27
-8
lines changed

3 files changed

+27
-8
lines changed

lib/rexml/parsers/baseparser.rb

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -195,11 +195,9 @@ def pull_event
195195
return [ :end_document ] if empty?
196196
return @stack.shift if @stack.size > 0
197197
#STDERR.puts @source.encoding
198-
@source.read if @source.buffer.size<2
199198
#STDERR.puts "BUFFER = #{@source.buffer.inspect}"
200199
if @document_status == nil
201-
#@source.consume( /^\s*/um )
202-
word = @source.match( /^((?:\s+)|(?:<[^>]*>))/um )
200+
word = @source.match( /\A((?:\s+)|(?:<[^>]*>))/um )
203201
word = word[1] unless word.nil?
204202
#STDERR.puts "WORD = #{word.inspect}"
205203
case word
@@ -257,18 +255,16 @@ def pull_event
257255
@stack << [ :end_doctype ]
258256
end
259257
return args
260-
when /^\s+/
258+
when /\A\s+/
261259
else
262260
@document_status = :after_doctype
263-
@source.read if @source.buffer.size<2
264-
md = @source.match(/\s*/um, true)
265261
if @source.encoding == "UTF-8"
266262
@source.buffer.force_encoding(::Encoding::UTF_8)
267263
end
268264
end
269265
end
270266
if @document_status == :in_doctype
271-
md = @source.match(/\s*(.*?>)/um)
267+
md = @source.match(/\A\s*(.*?>)/um)
272268
case md[1]
273269
when SYSTEMENTITY
274270
match = @source.match( SYSTEMENTITY, true )[1]
@@ -349,7 +345,11 @@ def pull_event
349345
return [ :end_doctype ]
350346
end
351347
end
348+
if @document_status == :after_doctype
349+
@source.match(/\A\s*/um, true)
350+
end
352351
begin
352+
@source.read if @source.buffer.size<2
353353
if @source.buffer[0] == ?<
354354
if @source.buffer[1] == ?/
355355
@nsstack.shift
@@ -392,6 +392,7 @@ def pull_event
392392
unless md
393393
raise REXML::ParseException.new("malformed XML: missing tag start", @source)
394394
end
395+
@document_status = :in_element
395396
prefixes = Set.new
396397
prefixes << md[2] if md[2]
397398
@nsstack.unshift(curr_ns=Set.new)

test/parse/test_processing_instruction.rb

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,25 @@ def test_no_name
2020
<??>
2121
DETAIL
2222
end
23+
24+
def test_garbage_text
25+
# TODO: This should be parse error.
26+
# Create test/parse/test_document.rb or something and move this to it.
27+
doc = parse(<<-XML)
28+
x<?x y
29+
<!--?><?x -->?>
30+
<r/>
31+
XML
32+
pi = doc.children[1]
33+
assert_equal([
34+
"x",
35+
"y\n<!--",
36+
],
37+
[
38+
pi.target,
39+
pi.content,
40+
])
41+
end
2342
end
2443
end
2544
end

test/parser/test_ultra_light.rb

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ def test_entity_declaration
1616
nil,
1717
[:entitydecl, "name", "value"]
1818
],
19-
[:text, "\n"],
2019
[:start_element, :parent, "root", {}],
2120
[:text, "\n"],
2221
],

0 commit comments

Comments
 (0)