Skip to content

Conversation

@scoder
Copy link
Contributor

@scoder scoder commented Jul 19, 2019

Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.

https://bugs.python.org/issue37399

… even when comments or pis are discarded.

Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
@dimaqq
Copy link
Contributor

dimaqq commented Jul 23, 2019

I think it would be nice to add a test case for Whitespace Processing, something like:

<a>text
  <!-- comment -->
  tail</a>

I can't tell off the top of my head what the text value should be :)
Apparently Py3.7 just yanks the comment and leaves a.text == "text\n \n tail" (GitHub renders this incorrectly, it's -newline-space-space-newline-space-space-)

Ref: e.g. https://docstore.mik.ua/orelly/xml/schema/ch04_02.htm

@dimaqq
Copy link
Contributor

dimaqq commented Jul 23, 2019

My compiler complains of logical ops not being explicit enough. Personally I think it's silly, but the rest of CPython code base uses explicit parentheses in such cases, if ((a && b) || c).

/Users/user.surnam/cpython/Modules/_elementtree.c:3639:35: warning: '&&' within '||' [-Wlogical-op-parentheses]
        if (target->events_append && target->pi_event_obj || target->insert_pis) {
            ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ ~~
/Users/user.surnam/cpython/Modules/_elementtree.c:3639:35: note: place parentheses around the '&&' expression to silence this warning
        if (target->events_append && target->pi_event_obj || target->insert_pis) {
                                  ^

@scoder
Copy link
Contributor Author

scoder commented Jul 23, 2019

Thanks @dimaqq, I followed both of your suggestions.

@dimaqq
Copy link
Contributor

dimaqq commented Jul 24, 2019

Hi Stefan, I'm afraid there's a memory leak, consider this program:

from xml.etree import ElementTree
while True: ElementTree.fromstring("<a>text<!--c1-->a<!--c2-->b<!--c3-->foo</a>").text

d395209, common ancestor with master branch: ~5MB, stable
21cd0bd bpo-37399_missing_tail branch: grows to ~800MB in about a minute, probably unbounded

@ZackerySpytz
Copy link
Contributor

I tested this PR locally, and I see the following when ./python -m test -R 3:3 test_xml_etree_c is run:

Run tests sequentially
0:00:00 load avg: 1.40 [1/1] test_xml_etree_c
beginning 6 repetitions
123456
......
test_xml_etree_c leaked [20, 8, 20] references, sum=48
test_xml_etree_c leaked [8, 6, 8] memory blocks, sum=22
test_xml_etree_c failed

== Tests result: FAILURE ==

1 test failed:
    test_xml_etree_c

Total duration: 3 sec 50 ms
Tests result: FAILURE

@scoder
Copy link
Contributor Author

scoder commented Jul 24, 2019

Thanks for testing and reporting the leak. I resolved it, and also cleaned up a couple of other places to make them clearer.

@scoder scoder merged commit c6cb4cd into python:master Jul 24, 2019
@miss-islington
Copy link
Contributor

Thanks @scoder for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @scoder, I could not cleanly backport this to 3.8 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker c6cb4cdd21c0c3a09b0617dbfaa7053d3bfa6def 3.8

scoder added a commit to scoder/cpython that referenced this pull request Jul 24, 2019
…nt/pi (pythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
@scoder scoder deleted the bpo-37399_missing_tail branch July 24, 2019 18:25
@bedevere-bot
Copy link

GH-14936 is a backport of this pull request to the 3.8 branch.

scoder added a commit that referenced this pull request Jul 24, 2019
…nt/pi (GH-14856) (GH-14936)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
…ythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
…ythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020
…ythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants