Skip to content

Commit ecc8c1f

Browse files
xcp.xmlunwrap: encode() only if type is unicode (only for Py2)
xcp.xmlunwrap extracts XML Elements from XML, and for Python2, the unwrapped unicode is encoded into Py2:str(bytes). Python3 unwraps XML Text elements as the Py3:str type which is likewise Unicode, but since Py3:str is the native type, we don't want to encode the Py3:str to Py3:bytes as that would break the API for use on Python3. BEcause binary data is not legal XML content and XML Text elements are defined to be encoded text, UTF-8 is the standard encoding, which Python converts to. It this fine to only encode() to Py2:str(=bytes) on Python2 as a legacy operation which can be removed once we drop Python2. Signed-off-by: Bernhard Kaindl <[email protected]>
1 parent 08f0001 commit ecc8c1f

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

xcp/xmlunwrap.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@ def getText(nodelist):
3434
for node in nodelist.childNodes:
3535
if node.nodeType == node.TEXT_NODE:
3636
rc = rc + node.data
37-
return rc.encode().strip()
37+
if not isinstance(rc, str): # Python 2 only, otherwise it would return unicode
38+
rc = rc.encode()
39+
return rc.strip()
3840

3941
def getElementsByTagName(el, tags, mandatory = False):
4042
matching = []
@@ -47,7 +49,9 @@ def getElementsByTagName(el, tags, mandatory = False):
4749
def getStrAttribute(el, attrs, default = '', mandatory = False):
4850
matching = []
4951
for attr in attrs:
50-
val = el.getAttribute(attr).encode()
52+
val = el.getAttribute(attr)
53+
if not isinstance(val, str): # Python 2 only, otherwise it would return unicode
54+
val = val.encode()
5155
if val != '':
5256
matching.append(val)
5357
if len(matching) == 0:

0 commit comments

Comments
 (0)