xcp.xmlunwrap: encode() only if type is unicode (only for Py2)

bernhardkaindl · bernhardkaindl · commit ecc8c1f2a88d · 2023-05-08T15:28:55.000+02:00
xcp.xmlunwrap extracts XML Elements from XML, and for Python2,
the unwrapped unicode is encoded into Py2:str(bytes).

Python3 unwraps XML Text elements as the Py3:str type which is
likewise Unicode, but since Py3:str is the native type, we don't
want to encode the Py3:str to Py3:bytes as that would break the
API for use on Python3.

BEcause binary data is not legal XML content and XML Text elements
are defined to be encoded text, UTF-8 is the standard encoding,
which Python converts to.

It this fine to only encode() to Py2:str(=bytes) on Python2 as
a legacy operation which can be removed once we drop Python2.

Signed-off-by: Bernhard Kaindl &lt;bernhard.kaindl@cloud.com&gt;
diff --git a/xcp/xmlunwrap.py b/xcp/xmlunwrap.py
@@ -34,7 +34,9 @@ def getText(nodelist):
     for node in nodelist.childNodes:
         if node.nodeType == node.TEXT_NODE:
             rc = rc + node.data
-    return rc.encode().strip()
+    if not isinstance(rc, str):  # Python 2 only, otherwise it would return unicode
+        rc = rc.encode()
+    return rc.strip()
 
 def getElementsByTagName(el, tags, mandatory = False):
     matching = []
@@ -47,7 +49,9 @@ def getElementsByTagName(el, tags, mandatory = False):
 def getStrAttribute(el, attrs, default = '', mandatory = False):
     matching = []
     for attr in attrs:
-        val = el.getAttribute(attr).encode()
+        val = el.getAttribute(attr)
+        if not isinstance(val, str):  # Python 2 only, otherwise it would return unicode
+            val = val.encode()
         if val != '':
             matching.append(val)
     if len(matching) == 0: