@@ -1154,6 +1154,10 @@ Connection objects
1154
1154
f.write('%s\n ' % line)
1155
1155
con.close()
1156
1156
1157
+ .. seealso ::
1158
+
1159
+ :ref: `sqlite3-howto-encoding `
1160
+
1157
1161
1158
1162
.. method :: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
1159
1163
@@ -1220,6 +1224,10 @@ Connection objects
1220
1224
1221
1225
.. versionadded :: 3.7
1222
1226
1227
+ .. seealso ::
1228
+
1229
+ :ref: `sqlite3-howto-encoding `
1230
+
1223
1231
.. method :: getlimit(category, /)
1224
1232
1225
1233
Get a connection runtime limit.
@@ -1441,39 +1449,8 @@ Connection objects
1441
1449
and returns a text representation of it.
1442
1450
The callable is invoked for SQLite values with the ``TEXT `` data type.
1443
1451
By default, this attribute is set to :class: `str `.
1444
- If you want to return ``bytes `` instead, set *text_factory * to ``bytes ``.
1445
1452
1446
- Example:
1447
-
1448
- .. testcode ::
1449
-
1450
- con = sqlite3.connect(":memory: ")
1451
- cur = con.cursor()
1452
-
1453
- AUSTRIA = "Österreich"
1454
-
1455
- # by default, rows are returned as str
1456
- cur.execute("SELECT ?", (AUSTRIA,))
1457
- row = cur.fetchone()
1458
- assert row[0] == AUSTRIA
1459
-
1460
- # but we can make sqlite3 always return bytestrings ...
1461
- con.text_factory = bytes
1462
- cur.execute("SELECT ?", (AUSTRIA,))
1463
- row = cur.fetchone()
1464
- assert type(row[0]) is bytes
1465
- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
1466
- # database ...
1467
- assert row[0] == AUSTRIA.encode("utf-8")
1468
-
1469
- # we can also implement a custom text_factory ...
1470
- # here we implement one that appends "foo" to all strings
1471
- con.text_factory = lambda x: x.decode("utf-8") + "foo"
1472
- cur.execute("SELECT ?", ("bar",))
1473
- row = cur.fetchone()
1474
- assert row[0] == "barfoo"
1475
-
1476
- con.close()
1453
+ See :ref: `sqlite3-howto-encoding ` for more details.
1477
1454
1478
1455
.. attribute :: total_changes
1479
1456
@@ -1632,7 +1609,6 @@ Cursor objects
1632
1609
COMMIT;
1633
1610
""")
1634
1611
1635
-
1636
1612
.. method :: fetchone()
1637
1613
1638
1614
If :attr: `~Cursor.row_factory ` is ``None ``,
@@ -2611,6 +2587,47 @@ With some adjustments, the above recipe can be adapted to use a
2611
2587
instead of a :class: `~collections.namedtuple `.
2612
2588
2613
2589
2590
+ .. _sqlite3-howto-encoding :
2591
+
2592
+ How to handle non-UTF-8 text encodings
2593
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2594
+
2595
+ By default, :mod: `!sqlite3 ` uses :class: `str ` to adapt SQLite values
2596
+ with the ``TEXT `` data type.
2597
+ This works well for UTF-8 encoded text, but it might fail for other encodings
2598
+ and invalid UTF-8.
2599
+ You can use a custom :attr: `~Connection.text_factory ` to handle such cases.
2600
+
2601
+ Because of SQLite's `flexible typing `_, it is not uncommon to encounter table
2602
+ columns with the ``TEXT `` data type containing non-UTF-8 encodings,
2603
+ or even arbitrary data.
2604
+ To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
2605
+ encoded text, for example a table of Czech-English dictionary entries.
2606
+ Assuming we now have a :class: `Connection ` instance :py:data: `!con `
2607
+ connected to this database,
2608
+ we can decode the Latin-2 encoded text using this :attr: `~Connection.text_factory `:
2609
+
2610
+ .. testcode ::
2611
+
2612
+ con.text_factory = lambda data: str(data, encoding="latin2")
2613
+
2614
+ For invalid UTF-8 or arbitrary data in stored in ``TEXT `` table columns,
2615
+ you can use the following technique, borrowed from the :ref: `unicode-howto `:
2616
+
2617
+ .. testcode ::
2618
+
2619
+ con.text_factory = lambda data: str(data, errors="surrogateescape")
2620
+
2621
+ .. note ::
2622
+
2623
+ The :mod: `!sqlite3 ` module API does not support strings
2624
+ containing surrogates.
2625
+
2626
+ .. seealso ::
2627
+
2628
+ :ref: `unicode-howto `
2629
+
2630
+
2614
2631
.. _sqlite3-explanation :
2615
2632
2616
2633
Explanation
0 commit comments