Skip to content

gh-108021: In sqlite3, use 64-bit APIs for binding and returning text and blobs #108075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Aug 17, 2023

  • Use sqlite3_bind_blob64() iso. sqlite3_bind_blob()
  • Use sqlite3_result_blob64() iso. sqlite3_result_blob()
  • Use sqlite3_bind_text64() iso. sqlite3_bind_text()
  • Use sqlite3_result_text64() iso. sqlite3_result_text()

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented Aug 17, 2023

@serhiy-storchaka: I did not add or adjust existing tests yet.

Existing tests:

test_func_return_too_large_text, test_func_return_too_large_blob, test_too_large_string, and test_too_large_blob currently require 64-bit platform and check for 32-bit sizes.

@erlend-aasland
Copy link
Contributor Author

@serhiy-storchaka: I adjusted the tests in d669736; I'm not sure if we have buildbots that can test this, though.

Comment on lines 742 to 749
#if SIZEOF_SIZE_T > 8
if (sz > 0xffffffffffffffff) {
PyErr_SetString(PyExc_OverflowError,
"string is longer than INT_MAX bytes");
"Error setting result value: "
"string is longer than 2**64-1 bytes");
return -1;
}
sqlite3_result_text(context, str, (int)sz, SQLITE_TRANSIENT);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this code is that it will be excluded from compilation on all current platforms where Python can be build and on all future platforms for long time. Therefore this code will degrade.

I do not know how better to solve this issue. Try to introduce a macro PYSQLITE_MAX_SIZE for unsigned 0xffffffffffffffffu, remove conditional compilation and cast sz to unsigned size_t before comparison. Check if any compiler emit a warning about always true condition. Now somebody at least will be able to test the code by temporary reducing the limit. At least we can be sure that the code is still syntactically correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the SQLite API consistently uses unsigned uint_64 values, and the Python C APIs (involved here) consistently used signed ssize_t values.

  • if ssize_t is less than 8 bytes, we're ok; it will fit in uint_64 anyway
  • if ssize_t is 8 bytes or larger, we must check if the value is larger than 2**63-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also the code in _pysqlite_long_as_int64 (Modules/_sqlite/util.c).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it. 😉

Anyway, it exclude from compilation simpler code.

What I propose here, to write either:

if ((size_t)sz > PYSQLITE_MAX_SIZE) {

where PYSQLITE_MAX_SIZE is defined as 0x7fffffffu (the current limit) or 0x7fffffffffffffffu (theoretical maximum), or

if ((sqlite3_uint64)sz != (size_t)sz) {

which just test the safety of integer conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; I didn't have time to look at this since yesterday. Maybe I'll find time to return to it later today. Thanks for chiming in; these C data type issues are hard to get straight!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to your former suggestion.

@unittest.skipUnless(sys.maxsize > 2**32, 'requires 64bit platform')
@support.bigmemtest(size=2**31, memuse=4, dry_run=False)
@unittest.skipUnless(sys.maxsize > 2**64, 'requires (s)size_t > 64bit')
@support.bigmemtest(size=2**64+1, memuse=4, dry_run=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not the purpose of these tests. It does not test that the code raises an expected exception. It tests that the code does not corrupt the data, does not raise unexpected exception and does not crash. What would happen if remove the overflow check.

How do existing tests fail with these changes applied? Do they stop raising DataError? It is good, we should change tests for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not test that the code raises an expected exception.

The test executes assertRaises; it does test that an exception is raised. Yes, that also implies that no corruption has happened and that sqlite3 did not segfault.

The current test code does this:

  1. assert that DataError is raised when binding very large strings or buffers
  2. select rows from the test table and assert that no row was inserted

IOW, it verifies that sqlite3 raised an exception for large input, and that no rows were added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, now the behavior is changed, it was not unintentional, and the tests should be changed as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] the tests should be changed as well.

That is part of the aim of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if left these tests as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, on my machine, they actually succeed (though I have to remove the bigmem decorators in order to force them to run), so I guess they are already broken.

sqlite3
-------

* :mod:`sqlite3` now supports strings and BLOB sizes up to ``2**64-1`` bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the hardcoded limit is still 2**31-1, according to the sources. So this change does not change much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are both compile time and runtime limits that are in effect. Perhaps we can just consider this an implementation detail and remove the NEWS/What's New entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the NEWS/What's New entries; I guess this can be considered an internal change.

Comment on lines 742 to 749
#if SIZEOF_SIZE_T > 8
if (sz > 0xffffffffffffffff) {
PyErr_SetString(PyExc_OverflowError,
"string is longer than INT_MAX bytes");
"Error setting result value: "
"string is longer than 2**64-1 bytes");
return -1;
}
sqlite3_result_text(context, str, (int)sz, SQLITE_TRANSIENT);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it. 😉

Anyway, it exclude from compilation simpler code.

What I propose here, to write either:

if ((size_t)sz > PYSQLITE_MAX_SIZE) {

where PYSQLITE_MAX_SIZE is defined as 0x7fffffffu (the current limit) or 0x7fffffffffffffffu (theoretical maximum), or

if ((sqlite3_uint64)sz != (size_t)sz) {

which just test the safety of integer conversion.

@unittest.skipUnless(sys.maxsize > 2**32, 'requires 64bit platform')
@support.bigmemtest(size=2**31, memuse=4, dry_run=False)
@unittest.skipUnless(sys.maxsize > 2**64, 'requires (s)size_t > 64bit')
@support.bigmemtest(size=2**64+1, memuse=4, dry_run=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if left these tests as is?

@erlend-aasland
Copy link
Contributor Author

I'm marking this as a draft; after all it's too experimental to consider for landing in its current form.

@erlend-aasland erlend-aasland marked this pull request as draft September 4, 2023 13:38
@erlend-aasland
Copy link
Contributor Author

I don't have the time nor interest to follow this through. Feel free to pick it up.

@erlend-aasland erlend-aasland deleted the sqlite/64-bit branch October 11, 2023 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants