-
Notifications
You must be signed in to change notification settings - Fork 329
No double writing of lobs #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release80
Are you sure you want to change the base?
Conversation
I could't find any normal code path where inline quoting of LOBs would take place. But it might be used by people who generate raw SQL queries so I think it is worth having it somehow working instead of silently making it empty data. For example `#build_fixture_sql` and `#build_fixture_statements` in ActiveRecord::ConnectionAdapters::DatabaseStatements use it but we override `#insert_fixture` and `#insert_fixtures_set` which use these. Actually this `empty_nclob()` does not even work so perhaps this code path is not used for a very long time. Using it, I get an error > ORA-00904: "EMPTY_NCLOB": invalid identifier In the past such fix was rejected rsim#1588 but I think originally this empty quoting intended to workaround writing the quoted value and then `write_lob()` method inserting it again. But nowadays `write_lob()` is actually redundant and removed in another commit. So basically we have these as a last resort effort to make things work for niche use cases with the significant limitation on BLOBs length in particular.
In the very distant past, LOBs were not possible to insert together with the row. So this complicated callback method was necessary. Initially an `empty_clob()` was inserted and then the callbacks would call `write_lobs()` to insert the actually desired lobs. But nowadays it is performing just a second redundant insert. With prepared statements, the `OCI::CLOB/BLOB/NCLOB` objects are used. And the LOBs get inserted from the beginning as they should. And this also supports large LOBs. In fact presently up to 2GB - 8 bytes but this can be improved even further by making ruby-oci8 to use multiple `write` calls for the supplied argument instead of a single one which fails for large values. Not a high priority to increase max supported string size though, given that is all this huge sting needs to be in memory, possibly multiple copies. This is not a loss of functionality compared to `write_lob()` because: * `write_lobs()` also used a single `write` call * if `OCI:BLOB` would crash then presence of `write_lobs()` wouldn't help
c6336f4 to
5c6c9b9
Compare
|
Hi @akostadinov, Thank you for this PR and for identifying the double-write issue. You're right that when However, this PR breaks functionality when The IssueWhen prepared statements are disabled, the adapter can't bind LOB data as parameters. Instead, it must:
Without the callback, large LOBs fail with Test BranchesI've created some additional spec tests that demonstrate this behavior:
The additional failures on the PR branch:
Path ForwardThe double-write issue you've identified is real and worth fixing. Would you mind opening an issue to track it separately? The challenge is finding a way to skip the |
When prepared_statements is enabled (the default), LOB data is already written during INSERT via temporary LOB binding in type_cast(). The write_lobs callback was redundantly writing the same data again via SELECT FOR UPDATE. This change skips the callback in the prepared statements path while preserving it for the unprepared path, where empty_clob()/empty_blob() literals require the callback to populate LOB data after INSERT. Fixes the double-write issue identified in rsim#2483.
|
My line of thinking is that we should not be obliged to support just any code path, whether it makes sense or not. If somebody wants to disable prepared statements, then they should live with the inherent limitations of that approach. Which is 4000 chars for BLOBS, or 32k when This We may somehow force LOBs to use prepared statements in all situation, to assure things working with such configuration but then what is the point disabling prepared statements in the first place? In summary: Pure Oracle inline SQL limits blobs (but not clobs) to 4k/32k. Disabling prepared statements means user requesting inline SQL so they will be subject to this database server limitation. The normal code path of using prepared statements has no such limitations and has no drawbacks compared to the inline SQL. @andynu , thank you very much for reviewing this, let me know if you find flaws in the thinking above. Maybe I'm missing some valid use cases? P.S. Moreover, |
|
I think we should continue supporting existing code paths. Disabling LOB management for non-prepared statements would be a significant change that I, a fellow contributor and not one of the primary maintainers of this library, am not prepared to make (🥁 ). I don't want to interfere with someone else's existing use. And the use or non-use of prepared statements is a decision that you make based on the homogeneity or heterogeneity of your SQL queries. The reason I added an additional write_lobs callback was because of a need I had in a production system. And the overall write_lobs hook predates my involvement here. I think smoothing over Oracle's quirks, including working around server limitations, is part of what this library does. We're quite a ways away from 'Pure Oracle inline SQL' here. Removing that for non-prepared statements feels like a regression, even if the use case is niche. I still agree that we need to address the double write. I've drafted a modest change that I think addresses that andynu@151e629 You're right that this path may become untenable if the kubo/ruby-oci8#271 changes land. I'd rather address that when it happens than preemptively remove functionality. |
|
@andynu , so would you share your use cases that requires My tracking down of the And how about quoting that inserts |
|
Thanks for the discussion, @akostadinov. To provide some context: I encountered this exact issue in production (see #2477). Rails 8's I understand the appeal of simplifying the codebase by removing the
My approach in andynu@151e629 addresses both: skip the callback when prepared statements handle the LOB directly, preserve it when they don't. On terminology: I'd prefer we avoid characterizing code as "ugly"—it's subjective and doesn't help us evaluate whether the code is solving a real problem. The question is whether this path serves legitimate use cases (it does, as #2477 demonstrates) and whether the maintenance cost is justified. My position is that we fix the double-write issue without breaking the unprepared statements path. I've demonstrated one approach in andynu@151e629. I'm happy to discuss alternative implementations, but removing large LOB support for the unprepared statements path would be a breaking change and a regression—that's not a direction I'm willing to take. |
|
I see. It is a valid issue that So if I would rather look for options to make query log tags working with prepared statements, before tying to fix
|
|
The name changed, but the pattern pre-existed for updates. My PR only expanded it for the create case.
The original pattern was introduced 8 years ago for the update case. c5c68fe I'm open to exploring alternative solutions to make the query_log_tags_enabled feature work, but removing a feature that has been in place for 8 years (in the update condition) would be a regression and could break users relying on the non-truncated behavior. I'm very hesitant unless we can find a way that does it without the truncation. |
…ents This spec clearly demonstrates the critical difference between LOB handling with prepared statements (bind parameters) vs without (raw SQL with empty_clob()). Key points demonstrated: - With prepared_statements: true, LOB data flows through type_cast() which creates OCI8::CLOB temp LOBs. Data is populated BEFORE INSERT. - With prepared_statements: false, SQL contains empty_clob() literals. The write_lobs callback is REQUIRED to populate LOB data after INSERT. This test suite will FAIL if the lob.rb callbacks are removed, proving they are necessary for backwards compatibility with prepared_statements: false. Related to: rsim#2483
This is an important change not only because of improved performance but also because it is needed for compatibility with a potential improvement of ruby-oci8 handling of LOBs, see kubo/ruby-oci8#271. Please read that upstream pull request and the linked kubo/ruby-oci8#230 for explanation about a huge memory bloat issue when many queries retrieve LOBs.
With regards to immediate effect on this project, I think it is better to read the detailed commit messages of the 2 commits part of this PR. In a nutshell:
write_lobs()has been updating the inserted LOBs a second time after a perfectly valid insertion usingOCI::BLOB.neworCLOBorNCLOB. So it was completely redundant#insert_fixtures_setmethod had to implemented using prepared statements because the Rails default relies on@conn.quote("value")for everything including LOBs and that has a limitation of only 2k characters for BLOBs withoutMAX_STRING_SIZE=EXTENDEDalthough (N)CLOBS can be still quiet large in either configuration with the implementation I here propose.@conn.quote("value")has been updated to support LOBs instead of usingempty_(bnc)lob()and it works quiet well now although it is questionably if that method has valid use cases, still better to be working to avoid surprises in edge casesWith regards to trying out all the improvements of kubo/ruby-oci8#271 and this PR with a monkey patch in your project, you can just add to your initializers something like:
Additionally, I'm planning to submit another PR that would clean-up cursor cache when connections are returned to the ActiveRecord pool. Because even with the above improvements, still the memory consumption is high with certain hard to reproduce workloads. Reducing the cached statements limit also doesn't help consistently. So I will make it a configuration option. But to try it out:
I will appreciate any feedback!
P.S. I filed this PR against the
release80branch because master is broken and tests cannot pass but can be happily applied on top of #2480