Skip to content

Conversation

@akostadinov
Copy link
Contributor

@akostadinov akostadinov commented Dec 21, 2025

This is an important change not only because of improved performance but also because it is needed for compatibility with a potential improvement of ruby-oci8 handling of LOBs, see kubo/ruby-oci8#271. Please read that upstream pull request and the linked kubo/ruby-oci8#230 for explanation about a huge memory bloat issue when many queries retrieve LOBs.

With regards to immediate effect on this project, I think it is better to read the detailed commit messages of the 2 commits part of this PR. In a nutshell:

  • write_lobs() has been updating the inserted LOBs a second time after a perfectly valid insertion using OCI::BLOB.new or CLOB or NCLOB. So it was completely redundant
  • the #insert_fixtures_set method had to implemented using prepared statements because the Rails default relies on @conn.quote("value") for everything including LOBs and that has a limitation of only 2k characters for BLOBs without MAX_STRING_SIZE=EXTENDED although (N)CLOBS can be still quiet large in either configuration with the implementation I here propose.
  • @conn.quote("value") has been updated to support LOBs instead of using empty_(bnc)lob() and it works quiet well now although it is questionably if that method has valid use cases, still better to be working to avoid surprises in edge cases

With regards to trying out all the improvements of kubo/ruby-oci8#271 and this PR with a monkey patch in your project, you can just add to your initializers something like:

    # disable the write_lobs callback
    ActiveRecord::Base.skip_callback(:update, :after, :enhanced_write_lobs)

    # retrieve LOBs as LONG and LONGRAW
    OCI8::BindType::Mapping[:clob] = OCI8::BindType::Long
    OCI8::BindType::Mapping[:blob] = OCI8::BindType::LongRaw

    # disable ruby-oci8 static array fetching when lobs are detected
    module OCI8DisableArrayFetch
      private
      def define_one_column(pos, param)
        @fetch_array_size = nil # disable memory array fetching anytime
        super # call original
      end
    end
    
    OCI8::Cursor.prepend(OCI8DisableArrayFetch)

Additionally, I'm planning to submit another PR that would clean-up cursor cache when connections are returned to the ActiveRecord pool. Because even with the above improvements, still the memory consumption is high with certain hard to reproduce workloads. Reducing the cached statements limit also doesn't help consistently. So I will make it a configuration option. But to try it out:

    # clean-up prepared statements/cursors on connection return to pool
    module OracleStatementCleanup
      def self.included(base)
        base.set_callback :checkin, :after, :close_and_clear_statements
      end

      def close_and_clear_statements
        @statements&.clear
      end
    end

    ActiveRecord::ConnectionAdapters::OracleEnhancedAdapter.include OracleStatementCleanup

I will appreciate any feedback!

P.S. I filed this PR against the release80 branch because master is broken and tests cannot pass but can be happily applied on top of #2480

I could't find any normal code path where inline quoting of LOBs would
take place. But it might be used by people who generate raw SQL queries
so I think it is worth having it somehow working instead of silently
making it empty data.

For example `#build_fixture_sql` and `#build_fixture_statements` in
ActiveRecord::ConnectionAdapters::DatabaseStatements use it but we
override `#insert_fixture` and `#insert_fixtures_set` which use these.

Actually this `empty_nclob()` does not even work so perhaps this code
path is not used for a very long time. Using it, I get an error

> ORA-00904: "EMPTY_NCLOB": invalid identifier

In the past such fix was rejected rsim#1588 but I think originally this
empty quoting intended to workaround writing the quoted value and then
`write_lob()` method inserting it again. But nowadays `write_lob()` is
actually redundant and removed in another commit.

So basically we have these as a last resort effort to make things work
for niche use cases with the significant limitation on BLOBs length in
particular.
In the very distant past, LOBs were not possible to insert together with
the row. So this complicated callback method was necessary. Initially
an `empty_clob()` was inserted and then the callbacks would call
`write_lobs()` to insert the actually desired lobs.

But nowadays it is performing just a second redundant insert. With
prepared statements, the `OCI::CLOB/BLOB/NCLOB` objects are used. And
the LOBs get inserted from the beginning as they should. And this also
supports large LOBs.

In fact presently up to 2GB - 8 bytes but this can be improved even
further by making ruby-oci8 to use multiple `write` calls for the supplied
argument instead of a single one which fails for large values.

Not a high priority to increase max supported string size though, given
that is all this huge sting needs to be in memory, possibly multiple
copies.

This is not a loss of functionality compared to `write_lob()` because:
* `write_lobs()` also used a single `write` call
* if `OCI:BLOB` would crash then presence of `write_lobs()` wouldn't
  help
@andynu
Copy link
Collaborator

andynu commented Dec 27, 2025

Hi @akostadinov,

Thank you for this PR and for identifying the double-write issue. You're right that when prepared_statements: true (the default), LOB data flows through type_cast() which creates temporary LOBs that Oracle handles directly during INSERT — making the subsequent write_lobs callback redundant.

However, this PR breaks functionality when prepared_statements: false.

The Issue

When prepared statements are disabled, the adapter can't bind LOB data as parameters. Instead, it must:

  1. INSERT with empty_clob()/empty_blob() literals
  2. Use the write_lobs callback to SELECT ... FOR UPDATE and write the LOB content

Without the callback, large LOBs fail with ORA-01704: string literal too long (Oracle's 4000-byte limit for literals).

Test Branches

I've created some additional spec tests that demonstrate this behavior:

The additional failures on the PR branch:

  • 100KB BLOB creation → ORA-01704: string literal too long
  • SQL verification expecting empty_clob() + FOR UPDATE pattern

Path Forward

The double-write issue you've identified is real and worth fixing. Would you mind opening an issue to track it separately?

The challenge is finding a way to skip the write_lobs hooks when on the prepared statements path, while preserving them for the unprepared case.

andynu added a commit to andynu/oracle-enhanced that referenced this pull request Dec 27, 2025
When prepared_statements is enabled (the default), LOB data is already
written during INSERT via temporary LOB binding in type_cast(). The
write_lobs callback was redundantly writing the same data again via
SELECT FOR UPDATE.

This change skips the callback in the prepared statements path while
preserving it for the unprepared path, where empty_clob()/empty_blob()
literals require the callback to populate LOB data after INSERT.

Fixes the double-write issue identified in rsim#2483.
@akostadinov
Copy link
Contributor Author

My line of thinking is that we should not be obliged to support just any code path, whether it makes sense or not.

If somebody wants to disable prepared statements, then they should live with the inherent limitations of that approach. Which is 4000 chars for BLOBS, or 32k when MAX_STRING_SIZE=EXTENDED is set. For CLOBS and NCLOBS there is no such limitation with this implementation.

This write_lobs approach is still something that is not a normal SQL statement. If somebody wants pure inline SQL (is this the correct term?) for whatever reason, then write_lobs doesn't truly help.

We may somehow force LOBs to use prepared statements in all situation, to assure things working with such configuration but then what is the point disabling prepared statements in the first place?

In summary: Pure Oracle inline SQL limits blobs (but not clobs) to 4k/32k. Disabling prepared statements means user requesting inline SQL so they will be subject to this database server limitation. The normal code path of using prepared statements has no such limitations and has no drawbacks compared to the inline SQL.

@andynu , thank you very much for reviewing this, let me know if you find flaws in the thinking above. Maybe I'm missing some valid use cases?

P.S. Moreover, SELECT ... FOR UPDATE does not work when using the efficient LOB retrieval method as described kubo/ruby-oci8#271 (and can easily be configured in an initializer presently), so this is yet another limitation which makes the exercise of keeping the write_lobs code path more of a nuisance and maintenance burden than actually help valid use cases.

@andynu
Copy link
Collaborator

andynu commented Dec 28, 2025

I think we should continue supporting existing code paths. Disabling LOB management for non-prepared statements would be a significant change that I, a fellow contributor and not one of the primary maintainers of this library, am not prepared to make (🥁 ). I don't want to interfere with someone else's existing use. And the use or non-use of prepared statements is a decision that you make based on the homogeneity or heterogeneity of your SQL queries. The reason I added an additional write_lobs callback was because of a need I had in a production system. And the overall write_lobs hook predates my involvement here.

I think smoothing over Oracle's quirks, including working around server limitations, is part of what this library does. We're quite a ways away from 'Pure Oracle inline SQL' here. Removing that for non-prepared statements feels like a regression, even if the use case is niche.

I still agree that we need to address the double write. I've drafted a modest change that I think addresses that andynu@151e629

You're right that this path may become untenable if the kubo/ruby-oci8#271 changes land. I'd rather address that when it happens than preemptively remove functionality.

@akostadinov
Copy link
Contributor Author

@andynu , so would you share your use cases that requires write_lobs? Also can you elaborate more about "decision that you make based on the homogeneity or heterogeneity of your SQL queries"? As I wrote above, I don't see valid use cases to disable prepared statements when working with LOBs especially.

My tracking down of the write_lobs callbacks is to the very beginning when there were no other options to add LOBs. So to my thinking, the need for it is gone, and it is a rather ugly workaround involving a whole separate query and circling through all object changed attributes. Few people contribute to the oracle oracle_enhanced and ruby-oci8 and keeping such stuff IMO is counter-productive... unless there are use cases you can't easily adapt (which may exist and maybe I just don't know about them).

And how about quoting that inserts empty_clob()? I think this is a bug, people complained about it in the past that they got bitten by it. Some outdated stuff just has to go if a project is to keep its sanity. That's only my take of course.

@andynu
Copy link
Collaborator

andynu commented Dec 30, 2025

Thanks for the discussion, @akostadinov.

To provide some context: I encountered this exact issue in production (see #2477). Rails 8's query_log_tags_enabled = true causes Rails to automatically disable prepared statements, which means users can end up on the unprepared path without explicitly choosing it. This isn't just a niche configuration—it's the default development behavior in Rails 8.

I understand the appeal of simplifying the codebase by removing the write_lobs path, but I think we're solving different problems:

  • The double-write issue is real and worth fixing. I agree completely that when prepared_statements: true, the write_lobs callback is redundant and should be skipped.
  • The unprepared path is also real and serves actual users. Removing support for large LOBs on that path is a breaking change for existing deployments.

My approach in andynu@151e629 addresses both: skip the callback when prepared statements handle the LOB directly, preserve it when they don't.

On terminology: I'd prefer we avoid characterizing code as "ugly"—it's subjective and doesn't help us evaluate whether the code is solving a real problem. The question is whether this path serves legitimate use cases (it does, as #2477 demonstrates) and whether the maintenance cost is justified.

My position is that we fix the double-write issue without breaking the unprepared statements path. I've demonstrated one approach in andynu@151e629. I'm happy to discuss alternative implementations, but removing large LOB support for the unprepared statements path would be a breaking change and a regression—that's not a direction I'm willing to take.

@akostadinov
Copy link
Contributor Author

I see. It is a valid issue that query_log_tags_enabled feature should be made working. But it also comes to show that previously non-prepared statements did not actually work, and the new default query_log_tags_enabled setting only made that noticeable. Am I correct in this conclusion?

So if write_lobs is removed, this will not be removal of functionality. That functionality was already not there.

I would rather look for options to make query log tags working with prepared statements, before tying to fix write_lobs. I will look at that after the holidays.

write_lobs can be fixed, even with efficient piecewise retrieval of LOBs, we can always use an UPDATE prepared statement instead of SELECT ... FOR UPDATE. I just see it as a really important simplification if we get rid of it instead. It seems too easy to get things wrong with it, as apparent by the current state of the code. i.e. two unnoticed severe issues - double writing and non-prepared statements not actually working.

@andynu
Copy link
Collaborator

andynu commented Dec 30, 2025

The name changed, but the pattern pre-existed for updates. My PR only expanded it for the create case.

The original pattern was introduced 8 years ago for the update case. c5c68fe

I'm open to exploring alternative solutions to make the query_log_tags_enabled feature work, but removing a feature that has been in place for 8 years (in the update condition) would be a regression and could break users relying on the non-truncated behavior. I'm very hesitant unless we can find a way that does it without the truncation.

andynu added a commit to andynu/oracle-enhanced that referenced this pull request Dec 31, 2025
…ents

This spec clearly demonstrates the critical difference between LOB handling
with prepared statements (bind parameters) vs without (raw SQL with empty_clob()).

Key points demonstrated:
- With prepared_statements: true, LOB data flows through type_cast() which
  creates OCI8::CLOB temp LOBs. Data is populated BEFORE INSERT.
- With prepared_statements: false, SQL contains empty_clob() literals.
  The write_lobs callback is REQUIRED to populate LOB data after INSERT.

This test suite will FAIL if the lob.rb callbacks are removed, proving
they are necessary for backwards compatibility with prepared_statements: false.

Related to: rsim#2483
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants