Skip to content

Conversation

17dec
Copy link

@17dec 17dec commented Dec 20, 2023

No description provided.

@17dec
Copy link
Author

17dec commented Dec 21, 2023

After a good night's sleep I realized a few things:

  • I initially forgot to consume @frame_size on IO.close, so closing after a partial read was bugged.
  • Mandating an IO.close when reading felt like fragile API design, so it now automatically consumes the final frames when the reader has been drained. IO.close is now optional.
  • The protocol allows us to allocate perfectly-sized buffers to read COPY data on a row-by-row basis, but the bare IO interface didn't expose that functionality. I've added a remaining_row_size method to support that use case. (I thought of implementing a more performant specialization of IO.gets on top, but that's only correct for TEXT formats)
  • With the COPY-out infrastructure in place, adding COPY-in support is actually pretty trivial as well. I don't really have a use case for that yet, but implemented anyway.

That should be all for now, hopefully I got everything this time. :)

@17dec 17dec changed the title Add support for COPY out data transfers Add support for COPY data transfers Dec 21, 2023
@jhf
Copy link

jhf commented Jan 23, 2024

I've tried this in a crystal project for fast import from a csv file to postgresql.
It worked as expected with copying, but I think there should be an example on how to use that, I can make a PR after this is merged.
My usage looks like this:

DB.connect("postgres://#{postgres_user}:#{postgres_password}@#{postgres_host}:#{postgres_port}/#{postgres_db}") do |db|
  io = db.exec_copy "COPY public.legal_unit_region_activity_category_stats_current(tax_reg_ident,name,employees,physical_region_code,primary_activity_category_code) FROM STDIN"
  csv = CSV.new(File.open(import_filename), headers: true, separator: ',', quote_char: '"')
  while csv.next
    sql_text = [csv["tax_reg_ident"],
                csv["name"],
                csv["employees"],
                csv["physical_region_code"],
                csv["primary_activity_category_code"],
    ].map do |v|
      case v
      when ""  then nil
      else          v
      end
    end.join("\t")
    puts "Uploading #{sql_text}" if verbose
    io.puts sql_text
  end
  puts "Waiting for processing" if verbose
  io.close
  db.close
end

I think there are possible improvements, such as having a IO#write_row that does the join and the puts, but I think those can come later.
For efficiency there could also be a binary version, but that would require using a PostgreSQL binary encoding.

@jhf
Copy link

jhf commented Mar 19, 2025

I'm still using this feature, is there any problem that prevents merging?

@will will merged commit c2c783e into will:master Mar 21, 2025
1 check passed
@will
Copy link
Owner

will commented Mar 21, 2025

Thanks for the reminder @jhf

And super thanks for contributing this @17dec and sorry this dropped off my radar for so long

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants