Skip to content

Use pickle's loads/dumps as a faster deepcopy#5281

Merged
beckermr merged 22 commits intoconda:mainfrom
kenodegard:pickle-vs-deepcopy
Jun 25, 2024
Merged

Use pickle's loads/dumps as a faster deepcopy#5281
beckermr merged 22 commits intoconda:mainfrom
kenodegard:pickle-vs-deepcopy

Conversation

@kenodegard
Copy link
Copy Markdown
Contributor

@kenodegard kenodegard commented Apr 12, 2024

Description

Attempting to use pickle instead of deepcopy for a faster copy.

Split from:

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

Signed-off-by: Marcel Bargull <marcel.bargull@udo.edu>
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 12, 2024
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 12, 2024

CodSpeed Performance Report

Merging #5281 will improve performances by 66.17%

Comparing kenodegard:pickle-vs-deepcopy (e5318de) with main (fbbc298)

Summary

⚡ 1 improvements
✅ 2 untouched benchmarks

Benchmarks breakdown

Benchmark main kenodegard:pickle-vs-deepcopy Change
test_pin_subpackage_benchmark 12.2 s 7.4 s +66.17%

@kenodegard kenodegard marked this pull request as ready for review April 12, 2024 20:14
@kenodegard kenodegard requested a review from a team as a code owner April 12, 2024 20:14
Copy link
Copy Markdown
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment here is as last time. If we encounter a data structure that does not pickle, we will get an error. It might be worth making a separate helper function with a try except block falling back to deepcopy.

@beckermr
Copy link
Copy Markdown
Contributor

pre-commit.ci autofix

@beckermr
Copy link
Copy Markdown
Contributor

beckermr commented Jun 21, 2024

@kenodegard @dholth Are we worried about errors where something cannot be pickled or shall we merge this one?

The test failures appear to be unrelated warnings appearing w/ python 3.12:

DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.

I don't follow it totally but it appears to happen in extractions of packages.

@minrk
Copy link
Copy Markdown
Contributor

minrk commented Jun 21, 2024

Are we worried about errors where something cannot be pickled

Since this is currently only applied to variant objects and I think variants are necessarily dicts of lists, strings and numbers, I don't think this is likely to come up.

beckermr
beckermr previously approved these changes Jun 22, 2024
Copy link
Copy Markdown
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a news item and then merge!

@isuruf isuruf added this to the 24.7.x milestone Jun 22, 2024
beckermr
beckermr previously approved these changes Jun 22, 2024
@jaimergp
Copy link
Copy Markdown
Contributor

DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.

I don't follow it totally but it appears to happen in extractions of packages.

Yep, this is the tarfile usage in conda-package-streaming. Added conda/conda-package-streaming#87 to track. For now we can add a filterwarnings setting to pyproject.toml, I'd say. I think @zklaus was looking into it.

@beckermr
Copy link
Copy Markdown
Contributor

filterwarnings setting to pyproject.toml

do you have an example of this or a link to the docs?

I am not following.

@beckermr beckermr enabled auto-merge (squash) June 25, 2024 11:22
@beckermr beckermr merged commit 3a09f13 into conda:main Jun 25, 2024
@kenodegard kenodegard deleted the pickle-vs-deepcopy branch June 25, 2024 14:05
@minrk
Copy link
Copy Markdown
Contributor

minrk commented Jun 28, 2024

Thanks for this! I've been testing it out, and with this and #5384 rerender time is reduced by ~90%, from over 30 minutes to about 4 on some recipes with a large variant matrix (petsc4py).

FWIW, I spent a little time looking into making config.variant itself an immutable dict via deepfreeze since that's already computed for the hash anyway and would eliminate the need to copy variants altogether. But there are a few places where the variant is modified in place that make it a little bit tricky. Still doable and perhaps right thing to do, but not so simple. I think it also comes at an increased cost of computing list_of_dicts_to_dict_of_lists because there would end up being a lot of converting tuples to lists and back.

I also noticed in conda-forge/conda-smithy#1967 that deepfreeze is quite a lot more expensive than using HashableDict (deepfreeze takes ~10x the time of hash(HashableDict(dict))

@beeankha beeankha mentioned this pull request Jul 16, 2024
55 tasks
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Jun 29, 2025
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed [bot] added once the contributor has signed the CLA locked [bot] locked due to inactivity

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants