Skip to content

Optimize artifact loading during Pkg.build stage, remove Pkg application dependency from JLL libraries #538

@vtjnash

Description

@vtjnash

Some history: way back when, the Julia ecosystem used to handle artifacts a little bit like jll files currently handles artifacts (through the BinDeps ecosystem)—figuring them out on the fly when loading the package and such. This was a performance and debugging (configuration) mess. Finally, Pkg.build was created to make sense of it all. Before that, loading packages was awkward (because they could run into various issues with loading files not working and not having good debugging tooling available at load time; and it violates the concept of .ji files being immutable caches dependent only on their .ji files), and perhaps equally importantly it was slow. Not very slow—but just enough to be a problem when it started getting used everywhere. We've improved many things since then (such as adding .ji incremental precompile files, baking Pkg into the system image, and making a larger precompiled sysimg images) to sometimes bury some of that overhead. And Artifacts+BinaryBuilder are also much better now, since they are are mostly always handled during Pkg.build and are more declarative and also more carefully managed (and managable).

But it seems that BinaryBuilder is also regressing the ecosystem somewhat on these axes too (e.g JuliaLang/julia#33985 (comment) and JuliaGraphics/Gtk.jl#447 (comment)). However, when you're an important low-level package like this, you have to be attentive to these little details—things that Pkg itself gets to ignore because it's just the end application.

I'm looking into a bit how to fix this, but opening the issue in advance as a place to track progress. I don't know specifically what this should look like yet, but some quick thoughts on various possible options for a roadmap:

  • move Platform definition code to Sys
  • move Artifact parsing code to Base
  • (or perhaps make a small package that just provides those helpers)
  • make the .jll files just fully declarative (so that the helper Module / function knows how to fill them in based on the contents of an Artifact file)
  • make a pre-processed binary representation in a cache file that contains the pre-processed graph (this is necessary also for fixing other Distributed.jl and incremental precompile issues too related to normal Project/Manifest usage—e.g. Workers should inherit Pkg environment JuliaLang/julia#28781 and Code loading might be better just fully parsing TOML files JuliaLang/julia#27414)
  • move calls that are using the Pkg application into a subprocess where they won't pollute the .ji file
  • create a database (and define a location for it, perhaps a sqlite file) that supports efficient Artifact queries (and perhaps Manifest queries too)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions