Skip to content

Update "Producing Consistent Snapshots" #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 37 additions & 41 deletions pep-0458.txt
Original file line number Diff line number Diff line change
Expand Up @@ -907,55 +907,51 @@ efficiently transfer consistent snapshots from PyPI.
Producing Consistent Snapshots
------------------------------

When a new project release is uploaded to PyPI, PyPI MUST update the *bin-n*
metadata responsible for the target files of the project release. Remember that
target files are sorted into bins by their filename hashes. Consequentially,
PyPI MUST update *snapshot* to account for the updated *bin-n* metadata, and
*timestamp* to account for the updated *snapshot* metadata. These updates
SHOULD be handled by automated processes, e.g. one or more *transaction
processes* and one *snapshot process*.

Each transaction process keeps track of a project upload, adds all new target
files to the most recent, relevant *bin-n* metadata and informs the
snapshot process to produce a consistent snapshot. Each project release SHOULD
be handled in an atomic transaction, so that a given consistent snapshot
contains all target files of a project release. However, transaction processes
MAY be parallelized under the following constraints:

- Pairs of transaction processes MUST NOT concurrently work on the same project.
- Pairs of transaction processes MUST NOT concurrently work on projects that
belong to the same *bin-n* role.

When a transaction process is finished updating the relevant *bin-n* metadata
it informs the snapshot process to generate a new consistent snapshot. The
snapshot process does so by taking the updated *bin-n* metadata, incrementing
their respective version numbers, signing them with the *bin-n* role key(s),
and writing them to *VERSION_NUMBER.bin-N.json*.

Similarly, the snapshot process then takes the most recent *snapshot* metadata,
updates its *bin-n* metadata version numbers, increments its own version
number, signs it with the *snapshot* role key, and writes it to
*VERSION_NUMBER.snapshot.json*.
When a new distribution file is uploaded to PyPI, PyPI MUST update the
responsible *bin-n* metadata. Remember that all target files are sorted into
bins by their filename hashes. PyPI MUST also update *snapshot* to account for
the updated *bin-n* metadata, and *timestamp* to account for the updated
*snapshot* metadata. These updates SHOULD be handled by an automated *snapshot
process*.

File uploads MAY be handled in parallel, however, consistent snapshots MUST be
produced in a strictly sequential manner. Furthermore, as long as distribution
files are self-contained, a consistent snapshot MAY be produced for each
uploaded file. To do so upload processes place new distribution files into a
concurrency-safe FIFO queue and the snapshot process reads from that queue one
file at a time and performs the following tasks:

First, it adds the new file path to the relevant *bin-n* metadata, increments
its version number, signs it with the *bin-n* role key, and writes it to
*VERSION_NUMBER.bin-N.json*.

Then, it takes the most recent *snapshot* metadata, updates its *bin-n*
metadata version numbers, increments its own version number, signs it with the
*snapshot* role key, and writes it to *VERSION_NUMBER.snapshot.json*.

And finally, the snapshot process takes the most recent *timestamp* metadata,
updates its *snapshot* metadata hash and version number, increments its own
version number, sets a new expiration time, signs it with the *timestamp* role
key, and writes it to *timestamp.json*.

The snapshot process MUST generate consistent snapshots sequentially, reading
the notifications received from the transaction process(es) from a
concurrency-safe FIFO queue. Fortunately, the operation of signing is fast
enough that this may be done a thousand or more times per second.
When updating *bin-n* metadata for a consistent snapshot, the snapshot process
SHOULD also include any new or updated hashes of simple index pages in the
relevant *bin-n* metadata. Note that, simple index pages may be generated
dynamically on API calls, so it is important that their output remains stable
throughout the validity of a consistent snapshot.

If there are multiple files in a release, a project MAY release these files in
separate transactions. For example, a project MAY release files for Windows in
one transaction, and the files for Linux in another transaction. However, a project
SHOULD release files that must belong together in order for everything to work
in the same transaction.
Since the snapshot process MUST generate consistent snapshots in a strictly
sequential manner it constitutes a bottleneck. Fortunately, the operation of
signing is fast enough that this may be done a thousand or more times per
second.

At any rate, PyPI SHOULD use a `transaction log`__ to record project
transaction processes and the snapshot queue for auditing and to recover from
errors after a server failure.
Moreover, PyPI MAY serve distribution files to clients before the corresponding
consistent snapshot metadata is generated. In that case the client software
SHOULD inform the user that full TUF protection is not yet available but will
be shortly.

PyPI SHOULD use a `transaction log`__ to record upload processes and the
snapshot queue for auditing and to recover from errors after a server failure.

__ https://en.wikipedia.org/wiki/Transaction_log

Expand Down