Skip to content

Only use sha 512 #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 22, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 44 additions & 49 deletions pep-0458.txt
Original file line number Diff line number Diff line change
Expand Up @@ -307,9 +307,9 @@ kinds of metadata RECOMMENDED for PyPI.

__ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/METADATA.md

In addition, all target files SHOULD be available on disk at least three times.
In addition, all target files SHOULD be available on disk at least two times.
Once under their original filename, to provide backwards compatibility, and
twice with their SHA-256 and SHA-512 hash respectively included in their
once with their SHA-512 hash included in their
filename. This is required to produce `Consistent Snapshots`_.

Depending on the used file system different data deduplication mechanisms MAY
Expand All @@ -321,7 +321,7 @@ PyPI and TUF Metadata

TUF metadata provides information that clients can use to make update
decisions. For example, a *targets* metadata lists the available target files
on PyPI and includes the required signatures, cryptographic hashes, and
on PyPI and includes the required signatures, cryptographic hash, and
file sizes for each. Different metadata files provide different information, which are
signed by separate roles. The *root* role indicates what metadata belongs to
each role. The concept of roles allows TUF to delegate responsibilities
Expand All @@ -345,20 +345,19 @@ roles used in TUF.
Figure 1: An overview of the TUF roles.

Unless otherwise specified, this PEP RECOMMENDS that every metadata or
target file be hashed using both the SHA2-256 and SHA2-512 functions of
target file be hashed using the SHA2-512 function of
the `SHA-2`__ family. SHA-2 has native and well-tested Python 2 and 3
support (allowing for verification of these hashes without additional,
non-Python dependencies), and using both functions should provide
sufficient protection against `collision attacks`__ for the foreseeable
future. However, this assumes that a collision attack for SHA2-256 does
not easily translate to SHA2-512. If stronger security guarantees are
required, then SHA2-256 and `SHA3-256`__ MAY be used instead, since they
are based on very different designs from each other. However, SHA-3
non-Python dependencies). If stronger security guarantees are
required, then both SHA2-256 and SHA2-512 or both SHA2-256 and `SHA3-256`__
MAY be used instead. SHA2-256 and SHA3-256
are based on very different designs from each other, providing extra protection
against `collision attacks`__. However, SHA-3
requires installing additional, non-Python dependencies for `Python 2`__.

__ https://en.wikipedia.org/wiki/SHA-2
__ https://en.wikipedia.org/wiki/Collision_attack
__ https://en.wikipedia.org/wiki/SHA-3
__ https://en.wikipedia.org/wiki/Collision_attack
__ https://pip.pypa.io/en/latest/development/release-process/#python-2-support


Expand Down Expand Up @@ -509,13 +508,13 @@ __ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/TUTORIAL.md#deleg
Based on our findings as of the time this document was updated for
implementation (Nov 7 2019), summarized in Tables 1-2, PyPI SHOULD
split all targets in the *bins* role by delegating them to 16,384
*bin-n* roles (see C11 in Table 1). Each *bin-n* role would sign
for the PyPI targets whose SHA2-256 hashes fall into that bin
*bin-n* roles (see C10 in Table 1). Each *bin-n* role would sign
for the PyPI targets whose SHA2-512 hashes fall into that bin
(see and Figure 2 and `Consistent Snapshots`_). It was found
that this number of bins would result in a 6-10% metadata overhead
(relative to the average size of downloaded distribution files; see V14 and
V16 in Table 2) for returning users, and a 70% overhead for new
users who are installing pip for the first time (see V18 in Table 2).
(relative to the average size of downloaded distribution files; see V13 and
V15 in Table 2) for returning users, and a 70% overhead for new
users who are installing pip for the first time (see V17 in Table 2).

A few assumptions used in calculating these metadata overhead percentages:

Expand All @@ -526,31 +525,29 @@ A few assumptions used in calculating these metadata overhead percentages:
+------+--------------------------------------------------+-----------+
| Name | Description | Value |
+------+--------------------------------------------------+-----------+
| C1 | # of bytes in a SHA2-256 hexadecimal digest | 64 |
| C1 | # of bytes in a SHA2-512 hexadecimal digest | 128 |
+------+--------------------------------------------------+-----------+
| C2 | # of bytes in a SHA2-512 hexadecimal digest | 128 |
| C2 | # of bytes for a SHA2-512 public key ID | 64 |
+------+--------------------------------------------------+-----------+
| C3 | # of bytes for a SHA2-256 public key ID | 64 |
| C3 | # of bytes for an Ed25519 signature | 128 |
+------+--------------------------------------------------+-----------+
| C4 | # of bytes for an Ed25519 signature | 128 |
| C4 | # of bytes for an Ed25519 public key | 64 |
+------+--------------------------------------------------+-----------+
| C5 | # of bytes for an Ed25519 public key | 64 |
| C5 | # of bytes for a target relative file path | 256 |
+------+--------------------------------------------------+-----------+
| C6 | # of bytes for a target relative file path | 256 |
| C6 | # of bytes to encode a target file size | 7 |
+------+--------------------------------------------------+-----------+
| C7 | # of bytes to encode a target file size | 7 |
| C7 | # of bytes to encode a version number | 6 |
+------+--------------------------------------------------+-----------+
| C8 | # of bytes to encode a version number | 6 |
| C8 | # of targets (simple indices and distributions) | 2,273,539 |
+------+--------------------------------------------------+-----------+
| C9 | # of targets (simple indices and distributions) | 2,273,539 |
| C9 | Average # of bytes for a downloaded distribution | 2,184,393 |
+------+--------------------------------------------------+-----------+
| C10 | Average # of bytes for a downloaded distribution | 2,184,393 |
+------+--------------------------------------------------+-----------+
| C11 | # of bins | 16,384 |
| C10 | # of bins | 16,384 |
+------+--------------------------------------------------+-----------+

C9 by computed querying the number of release files.
C10 was derived by taking the average between a rough estimate of the average
C8 by computed querying the number of release files.
C9 was derived by taking the average between a rough estimate of the average
size of release files *downloaded* over the past 31 days (1,628,321 bytes),
and the average size of releases files on disk (2,740,465 bytes).
Ernest W. Durbin III helped to provide these numbers on November 7, 2019.
Expand All @@ -560,41 +557,39 @@ Table 1: A list of constants used to calculate metadata overhead.
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| Name | Description | Formula | Value |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V1 | Length of a path hash prefix | math.ceil(math.log(C11, 16)) | 4 |
| V1 | Length of a path hash prefix | math.ceil(math.log(C10, 16)) | 4 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V2 | Total # of path hash prefixes | 16**V1 | 65,536 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V3 | Avg # of targets per bin | math.ceil(C9/C11) | 139 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V4 | Avg size of SHA-256 hashes per bin | V3*C1 | 8,896 |
| V3 | Avg # of targets per bin | math.ceil(C8/C10) | 139 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V5 | Avg size of SHA-512 hashes per bin | V3*C2 | 17,792 |
| V4 | Avg size of SHA-512 hashes per bin | V3*C1 | 17,792 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V6 | Avg size of target paths per bin | V3*C6 | 35,584 |
| V5 | Avg size of target paths per bin | V3*C5 | 35,584 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V7 | Avg size of lengths per bin | V3*C7 | 973 |
| V6 | Avg size of lengths per bin | V3*C6 | 973 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V8 | Avg size of bin-n metadata (bytes) | V4+V5+V6+V7 | 63,245 |
| V7 | Avg size of bin-n metadata (bytes) | V4+V5+V76 | 54,349 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V9 | Total size of public key IDs in bins | C11*C3 | 1,048,576 |
| V8 | Total size of public key IDs in bins | C10*C2 | 1,048,576 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V10 | Total size of path hash prefixes in bins | V1*V2 | 262,144 |
| V9 | Total size of path hash prefixes in bins | V1*V2 | 262,144 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V11 | Est. size of bins metadata (bytes) | V9+V10 | 1,310,720 |
| V10 | Est. size of bins metadata (bytes) | V8+V9 | 1,310,720 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V12 | Est. size of snapshot metadata (bytes) | C11*C8 | 98,304 |
| V11 | Est. size of snapshot metadata (bytes) | C10*C7 | 98,304 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V13 | Est. size of metadata overhead per distribution per returning user (same snapshot) | 2*V8 | 126,490 |
| V12 | Est. size of metadata overhead per distribution per returning user (same snapshot) | 2*V7 | 108,698 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V14 | Est. metadata overhead per distribution per returning user (same snapshot) | round((V13/C10)*100) | 6% |
| V13 | Est. metadata overhead per distribution per returning user (same snapshot) | round((V12/C9)*100) | 5% |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V15 | Est. size of metadata overhead per distribution per returning user (diff snapshot) | V13+V12 | 224,794 |
| V14 | Est. size of metadata overhead per distribution per returning user (diff snapshot) | V12+V11 | 207,002 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V16 | Est. metadata overhead per distribution per returning user (diff snapshot) | round((V15/C10)*100) | 10% |
| V15 | Est. metadata overhead per distribution per returning user (diff snapshot) | round((V14/C9)*100) | 9% |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V17 | Est. size of metadata overhead per distribution per new user | V15+V11 | 1,535,514 |
| V16 | Est. size of metadata overhead per distribution per new user | V14+V10 | 1,517,722 |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+
| V18 | Est. metadata overhead per distribution per new user | round((V17/C10)*100) | 70% |
| V17 | Est. metadata overhead per distribution per new user | round((V16/C9)*100) | 69% |
+------+------------------------------------------------------------------------------------+------------------------------+-----------+

Table 2: Estimated metadata overheads for new and returning users.
Expand Down Expand Up @@ -829,7 +824,7 @@ version of the *snapshot* metadata, which in turn lists the versions of the
snapshot.

The *targets* or delegated targets metadata refer to the actual target
files, including all of their cryptographic hashes as specified above.
files, including their cryptographic hashes as specified above.
Thus, to mark a target file as part of a consistent snapshot it MUST, when
written to disk, include its hash in its filename:

Expand Down