Skip to content

Cross platform test and UTF-8 fixes#1553

Merged
jku merged 6 commits intomainfrom
cross-os-test
Oct 1, 2025
Merged

Cross platform test and UTF-8 fixes#1553
jku merged 6 commits intomainfrom
cross-os-test

Conversation

@jku
Copy link
Member

@jku jku commented Sep 25, 2025

This should fix #1552

  • added a test workflow based on idea taken from model-transparency
  • the test immediately fails because apparently we write cp1252 on Windows (and this is different with rekor 2 checkpoints that contain an mdash)
  • I've fixed the issues by being explicit about encoding: This changes serialization on Windows. Maybe we should actually be reading in bytes but I wanted the change to be minimal here.

Some details on the impact of this issue:

  • It is possible that some bundles produced on older releases on windows are no longer verifiable on windows when using a newer release (if the bundles did have encoding issues they used to be verifiable on windows but were never verifiable on other platforms). I am not aware of any such examples.
  • In practice it seems this issue triggers specifically with rekor v2 because of the new checkpoint format

@jku

This comment was marked as outdated.

@jku

This comment was marked as outdated.

@jku jku marked this pull request as ready for review September 25, 2025 12:49
@spencerschrock
Copy link

This should fix #1552

I noticed all the encoding changes are in _cli.py. Is this something we (model-transparency) would be hitting in our call path? Or is sigstore-python CLI used by rekor2, which we may be hitting by upgrading to sigstore-python v4?

@jku
Copy link
Member Author

jku commented Sep 25, 2025

I noticed all the encoding changes are in _cli.py. Is this something we (model-transparency) would be hitting in our call path? Or is sigstore-python CLI used by rekor2, which we may be hitting by upgrading to sigstore-python v4?

oh good point: _cli is not used by other modules so it's possible that some of your code has the same issues (maybe it was made following the _cli module here?)

@spencerschrock
Copy link

oh good point: _cli is not used by other modules so it's possible that some of your code has the same issues

Indeed, we were also reading and writing signatures without specifying an encoding. Forcing everything to utf-8 seemed to resolve our issues

Maybe we should actually be reading in bytes

model-transparency will be thinking about this too

@spencerschrock
Copy link

Maybe we should actually be reading in bytes

For the JSON bundle, I think UTF-8 is more conformant depending on how one interprets "closed ecosystem"?

https://datatracker.ietf.org/doc/html/rfc8259#section-8.1

JSON text exchanged between systems that are not part of a closed
ecosystem MUST be encoded using UTF-8 [RFC3629].

jku added 4 commits September 25, 2025 19:43
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Apparently Python really likes to use cp1252 on some platforms but our
bundles must not be platform specific.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Sep 25, 2025

To be clear, I didn't mean we should think about changing the representation on disk: utf-8 text is fine --I was just thinking that it might be clearer to read bytes and pass those to the json parser (double checking that it actually uses utf-8 in all cases).

@jku jku changed the title Cross platform test and fixes Cross platform test and UTF-8 fixes Sep 26, 2025
Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- it's super annoying that the default codepage on Windows is still causing problems like this.

@jku jku merged commit b6def06 into main Oct 1, 2025
38 checks passed
@jku jku deleted the cross-os-test branch October 1, 2025 07:20
@jku jku mentioned this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

windows encoding issues

3 participants