Skip to content

Conversation

mzihlmann
Copy link
Contributor

@mzihlmann mzihlmann commented Jun 8, 2025

Hi there,

I'm working on my kaniko fork. Kaniko has integration tests to check that the output is identical whether you build it with docker or kaniko and also whether you build it from scratch or from cache. For that end it currently still uses container-diff. The problem, container-diff is not strict enough. It only cares about file-contents. metadata is ignored, layer hash is ignored etc etc. Hence kaniko has issues with permissions, file-mode and cache misses that all go undetected in the integration tests. My plan now is to switch integration tests to diffoci, it fulfills all the requirements it seems. Thank you for this awesome tool!

The problem however is that currently timestamps are stored in PAX records, this makes it unusable in my case. Currently I hooked it up to --ignore-timestamps as in my case it only contains the timestamps. Please let me know if you would prefer a --ignore-pax-record or something similar instead.

Further I need to distinguish the timestamps on the files and the timestamps on the image metadata. If I rebuild from cache, the timestamps on the files must be identical, the timestamps on the image are new. Hence I introduced --ignore-file-timestamps and updated accordingly.

Again, thank you for this awesome tool!

@mzihlmann mzihlmann force-pushed the master branch 4 times, most recently from dc34f85 to 2e6e168 Compare June 8, 2025 10:10
@mzihlmann
Copy link
Contributor Author

hmmm it looks like switching will not be straightforward. As kaniko currently messes up both permissions and file mode on files it shouldnt touch, there is a very large diff in each layer. I think it will be easier to first adapt container-diff to be a little more strict, fix those issues in kaniko and then switch to diffoci.

@mzihlmann
Copy link
Contributor Author

meh, adding that capability to container-diff might not be as simple as I thought, they don't compare the tar contents as you do, but instead unpack the files into a local directory and compare them there, of course losing all file-metadata in the process. Let me try to maybe make yours less strict so I can bring it to work in kaniko.

@mzihlmann
Copy link
Contributor Author

juhuuu! finally got integration-tests using diffoci working (a loooot of new ignores necessary). already found the first bug
mzihlmann/kaniko#46 (comment)
Again, thank you for this tool!

pkg/diff/diff.go Outdated
@@ -865,8 +866,8 @@ func (d *differ) diffTarEntries(ctx context.Context, node *EventTreeNode, in [2]

func (d *differ) diffTarEntry(ctx context.Context, node *EventTreeNode, in [2]EventInput) (dirsToBeRemovedIfEmpty []string, retErr error) {
var negligibleTarFields []string
if d.o.IgnoreTimestamps {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to break the current behavior of IgnoreTimestamps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamps on the docker metadata and timestamps in the filesystem are two different beasts and in my specific use-case having a single switch is insufficient.

When I rebuild from cache I want to make sure that the filesystem is identical, but a new docker image will be emitted and therefore updating the timestamp on the image metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's a breaking interface change, alternative would be to expose two new flags:

ignore-file-timestamps
ignore-image-timestamps

and bundle them together like we do with semantic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundle them together

SGTM

@@ -43,6 +43,7 @@ func NewCommand() *cobra.Command {
"ignore-history",
"ignore-file-order",
"ignore-file-mode-redundant-bits",
"ignore-file-timestamps",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new flag? Can we just reuse ignore-timestamps?

@mzihlmann
Copy link
Contributor Author

mzihlmann commented Jun 19, 2025

I also have a few more changes that were necessary to make diffoci work in the context of kaniko integration tests, I think some could be valuable here too (there is at least one bugfix I haven't yet opened here). Now that I see that I'm not talking into the void Ill open them later today.
master...mzihlmann:diffoci:kaniko

@AkihiroSuda
Copy link
Member

Needs rebase

Signed-off-by: Martin Zihlmann <[email protected]>
Signed-off-by: Martin Zihlmann <[email protected]>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 1cb60ce into reproducible-containers:master Jun 26, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants