-
-
Notifications
You must be signed in to change notification settings - Fork 358
Write sparse index #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write sparse index #563
Conversation
I think I am a little stuck with this at the moment because I am not entirely sure how much functionality should belong to the write implementation. For example: |
Just pushing what I have so far for you to maybe get a better idea, but keep in mind this is still veeery much WIP, so don't expect too much…yet :D |
this fixes CI
Previously it would have been possible to write the sparse extension without any indication that it is actually needed. For now we rely on a flag which is technically a cache, after all Mode::DIR indicates whether an index entry is sparse or not, and with it the index itself. At least so I think.
…if `State` requires it. Speaking of, I do wonder how the `link` extension plays into all of this. Can there be sparse indices without `link`? I think so, after all it's just an optimization to avoid having to read from trees all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to setup a test that reads in a sparse index into State, which then calls write_to() with options to write out a regular / non sparse index. That means that the entries in the sparse index marked as Mode::DIR would have to be transformed / expanded into a list of Mode::File entries. Now, is that transformation part of the work that State::write_to should do, or does that belong to a different set of functions that modify the State before writing? Or in other words: does write_to just take the State and "stupidly" writes it to a file or is it also responsible for doing these kinds of transformations?
I took a quick look and am happy to share my thoughts on this.
My first question was: is State::write_to()
allowed to mutate? It probably shouldn't and currently isn't, which indicates that on-the-fly changes to State
as it's written aren't advisable. If this is taken as indication, there would be no other way than to have the caller perform an additional step to write a non-sparse index.
From what I could tell, my expectation would be that write_to
serializes exactly what's currently in memory, and it should fail if for whatever reason the task can't be performed without mutation.
I have also made a few minor modifications which help to make that approach clearer.
One thing to be careful about is the is_sparse
flag. It's a cache, and I think fn is_sparse()
could make clear what a sparse index is. Is it the presence of the sdir
extension, or that itself always present if there are DIR
mode entries? I'd expect it to be the latter, so the flag is really just a shortcut telling us these entries are present.
However, with the State
being mutable, this might not be the case anymore and I suggest to not rely on this flag at all, and rather check all entries to see of the is-sparse extension is necessary. Furthermore, lower-case extensions are mandatory, hence they can't be affected by options.
On another note, is the resolution of entries, i.e. DIR -> FILE
, not dependent on the presence of the link
extension? I thought that this is a performance improvement to avoid having to recurse trees (which as we now know, is slower in git than it has to be). If that was the case, it's certainly something to make sure we don't forget and somehow incorporate into the resolution API. It's certainly OK to do it in steps though and focus on the non-link
version first.
Thanks a lot! I think keeping this PR up-to-date generally helps as I am likely to make some changes while I am looking at it. |
Great, thank you for your response, that definitely clears things up!
I guess I was a little bit confused about this while looking into how |
Oh, right! That's good to keep in mind and maybe adjust as we know more. After all, |
By not using the cached I think this is ready to be reviewed, we can clear up the remaining questions and find out if you deem it worthy to be merged :D Some remaining open questions / tasks:
|
Thanks a lot for the summary! Despite open questions, I think it's OK to merge and potentially address them in a follow-up PR depending on your preference.
Can we have a checkbox to revisit this in the tracking ticket? Eventually we better understand whether enforcing a sort order like
Yes, we need to. There is an
After going back and forth with this, my opinion really depends on what you find best. I will hopefully get to the review soon. (Please note that changing a comment also doesn't trigger a new notification, so I saw this one by chance while following up on the first email I got that didn't have the review note yet) |
Conflicts: git-index/tests/index/file/read.rs
It came up in starship: starship/starship#3245
…filters All obtained from starship: starship/starship#3841
It's something we want to read, but not write. Minimal support, that's all we need. Sparse indices seem to be the future.
Thanks for the session today! Let me sum up what we came up with.
Once Did I miss anything major is is there anything else I could help with? Please let me know. |
I actually missed something above: it's the need for supporting |
We try really hard to describe these in detail and also condense them into an enum that represents all valid states. At this point it's not clear where these are going to be used as these flags as related features aren't yet implemneted, like checkout and status.
…ies anymore. Let's prefer to be round-trippable to avoid degenerating any information. At a later stage, I imagine the methods that operate on an index, for instance by adding DIR entries or removing them, to detect this case and remove the extension automatically.
More thoughts related to the comment above. I think this can be simpler and completely contained within the |
Note that it's likely that we need something like `desired_version` as option before V4 indices are supported even when we start writing indices for commits, for example. By then we also want to support all extensions, so a lot of work needed for proper support.
I just read in code that |
Thanks a lot for your help! I used this review to fully understand sparse index (at least more 'fully' then before :D) and now have a much better understand how it all comes together with upcoming features. Very useful. I see the following next steps:
Does this make sense at all, did I miss something? Please let me know. |
I just noticed that there is already a 'hook' in the current |
Thanks for the tip, I will take a closer look at that! |
When I fetched |
Tracked in #562
Tasks
raw byte comparison is failing because of an order mismatch in the tree extension and that should be investigated / fixed
for now though this can be can be worked around by
comparing only the relevant portions of bytes instead of allall bytes are relevantcreating the baseline index without tree extensionnot sure that is a possibility with gitgix progress
to reflect those findingssparse index can be written by providing the corresponding extension flag in thewrite::Options
index decides automatically if it needs to be sparse or not based on what kind of options are being passed inState
or notis_sparse
flag tracks / caches this for usdo we trust other functions to modify this flag correctly or do we need a way of verifying this, a.k.a. checking the DIR entries our selves
Notes
Just for myself to remember and to hammer the point home:
write_to
just looks at the currentState
and writes whatever it finds toout
. There are some optional extensions that can be configured directly, but everything else (like git-config stuff) it does not care about. This is handled entirely by different functions that mutate theState
to the desired state before writing it out.