-
Notifications
You must be signed in to change notification settings - Fork 536
allow self influence iteration options #1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request was exported from Phabricator. Differential Revision: D37743920 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D37743920 |
99warriors
added a commit
to 99warriors/captum
that referenced
this pull request
Jul 31, 2022
Summary: Pull Request resolved: pytorch#1002 - For self influence computation, there needs to be an iteration over both checkpoints as well as batches. This diff adds a `by_checkpoints` option. If true, the outer iteration is over checkpoints. If false, the outer iteration is over checkpoints. Because self influence computation can be called through the `influence` and `self_influence` methods, this option is added to both methods. Because only `TracInCP` and `TracInCPFast` should be used for self influence computation, only those classes are changed. - The implement this option, the old `self_influence` method, which had the outer iteration over checkpoints, is renamed to be a private `_self_influence_by_checkpoints` method. A new `_self_influence_by_batches` method is added, which has an outer iteration over batches, and re-uses the `_self_influence_by_checkpoints` method to compute self influence scores for a single batch (this method can accept both a single batch, as well as a dataloader yielding batches). Because the logic of this method is the same for all classes, a helper method, `_self_influence_by_batches_helper`, is added to `captum.influence._utils.common`. Finally, the new `self_influence` method simply chooses whether to call `_self_influence_by_checkpoints` or `_self_influence_by_batches`. - Documentation describing the two options for `by_checkpoints` is added to the `self_influence` and `influence` methods. - `test_tracin_show_progress` now differentiates between 2 modes: "self influence by checkpoints" (the original test for progress bar when calculating self influence scores, which checks whether the outer progress bar over checkpoints and inner progress bars over batches both reach 100%), and the newly added mode "self influence by batches", which checks whether the progress bar over batches reaches 100%. - `test_tracin_self_influence` now also checks whether computing self influence scores gives the same result regardless of whether `by_checkpoints` is True or False Reviewed By: NarineK Differential Revision: D37743920 fbshipit-source-id: e7fe669d3fdbc2d2b3c4c16ed3eb56651b0bd8fa
ed79fd4
to
986e5be
Compare
Summary: Pull Request resolved: pytorch#994 change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before). The modified helper function can be called by external functions to compute self influence. The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded. The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call. This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode. The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large. Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader. Differential Revision: D35603078 fbshipit-source-id: 87063052e68441b82514489f4d9f9ad29b396da4
Summary: Pull Request resolved: pytorch#1002 - For self influence computation, there needs to be an iteration over both checkpoints as well as batches. This diff adds a `by_checkpoints` option. If true, the outer iteration is over checkpoints. If false, the outer iteration is over checkpoints. Because self influence computation can be called through the `influence` and `self_influence` methods, this option is added to both methods. Because only `TracInCP` and `TracInCPFast` should be used for self influence computation, only those classes are changed. - The implement this option, the old `self_influence` method, which had the outer iteration over checkpoints, is renamed to be a private `_self_influence_by_checkpoints` method. A new `_self_influence_by_batches` method is added, which has an outer iteration over batches, and re-uses the `_self_influence_by_checkpoints` method to compute self influence scores for a single batch (this method can accept both a single batch, as well as a dataloader yielding batches). Because the logic of this method is the same for all classes, a helper method, `_self_influence_by_batches_helper`, is added to `captum.influence._utils.common`. Finally, the new `self_influence` method simply chooses whether to call `_self_influence_by_checkpoints` or `_self_influence_by_batches`. - Documentation describing the two options for `by_checkpoints` is added to the `self_influence` and `influence` methods. - `test_tracin_show_progress` now differentiates between 2 modes: "self influence by checkpoints" (the original test for progress bar when calculating self influence scores, which checks whether the outer progress bar over checkpoints and inner progress bars over batches both reach 100%), and the newly added mode "self influence by batches", which checks whether the progress bar over batches reaches 100%. - `test_tracin_self_influence` now also checks whether computing self influence scores gives the same result regardless of whether `by_checkpoints` is True or False Reviewed By: NarineK Differential Revision: D37743920 fbshipit-source-id: a4e0c44299b31bf50fe2b5b4cb4d2e62c669208a
986e5be
to
d3b1487
Compare
This pull request was exported from Phabricator. Differential Revision: D37743920 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
by_checkpoints
option. If true, the outer iteration is over checkpoints. If false, the outer iteration is over checkpoints. Because self influence computation can be called through theinfluence
andself_influence
methods, this option is added to both methods. Because onlyTracInCP
andTracInCPFast
should be used for self influence computation, only those classes are changed.self_influence
method, which had the outer iteration over checkpoints, is renamed to be a private_self_influence_by_checkpoints
method. A new_self_influence_by_batches
method is added, which has an outer iteration over batches, and re-uses the_self_influence_by_checkpoints
method to compute self influence scores for a single batch (this method can accept both a single batch, as well as a dataloader yielding batches). Because the logic of this method is the same for all classes, a helper method,_self_influence_by_batches_helper
, is added tocaptum.influence._utils.common
. Finally, the newself_influence
method simply chooses whether to call_self_influence_by_checkpoints
or_self_influence_by_batches
.by_checkpoints
is added to theself_influence
andinfluence
methods.test_tracin_show_progress
now differentiates between 2 modes: "self influence by checkpoints" (the original test for progress bar when calculating self influence scores, which checks whether the outer progress bar over checkpoints and inner progress bars over batches both reach 100%), and the newly added mode "self influence by batches", which checks whether the progress bar over batches reaches 100%.test_tracin_self_influence
now also checks whether computing self influence scores gives the same result regardless of whetherby_checkpoints
is True or FalseReviewed By: NarineK
Differential Revision: D37743920