Skip to content

allow self influence iteration options #1002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

99warriors
Copy link
Contributor

Summary:

  • For self influence computation, there needs to be an iteration over both checkpoints as well as batches. This diff adds a by_checkpoints option. If true, the outer iteration is over checkpoints. If false, the outer iteration is over checkpoints. Because self influence computation can be called through the influence and self_influence methods, this option is added to both methods. Because only TracInCP and TracInCPFast should be used for self influence computation, only those classes are changed.
  • The implement this option, the old self_influence method, which had the outer iteration over checkpoints, is renamed to be a private _self_influence_by_checkpoints method. A new _self_influence_by_batches method is added, which has an outer iteration over batches, and re-uses the _self_influence_by_checkpoints method to compute self influence scores for a single batch (this method can accept both a single batch, as well as a dataloader yielding batches). Because the logic of this method is the same for all classes, a helper method, _self_influence_by_batches_helper, is added to captum.influence._utils.common. Finally, the new self_influence method simply chooses whether to call _self_influence_by_checkpoints or _self_influence_by_batches.
  • Documentation describing the two options for by_checkpoints is added to the self_influence and influence methods.
  • test_tracin_show_progress now differentiates between 2 modes: "self influence by checkpoints" (the original test for progress bar when calculating self influence scores, which checks whether the outer progress bar over checkpoints and inner progress bars over batches both reach 100%), and the newly added mode "self influence by batches", which checks whether the progress bar over batches reaches 100%.
  • test_tracin_self_influence now also checks whether computing self influence scores gives the same result regardless of whether by_checkpoints is True or False

Reviewed By: NarineK

Differential Revision: D37743920

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37743920

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37743920

99warriors added a commit to 99warriors/captum that referenced this pull request Jul 31, 2022
Summary:
Pull Request resolved: pytorch#1002

- For self influence computation, there needs to be an iteration over both checkpoints as well as batches. This diff adds a `by_checkpoints` option. If true, the outer iteration is over checkpoints. If false, the outer iteration is over checkpoints. Because self influence computation can be called through the `influence` and `self_influence` methods, this option is added to both methods. Because only `TracInCP` and `TracInCPFast` should be used for self influence computation, only those classes are changed.
- The implement this option, the old `self_influence` method, which had the outer iteration over checkpoints, is renamed to be a private `_self_influence_by_checkpoints` method. A new `_self_influence_by_batches` method is added, which has an outer iteration over batches, and re-uses the `_self_influence_by_checkpoints` method to compute self influence scores for a single batch (this method can accept both a single batch, as well as a dataloader yielding batches). Because the logic of this method is the same for all classes, a helper method, `_self_influence_by_batches_helper`, is added to `captum.influence._utils.common`. Finally, the new `self_influence` method simply chooses whether to call `_self_influence_by_checkpoints` or `_self_influence_by_batches`.
- Documentation describing the two options for `by_checkpoints` is added to the `self_influence` and `influence` methods.
- `test_tracin_show_progress` now differentiates between 2 modes: "self influence by checkpoints" (the original test for progress bar when calculating self influence scores, which checks whether the outer progress bar over checkpoints and inner progress bars over batches both reach 100%), and the newly added mode "self influence by batches", which checks whether the progress bar over batches reaches 100%.
- `test_tracin_self_influence` now also checks whether computing self influence scores gives the same result regardless of whether `by_checkpoints` is True or False

Reviewed By: NarineK

Differential Revision: D37743920

fbshipit-source-id: e7fe669d3fdbc2d2b3c4c16ed3eb56651b0bd8fa
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Differential Revision: D35603078

fbshipit-source-id: 87063052e68441b82514489f4d9f9ad29b396da4
Summary:
Pull Request resolved: pytorch#1002

- For self influence computation, there needs to be an iteration over both checkpoints as well as batches. This diff adds a `by_checkpoints` option. If true, the outer iteration is over checkpoints. If false, the outer iteration is over checkpoints. Because self influence computation can be called through the `influence` and `self_influence` methods, this option is added to both methods. Because only `TracInCP` and `TracInCPFast` should be used for self influence computation, only those classes are changed.
- The implement this option, the old `self_influence` method, which had the outer iteration over checkpoints, is renamed to be a private `_self_influence_by_checkpoints` method. A new `_self_influence_by_batches` method is added, which has an outer iteration over batches, and re-uses the `_self_influence_by_checkpoints` method to compute self influence scores for a single batch (this method can accept both a single batch, as well as a dataloader yielding batches). Because the logic of this method is the same for all classes, a helper method, `_self_influence_by_batches_helper`, is added to `captum.influence._utils.common`. Finally, the new `self_influence` method simply chooses whether to call `_self_influence_by_checkpoints` or `_self_influence_by_batches`.
- Documentation describing the two options for `by_checkpoints` is added to the `self_influence` and `influence` methods.
- `test_tracin_show_progress` now differentiates between 2 modes: "self influence by checkpoints" (the original test for progress bar when calculating self influence scores, which checks whether the outer progress bar over checkpoints and inner progress bars over batches both reach 100%), and the newly added mode "self influence by batches", which checks whether the progress bar over batches reaches 100%.
- `test_tracin_self_influence` now also checks whether computing self influence scores gives the same result regardless of whether `by_checkpoints` is True or False

Reviewed By: NarineK

Differential Revision: D37743920

fbshipit-source-id: a4e0c44299b31bf50fe2b5b4cb4d2e62c669208a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D37743920

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants