-
Notifications
You must be signed in to change notification settings - Fork 7.2k
MaxVit model #6342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
MaxVit model #6342
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
f15fd92
Added maxvit architecture and tests
TeodorPoncu c5b2839
rebased + addresed comments
TeodorPoncu 5e8a222
Revert "rebased + addresed comments"
TeodorPoncu aa95139
Re-added model changes after revert
TeodorPoncu 1fddecc
aligned with partial original implementation
TeodorPoncu b7f0e97
removed submitit script fixed lint
TeodorPoncu 872f40f
mypy fix for too many arguments
TeodorPoncu f561edf
updated old tests
TeodorPoncu 314b82a
removed per batch lr scheduler and seed setting
TeodorPoncu a4863e9
removed ontap
TeodorPoncu c4406e4
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu 2111680
added docs, validated weights
TeodorPoncu cc51c2b
fixed test expect, moved shape assertions in the begging for torch.fx…
TeodorPoncu d2dfe71
mypy fix
TeodorPoncu 328f9b6
lint fix
TeodorPoncu b334b7f
added legacy interface
TeodorPoncu ebb8c16
added weight link
TeodorPoncu e281371
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu 20422bc
updated docs
TeodorPoncu 9ad86fe
Merge branch 'BATERIES]-add-max-vit' of https://github.com/pytorch/vi…
TeodorPoncu 775990c
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu a24e549
Update references/classification/train.py
TeodorPoncu bb42548
Update torchvision/models/maxvit.py
TeodorPoncu ed21d3d
adressed comments
TeodorPoncu 09e4ced
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu 521d6d5
update ra_maginuted and augmix_severity default values
TeodorPoncu 79cb004
Merge branch 'BATERIES]-add-max-vit' of https://github.com/pytorch/vi…
TeodorPoncu 97cbcd8
adressed some comments
TeodorPoncu 9fc6a5b
Merge branch 'BATERIES]-add-max-vit' of https://github.com/pytorch/vi…
TeodorPoncu 6b00ca8
remove input_channels parameter
TeodorPoncu 45d3966
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu 2aca920
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu cab35c1
Merge branch 'main' into BATERIES]-add-max-vit
TeodorPoncu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| import argparse | ||
TeodorPoncu marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| import os | ||
| import uuid | ||
| from pathlib import Path | ||
|
|
||
| import train | ||
| import submitit | ||
|
|
||
|
|
||
| def parse_args(): | ||
| train_parser = train.get_args_parser(add_help=False) | ||
| parser = argparse.ArgumentParser("Submitit for train", parents=[train_parser], add_help=True) | ||
| parser.add_argument("--ngpus", default=8, type=int, help="Number of gpus to request on each node") | ||
| parser.add_argument("--nodes", default=1, type=int, help="Number of nodes to request") | ||
| parser.add_argument("--timeout", default=60*24*30, type=int, help="Duration of the job") | ||
| parser.add_argument("--job_dir", default="", type=str, help="Job dir. Leave empty for automatic.") | ||
| parser.add_argument("--partition", default="train", type=str, help="the partition (default train).") | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def get_shared_folder() -> Path: | ||
| user = os.getenv("USER") | ||
| path = "/data/checkpoints" | ||
| if Path(path).is_dir(): | ||
| p = Path(f"{path}/{user}/experiments") | ||
| p.mkdir(exist_ok=True) | ||
| return p | ||
| raise RuntimeError("No shared folder available") | ||
|
|
||
|
|
||
| def get_init_file_folder() -> Path: | ||
| user = os.getenv("USER") | ||
| path = "/shared" | ||
| if Path(path).is_dir(): | ||
| p = Path(f"{path}/{user}") | ||
| p.mkdir(exist_ok=True) | ||
| return p | ||
| raise RuntimeError("No shared folder available") | ||
|
|
||
|
|
||
| def get_init_file(): | ||
| # Init file must not exist, but it's parent dir must exist. | ||
| os.makedirs(str(get_init_file_folder()), exist_ok=True) | ||
| init_file = get_init_file_folder() / f"{uuid.uuid4().hex}_init" | ||
| if init_file.exists(): | ||
| os.remove(str(init_file)) | ||
| return init_file | ||
|
|
||
|
|
||
| class Trainer(object): | ||
| def __init__(self, args): | ||
| self.args = args | ||
|
|
||
| def __call__(self): | ||
| import train | ||
|
|
||
| self._setup_gpu_args() | ||
| train.main(self.args) | ||
|
|
||
| def checkpoint(self): | ||
| import os | ||
| import submitit | ||
| from pathlib import Path | ||
|
|
||
| self.args.dist_url = get_init_file().as_uri() | ||
| checkpoint_file = os.path.join(self.args.output_dir, "checkpoint.pth") | ||
| if os.path.exists(checkpoint_file): | ||
| self.args.resume = checkpoint_file | ||
| print("Requeuing ", self.args) | ||
| empty_trainer = type(self)(self.args) | ||
| return submitit.helpers.DelayedSubmission(empty_trainer) | ||
|
|
||
| def _setup_gpu_args(self): | ||
| import submitit | ||
| from pathlib import Path | ||
|
|
||
| job_env = submitit.JobEnvironment() | ||
| self.args.output_dir = Path(str(self.args.output_dir).replace("%j", str(job_env.job_id))) | ||
| self.args.gpu = job_env.local_rank | ||
| self.args.rank = job_env.global_rank | ||
| self.args.world_size = job_env.num_tasks | ||
| print(f"Process group: {job_env.num_tasks} tasks, rank: {job_env.global_rank}") | ||
|
|
||
|
|
||
| def main(): | ||
| args = parse_args() | ||
| if args.job_dir == "": | ||
| args.job_dir = get_shared_folder() / "%j" | ||
|
|
||
| # Note that the folder will depend on the job_id, to easily track experiments | ||
| executor = submitit.AutoExecutor(folder=args.job_dir, slurm_max_num_timeout=300) | ||
|
|
||
| # cluster setup is defined by environment variables | ||
| num_gpus_per_node = args.ngpus | ||
| nodes = args.nodes | ||
| timeout_min = args.timeout | ||
|
|
||
| executor.update_parameters( | ||
| #mem_gb=96 * num_gpus_per_node, # 768GB per machine | ||
| gpus_per_node=num_gpus_per_node, | ||
| tasks_per_node=num_gpus_per_node, # one task per GPU | ||
| cpus_per_task=12, # 96 cpus per machine | ||
| nodes=nodes, | ||
| timeout_min=timeout_min, # max is 60 * 72 | ||
| slurm_partition=args.partition, | ||
| slurm_signal_delay_s=120, | ||
| ) | ||
|
|
||
|
|
||
| executor.update_parameters(name="torchvision") | ||
|
|
||
| args.dist_url = get_init_file().as_uri() | ||
| args.output_dir = args.job_dir | ||
|
|
||
| trainer = Trainer(args) | ||
| job = executor.submit(trainer) | ||
|
|
||
| print("Submitted job_id:", job.job_id) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.