Skip to content

Use run name for logging with WandbLogger #12604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 21, 2022
Merged

Use run name for logging with WandbLogger #12604

merged 9 commits into from
Jun 21, 2022

Conversation

tshu-w
Copy link
Contributor

@tshu-w tshu-w commented Apr 4, 2022

What does this PR do?

Fixes #12157 and #12603: This fix the name property of WandbLogger for #12028.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@manangoel99
Copy link
Contributor

manangoel99 commented Apr 4, 2022

Hello @tshu-w ! Using the run name makes a lot of sense instead of the project name.

@tshu-w
Copy link
Contributor Author

tshu-w commented Apr 4, 2022

@manangoel99 Thank you for confirming and link the old PR. Let's ping PL members to make sure this PR is perfected and merged if there is no reply for some time.

@krshrimali
Copy link
Contributor

Hi, @tshu-w - Thank you for creating this PR. We appreciate the contribution 🚀 Looks like the tests are failing, mostly because there is project_name() used in the tests which should be renamed as well to name. Would you mind updating the relevant tests and ping once ready? Please see the log here: https://github.com/PyTorchLightning/pytorch-lightning/runs/5813691929?check_suite_focus=true.

Apologies for the delay in the old PR!

@carmocca carmocca added bug Something isn't working logger: wandb Weights & Biases labels Apr 4, 2022
@carmocca carmocca added this to the 1.7 milestone Apr 4, 2022
@manangoel99
Copy link
Contributor

This will bring one change though. The directory structure for checkpoints currently looks like <project_name>/<run_id> but after this it will be <run_name>/<run_id>

@manangoel99
Copy link
Contributor

@tshu-w though having logger.name return the run name makes more sense but the changed directory structure does not seem as intuitive. So maybe instead of chaning the behaviour of the name method we could have a new method run_name which returns the run name. What do you think?

@tshu-w
Copy link
Contributor Author

tshu-w commented Apr 5, 2022

@krshrimali Hi, thank you remind, I update the test file now.

This will bring one change though. The directory structure for checkpoints currently looks like <project_name>/<run_id> but after this it will be <run_name>/<run_id>

This is as expected, PL default TensorBoardLogger stores checkpoints to save_dir/name/version(for more discuss, see #12028), and I believe that dir, name and id in wandb correspond to save_dir, name and version respectively.

@tshu-w though having logger.name return the run name makes more sense but the changed directory structure does not seem as intuitive. So maybe instead of chaning the behaviour of the name method we could have a new method run_name which returns the run name. What do you think?

As mentioned above, the save_dir, name, and version of a PL logger have specific meanings, and I think the wandb name is the concept that just corresponds to the PL logger's name. Therefore, we need to change the behavior of name, rather than defining a new run_name (which could also be an alias for name, but that's beyond this PR).
cc @awaelchli

@manangoel99
Copy link
Contributor

QQ what does name in PL signify. What I mean by this is that you could be naming a project or an experiment in the project.
The name property in WandbLogger assumes that the user is naming the project and not the experiment (equivalent to a wandb run object).

@tshu-w
Copy link
Contributor Author

tshu-w commented Apr 5, 2022

QQ what does name in PL signify. What I mean by this is that you could be naming a project or an experiment in the project.
The name property in WandbLogger assumes that the user is naming the project and not the experiment (equivalent to a wandb run object).

I'm the new users of wandb, but from the document here https://docs.wandb.ai/ref/python/init, should project for project name and name for run/experiment name? And I see using run_object.name got the name in init_args while project_name() got the project. Is there something I'm getting wrong?

As for which name in PL logger should correspond, I believe it is run name.

@manangoel99
Copy link
Contributor

manangoel99 commented Apr 6, 2022

@tshu-w we can make this change but if save_dir=None, this causes a lot of clutter for the user so we can change it to have save_dir=self.experiment.project_name() if save_dir=None so that everything goes under one directory.
We would also like to keep a deprecation warning here to make sure it doesn't break any existing production workflows.
cc @awaelchli @carmocca

@tshu-w
Copy link
Contributor Author

tshu-w commented Apr 6, 2022

@manangoel99 agree with you, but that's beyond the scope of this PR.

@mergify mergify bot added the has conflicts label Apr 6, 2022
@manangoel99
Copy link
Contributor

manangoel99 commented Apr 6, 2022

@tshu-w I think in that case maybe we could close this PR and I can come up with another PR meanwhile to have deprecation warnings now since the behaviour of the arguments will be changing and this could break people's code so we need to raise deprecation warnings.
@awaelchli @rohitgr7 @carmocca apologies if this is an ignorant question but does pytorch lightning have guidelines for handling changes in argument behaviour. Something like keeping it the same in 1.7 with warnings and then finally changing it in 1.8.

@tshu-w
Copy link
Contributor Author

tshu-w commented Apr 7, 2022

I think in that case maybe we could close this PR and I can come up with another PR meanwhile to have deprecation warnings now since the behaviour of the arguments will be changing and this could break people's code so we need to raise deprecation warnings.

It seems to me that the inconsistency between self._experiment.project_name() and self._name can be solved first (as has been suggested by many people). If the user is bothered by multiple directories, this can also be solved by simply setting save_dir. Anyway, let's hear from PL members.

@awaelchli
Copy link
Contributor

May I suggest the following.

  1. Return the run name in the name property as done in this PR.
  2. Since the project name is mandatory to provide, and the name optional, set the run name to the project name by default (instead of letting wandb generate random names).

This way, we are backward compatible partially, and the directory doesn't get cluttered every time you run a new experiment.

@tshu-w
Copy link
Contributor Author

tshu-w commented Apr 12, 2022

@manangoel99 @awaelchli suggest make sense to me, what about you?

@tshu-w
Copy link
Contributor Author

tshu-w commented May 31, 2022

I just noticed my wandb logger folder setup is all messed up with PL 1.6.3, looking forward to pulling this fix when it's merged!

Hi, this issue exists in PL 1.5 either, If you meet error in PL 1.6.3. It might be other bug?

And if @awaelchli could mention other appropriate reviewers.

@rohanshad
Copy link

I just noticed my wandb logger folder setup is all messed up with PL 1.6.3, looking forward to pulling this fix when it's merged!

Hi, this issue exists in PL 1.5 either, If you meet error in PL 1.6.3. It might be other bug?

And if @awaelchli could at other appropriate reviewers.

On 1.5.10 the logger would store checkpoints like this:

project_name
        ├── version_id (alphanumeric 8 digit)
        ├── version_id (alphanumeric 8 digit)
                          ├── checkpoints

On 1.6.3 they're all like this now:

version_name **(not the same as version_id)**
        ├── version_None 
                        ├── checkpoints

Completely messes up the way I track experiments since on the online interface of wandb references version_id and not version_name. Is this a different issue altogether?

@tshu-w
Copy link
Contributor Author

tshu-w commented May 31, 2022

Completely messes up the way I track experiments since on the online interface of wandb references version_id and not version_name. Is this a different issue altogether?

I suspect yes, though I'm not why this happening on your side. This PR just want to change logger structure like this:

run_name or project_name
        ├── version_id (alphanumeric 8 digit)
        ├── version_id (alphanumeric 8 digit)
                          ├── checkpoints

@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels May 31, 2022
@awaelchli
Copy link
Contributor

Yes I pinged the team for one the third review. Sorry about the wait time, things are slow at the moment.

Copy link
Contributor

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 3, 2022
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Jun 15, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 21, 2022
@awaelchli awaelchli enabled auto-merge (squash) June 21, 2022 13:49
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Jun 21, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 21, 2022
@awaelchli awaelchli merged commit 749709f into Lightning-AI:master Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community This PR is from the community logger: wandb Weights & Biases ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The name property of wandb and comet return project_name, while other loggers like mlflow, neptune, test_tube and tensorboard return run_name.
7 participants