-
Notifications
You must be signed in to change notification settings - Fork 148
Support using SwanLab for experiment tracking #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! The integration of swanlab
configuration is fine, but some logging utilities should be merged with existing ones using wandb.
docs/tutorial/quickstart.md
Outdated
@@ -99,10 +99,14 @@ python3 training/main_sync_ppo.py --help | |||
|
|||
We recommend using Weights & Biases (wandb) for monitoring. Run `wandb login` or set the `WANDB_API_KEY` environment variable. Set `wandb.mode=True` in your configuration to upload training statistics. | |||
|
|||
Alternatively, you can use SwanLab for monitoring. Run swanlab login or set the `SWANLAB_API_KEY` environment variable. Set `swanlab.mode=True` in your configuration to upload training statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a hyper link to swanlab to let more people know about it?
Also, use "`" to quote `swanlab login`.
swanlab.mode
should be set to online
if it has the same API as wandb. The previous typo was fixed in #100
Can you try to merge the two lines about wandb
and swanlab
somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback!
I've updated the documentation based on your suggestions:
- Added official links to Weights & Biases and SwanLab for better user reference.
- Used backticks to quote commands and parameters (e.g.,
wandb login
,swanlab login
). - Updated
swanlab.mode
usage to align with WandB's API convention, now using"local"
and"cloud"
instead ofTrue
. - Merged WandB and SwanLab descriptions into a single, concise statement for better readability.
- Added a note about using
swanlab.mode="local"
if the server is unreachable.
pyproject.toml
Outdated
@@ -61,6 +61,8 @@ dependencies = [ | |||
"colorlog", | |||
"psutil", | |||
"pynvml", | |||
"swanlab==0.6.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
">=" or "=="?
realhf/base/logging.py
Outdated
@@ -158,6 +158,20 @@ def log_wandb_tensorboard(data, step=None, summary_writer=None): | |||
for key, val in data.items(): | |||
summary_writer.add_scalar(f"{key}", val, step) | |||
|
|||
def log_swanlab_tensorboard(data, step=None, summary_writer=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be merged with log_wandb_tensorboard
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback!
I've refactored the code and merged log_swanlab_tensorboard with log_wandb_tensorboard into a single function called log_swanlab_wandb_tensorboard.
realhf/system/model_function_call.py
Outdated
@@ -447,6 +448,11 @@ async def run_step(self, buf_indices, sample, buffer_id: int): | |||
step=ctrl.step_info.global_step, | |||
summary_writer=self.summary_writer, | |||
) | |||
logging.log_swanlab_tensorboard( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge this with log_wandb_tensorboard
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored the code and merged log_swanlab_tensorboard with log_wandb_tensorboard into a single function called log_swanlab_wandb_tensorboard.
requirements.txt
Outdated
prettytable | ||
swanlab==0.6.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double-check the version requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified the dependency to use the latest version automatically.
- Added official links for better user reference - Used backticks to quote commands and parameters - Unified mode settings to use "online" / "cloud" convention - Merged WandB and SwanLab descriptions into a single concise statement - Added note on using `swanlab.mode="local"` when server connection is unavailable
…o log_swanlab_wandb_tensorboard - Unified logging logic for SwanLab, WandB, and TensorBoard to reduce code duplication
- Updated SwanLab version in pyproject.toml - Updated SwanLab version in requirements.txt
8474630
to
f86d103
Compare
- Config now uses provided arguments first - Falls back to reading from config.yaml if no input is given
Thanks for the feedback! I've updated the code based on your suggestion. Kindly review it again at your convenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for your contribution! We are almost there.
As a kind reminder, please format the files such that the CI will pass:
pip install -e .
# clear any external packages installed locally
rm -rf ./sympy
rm -rf ./sglang
# Run formatting
isort . && black .
realhf/base/logging.py
Outdated
_LATEST_WANDB_STEP = 0 | ||
_LATEST_SWANLAB_STEP = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two step variables are the same. Remaining a single _LATEST_LOG_STEP
will be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
I've merged _LATEST_WANDB_STEP
and _LATEST_SWANLAB_STEP
into _LATEST_LOG_STEP
.
docs/tutorial/quickstart.md
Outdated
@@ -97,12 +97,15 @@ python3 training/main_sync_ppo.py --help | |||
|
|||
## Monitoring the Training Process | |||
|
|||
We recommend using Weights & Biases (wandb) for monitoring. Run `wandb login` or set the `WANDB_API_KEY` environment variable. Set `wandb.mode=online` in your configuration to upload training statistics. | |||
+ We recommend using [Weights & Biases (wandb)](https://github.com/wandb/wandb) or [SwanLab](https://github.com/SwanHubX/SwanLab) for monitoring—run `wandb login` or `swanlab login`, or set the corresponding environment variable API key (`WANDB_API_KEY` or `SWANLAB_API_KEY`). Set `wandb.mode="online"` or `swanlab.mode="cloud"` in your configuration to upload training statistics. If you cannot connect to the server, you can also use `swanlab.mode="local"` to save data locally without uploading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention wandb.mode=offline
together with swanlab.mode=local
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
I've added note on using wandb.mode="offline"
together with swanlab.mode="local"
.
- Updated SwanLab version in requirements.txt
Thank you for the reminder! I've formatted the code using Let me know if there's anything else I can improve! |
@GurrenLagann97 Can you provide another review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems perfect
🎉😄Thanks for your contribution to the SwanLab and AReaL community. @xichengpro |
* Support using SwanLab for experiment tracking * docs: improve WandB and SwanLab integration documentation - Added official links for better user reference - Used backticks to quote commands and parameters - Unified mode settings to use "online" / "cloud" convention - Merged WandB and SwanLab descriptions into a single concise statement - Added note on using `swanlab.mode="local"` when server connection is unavailable * refactor: update default value of api_key * fix: correct help description from WandB to SwanLab in SwanLabConfig * refactor: merge log_swanlab_tensorboard and log_wandb_tensorboard into log_swanlab_wandb_tensorboard - Unified logging logic for SwanLab, WandB, and TensorBoard to reduce code duplication * chore: update swanlab version in dependency config files - Updated SwanLab version in pyproject.toml - Updated SwanLab version in requirements.txt * refactor: enhance SwanLab config handling for logging purposes - Config now uses provided arguments first - Falls back to reading from config.yaml if no input is given * docs: add note on using when server connection is unavailable * refactor: merge _LATEST_WANDB_STEP and _LATEST_SWANLAB_STEP into _LATEST_LOG_STEP * Format code with black and isort * chore: update swanlab version in dependency config files - Updated SwanLab version in requirements.txt * refactor: rename swanlab_wandb_data to log_data --------- Co-authored-by: dubingnan <[email protected]>
A significant number of users are unable to access wandb due to network restrictions and are more accustomed to using the localized tool SwanLab. To improve the project's usability and local compatibility, this PR adds support for integrating with SwanLab.