Skip to content

Conversation

yunfeng-scale
Copy link
Contributor

@yunfeng-scale yunfeng-scale commented Mar 1, 2024

Pull Request Summary

  • Partial open source of tool_completion
  • Adapt batch inference to run tool completion loop (generations -> tool run)
  • Add tool config to batch inference API

for notekeeping, some things to fix:

  1. vLLM stop sequence does not cut token logprobs so there's extra stop sequence token logprobs
  2. tool should not use a hardcoded tokenizer, should use whatever provided together with model weights
  3. token log prob not captured for tool output

Test Plan and Usage Guide

some local test
tested running jobs in training cluster
added unit tests
vllm batch image deployed

@yunfeng-scale yunfeng-scale requested review from auag92, Georgepu1 and a team March 1, 2024 05:06
Copy link

@Georgepu1 Georgepu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm and it's nice that tool_completion currently mirrors the latest version, just a couple of nits. thanks for adding this feature in for batch complete calls, we'll add it into our inference pipeline



TOOL_MAP = {
Tools.CODE_EVALUATOR: CodeBlockEvaluator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move each tool to own file for future if there are more tools

self.evaluate = self.evaluate_code_in_docker
except docker.errors.DockerException:
# If docker is not available, use the python interpreter
self.evaluate = self.evaluate_code_using_exec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe for the batch job? seems that potentially harmful generations could break this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(synced offline) known risk - for now since main use is internal, risk should be low and this is okay. will have follow-ups to mitigate issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think for batch jobs the risk is more manageable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants