|
| 1 | +# Cserve Lightweight Benchmarker |
| 2 | +This benchmark operates entirely external to any serving framework, and can easily be extended and modified. Provides a variety of statistics and profiling modes. It is intended to be a standalone tool for precise statistically significant benchmarking with a particular input/output distribution. Each request consists of a single prompt and single decode. |
| 3 | + |
| 4 | +### Installation |
| 5 | +1) Install rust: `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` and restart the shell. See: https://www.rust-lang.org/tools/install |
| 6 | +2) Run `cargo build` in `benchmarks/lightweight_benchmark` |
| 7 | + |
| 8 | +### Usage |
| 9 | +1) Ensure the framework you wish to use is started (CServe or vLLM) so that the generate API is exposed. |
| 10 | +2) Find the binary under target/lightweight_benchmarker. Then launch it with all the arguments specified below. |
| 11 | + |
| 12 | +### Arguments and Feature List |
| 13 | +* `--help`: Find all of the options below. |
| 14 | +* `--verbose`: Enables verbose printing of various INFOs. |
| 15 | +* `--output-file`: Specify a filename for the output json. Otherwise, prints to the standard out. |
| 16 | +* `--benchmark-note`: Spcify a note (string) to include in the output json. Default: empty. |
| 17 | +* `--config-file`: This is an alternative way of passing in all the arguments below. They are specified in camelCase (e.g. `--request-rate` becomes `requestRate`). If both a config file and a cli argument is specified for the same parameter, the cli argument takes precedence. This makes it possible to use a config file to specify your own defaults. |
| 18 | +* `--text-file`: Specifies a text file to use as prompt input. Useful for speculative decoding type tasks. |
| 19 | +* `--tokenizer-name`: Name of the model you intend to use. This helps tokenize an exact number of tokens for the prompt. |
| 20 | +* `--hostname`: Specify the hostname where the endpoint is located. Default: localhost |
| 21 | +* `--port`: Specify the port on hostname where the endpoint is located. Default: 8080 |
| 22 | +* `--framework`: Specify the framework. Can be one of: |
| 23 | + * vllm |
| 24 | + * cserve |
| 25 | +* `--request-distribution`: Specify the distribution for input requests to arrive. Can be one of: |
| 26 | + * poisson (with $request_rate$) |
| 27 | + * even (Non-random wherecrequests arrive every $1/request_rate$) |
| 28 | + * same (Start at 0). Default. |
| 29 | +* `--num-samples`: Number of times to run the experiment. Default: 1. |
| 30 | +* `--num-requests`: Number of requests to launch per trial. Default: 1. |
| 31 | +* `--use-beam-search`: Whether to use beam search or not. Default: False. |
| 32 | +* `--best-of`: Beam width, when beam search is on. Otherwise, number of output sequences produced from prompt. Default: 1. |
| 33 | +* `--request-rate`: Request rate used in the request distribution. Default: 1.0 |
| 34 | +* `--prompt-low`: The (inclusive) start of the range of the uniform distribution from which prompt lengths are sampled from. |
| 35 | +* `--prompt-high`: The (exclusive) end of the range of the uniform distribution from which prompt lengths are sampled from. |
| 36 | +* `--decode-low`: The (inclusive) start of the range of the uniform distribution from which decode lengths are sampled from. |
| 37 | +* `--decode-high`: The (exclusive) end of the range of the uniform distribution from which decode lengths are sampled from. |
0 commit comments