Skip to content

Commit 122da79

Browse files
author
Muralidhar Andoorveedu
committed
Initial commit
0 parents  commit 122da79

File tree

4 files changed

+919
-0
lines changed

4 files changed

+919
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
target/*
2+
Cargo.lock

Cargo.toml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[package]
2+
name = "lightweight_benchmark"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
7+
8+
[dependencies]
9+
reqwest = { version = "0.11", features = ["json"] }
10+
tokio = { version = "1", features = ["full"] }
11+
clap = { version = "4.0", features = ["derive"] }
12+
tokenizers = { version = "0.15.2", features = ["http"] }
13+
colored = "2.1.0"
14+
serde = "1.0.197"
15+
serde_json = "1.0.114"
16+
rand = "0.8.4"
17+
futures = "0.3"
18+
rand_distr = "0.4.3"

README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Cserve Lightweight Benchmarker
2+
This benchmark operates entirely external to any serving framework, and can easily be extended and modified. Provides a variety of statistics and profiling modes. It is intended to be a standalone tool for precise statistically significant benchmarking with a particular input/output distribution. Each request consists of a single prompt and single decode.
3+
4+
### Installation
5+
1) Install rust: `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` and restart the shell. See: https://www.rust-lang.org/tools/install
6+
2) Run `cargo build` in `benchmarks/lightweight_benchmark`
7+
8+
### Usage
9+
1) Ensure the framework you wish to use is started (CServe or vLLM) so that the generate API is exposed.
10+
2) Find the binary under target/lightweight_benchmarker. Then launch it with all the arguments specified below.
11+
12+
### Arguments and Feature List
13+
* `--help`: Find all of the options below.
14+
* `--verbose`: Enables verbose printing of various INFOs.
15+
* `--output-file`: Specify a filename for the output json. Otherwise, prints to the standard out.
16+
* `--benchmark-note`: Spcify a note (string) to include in the output json. Default: empty.
17+
* `--config-file`: This is an alternative way of passing in all the arguments below. They are specified in camelCase (e.g. `--request-rate` becomes `requestRate`). If both a config file and a cli argument is specified for the same parameter, the cli argument takes precedence. This makes it possible to use a config file to specify your own defaults.
18+
* `--text-file`: Specifies a text file to use as prompt input. Useful for speculative decoding type tasks.
19+
* `--tokenizer-name`: Name of the model you intend to use. This helps tokenize an exact number of tokens for the prompt.
20+
* `--hostname`: Specify the hostname where the endpoint is located. Default: localhost
21+
* `--port`: Specify the port on hostname where the endpoint is located. Default: 8080
22+
* `--framework`: Specify the framework. Can be one of:
23+
* vllm
24+
* cserve
25+
* `--request-distribution`: Specify the distribution for input requests to arrive. Can be one of:
26+
* poisson (with $request_rate$)
27+
* even (Non-random wherecrequests arrive every $1/request_rate$)
28+
* same (Start at 0). Default.
29+
* `--num-samples`: Number of times to run the experiment. Default: 1.
30+
* `--num-requests`: Number of requests to launch per trial. Default: 1.
31+
* `--use-beam-search`: Whether to use beam search or not. Default: False.
32+
* `--best-of`: Beam width, when beam search is on. Otherwise, number of output sequences produced from prompt. Default: 1.
33+
* `--request-rate`: Request rate used in the request distribution. Default: 1.0
34+
* `--prompt-low`: The (inclusive) start of the range of the uniform distribution from which prompt lengths are sampled from.
35+
* `--prompt-high`: The (exclusive) end of the range of the uniform distribution from which prompt lengths are sampled from.
36+
* `--decode-low`: The (inclusive) start of the range of the uniform distribution from which decode lengths are sampled from.
37+
* `--decode-high`: The (exclusive) end of the range of the uniform distribution from which decode lengths are sampled from.

0 commit comments

Comments
 (0)