Skip to content

Commit 9214d46

Browse files
committed
misc updates to README.md
1 parent b1b7de9 commit 9214d46

File tree

1 file changed

+39
-36
lines changed

1 file changed

+39
-36
lines changed

README.md

Lines changed: 39 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ pip install .
1010
```
1111

1212
## Usage
13-
After installing with the above instructions, the benchmarker can be invoked with `inference-benchmark <args>`.
13+
After installing with the above instructions, the benchmarker can be invoked with `fib <args>`.
1414

1515
After you get your output (using `--output-file`), you can invoke one of the data postprocessors in `data_postprocessors`.
1616

@@ -59,28 +59,32 @@ The output json file in an array of objects that contain the following fields:<b
5959
* `stream`: Indicates if we used the stream argument or not
6060

6161
### Data Postprocessors
62-
Below is a description of the data postprocessors.
62+
Below is a description of the data postprocessors. Each can be invoked with `fib <postprocessor name here> [Options]`
6363

64-
#### `performance.py`
64+
#### `analyse`
6565
Prints the following output for a given run, same as vLLM.
6666

6767
```
6868
============ Serving Benchmark Result ============
69-
Successful requests: 20
70-
Benchmark duration (s): 19.39
71-
Total input tokens: 407
72-
Total generated tokens: 5112
73-
Request throughput (req/s): 1.03
74-
Input token throughput (tok/s): 20.99
75-
Output token throughput (tok/s): 263.66
69+
Successful requests: 20
70+
Benchmark duration (s): 4.12
71+
Total input tokens: 3978
72+
Total generated tokens: 4000
73+
Request throughput (req/s): 4.85
74+
Input token throughput (tok/s): 964.98
75+
Output token throughput (tok/s): 970.32
7676
---------------Time to First Token----------------
77-
Mean TTFT (ms): 24.66
78-
Median TTFT (ms): 24.64
79-
P99 TTFT (ms): 34.11
77+
Mean TTFT (ms): 6.79
78+
Median TTFT (ms): 4.81
79+
P99 TTFT (ms): 17.90
8080
-----Time per Output Token (excl. 1st token)------
81-
Mean TPOT (ms): 2295.86
82-
Median TPOT (ms): 2362.54
83-
P99 TPOT (ms): 2750.76
81+
Mean TPOT (ms): 1.57
82+
Median TPOT (ms): 1.59
83+
P99 TPOT (ms): 1.90
84+
---------------Inter-token Latency----------------
85+
Mean ITL (ms): 1.57
86+
Median ITL (ms): 1.47
87+
P99 ITL (ms): 2.71
8488
==================================================
8589
```
8690

@@ -90,7 +94,7 @@ Supports the following args:
9094
| --- | --- |
9195
| `--datapath` | Path to the output json file produced. |
9296

93-
#### `itl.py`
97+
#### `generate-itl-plot`
9498

9599
Returns a plot of inter-token latencies for a specific request. Takes the following args:
96100

@@ -100,7 +104,7 @@ Returns a plot of inter-token latencies for a specific request. Takes the follow
100104
| `--output` | Path to save figure supported by matplotlib. |
101105
| `--request-num` | Which request to produce ITL plot for. |
102106

103-
#### `ttft.py`
107+
#### `generate-ttft-plot`
104108

105109
Generates a simple CDF plot of **time to first token** requests. You can pass a single file or a list of generated files from the benchmark to make a comparisson <br>
106110

@@ -119,32 +123,31 @@ We will use gpt2 as the model<br>
119123

120124
Once the backend is up and running we can go to the examples folder and run the inference benchmark using vllm_args.json file <br>
121125
`cd examples`<br>
122-
`inference-benchmark --config-file vllm_args.json --output-file vllm-benchmark.json`
126+
`fib benchmark --config-file vllm_args.json --output-file vllm-benchmark.json`
123127

124-
then you can go to the folder data_postprocessors and see the performance with performance.py<br>
125-
`cd ../data_postprocessors` <br>
126-
`python performance.py --datapath ../examples/vllm-benchmark.json` <br>
128+
then you can run the performance analysis post-processor:<br>
129+
`fib analyse --datapath vllm-benchmark.json` <br>
127130

128131
```
129132
============ Serving Benchmark Result ============
130133
Successful requests: 20
131-
Benchmark duration (s): 4.15
132-
Total input tokens: 3836
134+
Benchmark duration (s): 4.12
135+
Total input tokens: 3978
133136
Total generated tokens: 4000
134-
Request throughput (req/s): 4.82
135-
Input token throughput (tok/s): 925.20
136-
Output token throughput (tok/s): 964.76
137+
Request throughput (req/s): 4.85
138+
Input token throughput (tok/s): 964.98
139+
Output token throughput (tok/s): 970.32
137140
---------------Time to First Token----------------
138-
Mean TTFT (ms): 19.91
139-
Median TTFT (ms): 22.11
140-
P99 TTFT (ms): 28.55
141+
Mean TTFT (ms): 6.79
142+
Median TTFT (ms): 4.81
143+
P99 TTFT (ms): 17.90
141144
-----Time per Output Token (excl. 1st token)------
142-
Mean TPOT (ms): 6.73
143-
Median TPOT (ms): 7.96
144-
P99 TPOT (ms): 8.41
145+
Mean TPOT (ms): 1.57
146+
Median TPOT (ms): 1.59
147+
P99 TPOT (ms): 1.90
145148
---------------Inter-token Latency----------------
146-
Mean ITL (ms): 6.73
147-
Median ITL (ms): 7.40
148-
P99 ITL (ms): 20.70
149+
Mean ITL (ms): 1.57
150+
Median ITL (ms): 1.47
151+
P99 ITL (ms): 2.71
149152
==================================================
150153
```

0 commit comments

Comments
 (0)