@@ -10,7 +10,7 @@ pip install .
10
10
```
11
11
12
12
## Usage
13
- After installing with the above instructions, the benchmarker can be invoked with ` inference-benchmark <args>` .
13
+ After installing with the above instructions, the benchmarker can be invoked with ` fib <args>` .
14
14
15
15
After you get your output (using ` --output-file ` ), you can invoke one of the data postprocessors in ` data_postprocessors ` .
16
16
@@ -59,28 +59,32 @@ The output json file in an array of objects that contain the following fields:<b
59
59
* ` stream ` : Indicates if we used the stream argument or not
60
60
61
61
### Data Postprocessors
62
- Below is a description of the data postprocessors.
62
+ Below is a description of the data postprocessors. Each can be invoked with ` fib <postprocessor name here> [Options] `
63
63
64
- #### ` performance.py `
64
+ #### ` analyse `
65
65
Prints the following output for a given run, same as vLLM.
66
66
67
67
```
68
68
============ Serving Benchmark Result ============
69
- Successful requests: 20
70
- Benchmark duration (s): 19.39
71
- Total input tokens: 407
72
- Total generated tokens: 5112
73
- Request throughput (req/s): 1.03
74
- Input token throughput (tok/s): 20.99
75
- Output token throughput (tok/s): 263.66
69
+ Successful requests: 20
70
+ Benchmark duration (s): 4.12
71
+ Total input tokens: 3978
72
+ Total generated tokens: 4000
73
+ Request throughput (req/s): 4.85
74
+ Input token throughput (tok/s): 964.98
75
+ Output token throughput (tok/s): 970.32
76
76
---------------Time to First Token----------------
77
- Mean TTFT (ms): 24.66
78
- Median TTFT (ms): 24.64
79
- P99 TTFT (ms): 34.11
77
+ Mean TTFT (ms): 6.79
78
+ Median TTFT (ms): 4.81
79
+ P99 TTFT (ms): 17.90
80
80
-----Time per Output Token (excl. 1st token)------
81
- Mean TPOT (ms): 2295.86
82
- Median TPOT (ms): 2362.54
83
- P99 TPOT (ms): 2750.76
81
+ Mean TPOT (ms): 1.57
82
+ Median TPOT (ms): 1.59
83
+ P99 TPOT (ms): 1.90
84
+ ---------------Inter-token Latency----------------
85
+ Mean ITL (ms): 1.57
86
+ Median ITL (ms): 1.47
87
+ P99 ITL (ms): 2.71
84
88
==================================================
85
89
```
86
90
@@ -90,7 +94,7 @@ Supports the following args:
90
94
| --- | --- |
91
95
| ` --datapath ` | Path to the output json file produced. |
92
96
93
- #### ` itl.py `
97
+ #### ` generate- itl-plot `
94
98
95
99
Returns a plot of inter-token latencies for a specific request. Takes the following args:
96
100
@@ -100,7 +104,7 @@ Returns a plot of inter-token latencies for a specific request. Takes the follow
100
104
| ` --output ` | Path to save figure supported by matplotlib. |
101
105
| ` --request-num ` | Which request to produce ITL plot for. |
102
106
103
- #### ` ttft.py `
107
+ #### ` generate- ttft-plot `
104
108
105
109
Generates a simple CDF plot of ** time to first token** requests. You can pass a single file or a list of generated files from the benchmark to make a comparisson <br >
106
110
@@ -119,32 +123,31 @@ We will use gpt2 as the model<br>
119
123
120
124
Once the backend is up and running we can go to the examples folder and run the inference benchmark using vllm_args.json file <br >
121
125
` cd examples ` <br >
122
- ` inference- benchmark --config-file vllm_args.json --output-file vllm-benchmark.json`
126
+ ` fib benchmark --config-file vllm_args.json --output-file vllm-benchmark.json`
123
127
124
- then you can go to the folder data_postprocessors and see the performance with performance.py<br >
125
- ` cd ../data_postprocessors ` <br >
126
- ` python performance.py --datapath ../examples/vllm-benchmark.json ` <br >
128
+ then you can run the performance analysis post-processor:<br >
129
+ ` fib analyse --datapath vllm-benchmark.json ` <br >
127
130
128
131
```
129
132
============ Serving Benchmark Result ============
130
133
Successful requests: 20
131
- Benchmark duration (s): 4.15
132
- Total input tokens: 3836
134
+ Benchmark duration (s): 4.12
135
+ Total input tokens: 3978
133
136
Total generated tokens: 4000
134
- Request throughput (req/s): 4.82
135
- Input token throughput (tok/s): 925.20
136
- Output token throughput (tok/s): 964.76
137
+ Request throughput (req/s): 4.85
138
+ Input token throughput (tok/s): 964.98
139
+ Output token throughput (tok/s): 970.32
137
140
---------------Time to First Token----------------
138
- Mean TTFT (ms): 19.91
139
- Median TTFT (ms): 22.11
140
- P99 TTFT (ms): 28.55
141
+ Mean TTFT (ms): 6.79
142
+ Median TTFT (ms): 4.81
143
+ P99 TTFT (ms): 17.90
141
144
-----Time per Output Token (excl. 1st token)------
142
- Mean TPOT (ms): 6.73
143
- Median TPOT (ms): 7.96
144
- P99 TPOT (ms): 8.41
145
+ Mean TPOT (ms): 1.57
146
+ Median TPOT (ms): 1.59
147
+ P99 TPOT (ms): 1.90
145
148
---------------Inter-token Latency----------------
146
- Mean ITL (ms): 6.73
147
- Median ITL (ms): 7.40
148
- P99 ITL (ms): 20.70
149
+ Mean ITL (ms): 1.57
150
+ Median ITL (ms): 1.47
151
+ P99 ITL (ms): 2.71
149
152
==================================================
150
153
```
0 commit comments