Skip to content

Commit a64085a

Browse files
committed
ci: bench: more resilient, more metrics
1 parent 93434fd commit a64085a

File tree

2 files changed

+55
-24
lines changed

2 files changed

+55
-24
lines changed

.github/workflows/bench.yml

Lines changed: 51 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,15 @@ on:
1212
- Standard_NC4as_T4_v3
1313
- Standard_NC24ads_A100_v4
1414
- Standard_NC80adis_H100_v5
15+
sha:
16+
description: 'Commit SHA1 to build'
17+
required: false
18+
type: string
19+
duration:
20+
description: 'Duration of the bench'
21+
type: string
22+
default: 10m
23+
1524
push:
1625
branches:
1726
- master
@@ -31,13 +40,15 @@ jobs:
3140
runs-on: Standard_NC4as_T4_v3
3241
env:
3342
RUNNER_LABEL: Standard_NC4as_T4_v3 # FIXME Do not find a way to not duplicate it
43+
N_USERS: 8
3444
if: ${{ github.event.inputs.gpu-series == 'Standard_NC4as_T4_v3' || github.event.schedule || github.event.pull_request || github.event.push.ref == 'refs/heads/master' }}
3545
steps:
3646
- name: Clone
3747
id: checkout
3848
uses: actions/checkout@v3
3949
with:
4050
fetch-depth: 0
51+
ref: ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha || github.head_ref || github.ref_name }}
4152

4253
- name: Install python env
4354
id: pipenv
@@ -100,13 +111,13 @@ jobs:
100111
--runner-label ${{ env.RUNNER_LABEL }} \
101112
--name ${{ github.job }} \
102113
--branch ${{ github.head_ref || github.ref_name }} \
103-
--commit ${{ github.sha }} \
114+
--commit ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha }} \
104115
--scenario script.js \
105-
--duration 10m \
116+
--duration ${{ github.event.inputs.duration || "10m" }} \
106117
--hf-repo ggml-org/models \
107118
--hf-file phi-2/ggml-model-q4_0.gguf \
108119
--model-path-prefix /models \
109-
--parallel 8 \
120+
--parallel ${{ env.N_USERS }} \
110121
-ngl 33 \
111122
--batch-size 2048 \
112123
--ubatch-size 256 \
@@ -125,14 +136,15 @@ jobs:
125136
name: benchmark-results
126137
compression-level: 9
127138
path: |
128-
examples/server/bench/*.png
139+
examples/server/bench/*.jpg
129140
examples/server/bench/*.json
130141
examples/server/bench/*.log
131142
132143
- name: Commit status
133144
uses: Sibz/github-status-action@v1
134145
with:
135146
authToken: ${{secrets.GITHUB_TOKEN}}
147+
sha: ${{ inputs.sha || github.event.pull_request.head.sha || github.sha }}
136148
context: bench-server-baseline
137149
description: |
138150
${{ env.BENCH_RESULTS }}
@@ -145,10 +157,10 @@ jobs:
145157
with:
146158
client_id: ${{secrets.IMGUR_CLIENT_ID}}
147159
path: |
148-
examples/server/bench/prompt_tokens_seconds.png
149-
examples/server/bench/predicted_tokens_seconds.png
150-
examples/server/bench/kv_cache_usage_ratio.png
151-
examples/server/bench/requests_processing.png
160+
examples/server/bench/prompt_tokens_seconds.jpg
161+
examples/server/bench/predicted_tokens_seconds.jpg
162+
examples/server/bench/kv_cache_usage_ratio.jpg
163+
examples/server/bench/requests_processing.jpg
152164
153165
- name: Extract mermaid
154166
id: set_mermaid
@@ -176,24 +188,39 @@ jobs:
176188
echo "$REQUESTS_PROCESSING" >> $GITHUB_ENV
177189
echo "EOF" >> $GITHUB_ENV
178190
191+
- name: Extract image url
192+
id: extrac_image_url
193+
continue-on-error: true
194+
run: |
195+
set -eux
196+
197+
echo "IMAGE_O=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[0] }}" >> $GITHUB_ENV
198+
echo "IMAGE_1=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[1] }}" >> $GITHUB_ENV
199+
echo "IMAGE_2=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[2] }}" >> $GITHUB_ENV
200+
echo "IMAGE_3=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[3] }}" >> $GITHUB_ENV
201+
179202
- name: Comment PR
180203
uses: mshick/add-pr-comment@v2
181204
id: comment_pr
182205
if: ${{ github.event.pull_request != '' }}
183-
continue-on-error: true
184206
with:
185207
message-id: bench-${{ github.job }}-${{ env.RUNNER_LABEL }}
186208
message: |
187-
📈 **llama.cpp server** benchmark for _${{ github.job }}_ on _${{ env.RUNNER_LABEL }}_: **${{ env.BENCH_ITERATIONS}} iterations** 🚀
209+
📈 **llama.cpp server** for _${{ github.job }}_ on _${{ env.RUNNER_LABEL }}_: **${{ env.BENCH_ITERATIONS}} iterations** 🚀
188210
211+
- Concurrent users: ${{ env.N_USERS }}
212+
- HTTP request : avg=${{ env.HTTP_REQ_DURATION_AVG }}ms p(90)=${{ env.HTTP_REQ_DURATION_P_90_ }}ms passes=${{ env.HTTP_REQ_FAILED_FAILS }}reqs fails=${{ env.HTTP_REQ_FAILED_PASSES }}reqs
213+
- Prompt processing (pp): avg=${{ env.LLAMACPP_PROMPT_TOKENS_AVG }}tk/s p(90)=${{ env.LLAMACPP_PROMPT_TOKENS_P_90_ }}tk/s **total=${{ env.LLAMACPP_PROMPT_TOKENS_TOTAL_COUNTER_RATE }}tk/s**
214+
- Token generation (tg): avg=${{ env.LLAMACPP_TOKENS_SECOND_AVG }}tk/s p(90)=${{ env.LLAMACPP_TOKENS_SECOND_P_90_ }}tk/s **total=${{ env.LLAMACPP_COMPLETION_TOKENS_TOTAL_COUNTER_RATE }}tk/s**
215+
- Finish reason : stop=${{ env.LLAMACPP_COMPLETIONS_STOP_RATE_PASSES }}reqs truncated=${{ env.LLAMACPP_COMPLETIONS_TRUNCATED_RATE_PASSES }}
189216
- ${{ env.BENCH_GRAPH_XLABEL }}
190-
- req_avg=${{ env.HTTP_REQ_DURATION_AVG }} pp_avg=${{ env.LLAMACPP_PROMPT_TOKENS_AVG }} tks_avg=${{ env.LLAMACPP_TOKENS_SECOND_AVG }}
191-
192-
217+
193218
<p align="center">
194-
<img width="100%" height="100%" src="${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[0] }}" alt="prompt_tokens_seconds" />
219+
220+
<img width="100%" height="100%" src="${{ env.IMAGE_O] }}" alt="prompt_tokens_seconds" />
195221
196222
<details>
223+
197224
<summary>More</summary>
198225
199226
```mermaid
@@ -202,7 +229,7 @@ jobs:
202229
203230
</details>
204231
205-
<img width="100%" height="100%" src="${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[1] }}" alt="predicted_tokens_seconds"/>
232+
<img width="100%" height="100%" src="${{ env.IMAGE_1 }}" alt="predicted_tokens_seconds"/>
206233
207234
<details>
208235
<summary>More</summary>
@@ -214,10 +241,14 @@ jobs:
214241
</details>
215242
216243
</p>
244+
217245
<details>
218-
<summary>Details</summary>
219-
<p align="center">
220-
<img width="100%" height="100%" src="${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[2] }}" alt="kv_cache_usage_ratio" />
246+
247+
<summary>Details</summary>
248+
249+
<p align="center">
250+
251+
<img width="100%" height="100%" src="${{ env.IMAGE_2 }}" alt="kv_cache_usage_ratio" />
221252
222253
<details>
223254
<summary>More</summary>
@@ -228,7 +259,7 @@ jobs:
228259
229260
</details>
230261
231-
<img width="100%" height="100%" src="${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[3] }}" alt="requests_processing"/>
262+
<img width="100%" height="100%" src="${{ env.IMAGE_3 }}" alt="requests_processing"/>
232263
233264
<details>
234265
<summary>More</summary>
@@ -238,6 +269,6 @@ jobs:
238269
```
239270
240271
</details>
241-
272+
242273
</p>
243274
</details>

examples/server/bench/bench.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def main(args_in: list[str] | None = None) -> None:
7070
for metric_name in data['metrics']:
7171
for metric_metric in data['metrics'][metric_name]:
7272
value = data['metrics'][metric_name][metric_metric]
73-
if isinstance(value, float):
73+
if isinstance(value, float) or isinstance(value, int):
7474
value = round(value, 2)
7575
data['metrics'][metric_name][metric_metric]=value
7676
github_env.write(
@@ -149,11 +149,11 @@ def main(args_in: list[str] | None = None) -> None:
149149
plt.gca().spines["right"].set_alpha(0.0)
150150
plt.gca().spines["left"].set_alpha(0.3)
151151

152-
# Save the plot as a PNG image
153-
plt.savefig(f'{metric}.png')
152+
# Save the plot as a jpg image
153+
plt.savefig(f'{metric}.jpg', dpi=60)
154154
plt.close()
155155

156-
# Mermaid format in case image failed
156+
# Mermaid format in case images upload failed
157157
with (open(f"{metric}.mermaid", 'w') as mermaid_f):
158158
mermaid = (
159159
f"""---

0 commit comments

Comments
 (0)