【Nvidia】Jetson Orin 平台VLM部署方法与指标评测

# Jetson Orin 测试

## 2.1 直接测试
可以直接使用jetson container来部署sglang；也可以安装pytorch等jetson版本后直接运行sglang，可以直接使用第一种方案比较方便。
https://github.com/dusty-nv/jetson-containers
https://hub.docker.com/r/dustynv/sglang 
https://hub.docker.com/r/dustynv/l4t-pytorch

### 2.1.1 镜像拉取并创建容器

```
docker pull dustynv/sglang:r36.4-cu128-24.04
```

完成sglang的镜像拉取，然后启动容器，注意可以直接根据上方链接使用sglang的容器

```
docker run --runtime nvidia -it --rm -v /data/:/data/ --network=host dustynv/sglang:r36.4-cu128-24.04
```

### 2.1.2 查看sglang
可以直接查看sglang版本

```
root@tegra-ubuntu:/# pip list |grep sglang
sglang                            0.4.7.post1              /opt/venv/lib/python3.12/site-packages pip
```

### 2.1.3 提前准备好模型启动
使用Qwen2.5-VL-3B模型测试

```
python3 -m sglang.launch_server --model-path ./Qwen2.5-VL-3B-Instruct/ --host 0.0.0.0 --port 30000 --enable-metrics
```

### 2.1.4 相关性能指标参考如下
执行mmlu的opencompass评测（纯文本调用评测先）

- 功耗 27W左右（按照25W设置的功耗，因为是orin nx的设置，所以这里暂时没有修改，非agx orin的峰值功耗）
- 显存（内存）占用30GB左右 fp32部署

![](/media/202510/2025-10-28_180325_3436740.7590212560485443.png)

```

10-23-2025 11:13:25 RAM 56177/62842MB (lfb 1x4MB) SWAP 379/31421MB (cached 0MB) CPU [0%@729,4%@729,2%@729,5%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] GR3D_FREQ 99% cpu@55.843C tboard@44.875C soc2@51.718C tdiode@46C soc0@52.656C gpu@53.531C tj@55.843C soc1@52.718C VDDQ_VDD2_1V8AO 3918mW/3793mW VDD_GPU_SOC 11894mW/11871mW VDD_CPU_CV 0mW/46mW VIN_SYS_5V0 9676mW/9480mW

```

- 每秒token输出如下

```
[2025-10-23 11:14:57] Decode batch. #running-req: 8, #token: 1544, token usage: 0.00, cuda graph: True, gen throughput (token/s): 86.95, #queue-req: 0
[2025-10-23 11:15:00] Decode batch. #running-req: 8, #token: 1864, token usage: 0.00, cuda graph: True, gen throughput (token/s): 96.25, #queue-req: 0
[2025-10-23 11:15:04] Decode batch. #running-req: 8, #token: 2184, token usage: 0.00, cuda graph: True, gen throughput (token/s): 96.18, #queue-req: 0
[2025-10-23 11:15:05] INFO:     192.168.1.200:37808 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-10-23 11:15:07] Decode batch. #running-req: 7, #token: 2206, token usage: 0.00, cuda graph: True, gen throughput (token/s): 88.06, #queue-req: 0
[2025-10-23 11:15:09] INFO:     192.168.1.200:43068 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-10-23 11:15:09] INFO:     192.168.1.200:40004 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-10-23 11:15:10] Decode batch. #running-req: 5, #token: 1740, token usage: 0.00, cuda graph: True, gen throughput (token/s): 75.90, #queue-req: 0
```

### 2.1.5 实际opencompass mmlu的评测指标

可以对比一下跟4090上的差异
- Qwen2.5-VL-3B-Instruct-Jetson-SGLang-API

```
The markdown format results is as below:

| dataset | version | metric | mode | Qwen2.5-VL-3B-Instruct-Jetson-SGLang-API |
|----- | ----- | ----- | ----- | -----|
| lukaemon_mmlu_college_biology | bf6b83 | accuracy | gen | 68.06 |
| lukaemon_mmlu_college_chemistry | bf6b83 | accuracy | gen | 41.00 |
| lukaemon_mmlu_college_computer_science | bf6b83 | accuracy | gen | 52.00 |
| lukaemon_mmlu_college_mathematics | bf6b83 | accuracy | gen | 46.00 |
| lukaemon_mmlu_college_physics | bf6b83 | accuracy | gen | 56.86 |
| lukaemon_mmlu_electrical_engineering | bf6b83 | accuracy | gen | 58.62 |
| lukaemon_mmlu_astronomy | bf6b83 | accuracy | gen | 67.11 |
| lukaemon_mmlu_anatomy | bf6b83 | accuracy | gen | 57.78 |
| lukaemon_mmlu_abstract_algebra | bf6b83 | accuracy | gen | 42.00 |
| lukaemon_mmlu_machine_learning | bf6b83 | accuracy | gen | 50.89 |
| lukaemon_mmlu_clinical_knowledge | bf6b83 | accuracy | gen | 67.55 |
| lukaemon_mmlu_global_facts | bf6b83 | accuracy | gen | 43.00 |
| lukaemon_mmlu_management | bf6b83 | accuracy | gen | 70.87 |
| lukaemon_mmlu_nutrition | bf6b83 | accuracy | gen | 64.38 |
| lukaemon_mmlu_marketing | bf6b83 | accuracy | gen | 79.06 |
| lukaemon_mmlu_professional_accounting | bf6b83 | accuracy | gen | 49.65 |
| lukaemon_mmlu_high_school_geography | bf6b83 | accuracy | gen | 74.75 |
| lukaemon_mmlu_international_law | bf6b83 | accuracy | gen | 72.73 |
| lukaemon_mmlu_moral_scenarios | bf6b83 | accuracy | gen | 47.26 |
| lukaemon_mmlu_computer_security | bf6b83 | accuracy | gen | 76.00 |
| lukaemon_mmlu_high_school_microeconomics | bf6b83 | accuracy | gen | 74.37 |
| lukaemon_mmlu_professional_law | bf6b83 | accuracy | gen | 41.00 |
| lukaemon_mmlu_medical_genetics | bf6b83 | accuracy | gen | 69.00 |
| lukaemon_mmlu_professional_psychology | bf6b83 | accuracy | gen | 58.82 |
| lukaemon_mmlu_jurisprudence | bf6b83 | accuracy | gen | 71.30 |
| lukaemon_mmlu_world_religions | bf6b83 | accuracy | gen | 78.36 |
| lukaemon_mmlu_philosophy | bf6b83 | accuracy | gen | 64.31 |
| lukaemon_mmlu_virology | bf6b83 | accuracy | gen | 47.59 |
| lukaemon_mmlu_high_school_chemistry | bf6b83 | accuracy | gen | 58.62 |
| lukaemon_mmlu_public_relations | bf6b83 | accuracy | gen | 60.00 |
| lukaemon_mmlu_high_school_macroeconomics | bf6b83 | accuracy | gen | 66.92 |
| lukaemon_mmlu_human_sexuality | bf6b83 | accuracy | gen | 67.94 |
| lukaemon_mmlu_elementary_mathematics | bf6b83 | accuracy | gen | 84.13 |
| lukaemon_mmlu_high_school_physics | bf6b83 | accuracy | gen | 56.95 |
| lukaemon_mmlu_high_school_computer_science | bf6b83 | accuracy | gen | 75.00 |
| lukaemon_mmlu_high_school_european_history | bf6b83 | accuracy | gen | 73.94 |
| lukaemon_mmlu_business_ethics | bf6b83 | accuracy | gen | 64.00 |
| lukaemon_mmlu_moral_disputes | bf6b83 | accuracy | gen | 61.85 |
| lukaemon_mmlu_high_school_statistics | bf6b83 | accuracy | gen | 62.50 |
| lukaemon_mmlu_miscellaneous | bf6b83 | accuracy | gen | 76.12 |
| lukaemon_mmlu_formal_logic | bf6b83 | accuracy | gen | 47.62 |
| lukaemon_mmlu_high_school_government_and_politics | bf6b83 | accuracy | gen | 77.72 |
| lukaemon_mmlu_prehistory | bf6b83 | accuracy | gen | 63.58 |
| lukaemon_mmlu_security_studies | bf6b83 | accuracy | gen | 56.73 |
| lukaemon_mmlu_high_school_biology | bf6b83 | accuracy | gen | 76.45 |
| lukaemon_mmlu_logical_fallacies | bf6b83 | accuracy | gen | 71.78 |
| lukaemon_mmlu_high_school_world_history | bf6b83 | accuracy | gen | 75.11 |
| lukaemon_mmlu_professional_medicine | bf6b83 | accuracy | gen | 61.76 |
| lukaemon_mmlu_high_school_mathematics | bf6b83 | accuracy | gen | 61.11 |
| lukaemon_mmlu_college_medicine | bf6b83 | accuracy | gen | 65.32 |
| lukaemon_mmlu_high_school_us_history | bf6b83 | accuracy | gen | 74.02 |
| lukaemon_mmlu_sociology | bf6b83 | accuracy | gen | 75.62 |
| lukaemon_mmlu_econometrics | bf6b83 | accuracy | gen | 57.02 |
| lukaemon_mmlu_high_school_psychology | bf6b83 | accuracy | gen | 79.63 |
| lukaemon_mmlu_human_aging | bf6b83 | accuracy | gen | 64.13 |
| lukaemon_mmlu_us_foreign_policy | bf6b83 | accuracy | gen | 76.00 |
| lukaemon_mmlu_conceptual_physics | bf6b83 | accuracy | gen | 67.66 |

```

- Qwen2.5-VL-7B-Instruct-Jetson-SGLang-API

```
| dataset | version | metric | mode | Qwen2.5-VL-7B-Instruct-Jetson-SGLang-API |
|----- | ----- | ----- | ----- | -----|
| lukaemon_mmlu_college_biology | bf6b83 | accuracy | gen | 82.64 |
| lukaemon_mmlu_college_chemistry | bf6b83 | accuracy | gen | 54.00 |
| lukaemon_mmlu_college_computer_science | bf6b83 | accuracy | gen | 59.00 |
| lukaemon_mmlu_college_mathematics | bf6b83 | accuracy | gen | 46.00 |
| lukaemon_mmlu_college_physics | bf6b83 | accuracy | gen | 64.71 |
| lukaemon_mmlu_electrical_engineering | bf6b83 | accuracy | gen | 64.14 |
| lukaemon_mmlu_astronomy | bf6b83 | accuracy | gen | 75.66 |
| lukaemon_mmlu_anatomy | bf6b83 | accuracy | gen | 63.70 |
| lukaemon_mmlu_abstract_algebra | bf6b83 | accuracy | gen | 54.00 |
| lukaemon_mmlu_machine_learning | bf6b83 | accuracy | gen | 53.57 |
| lukaemon_mmlu_clinical_knowledge | bf6b83 | accuracy | gen | 77.36 |
| lukaemon_mmlu_global_facts | bf6b83 | accuracy | gen | 45.00 |
| lukaemon_mmlu_management | bf6b83 | accuracy | gen | 80.58 |
| lukaemon_mmlu_nutrition | bf6b83 | accuracy | gen | 72.22 |
| lukaemon_mmlu_marketing | bf6b83 | accuracy | gen | 87.18 |
| lukaemon_mmlu_professional_accounting | bf6b83 | accuracy | gen | 53.55 |
| lukaemon_mmlu_high_school_geography | bf6b83 | accuracy | gen | 82.32 |
| lukaemon_mmlu_international_law | bf6b83 | accuracy | gen | 73.55 |
| lukaemon_mmlu_moral_scenarios | bf6b83 | accuracy | gen | 40.11 |
| lukaemon_mmlu_computer_security | bf6b83 | accuracy | gen | 72.00 |
| lukaemon_mmlu_high_school_microeconomics | bf6b83 | accuracy | gen | 80.25 |
| lukaemon_mmlu_professional_law | bf6b83 | accuracy | gen | 46.09 |
| lukaemon_mmlu_medical_genetics | bf6b83 | accuracy | gen | 76.00 |
| lukaemon_mmlu_professional_psychology | bf6b83 | accuracy | gen | 67.65 |
| lukaemon_mmlu_jurisprudence | bf6b83 | accuracy | gen | 71.30 |
| lukaemon_mmlu_world_religions | bf6b83 | accuracy | gen | 80.12 |
| lukaemon_mmlu_philosophy | bf6b83 | accuracy | gen | 71.06 |
| lukaemon_mmlu_virology | bf6b83 | accuracy | gen | 51.81 |
| lukaemon_mmlu_high_school_chemistry | bf6b83 | accuracy | gen | 67.00 |
| lukaemon_mmlu_public_relations | bf6b83 | accuracy | gen | 60.91 |
| lukaemon_mmlu_high_school_macroeconomics | bf6b83 | accuracy | gen | 73.33 |
| lukaemon_mmlu_human_sexuality | bf6b83 | accuracy | gen | 73.28 |
| lukaemon_mmlu_elementary_mathematics | bf6b83 | accuracy | gen | 92.06 |
| lukaemon_mmlu_high_school_physics | bf6b83 | accuracy | gen | 68.21 |
| lukaemon_mmlu_high_school_computer_science | bf6b83 | accuracy | gen | 86.00 |
| lukaemon_mmlu_high_school_european_history | bf6b83 | accuracy | gen | 76.97 |
| lukaemon_mmlu_business_ethics | bf6b83 | accuracy | gen | 67.00 |
| lukaemon_mmlu_moral_disputes | bf6b83 | accuracy | gen | 66.76 |
| lukaemon_mmlu_high_school_statistics | bf6b83 | accuracy | gen | 73.15 |
| lukaemon_mmlu_miscellaneous | bf6b83 | accuracy | gen | 84.55 |
| lukaemon_mmlu_formal_logic | bf6b83 | accuracy | gen | 50.00 |
| lukaemon_mmlu_high_school_government_and_politics | bf6b83 | accuracy | gen | 90.67 |
| lukaemon_mmlu_prehistory | bf6b83 | accuracy | gen | 76.23 |
| lukaemon_mmlu_security_studies | bf6b83 | accuracy | gen | 65.31 |
| lukaemon_mmlu_high_school_biology | bf6b83 | accuracy | gen | 85.16 |
| lukaemon_mmlu_logical_fallacies | bf6b83 | accuracy | gen | 77.30 |
| lukaemon_mmlu_high_school_world_history | bf6b83 | accuracy | gen | 73.42 |
| lukaemon_mmlu_professional_medicine | bf6b83 | accuracy | gen | 73.16 |
| lukaemon_mmlu_high_school_mathematics | bf6b83 | accuracy | gen | 60.00 |
| lukaemon_mmlu_college_medicine | bf6b83 | accuracy | gen | 71.68 |
| lukaemon_mmlu_high_school_us_history | bf6b83 | accuracy | gen | 78.92 |
| lukaemon_mmlu_sociology | bf6b83 | accuracy | gen | 79.60 |
| lukaemon_mmlu_econometrics | bf6b83 | accuracy | gen | 54.39 |
| lukaemon_mmlu_high_school_psychology | bf6b83 | accuracy | gen | 88.44 |
| lukaemon_mmlu_human_aging | bf6b83 | accuracy | gen | 72.20 |
| lukaemon_mmlu_us_foreign_policy | bf6b83 | accuracy | gen | 82.00 |
| lukaemon_mmlu_conceptual_physics | bf6b83 | accuracy | gen | 78.72 |

```

- Qwen2.5-VL-7B-Instruct-AWQ-Jetson-SGLang-API

```
The markdown format results is as below:

| dataset | version | metric | mode | Qwen2.5-VL-7B-Instruct-AWQ-Jetson-SGLang-API |
|----- | ----- | ----- | ----- | -----|
| lukaemon_mmlu_college_biology | bf6b83 | accuracy | gen | 80.56 |
| lukaemon_mmlu_college_chemistry | bf6b83 | accuracy | gen | 51.00 |
| lukaemon_mmlu_college_computer_science | bf6b83 | accuracy | gen | 63.00 |
| lukaemon_mmlu_college_mathematics | bf6b83 | accuracy | gen | 43.00 |
| lukaemon_mmlu_college_physics | bf6b83 | accuracy | gen | 66.67 |
| lukaemon_mmlu_electrical_engineering | bf6b83 | accuracy | gen | 66.21 |
| lukaemon_mmlu_astronomy | bf6b83 | accuracy | gen | 71.71 |
| lukaemon_mmlu_anatomy | bf6b83 | accuracy | gen | 59.26 |
| lukaemon_mmlu_abstract_algebra | bf6b83 | accuracy | gen | 49.00 |
| lukaemon_mmlu_machine_learning | bf6b83 | accuracy | gen | 50.89 |
| lukaemon_mmlu_clinical_knowledge | bf6b83 | accuracy | gen | 74.34 |
| lukaemon_mmlu_global_facts | bf6b83 | accuracy | gen | 46.00 |
| lukaemon_mmlu_management | bf6b83 | accuracy | gen | 78.64 |
| lukaemon_mmlu_nutrition | bf6b83 | accuracy | gen | 73.20 |
| lukaemon_mmlu_marketing | bf6b83 | accuracy | gen | 85.90 |
| lukaemon_mmlu_professional_accounting | bf6b83 | accuracy | gen | 54.26 |
| lukaemon_mmlu_high_school_geography | bf6b83 | accuracy | gen | 82.32 |
| lukaemon_mmlu_international_law | bf6b83 | accuracy | gen | 70.25 |
| lukaemon_mmlu_moral_scenarios | bf6b83 | accuracy | gen | 46.82 |
| lukaemon_mmlu_computer_security | bf6b83 | accuracy | gen | 68.00 |
| lukaemon_mmlu_high_school_microeconomics | bf6b83 | accuracy | gen | 78.15 |
| lukaemon_mmlu_professional_law | bf6b83 | accuracy | gen | 44.39 |
| lukaemon_mmlu_medical_genetics | bf6b83 | accuracy | gen | 76.00 |
| lukaemon_mmlu_professional_psychology | bf6b83 | accuracy | gen | 65.20 |
| lukaemon_mmlu_jurisprudence | bf6b83 | accuracy | gen | 71.30 |
| lukaemon_mmlu_world_religions | bf6b83 | accuracy | gen | 77.19 |
| lukaemon_mmlu_philosophy | bf6b83 | accuracy | gen | 65.59 |
| lukaemon_mmlu_virology | bf6b83 | accuracy | gen | 47.59 |
| lukaemon_mmlu_high_school_chemistry | bf6b83 | accuracy | gen | 63.55 |
| lukaemon_mmlu_public_relations | bf6b83 | accuracy | gen | 60.00 |
| lukaemon_mmlu_high_school_macroeconomics | bf6b83 | accuracy | gen | 73.33 |
| lukaemon_mmlu_human_sexuality | bf6b83 | accuracy | gen | 77.10 |
| lukaemon_mmlu_elementary_mathematics | bf6b83 | accuracy | gen | 90.74 |
| lukaemon_mmlu_high_school_physics | bf6b83 | accuracy | gen | 60.26 |
| lukaemon_mmlu_high_school_computer_science | bf6b83 | accuracy | gen | 82.00 |
| lukaemon_mmlu_high_school_european_history | bf6b83 | accuracy | gen | 67.88 |
| lukaemon_mmlu_business_ethics | bf6b83 | accuracy | gen | 68.00 |
| lukaemon_mmlu_moral_disputes | bf6b83 | accuracy | gen | 65.32 |
| lukaemon_mmlu_high_school_statistics | bf6b83 | accuracy | gen | 70.37 |
| lukaemon_mmlu_miscellaneous | bf6b83 | accuracy | gen | 82.12 |
| lukaemon_mmlu_formal_logic | bf6b83 | accuracy | gen | 44.44 |
| lukaemon_mmlu_high_school_government_and_politics | bf6b83 | accuracy | gen | 87.56 |
| lukaemon_mmlu_prehistory | bf6b83 | accuracy | gen | 72.53 |
| lukaemon_mmlu_security_studies | bf6b83 | accuracy | gen | 63.67 |
| lukaemon_mmlu_high_school_biology | bf6b83 | accuracy | gen | 81.61 |
| lukaemon_mmlu_logical_fallacies | bf6b83 | accuracy | gen | 74.23 |
| lukaemon_mmlu_high_school_world_history | bf6b83 | accuracy | gen | 70.46 |
| lukaemon_mmlu_professional_medicine | bf6b83 | accuracy | gen | 70.22 |
| lukaemon_mmlu_high_school_mathematics | bf6b83 | accuracy | gen | 55.56 |
| lukaemon_mmlu_college_medicine | bf6b83 | accuracy | gen | 68.79 |
| lukaemon_mmlu_high_school_us_history | bf6b83 | accuracy | gen | 75.00 |
| lukaemon_mmlu_sociology | bf6b83 | accuracy | gen | 72.64 |
| lukaemon_mmlu_econometrics | bf6b83 | accuracy | gen | 48.25 |
| lukaemon_mmlu_high_school_psychology | bf6b83 | accuracy | gen | 88.07 |
| lukaemon_mmlu_human_aging | bf6b83 | accuracy | gen | 67.71 |
| lukaemon_mmlu_us_foreign_policy | bf6b83 | accuracy | gen | 78.00 |
| lukaemon_mmlu_conceptual_physics | bf6b83 | accuracy | gen | 74.89 |

```

## 2.2 使用SGLANG部署的一些实际测试数据结果对比

- 使用EvalScope进行压测压测指令如下，使用一个进程发送10个请求，请求结果为多模态数据。通过以太网直连和本地请求差别不大。

```
evalscope perf   \
--parallel 1     \
--number 10      \
--model Qwen2.5-VL-7B-Instruct \
--url http://192.168.1.102:30000/v1/chat/completions \
--api openai   --dataset random_vl   --max-tokens 1024   --min-tokens 1024 \
--prefix-length 0   --min-prompt-length 1024   --max-prompt-length 1024 \
--tokenizer-path /mnt/fd9ef272-d51b-4896-bfc8-9beaa52ae4a5/dingfeng1/Qwen2.5-VL-7B-Instruct/ \
--extra-args '{"ignore_eos": true}'
```

![](/media/202510/2025-10-28_180832_6931140.3080147407249699.png)

- AGX nvpmodel

![](/media/202510/2025-10-28_180550_9798400.6720138991101836.png)

## 2.3 Jetson AGX Orin and Thor 大模型部署性能分析

![](/media/202510/2025-10-28_180659_7454540.0354747622941578.png)

https://elinux.org/Jetson/L4T/Jetson_AI_Stack#AGX_Orin

从上方连接中可以找到详细的测试报告，针对于Nvidia官方给出的指标，值得注意的是图表只是针对于速度进行了分析，并且给出了token/s的输出结果，但是由于是（The table is run with VLLM with quantization=w4a 16, max concurrency =8, input seq len = 2048 and output seg len = 128.）w4a16的结果，所以具体的精度并没有看到。如果想要复现性能，需要严格遵循量化策略，否则很难达到实际的性能。

# 3 结论
- 使用SGLANG框架可以对VLM进行部署，同时基于prometheus+grafana的组合可以充分展示后端的性能
- SGLANG 0.5.4的版本提供了一些offline quantization的脚本可以将模型进行转化，否则模型如果没有quantization_config可能无法支持执行offline quantization选项
- 从实际部署来看使用opencompass+VLMEvalKit可以构建自定义数据流用于自有模型的评测（主要是针对一些量化部署优化的版本，可以创建多个工作流）
- Jetson NX orin 16GB可以用于部署Qwen2.5 VL 3B或者7B量化版的模型，从指标来看模型的整体损失在一些非复杂指标上看差异不大，不过量化都是基于官方的AWQ版本模型；Qwen3 有官方的FP8版本，从显存占用来看也能够正常执行。剩余显存用于KVCache使用，仍可在一定程度上加速token输出
- Jetson AGX orin 64GB可以基本部署当前主要的模型，性能方面可根据实时性要求调整，实测来看按照官方给出的指标基本能够对其。从官方指标看，对VLM和LLM的输出效率已经不错。针对VLA的输出仍然较为紧俏。

# Annexe
- https://www.qbitai.com/2025/05/281230.html
- https://github.com/dusty-nv/jetson-containers
- https://github.com/open-compass/VLMEvalKit/blob/main/run.py
- https://rank.opencompass.org.cn/leaderboard-multimodal
- https://opencompass.readthedocs.io/zh-cn/latest/get_started/quick_start.html
- https://rank.opencompass.org.cn/leaderboard-llm
- https://github.com/dusty-nv/jetson-containers

# 测试结果

- Jetson Orin NX 16GB Qwen2.5VL-7B-Instruct-AWQ
2025-10-28 11:13:09 - evalscope - INFO: Test connection successful.
2025-10-28 11:13:09 - evalscope - INFO: Save the data base to: outputs/20251028_111152/Qwen2.5-VL-7B-Instruct/benchmark_data.db
Processing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 9/10 [11:48<01:18, 78.58s/it]2025-10-28 11:26:17 - evalscope - INFO: {
  "Time taken for tests (s)": 787.4081,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 14.4455,
  "Total token throughput (tok/s)": 29.822,
  "Request throughput (req/s)": 0.0141,
  "Average latency (s)": 78.7394,
  "Average time to first token (s)": 2.1271,
  "Average time per output token (s)": 0.0749,
  "Average inter-token latency (s)": 0.083,
  "Average input tokens per request": 1090.0,
  "Average output tokens per request": 1024.0
}
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [13:07<00:00, 78.75s/it]
2025-10-28 11:26:17 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          |  787.408  |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   14.4455 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   29.822  |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0141 |
+-----------------------------------+-----------+
| Average latency (s)               |   78.7394 |
+-----------------------------------+-----------+
| Average time to first token (s)   |    2.1271 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0749 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.083  |
+-----------------------------------+-----------+
| Average input tokens per request  | 1090      |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 11:26:17 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  1.9576  | 0.0743  |  0.0748  |   78.5232   |     1090     |     1024      |    13.0223     |    26.8839    |
|     25%     |  1.9585  | 0.0746  |  0.0748  |   78.5264   |     1090     |     1024      |    13.0249     |    26.8894    |
|     50%     |  1.9605  | 0.0749  |  0.0749  |   78.6093   |     1090     |     1024      |     13.028     |    26.8957    |
|     66%     |  1.9605  | 0.0751  |  0.0749  |   78.6145   |     1090     |     1024      |    13.0386     |    26.9177    |
|     75%     |  1.9625  | 0.0753  |  0.0749  |   78.6184   |     1090     |     1024      |    13.0402     |    26.9209    |
|     80%     |  1.9639  | 0.0754  |  0.0749  |   78.6345   |     1090     |     1024      |    13.0407     |    26.922     |
|     90%     |  3.634   | 0.1492  |  0.075   |   80.2125   |     1090     |     1024      |    13.0414     |    26.9233    |
|     95%     |  3.634   |  0.15   |  0.075   |   80.2125   |     1090     |     1024      |    13.0414     |    26.9233    |
|     98%     |  3.634   | 0.1504  |  0.075   |   80.2125   |     1090     |     1024      |    13.0414     |    26.9233    |
|     99%     |  3.634   | 0.1506  |  0.075   |   80.2125   |     1090     |     1024      |    13.0414     |    26.9233    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 11:26:17 - evalscope - INFO: Save the summary to: outputs/20251028_111152/Qwen2.5-VL-7B-Instruct

- Jetson Orin NX 16GB Qwen2.5VL-3B-Instruct-AWQ
2025-10-28 07:31:15 - evalscope - INFO: Test connection successful.
2025-10-28 07:31:16 - evalscope - INFO: Save the data base to: outputs/20251028_073037/Qwen2.5-VL-3B-Instruct-AWQ/benchmark_data.db
Processing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 9/10 [05:28<00:36, 36.37s/it]2025-10-28 07:37:21 - evalscope - INFO: {
  "Time taken for tests (s)": 364.9373,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 31.1624,
  "Total token throughput (tok/s)": 64.3303,
  "Request throughput (req/s)": 0.0304,
  "Average latency (s)": 36.4749,
  "Average time to first token (s)": 1.1994,
  "Average time per output token (s)": 0.0345,
  "Average inter-token latency (s)": 0.0345,
  "Average input tokens per request": 1089.9,
  "Average output tokens per request": 1024.0
}
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [06:05<00:00, 36.51s/it]
2025-10-28 07:37:21 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          |  364.937  |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   31.1624 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   64.3303 |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0304 |
+-----------------------------------+-----------+
| Average latency (s)               |   36.4749 |
+-----------------------------------+-----------+
| Average time to first token (s)   |    1.1994 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0345 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.0345 |
+-----------------------------------+-----------+
| Average input tokens per request  | 1089.9    |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 07:37:21 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  1.0108  |  0.034  |  0.0345  |   36.2704   |     1090     |     1024      |    28.1535     |    58.1216    |
|     25%     |  1.0109  | 0.0342  |  0.0345  |   36.2841   |     1090     |     1024      |    28.1826     |    58.1786    |
|     50%     |  1.0226  | 0.0344  |  0.0345  |   36.3192   |     1090     |     1024      |    28.2138     |    58.246     |
|     66%     |  1.0236  | 0.0346  |  0.0345  |   36.3315   |     1090     |     1024      |    28.2213     |    58.2616    |
|     75%     |  1.0272  | 0.0347  |  0.0345  |   36.3345   |     1090     |     1024      |    28.2217     |    58.2624    |
|     80%     |  1.1157  | 0.0348  |  0.0345  |   36.372    |     1090     |     1024      |    28.2324     |    58.2845    |
|     90%     |  2.7487  |  0.035  |  0.0345  |   37.9942   |     1090     |     1024      |     28.237     |    58.2941    |
|     95%     |  2.7487  | 0.0353  |  0.0345  |   37.9942   |     1090     |     1024      |     28.237     |    58.2941    |
|     98%     |  2.7487  | 0.0357  |  0.0345  |   37.9942   |     1090     |     1024      |     28.237     |    58.2941    |
|     99%     |  2.7487  | 0.0359  |  0.0345  |   37.9942   |     1090     |     1024      |     28.237     |    58.2941    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 07:37:21 - evalscope - INFO: Save the summary to: outputs/20251028_073037/Qwen2.5-VL-3B-Instruct-AWQ

- Jetson Orin NX 16GB Qwen2.5VL-3B-Instruct
2025-10-28 06:44:50 - evalscope - INFO: Test connection successful.
2025-10-28 06:44:51 - evalscope - INFO: Save the data base to: outputs/20251028_064206/Qwen2.5-VL-3B-Instruct/benchmark_data.db
Processing:  90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌             | 9/10 [24:43<02:44, 164.87s/it]2025-10-28 07:12:20 - evalscope - INFO: {
  "Time taken for tests (s)": 1648.7701,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 13.8011,
  "Total token throughput (tok/s)": 28.028,
  "Request throughput (req/s)": 0.0067,
  "Average latency (s)": 164.8485,
  "Average time to first token (s)": 2.3873,
  "Average time per output token (s)": 0.0794,
  "Average inter-token latency (s)": 0.0847,
  "Average input tokens per request": 2111.2,
  "Average output tokens per request": 2048.0
}
Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [27:28<00:00, 164.89s/it]
2025-10-28 07:12:20 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          | 1648.77   |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   13.8011 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   28.028  |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0067 |
+-----------------------------------+-----------+
| Average latency (s)               |  164.849  |
+-----------------------------------+-----------+
| Average time to first token (s)   |    2.3873 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0794 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.0847 |
+-----------------------------------+-----------+
| Average input tokens per request  | 2111.2    |
+-----------------------------------+-----------+
| Average output tokens per request | 2048      |
+-----------------------------------+-----------+
2025-10-28 07:12:20 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  2.3275  | 0.0788  |  0.0793  |  164.7491   |     2114     |     2048      |    12.4184     |    25.2371    |
|     25%     |  2.3319  | 0.0791  |  0.0793  |  164.7856   |     2114     |     2048      |    12.4231     |    25.2466    |
|     50%     |  2.3454  | 0.0794  |  0.0794  |  164.8226   |     2114     |     2048      |    12.4272     |    25.2548    |
|     66%     |   2.35   | 0.0796  |  0.0794  |  164.8417   |     2114     |     2048      |    12.4274     |    25.2554    |
|     75%     |  2.3667  | 0.0797  |  0.0794  |  164.8539   |     2114     |     2048      |    12.4283     |    25.2571    |
|     80%     |  2.4258  | 0.0798  |  0.0794  |  164.9161   |     2114     |     2048      |     12.431     |    25.2627    |
|     90%     |  2.7384  | 0.0804  |  0.0794  |  165.2194   |     2114     |     2048      |    12.4347     |    25.2701    |
|     95%     |  2.7384  | 0.1585  |  0.0794  |  165.2194   |     2114     |     2048      |    12.4347     |    25.2701    |
|     98%     |  2.7384  | 0.1592  |  0.0794  |  165.2194   |     2114     |     2048      |    12.4347     |    25.2701    |
|     99%     |  2.7384  | 0.1595  |  0.0794  |  165.2194   |     2114     |     2048      |    12.4347     |    25.2701    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 07:12:20 - evalscope - INFO: Save the summary to: outputs/20251028_064206/Qwen2.5-VL-3B-Instruct

- Jetson AGX Orin 64GB Qwen2.5VL-7B-Instruct
2025-10-28 08:26:55 - evalscope - INFO: Test connection successful.
2025-10-28 08:26:56 - evalscope - INFO: Save the data base to: outputs/20251028_082520/Qwen2.5-VL-7B-Instruct/benchmark_data.db
Processing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 9/10 [13:58<01:33, 93.17s/it]2025-10-28 08:42:28 - evalscope - INFO: {
  "Time taken for tests (s)": 931.9475,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 12.2087,
  "Total token throughput (tok/s)": 25.2031,
  "Request throughput (req/s)": 0.0119,
  "Average latency (s)": 93.1783,
  "Average time to first token (s)": 1.2425,
  "Average time per output token (s)": 0.0899,
  "Average inter-token latency (s)": 0.0954,
  "Average input tokens per request": 1089.9,
  "Average output tokens per request": 1024.0
}
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [15:32<00:00, 93.21s/it]
2025-10-28 08:42:28 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          |  931.947  |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   12.2087 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   25.2031 |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0119 |
+-----------------------------------+-----------+
| Average latency (s)               |   93.1783 |
+-----------------------------------+-----------+
| Average time to first token (s)   |    1.2425 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0899 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.0954 |
+-----------------------------------+-----------+
| Average input tokens per request  | 1089.9    |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 08:42:28 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  1.165   | 0.0893  |  0.0898  |   93.0585   |     1090     |     1024      |    10.9847     |    22.6667    |
|     25%     |  1.1651  | 0.0896  |  0.0898  |   93.0792   |     1090     |     1024      |    10.9869     |    22.6819    |
|     50%     |  1.1766  | 0.0899  |  0.0899  |   93.1483   |     1090     |     1024      |    10.9938     |    22.6962    |
|     66%     |  1.1793  | 0.0901  |  0.0899  |   93.1854   |     1090     |     1024      |    10.9955     |    22.6996    |
|     75%     |  1.1844  | 0.0903  |  0.0899  |   93.2022   |     1090     |     1024      |    11.0014     |    22.7118    |
|     80%     |  1.2404  | 0.0904  |  0.0899  |   93.2204   |     1090     |     1024      |    11.0038     |    22.7169    |
|     90%     |  1.806   | 0.0908  |   0.09   |   93.563    |     1090     |     1024      |    11.0045     |    22.7182    |
|     95%     |  1.806   | 0.1793  |   0.09   |   93.563    |     1090     |     1024      |    11.0045     |    22.7182    |
|     98%     |  1.806   | 0.1801  |   0.09   |   93.563    |     1090     |     1024      |    11.0045     |    22.7182    |
|     99%     |  1.806   | 0.1805  |   0.09   |   93.563    |     1090     |     1024      |    11.0045     |    22.7182    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 08:42:28 - evalscope - INFO: Save the summary to: outputs/20251028_082520/Qwen2.5-VL-7B-Instruct

- Jetson AGX Orin 64GB Qwen2.5VL-3B-Instruct
2025-10-28 09:10:11 - evalscope - INFO: Test connection successful.
2025-10-28 09:10:12 - evalscope - INFO: Save the data base to: outputs/20251028_090923/Qwen2.5-VL-3B-Instruct/benchmark_data.db
Processing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 9/10 [06:51<00:45, 45.71s/it]2025-10-28 09:17:49 - evalscope - INFO: {
  "Time taken for tests (s)": 457.2854,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 24.8763,
  "Total token throughput (tok/s)": 51.3535,
  "Request throughput (req/s)": 0.0243,
  "Average latency (s)": 45.7115,
  "Average time to first token (s)": 0.6067,
  "Average time per output token (s)": 0.0441,
  "Average inter-token latency (s)": 0.044,
  "Average input tokens per request": 1089.9,
  "Average output tokens per request": 1024.0
}
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [07:37<00:00, 45.74s/it]
2025-10-28 09:17:49 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          |  457.285  |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   24.8763 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   51.3535 |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0243 |
+-----------------------------------+-----------+
| Average latency (s)               |   45.7115 |
+-----------------------------------+-----------+
| Average time to first token (s)   |    0.6067 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0441 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.044  |
+-----------------------------------+-----------+
| Average input tokens per request  | 1089.9    |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 09:17:49 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  0.5522  | 0.0436  |  0.0441  |   45.6268   |     1090     |     1024      |    22.3743     |    46.1689    |
|     25%     |  0.5545  | 0.0438  |  0.0441  |   45.6311   |     1090     |     1024      |    22.4109     |    46.2662    |
|     50%     |  0.5583  | 0.0441  |  0.0441  |   45.6709   |     1090     |     1024      |    22.4274     |    46.3003    |
|     66%     |  0.5585  | 0.0442  |  0.0441  |   45.6874   |     1090     |     1024      |    22.4371     |    46.3204    |
|     75%     |  0.5654  | 0.0444  |  0.0441  |   45.6921   |     1090     |     1024      |    22.4408     |    46.328     |
|     80%     |  0.6076  | 0.0444  |  0.0441  |   45.7668   |     1090     |     1024      |     22.443     |    46.3325    |
|     90%     |  1.0066  | 0.0447  |  0.0442  |   46.1366   |     1090     |     1024      |    22.4528     |    46.3528    |
|     95%     |  1.0066  | 0.0449  |  0.0442  |   46.1366   |     1090     |     1024      |    22.4528     |    46.3528    |
|     98%     |  1.0066  | 0.0452  |  0.0442  |   46.1366   |     1090     |     1024      |    22.4528     |    46.3528    |
|     99%     |  1.0066  | 0.0455  |  0.0442  |   46.1366   |     1090     |     1024      |    22.4528     |    46.3528    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 09:17:49 - evalscope - INFO: Save the summary to: outputs/20251028_090923/Qwen2.5-VL-3B-Instruct

- Jetson AGX Orin 64GB Qwen2.5VL-7B-Instruct-AWQ
2025-10-28 08:49:51 - evalscope - INFO: Test connection successful.
2025-10-28 08:49:51 - evalscope - INFO: Save the data base to: outputs/20251028_084912/Qwen2.5-VL-7B-Instruct-AWQ/benchmark_data.db
Processing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 9/10 [05:25<00:36, 36.16s/it]2025-10-28 08:55:54 - evalscope - INFO: {
  "Time taken for tests (s)": 362.1074,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 31.4132,
  "Total token throughput (tok/s)": 64.7284,
  "Request throughput (req/s)": 0.0307,
  "Average latency (s)": 36.1938,
  "Average time to first token (s)": 0.8838,
  "Average time per output token (s)": 0.0345,
  "Average inter-token latency (s)": 0.0359,
  "Average input tokens per request": 1086.0,
  "Average output tokens per request": 1024.0
}
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [06:02<00:00, 36.22s/it]
2025-10-28 08:55:54 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          |  362.107  |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   31.4132 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   64.7284 |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0307 |
+-----------------------------------+-----------+
| Average latency (s)               |   36.1938 |
+-----------------------------------+-----------+
| Average time to first token (s)   |    0.8838 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0345 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.0359 |
+-----------------------------------+-----------+
| Average input tokens per request  | 1086      |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 08:55:54 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  0.8013  |  0.034  |  0.0345  |   36.0776   |     1090     |     1024      |    28.2728     |    57.5523    |
|     25%     |  0.8075  | 0.0343  |  0.0345  |   36.0778   |     1090     |     1024      |    28.3045     |    58.3678    |
|     50%     |  0.8141  | 0.0345  |  0.0345  |   36.1312   |     1090     |     1024      |    28.3511     |    58.509     |
|     66%     |  0.8146  | 0.0347  |  0.0345  |   36.1381   |     1090     |     1024      |    28.3599     |    58.5296    |
|     75%     |  0.8195  | 0.0348  |  0.0345  |   36.178    |     1090     |     1024      |    28.3831     |    58.5477    |
|     80%     |  0.9132  | 0.0349  |  0.0346  |   36.2186   |     1090     |     1024      |    28.3833     |    58.5956    |
|     90%     |  1.4437  | 0.0352  |  0.0346  |   36.8545   |     1090     |     1024      |    28.4154     |    58.596     |
|     95%     |  1.4437  | 0.0358  |  0.0346  |   36.8545   |     1090     |     1024      |    28.4154     |    58.596     |
|     98%     |  1.4437  | 0.0691  |  0.0346  |   36.8545   |     1090     |     1024      |    28.4154     |    58.596     |
|     99%     |  1.4437  | 0.0694  |  0.0346  |   36.8545   |     1090     |     1024      |    28.4154     |    58.596     |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 08:55:54 - evalscope - INFO: Save the summary to: outputs/20251028_084912/Qwen2.5-VL-7B-Instruct-AWQ

- Jetson AGX Orin 64GB Qwen2.5VL-3B-Instruct-AWQ
2025-10-28 09:02:24 - evalscope - INFO: Test connection successful.
2025-10-28 09:02:25 - evalscope - INFO: Save the data base to: outputs/20251028_090200/Qwen2.5-VL-3B-Instruct-AWQ/benchmark_data.db
Processing:  90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 9/10 [03:10<00:21, 21.15s/it]2025-10-28 09:05:57 - evalscope - INFO: {
  "Time taken for tests (s)": 211.5724,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 53.7527,
  "Total token throughput (tok/s)": 110.97,
  "Request throughput (req/s)": 0.0525,
  "Average latency (s)": 21.1406,
  "Average time to first token (s)": 0.5731,
  "Average time per output token (s)": 0.0201,
  "Average inter-token latency (s)": 0.0201,
  "Average input tokens per request": 1090.0,
  "Average output tokens per request": 1024.0
}
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [03:31<00:00, 21.17s/it]
2025-10-28 09:05:57 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          |  211.572  |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |   53.7527 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |  110.97   |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0525 |
+-----------------------------------+-----------+
| Average latency (s)               |   21.1406 |
+-----------------------------------+-----------+
| Average time to first token (s)   |    0.5731 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.0201 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.0201 |
+-----------------------------------+-----------+
| Average input tokens per request  | 1090      |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 09:05:57 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  0.4997  | 0.0197  |  0.0201  |   21.0531   |     1090     |     1024      |    48.2363     |    99.5815    |
|     25%     |  0.5061  | 0.0199  |  0.0201  |   21.0533   |     1090     |     1024      |    48.4606     |   100.0447    |
|     50%     |  0.5137  | 0.0201  |  0.0201  |   21.0881   |     1090     |     1024      |    48.5731     |   100.2768    |
|     66%     |  0.5175  | 0.0202  |  0.0201  |   21.0952   |     1090     |     1024      |    48.6303     |   100.3951    |
|     75%     |  0.5212  | 0.0203  |  0.0201  |   21.1306   |     1090     |     1024      |    48.6385     |   100.4118    |
|     80%     |  0.6271  | 0.0204  |  0.0201  |   21.2288   |     1090     |     1024      |    48.6389     |   100.4128    |
|     90%     |  1.0261  | 0.0206  |  0.0202  |   21.5677   |     1090     |     1024      |    48.6433     |   100.4219    |
|     95%     |  1.0261  | 0.0207  |  0.0202  |   21.5677   |     1090     |     1024      |    48.6433     |   100.4219    |
|     98%     |  1.0261  | 0.0209  |  0.0202  |   21.5677   |     1090     |     1024      |    48.6433     |   100.4219    |
|     99%     |  1.0261  | 0.0211  |  0.0202  |   21.5677   |     1090     |     1024      |    48.6433     |   100.4219    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 09:05:57 - evalscope - INFO: Save the summary to: outputs/20251028_090200/Qwen2.5-VL-3B-Instruct-AWQ

- Qwen2.5-VL-32B-Instruct-AWQ
2025-10-28 09:44:55 - evalscope - INFO: Test connection successful.
2025-10-28 09:44:56 - evalscope - INFO: Save the data base to: outputs/20251028_094248/Qwen2.5-VL-32B-Instruct-AWQ/benchmark_data.db
Processing:  90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌             | 9/10 [19:20<02:08, 128.92s/it]2025-10-28 10:06:25 - evalscope - INFO: {
  "Time taken for tests (s)": 1289.0551,
  "Number of concurrency": 1,
  "Total requests": 10,
  "Succeed requests": 10,
  "Failed requests": 0,
  "Output token throughput (tok/s)": 8.8265,
  "Total token throughput (tok/s)": 18.2219,
  "Request throughput (req/s)": 0.0086,
  "Average latency (s)": 128.8895,
  "Average time to first token (s)": 2.9977,
  "Average time per output token (s)": 0.1231,
  "Average inter-token latency (s)": 0.1333,
  "Average input tokens per request": 1090.0,
  "Average output tokens per request": 1024.0
}
Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [21:29<00:00, 128.92s/it]
2025-10-28 10:06:25 - evalscope - INFO:
Benchmarking summary:
+-----------------------------------+-----------+
| Key                               |     Value |
+===================================+===========+
| Time taken for tests (s)          | 1289.06   |
+-----------------------------------+-----------+
| Number of concurrency             |    1      |
+-----------------------------------+-----------+
| Total requests                    |   10      |
+-----------------------------------+-----------+
| Succeed requests                  |   10      |
+-----------------------------------+-----------+
| Failed requests                   |    0      |
+-----------------------------------+-----------+
| Output token throughput (tok/s)   |    8.8265 |
+-----------------------------------+-----------+
| Total token throughput (tok/s)    |   18.2219 |
+-----------------------------------+-----------+
| Request throughput (req/s)        |    0.0086 |
+-----------------------------------+-----------+
| Average latency (s)               |  128.889  |
+-----------------------------------+-----------+
| Average time to first token (s)   |    2.9977 |
+-----------------------------------+-----------+
| Average time per output token (s) |    0.1231 |
+-----------------------------------+-----------+
| Average inter-token latency (s)   |    0.1333 |
+-----------------------------------+-----------+
| Average input tokens per request  | 1090      |
+-----------------------------------+-----------+
| Average output tokens per request | 1024      |
+-----------------------------------+-----------+
2025-10-28 10:06:25 - evalscope - INFO:
Percentile results:
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
| Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
|     10%     |  2.9331  | 0.1222  |  0.1229  |  128.8109   |     1090     |     1024      |     7.9424     |    16.3968    |
|     25%     |  2.9341  | 0.1226  |  0.123   |  128.8433   |     1090     |     1024      |     7.9426     |    16.3971    |
|     50%     |  2.9441  | 0.1231  |  0.1231  |   128.904   |     1090     |     1024      |     7.9441     |    16.4002    |
|     66%     |  2.9516  | 0.1234  |  0.1231  |  128.9153   |     1090     |     1024      |     7.946      |    16.4042    |
|     75%     |  2.9521  | 0.1236  |  0.1231  |  128.9252   |     1090     |     1024      |     7.9476     |    16.4075    |
|     80%     |  2.9891  | 0.1238  |  0.1231  |  128.9278   |     1090     |     1024      |     7.9496     |    16.4117    |
|     90%     |  3.4695  | 0.1246  |  0.1231  |  129.1473   |     1090     |     1024      |     7.9595     |    16.4321    |
|     95%     |  3.4695  | 0.2466  |  0.1231  |  129.1473   |     1090     |     1024      |     7.9595     |    16.4321    |
|     98%     |  3.4695  | 0.2475  |  0.1231  |  129.1473   |     1090     |     1024      |     7.9595     |    16.4321    |
|     99%     |  3.4695  | 0.2479  |  0.1231  |  129.1473   |     1090     |     1024      |     7.9595     |    16.4321    |
+-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+
2025-10-28 10:06:25 - evalscope - INFO: Save the summary to: outputs/20251028_094248/Qwen2.5-VL-32B-Instruct-AWQ