Public Docs
【模型量化】深度学习模型量化 & 量化理论 & 各平台的量化过程 & 硬件加速
【TVM】TI关于TVM的使用测试与分析
【LLM&LVM】大模型开源工程思维导图
【北航卓越工程师】《汽车前沿技术导论:智能驾驶》讲义
【工具链】Yocto使用介绍——使用Yocto创建一个树莓派的系统镜像
【工具链】使用ssh+dialog指令设定服务器指定用户仅容器访问
【推理引擎】一篇关于模型推理的详细对比与学习
【推理引擎】关于TVM中的Schedule优化详解(On going)
【LLM微调】使用litgpt进行私有数据集模型微调的测试总结
【TVM】在TVM Relay中创建一个自定义操作符
【STT+LLM+TTS】如何使用语音转文字模型+大预言模型+语音生成模型完成一个类人的语音交互机器人
【RAG】 通过RAG构建垂直领域的LLM Agent的方法探索
【RAG】GraphRAG精读与测试(On going)
【AI Agent】MetaGPT精读与学习
【AI Base】Ilya Sutskever 27篇必读论文分享清单
【Nvidia】Jetson AGX Orin/ Jetson Orin nano 硬件测试调试内容(On going)
【BI/DI】LLM Using in BI Testing Scenario (On going)
【Nvidia】How to Activate a Camera on Nvidia Platform in Details
【RAS-PI】树莓派驱动开发
【行业咨询阅读】关注实时咨询和分析
【mobileye】2024 Driving AI
【mobileye】SDS_Safety_Architecture
【yolo】yolov8测试
【nvidia】Triton server实践
【alibaba】MNN(on updating)
【OpenAI】Triton(on updating)
【CAIS】关于Compound AI Systems的思考
【Nvidia】关于Cuda+Cudnn+TensorRT推理环境
【BEV】BEVDet在各个平台上的执行效率及优化(On Updating)
【Chip】AI在芯片设计和电路设计中的应用
【Chip】ChiPFormer
【Chip】关于布线的学习
【Chip】MaskPlace论文精读与工程复现优化
【gynasium】强化学习初体验
【Cadence】X AI
【transformer】MinGPT开源工程学习
【中间件】针对apollo 10.0中关于cyberRT性能优化的深度解读和思考
【Robotics】调研了解当前机器人开发者套件(on updating)
【Robotics】ROS CON China 2024 文档技术整理与感想总结(上2024.12.7,中2024.12.8,下场外产品)
【algorithm】关于模型、数据与标注规范的平衡问题
【nvidia】DLA的学习了解与使用
【nvidia】构建nvidia嵌入式平台的交叉编译环境(其他环境平台可借鉴)
【2025AI生成式大会】2025大会个人总结
【Robotics】 Create Quadruped Robot RL FootStep Training Environment In IsaacLab
【VLM】读懂多模态大模型评价指标
【VLM】大模型部署的端侧部署性能与精度评估方法与分析
【Nvidia】Jetson Orin 平台VLM部署方法与指标评测
文档发布于【Feng's Docs】
-
+
首页
【Nvidia】Jetson Orin 平台VLM部署方法与指标评测
# Jetson Orin 测试 ## 2.1 直接测试 可以直接使用jetson container来部署sglang;也可以安装pytorch等jetson版本后直接运行sglang,可以直接使用第一种方案比较方便。 https://github.com/dusty-nv/jetson-containers https://hub.docker.com/r/dustynv/sglang https://hub.docker.com/r/dustynv/l4t-pytorch ### 2.1.1 镜像拉取并创建容器 ``` docker pull dustynv/sglang:r36.4-cu128-24.04 ``` 完成sglang的镜像拉取,然后启动容器,注意可以直接根据上方链接使用sglang的容器 ``` docker run --runtime nvidia -it --rm -v /data/:/data/ --network=host dustynv/sglang:r36.4-cu128-24.04 ``` ### 2.1.2 查看sglang 可以直接查看sglang版本 ``` root@tegra-ubuntu:/# pip list |grep sglang sglang 0.4.7.post1 /opt/venv/lib/python3.12/site-packages pip ``` ### 2.1.3 提前准备好模型启动 使用Qwen2.5-VL-3B模型测试 ``` python3 -m sglang.launch_server --model-path ./Qwen2.5-VL-3B-Instruct/ --host 0.0.0.0 --port 30000 --enable-metrics ``` ### 2.1.4 相关性能指标参考如下 执行mmlu的opencompass评测(纯文本调用评测先) - 功耗 27W左右(按照25W设置的功耗,因为是orin nx的设置,所以这里暂时没有修改,非agx orin的峰值功耗) - 显存(内存)占用30GB左右 fp32部署  ``` 10-23-2025 11:13:25 RAM 56177/62842MB (lfb 1x4MB) SWAP 379/31421MB (cached 0MB) CPU [0%@729,4%@729,2%@729,5%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] GR3D_FREQ 99% cpu@55.843C tboard@44.875C soc2@51.718C tdiode@46C soc0@52.656C gpu@53.531C tj@55.843C soc1@52.718C VDDQ_VDD2_1V8AO 3918mW/3793mW VDD_GPU_SOC 11894mW/11871mW VDD_CPU_CV 0mW/46mW VIN_SYS_5V0 9676mW/9480mW ``` - 每秒token输出如下 ``` [2025-10-23 11:14:57] Decode batch. #running-req: 8, #token: 1544, token usage: 0.00, cuda graph: True, gen throughput (token/s): 86.95, #queue-req: 0 [2025-10-23 11:15:00] Decode batch. #running-req: 8, #token: 1864, token usage: 0.00, cuda graph: True, gen throughput (token/s): 96.25, #queue-req: 0 [2025-10-23 11:15:04] Decode batch. #running-req: 8, #token: 2184, token usage: 0.00, cuda graph: True, gen throughput (token/s): 96.18, #queue-req: 0 [2025-10-23 11:15:05] INFO: 192.168.1.200:37808 - "POST /v1/chat/completions HTTP/1.1" 200 OK [2025-10-23 11:15:07] Decode batch. #running-req: 7, #token: 2206, token usage: 0.00, cuda graph: True, gen throughput (token/s): 88.06, #queue-req: 0 [2025-10-23 11:15:09] INFO: 192.168.1.200:43068 - "POST /v1/chat/completions HTTP/1.1" 200 OK [2025-10-23 11:15:09] INFO: 192.168.1.200:40004 - "POST /v1/chat/completions HTTP/1.1" 200 OK [2025-10-23 11:15:10] Decode batch. #running-req: 5, #token: 1740, token usage: 0.00, cuda graph: True, gen throughput (token/s): 75.90, #queue-req: 0 ``` ### 2.1.5 实际opencompass mmlu的评测指标 可以对比一下跟4090上的差异 - Qwen2.5-VL-3B-Instruct-Jetson-SGLang-API ``` The markdown format results is as below: | dataset | version | metric | mode | Qwen2.5-VL-3B-Instruct-Jetson-SGLang-API | |----- | ----- | ----- | ----- | -----| | lukaemon_mmlu_college_biology | bf6b83 | accuracy | gen | 68.06 | | lukaemon_mmlu_college_chemistry | bf6b83 | accuracy | gen | 41.00 | | lukaemon_mmlu_college_computer_science | bf6b83 | accuracy | gen | 52.00 | | lukaemon_mmlu_college_mathematics | bf6b83 | accuracy | gen | 46.00 | | lukaemon_mmlu_college_physics | bf6b83 | accuracy | gen | 56.86 | | lukaemon_mmlu_electrical_engineering | bf6b83 | accuracy | gen | 58.62 | | lukaemon_mmlu_astronomy | bf6b83 | accuracy | gen | 67.11 | | lukaemon_mmlu_anatomy | bf6b83 | accuracy | gen | 57.78 | | lukaemon_mmlu_abstract_algebra | bf6b83 | accuracy | gen | 42.00 | | lukaemon_mmlu_machine_learning | bf6b83 | accuracy | gen | 50.89 | | lukaemon_mmlu_clinical_knowledge | bf6b83 | accuracy | gen | 67.55 | | lukaemon_mmlu_global_facts | bf6b83 | accuracy | gen | 43.00 | | lukaemon_mmlu_management | bf6b83 | accuracy | gen | 70.87 | | lukaemon_mmlu_nutrition | bf6b83 | accuracy | gen | 64.38 | | lukaemon_mmlu_marketing | bf6b83 | accuracy | gen | 79.06 | | lukaemon_mmlu_professional_accounting | bf6b83 | accuracy | gen | 49.65 | | lukaemon_mmlu_high_school_geography | bf6b83 | accuracy | gen | 74.75 | | lukaemon_mmlu_international_law | bf6b83 | accuracy | gen | 72.73 | | lukaemon_mmlu_moral_scenarios | bf6b83 | accuracy | gen | 47.26 | | lukaemon_mmlu_computer_security | bf6b83 | accuracy | gen | 76.00 | | lukaemon_mmlu_high_school_microeconomics | bf6b83 | accuracy | gen | 74.37 | | lukaemon_mmlu_professional_law | bf6b83 | accuracy | gen | 41.00 | | lukaemon_mmlu_medical_genetics | bf6b83 | accuracy | gen | 69.00 | | lukaemon_mmlu_professional_psychology | bf6b83 | accuracy | gen | 58.82 | | lukaemon_mmlu_jurisprudence | bf6b83 | accuracy | gen | 71.30 | | lukaemon_mmlu_world_religions | bf6b83 | accuracy | gen | 78.36 | | lukaemon_mmlu_philosophy | bf6b83 | accuracy | gen | 64.31 | | lukaemon_mmlu_virology | bf6b83 | accuracy | gen | 47.59 | | lukaemon_mmlu_high_school_chemistry | bf6b83 | accuracy | gen | 58.62 | | lukaemon_mmlu_public_relations | bf6b83 | accuracy | gen | 60.00 | | lukaemon_mmlu_high_school_macroeconomics | bf6b83 | accuracy | gen | 66.92 | | lukaemon_mmlu_human_sexuality | bf6b83 | accuracy | gen | 67.94 | | lukaemon_mmlu_elementary_mathematics | bf6b83 | accuracy | gen | 84.13 | | lukaemon_mmlu_high_school_physics | bf6b83 | accuracy | gen | 56.95 | | lukaemon_mmlu_high_school_computer_science | bf6b83 | accuracy | gen | 75.00 | | lukaemon_mmlu_high_school_european_history | bf6b83 | accuracy | gen | 73.94 | | lukaemon_mmlu_business_ethics | bf6b83 | accuracy | gen | 64.00 | | lukaemon_mmlu_moral_disputes | bf6b83 | accuracy | gen | 61.85 | | lukaemon_mmlu_high_school_statistics | bf6b83 | accuracy | gen | 62.50 | | lukaemon_mmlu_miscellaneous | bf6b83 | accuracy | gen | 76.12 | | lukaemon_mmlu_formal_logic | bf6b83 | accuracy | gen | 47.62 | | lukaemon_mmlu_high_school_government_and_politics | bf6b83 | accuracy | gen | 77.72 | | lukaemon_mmlu_prehistory | bf6b83 | accuracy | gen | 63.58 | | lukaemon_mmlu_security_studies | bf6b83 | accuracy | gen | 56.73 | | lukaemon_mmlu_high_school_biology | bf6b83 | accuracy | gen | 76.45 | | lukaemon_mmlu_logical_fallacies | bf6b83 | accuracy | gen | 71.78 | | lukaemon_mmlu_high_school_world_history | bf6b83 | accuracy | gen | 75.11 | | lukaemon_mmlu_professional_medicine | bf6b83 | accuracy | gen | 61.76 | | lukaemon_mmlu_high_school_mathematics | bf6b83 | accuracy | gen | 61.11 | | lukaemon_mmlu_college_medicine | bf6b83 | accuracy | gen | 65.32 | | lukaemon_mmlu_high_school_us_history | bf6b83 | accuracy | gen | 74.02 | | lukaemon_mmlu_sociology | bf6b83 | accuracy | gen | 75.62 | | lukaemon_mmlu_econometrics | bf6b83 | accuracy | gen | 57.02 | | lukaemon_mmlu_high_school_psychology | bf6b83 | accuracy | gen | 79.63 | | lukaemon_mmlu_human_aging | bf6b83 | accuracy | gen | 64.13 | | lukaemon_mmlu_us_foreign_policy | bf6b83 | accuracy | gen | 76.00 | | lukaemon_mmlu_conceptual_physics | bf6b83 | accuracy | gen | 67.66 | ``` - Qwen2.5-VL-7B-Instruct-Jetson-SGLang-API ``` | dataset | version | metric | mode | Qwen2.5-VL-7B-Instruct-Jetson-SGLang-API | |----- | ----- | ----- | ----- | -----| | lukaemon_mmlu_college_biology | bf6b83 | accuracy | gen | 82.64 | | lukaemon_mmlu_college_chemistry | bf6b83 | accuracy | gen | 54.00 | | lukaemon_mmlu_college_computer_science | bf6b83 | accuracy | gen | 59.00 | | lukaemon_mmlu_college_mathematics | bf6b83 | accuracy | gen | 46.00 | | lukaemon_mmlu_college_physics | bf6b83 | accuracy | gen | 64.71 | | lukaemon_mmlu_electrical_engineering | bf6b83 | accuracy | gen | 64.14 | | lukaemon_mmlu_astronomy | bf6b83 | accuracy | gen | 75.66 | | lukaemon_mmlu_anatomy | bf6b83 | accuracy | gen | 63.70 | | lukaemon_mmlu_abstract_algebra | bf6b83 | accuracy | gen | 54.00 | | lukaemon_mmlu_machine_learning | bf6b83 | accuracy | gen | 53.57 | | lukaemon_mmlu_clinical_knowledge | bf6b83 | accuracy | gen | 77.36 | | lukaemon_mmlu_global_facts | bf6b83 | accuracy | gen | 45.00 | | lukaemon_mmlu_management | bf6b83 | accuracy | gen | 80.58 | | lukaemon_mmlu_nutrition | bf6b83 | accuracy | gen | 72.22 | | lukaemon_mmlu_marketing | bf6b83 | accuracy | gen | 87.18 | | lukaemon_mmlu_professional_accounting | bf6b83 | accuracy | gen | 53.55 | | lukaemon_mmlu_high_school_geography | bf6b83 | accuracy | gen | 82.32 | | lukaemon_mmlu_international_law | bf6b83 | accuracy | gen | 73.55 | | lukaemon_mmlu_moral_scenarios | bf6b83 | accuracy | gen | 40.11 | | lukaemon_mmlu_computer_security | bf6b83 | accuracy | gen | 72.00 | | lukaemon_mmlu_high_school_microeconomics | bf6b83 | accuracy | gen | 80.25 | | lukaemon_mmlu_professional_law | bf6b83 | accuracy | gen | 46.09 | | lukaemon_mmlu_medical_genetics | bf6b83 | accuracy | gen | 76.00 | | lukaemon_mmlu_professional_psychology | bf6b83 | accuracy | gen | 67.65 | | lukaemon_mmlu_jurisprudence | bf6b83 | accuracy | gen | 71.30 | | lukaemon_mmlu_world_religions | bf6b83 | accuracy | gen | 80.12 | | lukaemon_mmlu_philosophy | bf6b83 | accuracy | gen | 71.06 | | lukaemon_mmlu_virology | bf6b83 | accuracy | gen | 51.81 | | lukaemon_mmlu_high_school_chemistry | bf6b83 | accuracy | gen | 67.00 | | lukaemon_mmlu_public_relations | bf6b83 | accuracy | gen | 60.91 | | lukaemon_mmlu_high_school_macroeconomics | bf6b83 | accuracy | gen | 73.33 | | lukaemon_mmlu_human_sexuality | bf6b83 | accuracy | gen | 73.28 | | lukaemon_mmlu_elementary_mathematics | bf6b83 | accuracy | gen | 92.06 | | lukaemon_mmlu_high_school_physics | bf6b83 | accuracy | gen | 68.21 | | lukaemon_mmlu_high_school_computer_science | bf6b83 | accuracy | gen | 86.00 | | lukaemon_mmlu_high_school_european_history | bf6b83 | accuracy | gen | 76.97 | | lukaemon_mmlu_business_ethics | bf6b83 | accuracy | gen | 67.00 | | lukaemon_mmlu_moral_disputes | bf6b83 | accuracy | gen | 66.76 | | lukaemon_mmlu_high_school_statistics | bf6b83 | accuracy | gen | 73.15 | | lukaemon_mmlu_miscellaneous | bf6b83 | accuracy | gen | 84.55 | | lukaemon_mmlu_formal_logic | bf6b83 | accuracy | gen | 50.00 | | lukaemon_mmlu_high_school_government_and_politics | bf6b83 | accuracy | gen | 90.67 | | lukaemon_mmlu_prehistory | bf6b83 | accuracy | gen | 76.23 | | lukaemon_mmlu_security_studies | bf6b83 | accuracy | gen | 65.31 | | lukaemon_mmlu_high_school_biology | bf6b83 | accuracy | gen | 85.16 | | lukaemon_mmlu_logical_fallacies | bf6b83 | accuracy | gen | 77.30 | | lukaemon_mmlu_high_school_world_history | bf6b83 | accuracy | gen | 73.42 | | lukaemon_mmlu_professional_medicine | bf6b83 | accuracy | gen | 73.16 | | lukaemon_mmlu_high_school_mathematics | bf6b83 | accuracy | gen | 60.00 | | lukaemon_mmlu_college_medicine | bf6b83 | accuracy | gen | 71.68 | | lukaemon_mmlu_high_school_us_history | bf6b83 | accuracy | gen | 78.92 | | lukaemon_mmlu_sociology | bf6b83 | accuracy | gen | 79.60 | | lukaemon_mmlu_econometrics | bf6b83 | accuracy | gen | 54.39 | | lukaemon_mmlu_high_school_psychology | bf6b83 | accuracy | gen | 88.44 | | lukaemon_mmlu_human_aging | bf6b83 | accuracy | gen | 72.20 | | lukaemon_mmlu_us_foreign_policy | bf6b83 | accuracy | gen | 82.00 | | lukaemon_mmlu_conceptual_physics | bf6b83 | accuracy | gen | 78.72 | ``` - Qwen2.5-VL-7B-Instruct-AWQ-Jetson-SGLang-API ``` The markdown format results is as below: | dataset | version | metric | mode | Qwen2.5-VL-7B-Instruct-AWQ-Jetson-SGLang-API | |----- | ----- | ----- | ----- | -----| | lukaemon_mmlu_college_biology | bf6b83 | accuracy | gen | 80.56 | | lukaemon_mmlu_college_chemistry | bf6b83 | accuracy | gen | 51.00 | | lukaemon_mmlu_college_computer_science | bf6b83 | accuracy | gen | 63.00 | | lukaemon_mmlu_college_mathematics | bf6b83 | accuracy | gen | 43.00 | | lukaemon_mmlu_college_physics | bf6b83 | accuracy | gen | 66.67 | | lukaemon_mmlu_electrical_engineering | bf6b83 | accuracy | gen | 66.21 | | lukaemon_mmlu_astronomy | bf6b83 | accuracy | gen | 71.71 | | lukaemon_mmlu_anatomy | bf6b83 | accuracy | gen | 59.26 | | lukaemon_mmlu_abstract_algebra | bf6b83 | accuracy | gen | 49.00 | | lukaemon_mmlu_machine_learning | bf6b83 | accuracy | gen | 50.89 | | lukaemon_mmlu_clinical_knowledge | bf6b83 | accuracy | gen | 74.34 | | lukaemon_mmlu_global_facts | bf6b83 | accuracy | gen | 46.00 | | lukaemon_mmlu_management | bf6b83 | accuracy | gen | 78.64 | | lukaemon_mmlu_nutrition | bf6b83 | accuracy | gen | 73.20 | | lukaemon_mmlu_marketing | bf6b83 | accuracy | gen | 85.90 | | lukaemon_mmlu_professional_accounting | bf6b83 | accuracy | gen | 54.26 | | lukaemon_mmlu_high_school_geography | bf6b83 | accuracy | gen | 82.32 | | lukaemon_mmlu_international_law | bf6b83 | accuracy | gen | 70.25 | | lukaemon_mmlu_moral_scenarios | bf6b83 | accuracy | gen | 46.82 | | lukaemon_mmlu_computer_security | bf6b83 | accuracy | gen | 68.00 | | lukaemon_mmlu_high_school_microeconomics | bf6b83 | accuracy | gen | 78.15 | | lukaemon_mmlu_professional_law | bf6b83 | accuracy | gen | 44.39 | | lukaemon_mmlu_medical_genetics | bf6b83 | accuracy | gen | 76.00 | | lukaemon_mmlu_professional_psychology | bf6b83 | accuracy | gen | 65.20 | | lukaemon_mmlu_jurisprudence | bf6b83 | accuracy | gen | 71.30 | | lukaemon_mmlu_world_religions | bf6b83 | accuracy | gen | 77.19 | | lukaemon_mmlu_philosophy | bf6b83 | accuracy | gen | 65.59 | | lukaemon_mmlu_virology | bf6b83 | accuracy | gen | 47.59 | | lukaemon_mmlu_high_school_chemistry | bf6b83 | accuracy | gen | 63.55 | | lukaemon_mmlu_public_relations | bf6b83 | accuracy | gen | 60.00 | | lukaemon_mmlu_high_school_macroeconomics | bf6b83 | accuracy | gen | 73.33 | | lukaemon_mmlu_human_sexuality | bf6b83 | accuracy | gen | 77.10 | | lukaemon_mmlu_elementary_mathematics | bf6b83 | accuracy | gen | 90.74 | | lukaemon_mmlu_high_school_physics | bf6b83 | accuracy | gen | 60.26 | | lukaemon_mmlu_high_school_computer_science | bf6b83 | accuracy | gen | 82.00 | | lukaemon_mmlu_high_school_european_history | bf6b83 | accuracy | gen | 67.88 | | lukaemon_mmlu_business_ethics | bf6b83 | accuracy | gen | 68.00 | | lukaemon_mmlu_moral_disputes | bf6b83 | accuracy | gen | 65.32 | | lukaemon_mmlu_high_school_statistics | bf6b83 | accuracy | gen | 70.37 | | lukaemon_mmlu_miscellaneous | bf6b83 | accuracy | gen | 82.12 | | lukaemon_mmlu_formal_logic | bf6b83 | accuracy | gen | 44.44 | | lukaemon_mmlu_high_school_government_and_politics | bf6b83 | accuracy | gen | 87.56 | | lukaemon_mmlu_prehistory | bf6b83 | accuracy | gen | 72.53 | | lukaemon_mmlu_security_studies | bf6b83 | accuracy | gen | 63.67 | | lukaemon_mmlu_high_school_biology | bf6b83 | accuracy | gen | 81.61 | | lukaemon_mmlu_logical_fallacies | bf6b83 | accuracy | gen | 74.23 | | lukaemon_mmlu_high_school_world_history | bf6b83 | accuracy | gen | 70.46 | | lukaemon_mmlu_professional_medicine | bf6b83 | accuracy | gen | 70.22 | | lukaemon_mmlu_high_school_mathematics | bf6b83 | accuracy | gen | 55.56 | | lukaemon_mmlu_college_medicine | bf6b83 | accuracy | gen | 68.79 | | lukaemon_mmlu_high_school_us_history | bf6b83 | accuracy | gen | 75.00 | | lukaemon_mmlu_sociology | bf6b83 | accuracy | gen | 72.64 | | lukaemon_mmlu_econometrics | bf6b83 | accuracy | gen | 48.25 | | lukaemon_mmlu_high_school_psychology | bf6b83 | accuracy | gen | 88.07 | | lukaemon_mmlu_human_aging | bf6b83 | accuracy | gen | 67.71 | | lukaemon_mmlu_us_foreign_policy | bf6b83 | accuracy | gen | 78.00 | | lukaemon_mmlu_conceptual_physics | bf6b83 | accuracy | gen | 74.89 | ``` ## 2.2 使用SGLANG部署的一些实际测试数据结果对比 - 使用EvalScope进行压测压测指令如下,使用一个进程发送10个请求,请求结果为多模态数据。通过以太网直连和本地请求差别不大。 ``` evalscope perf \ --parallel 1 \ --number 10 \ --model Qwen2.5-VL-7B-Instruct \ --url http://192.168.1.102:30000/v1/chat/completions \ --api openai --dataset random_vl --max-tokens 1024 --min-tokens 1024 \ --prefix-length 0 --min-prompt-length 1024 --max-prompt-length 1024 \ --tokenizer-path /mnt/fd9ef272-d51b-4896-bfc8-9beaa52ae4a5/dingfeng1/Qwen2.5-VL-7B-Instruct/ \ --extra-args '{"ignore_eos": true}' ```  - AGX nvpmodel  ## 2.3 Jetson AGX Orin and Thor 大模型部署性能分析  https://elinux.org/Jetson/L4T/Jetson_AI_Stack#AGX_Orin 从上方连接中可以找到详细的测试报告,针对于Nvidia官方给出的指标,值得注意的是图表只是针对于速度进行了分析,并且给出了token/s的输出结果,但是由于是(The table is run with VLLM with quantization=w4a 16, max concurrency =8, input seq len = 2048 and output seg len = 128.)w4a16的结果,所以具体的精度并没有看到。如果想要复现性能,需要严格遵循量化策略,否则很难达到实际的性能。 # 3 结论 - 使用SGLANG框架可以对VLM进行部署,同时基于prometheus+grafana的组合可以充分展示后端的性能 - SGLANG 0.5.4的版本提供了一些offline quantization的脚本可以将模型进行转化,否则模型如果没有quantization_config可能无法支持执行offline quantization选项 - 从实际部署来看使用opencompass+VLMEvalKit可以构建自定义数据流用于自有模型的评测(主要是针对一些量化部署优化的版本,可以创建多个工作流) - Jetson NX orin 16GB可以用于部署Qwen2.5 VL 3B或者7B量化版的模型,从指标来看模型的整体损失在一些非复杂指标上看差异不大,不过量化都是基于官方的AWQ版本模型;Qwen3 有官方的FP8版本,从显存占用来看也能够正常执行。剩余显存用于KVCache使用,仍可在一定程度上加速token输出 - Jetson AGX orin 64GB可以基本部署当前主要的模型,性能方面可根据实时性要求调整,实测来看按照官方给出的指标基本能够对其。从官方指标看,对VLM和LLM的输出效率已经不错。针对VLA的输出仍然较为紧俏。 # Annexe - https://www.qbitai.com/2025/05/281230.html - https://github.com/dusty-nv/jetson-containers - https://github.com/open-compass/VLMEvalKit/blob/main/run.py - https://rank.opencompass.org.cn/leaderboard-multimodal - https://opencompass.readthedocs.io/zh-cn/latest/get_started/quick_start.html - https://rank.opencompass.org.cn/leaderboard-llm - https://github.com/dusty-nv/jetson-containers # 测试结果 - Jetson Orin NX 16GB Qwen2.5VL-7B-Instruct-AWQ 2025-10-28 11:13:09 - evalscope - INFO: Test connection successful. 2025-10-28 11:13:09 - evalscope - INFO: Save the data base to: outputs/20251028_111152/Qwen2.5-VL-7B-Instruct/benchmark_data.db Processing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 9/10 [11:48<01:18, 78.58s/it]2025-10-28 11:26:17 - evalscope - INFO: { "Time taken for tests (s)": 787.4081, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 14.4455, "Total token throughput (tok/s)": 29.822, "Request throughput (req/s)": 0.0141, "Average latency (s)": 78.7394, "Average time to first token (s)": 2.1271, "Average time per output token (s)": 0.0749, "Average inter-token latency (s)": 0.083, "Average input tokens per request": 1090.0, "Average output tokens per request": 1024.0 } Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [13:07<00:00, 78.75s/it] 2025-10-28 11:26:17 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 787.408 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 14.4455 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 29.822 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0141 | +-----------------------------------+-----------+ | Average latency (s) | 78.7394 | +-----------------------------------+-----------+ | Average time to first token (s) | 2.1271 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0749 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.083 | +-----------------------------------+-----------+ | Average input tokens per request | 1090 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 11:26:17 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 1.9576 | 0.0743 | 0.0748 | 78.5232 | 1090 | 1024 | 13.0223 | 26.8839 | | 25% | 1.9585 | 0.0746 | 0.0748 | 78.5264 | 1090 | 1024 | 13.0249 | 26.8894 | | 50% | 1.9605 | 0.0749 | 0.0749 | 78.6093 | 1090 | 1024 | 13.028 | 26.8957 | | 66% | 1.9605 | 0.0751 | 0.0749 | 78.6145 | 1090 | 1024 | 13.0386 | 26.9177 | | 75% | 1.9625 | 0.0753 | 0.0749 | 78.6184 | 1090 | 1024 | 13.0402 | 26.9209 | | 80% | 1.9639 | 0.0754 | 0.0749 | 78.6345 | 1090 | 1024 | 13.0407 | 26.922 | | 90% | 3.634 | 0.1492 | 0.075 | 80.2125 | 1090 | 1024 | 13.0414 | 26.9233 | | 95% | 3.634 | 0.15 | 0.075 | 80.2125 | 1090 | 1024 | 13.0414 | 26.9233 | | 98% | 3.634 | 0.1504 | 0.075 | 80.2125 | 1090 | 1024 | 13.0414 | 26.9233 | | 99% | 3.634 | 0.1506 | 0.075 | 80.2125 | 1090 | 1024 | 13.0414 | 26.9233 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 11:26:17 - evalscope - INFO: Save the summary to: outputs/20251028_111152/Qwen2.5-VL-7B-Instruct - Jetson Orin NX 16GB Qwen2.5VL-3B-Instruct-AWQ 2025-10-28 07:31:15 - evalscope - INFO: Test connection successful. 2025-10-28 07:31:16 - evalscope - INFO: Save the data base to: outputs/20251028_073037/Qwen2.5-VL-3B-Instruct-AWQ/benchmark_data.db Processing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 9/10 [05:28<00:36, 36.37s/it]2025-10-28 07:37:21 - evalscope - INFO: { "Time taken for tests (s)": 364.9373, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 31.1624, "Total token throughput (tok/s)": 64.3303, "Request throughput (req/s)": 0.0304, "Average latency (s)": 36.4749, "Average time to first token (s)": 1.1994, "Average time per output token (s)": 0.0345, "Average inter-token latency (s)": 0.0345, "Average input tokens per request": 1089.9, "Average output tokens per request": 1024.0 } Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [06:05<00:00, 36.51s/it] 2025-10-28 07:37:21 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 364.937 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 31.1624 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 64.3303 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0304 | +-----------------------------------+-----------+ | Average latency (s) | 36.4749 | +-----------------------------------+-----------+ | Average time to first token (s) | 1.1994 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0345 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.0345 | +-----------------------------------+-----------+ | Average input tokens per request | 1089.9 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 07:37:21 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 1.0108 | 0.034 | 0.0345 | 36.2704 | 1090 | 1024 | 28.1535 | 58.1216 | | 25% | 1.0109 | 0.0342 | 0.0345 | 36.2841 | 1090 | 1024 | 28.1826 | 58.1786 | | 50% | 1.0226 | 0.0344 | 0.0345 | 36.3192 | 1090 | 1024 | 28.2138 | 58.246 | | 66% | 1.0236 | 0.0346 | 0.0345 | 36.3315 | 1090 | 1024 | 28.2213 | 58.2616 | | 75% | 1.0272 | 0.0347 | 0.0345 | 36.3345 | 1090 | 1024 | 28.2217 | 58.2624 | | 80% | 1.1157 | 0.0348 | 0.0345 | 36.372 | 1090 | 1024 | 28.2324 | 58.2845 | | 90% | 2.7487 | 0.035 | 0.0345 | 37.9942 | 1090 | 1024 | 28.237 | 58.2941 | | 95% | 2.7487 | 0.0353 | 0.0345 | 37.9942 | 1090 | 1024 | 28.237 | 58.2941 | | 98% | 2.7487 | 0.0357 | 0.0345 | 37.9942 | 1090 | 1024 | 28.237 | 58.2941 | | 99% | 2.7487 | 0.0359 | 0.0345 | 37.9942 | 1090 | 1024 | 28.237 | 58.2941 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 07:37:21 - evalscope - INFO: Save the summary to: outputs/20251028_073037/Qwen2.5-VL-3B-Instruct-AWQ - Jetson Orin NX 16GB Qwen2.5VL-3B-Instruct 2025-10-28 06:44:50 - evalscope - INFO: Test connection successful. 2025-10-28 06:44:51 - evalscope - INFO: Save the data base to: outputs/20251028_064206/Qwen2.5-VL-3B-Instruct/benchmark_data.db Processing: 90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 9/10 [24:43<02:44, 164.87s/it]2025-10-28 07:12:20 - evalscope - INFO: { "Time taken for tests (s)": 1648.7701, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 13.8011, "Total token throughput (tok/s)": 28.028, "Request throughput (req/s)": 0.0067, "Average latency (s)": 164.8485, "Average time to first token (s)": 2.3873, "Average time per output token (s)": 0.0794, "Average inter-token latency (s)": 0.0847, "Average input tokens per request": 2111.2, "Average output tokens per request": 2048.0 } Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [27:28<00:00, 164.89s/it] 2025-10-28 07:12:20 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 1648.77 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 13.8011 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 28.028 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0067 | +-----------------------------------+-----------+ | Average latency (s) | 164.849 | +-----------------------------------+-----------+ | Average time to first token (s) | 2.3873 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0794 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.0847 | +-----------------------------------+-----------+ | Average input tokens per request | 2111.2 | +-----------------------------------+-----------+ | Average output tokens per request | 2048 | +-----------------------------------+-----------+ 2025-10-28 07:12:20 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 2.3275 | 0.0788 | 0.0793 | 164.7491 | 2114 | 2048 | 12.4184 | 25.2371 | | 25% | 2.3319 | 0.0791 | 0.0793 | 164.7856 | 2114 | 2048 | 12.4231 | 25.2466 | | 50% | 2.3454 | 0.0794 | 0.0794 | 164.8226 | 2114 | 2048 | 12.4272 | 25.2548 | | 66% | 2.35 | 0.0796 | 0.0794 | 164.8417 | 2114 | 2048 | 12.4274 | 25.2554 | | 75% | 2.3667 | 0.0797 | 0.0794 | 164.8539 | 2114 | 2048 | 12.4283 | 25.2571 | | 80% | 2.4258 | 0.0798 | 0.0794 | 164.9161 | 2114 | 2048 | 12.431 | 25.2627 | | 90% | 2.7384 | 0.0804 | 0.0794 | 165.2194 | 2114 | 2048 | 12.4347 | 25.2701 | | 95% | 2.7384 | 0.1585 | 0.0794 | 165.2194 | 2114 | 2048 | 12.4347 | 25.2701 | | 98% | 2.7384 | 0.1592 | 0.0794 | 165.2194 | 2114 | 2048 | 12.4347 | 25.2701 | | 99% | 2.7384 | 0.1595 | 0.0794 | 165.2194 | 2114 | 2048 | 12.4347 | 25.2701 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 07:12:20 - evalscope - INFO: Save the summary to: outputs/20251028_064206/Qwen2.5-VL-3B-Instruct - Jetson AGX Orin 64GB Qwen2.5VL-7B-Instruct 2025-10-28 08:26:55 - evalscope - INFO: Test connection successful. 2025-10-28 08:26:56 - evalscope - INFO: Save the data base to: outputs/20251028_082520/Qwen2.5-VL-7B-Instruct/benchmark_data.db Processing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 9/10 [13:58<01:33, 93.17s/it]2025-10-28 08:42:28 - evalscope - INFO: { "Time taken for tests (s)": 931.9475, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 12.2087, "Total token throughput (tok/s)": 25.2031, "Request throughput (req/s)": 0.0119, "Average latency (s)": 93.1783, "Average time to first token (s)": 1.2425, "Average time per output token (s)": 0.0899, "Average inter-token latency (s)": 0.0954, "Average input tokens per request": 1089.9, "Average output tokens per request": 1024.0 } Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [15:32<00:00, 93.21s/it] 2025-10-28 08:42:28 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 931.947 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 12.2087 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 25.2031 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0119 | +-----------------------------------+-----------+ | Average latency (s) | 93.1783 | +-----------------------------------+-----------+ | Average time to first token (s) | 1.2425 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0899 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.0954 | +-----------------------------------+-----------+ | Average input tokens per request | 1089.9 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 08:42:28 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 1.165 | 0.0893 | 0.0898 | 93.0585 | 1090 | 1024 | 10.9847 | 22.6667 | | 25% | 1.1651 | 0.0896 | 0.0898 | 93.0792 | 1090 | 1024 | 10.9869 | 22.6819 | | 50% | 1.1766 | 0.0899 | 0.0899 | 93.1483 | 1090 | 1024 | 10.9938 | 22.6962 | | 66% | 1.1793 | 0.0901 | 0.0899 | 93.1854 | 1090 | 1024 | 10.9955 | 22.6996 | | 75% | 1.1844 | 0.0903 | 0.0899 | 93.2022 | 1090 | 1024 | 11.0014 | 22.7118 | | 80% | 1.2404 | 0.0904 | 0.0899 | 93.2204 | 1090 | 1024 | 11.0038 | 22.7169 | | 90% | 1.806 | 0.0908 | 0.09 | 93.563 | 1090 | 1024 | 11.0045 | 22.7182 | | 95% | 1.806 | 0.1793 | 0.09 | 93.563 | 1090 | 1024 | 11.0045 | 22.7182 | | 98% | 1.806 | 0.1801 | 0.09 | 93.563 | 1090 | 1024 | 11.0045 | 22.7182 | | 99% | 1.806 | 0.1805 | 0.09 | 93.563 | 1090 | 1024 | 11.0045 | 22.7182 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 08:42:28 - evalscope - INFO: Save the summary to: outputs/20251028_082520/Qwen2.5-VL-7B-Instruct - Jetson AGX Orin 64GB Qwen2.5VL-3B-Instruct 2025-10-28 09:10:11 - evalscope - INFO: Test connection successful. 2025-10-28 09:10:12 - evalscope - INFO: Save the data base to: outputs/20251028_090923/Qwen2.5-VL-3B-Instruct/benchmark_data.db Processing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 9/10 [06:51<00:45, 45.71s/it]2025-10-28 09:17:49 - evalscope - INFO: { "Time taken for tests (s)": 457.2854, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 24.8763, "Total token throughput (tok/s)": 51.3535, "Request throughput (req/s)": 0.0243, "Average latency (s)": 45.7115, "Average time to first token (s)": 0.6067, "Average time per output token (s)": 0.0441, "Average inter-token latency (s)": 0.044, "Average input tokens per request": 1089.9, "Average output tokens per request": 1024.0 } Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [07:37<00:00, 45.74s/it] 2025-10-28 09:17:49 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 457.285 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 24.8763 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 51.3535 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0243 | +-----------------------------------+-----------+ | Average latency (s) | 45.7115 | +-----------------------------------+-----------+ | Average time to first token (s) | 0.6067 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0441 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.044 | +-----------------------------------+-----------+ | Average input tokens per request | 1089.9 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 09:17:49 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 0.5522 | 0.0436 | 0.0441 | 45.6268 | 1090 | 1024 | 22.3743 | 46.1689 | | 25% | 0.5545 | 0.0438 | 0.0441 | 45.6311 | 1090 | 1024 | 22.4109 | 46.2662 | | 50% | 0.5583 | 0.0441 | 0.0441 | 45.6709 | 1090 | 1024 | 22.4274 | 46.3003 | | 66% | 0.5585 | 0.0442 | 0.0441 | 45.6874 | 1090 | 1024 | 22.4371 | 46.3204 | | 75% | 0.5654 | 0.0444 | 0.0441 | 45.6921 | 1090 | 1024 | 22.4408 | 46.328 | | 80% | 0.6076 | 0.0444 | 0.0441 | 45.7668 | 1090 | 1024 | 22.443 | 46.3325 | | 90% | 1.0066 | 0.0447 | 0.0442 | 46.1366 | 1090 | 1024 | 22.4528 | 46.3528 | | 95% | 1.0066 | 0.0449 | 0.0442 | 46.1366 | 1090 | 1024 | 22.4528 | 46.3528 | | 98% | 1.0066 | 0.0452 | 0.0442 | 46.1366 | 1090 | 1024 | 22.4528 | 46.3528 | | 99% | 1.0066 | 0.0455 | 0.0442 | 46.1366 | 1090 | 1024 | 22.4528 | 46.3528 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 09:17:49 - evalscope - INFO: Save the summary to: outputs/20251028_090923/Qwen2.5-VL-3B-Instruct - Jetson AGX Orin 64GB Qwen2.5VL-7B-Instruct-AWQ 2025-10-28 08:49:51 - evalscope - INFO: Test connection successful. 2025-10-28 08:49:51 - evalscope - INFO: Save the data base to: outputs/20251028_084912/Qwen2.5-VL-7B-Instruct-AWQ/benchmark_data.db Processing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 9/10 [05:25<00:36, 36.16s/it]2025-10-28 08:55:54 - evalscope - INFO: { "Time taken for tests (s)": 362.1074, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 31.4132, "Total token throughput (tok/s)": 64.7284, "Request throughput (req/s)": 0.0307, "Average latency (s)": 36.1938, "Average time to first token (s)": 0.8838, "Average time per output token (s)": 0.0345, "Average inter-token latency (s)": 0.0359, "Average input tokens per request": 1086.0, "Average output tokens per request": 1024.0 } Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [06:02<00:00, 36.22s/it] 2025-10-28 08:55:54 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 362.107 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 31.4132 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 64.7284 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0307 | +-----------------------------------+-----------+ | Average latency (s) | 36.1938 | +-----------------------------------+-----------+ | Average time to first token (s) | 0.8838 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0345 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.0359 | +-----------------------------------+-----------+ | Average input tokens per request | 1086 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 08:55:54 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 0.8013 | 0.034 | 0.0345 | 36.0776 | 1090 | 1024 | 28.2728 | 57.5523 | | 25% | 0.8075 | 0.0343 | 0.0345 | 36.0778 | 1090 | 1024 | 28.3045 | 58.3678 | | 50% | 0.8141 | 0.0345 | 0.0345 | 36.1312 | 1090 | 1024 | 28.3511 | 58.509 | | 66% | 0.8146 | 0.0347 | 0.0345 | 36.1381 | 1090 | 1024 | 28.3599 | 58.5296 | | 75% | 0.8195 | 0.0348 | 0.0345 | 36.178 | 1090 | 1024 | 28.3831 | 58.5477 | | 80% | 0.9132 | 0.0349 | 0.0346 | 36.2186 | 1090 | 1024 | 28.3833 | 58.5956 | | 90% | 1.4437 | 0.0352 | 0.0346 | 36.8545 | 1090 | 1024 | 28.4154 | 58.596 | | 95% | 1.4437 | 0.0358 | 0.0346 | 36.8545 | 1090 | 1024 | 28.4154 | 58.596 | | 98% | 1.4437 | 0.0691 | 0.0346 | 36.8545 | 1090 | 1024 | 28.4154 | 58.596 | | 99% | 1.4437 | 0.0694 | 0.0346 | 36.8545 | 1090 | 1024 | 28.4154 | 58.596 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 08:55:54 - evalscope - INFO: Save the summary to: outputs/20251028_084912/Qwen2.5-VL-7B-Instruct-AWQ - Jetson AGX Orin 64GB Qwen2.5VL-3B-Instruct-AWQ 2025-10-28 09:02:24 - evalscope - INFO: Test connection successful. 2025-10-28 09:02:25 - evalscope - INFO: Save the data base to: outputs/20251028_090200/Qwen2.5-VL-3B-Instruct-AWQ/benchmark_data.db Processing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 9/10 [03:10<00:21, 21.15s/it]2025-10-28 09:05:57 - evalscope - INFO: { "Time taken for tests (s)": 211.5724, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 53.7527, "Total token throughput (tok/s)": 110.97, "Request throughput (req/s)": 0.0525, "Average latency (s)": 21.1406, "Average time to first token (s)": 0.5731, "Average time per output token (s)": 0.0201, "Average inter-token latency (s)": 0.0201, "Average input tokens per request": 1090.0, "Average output tokens per request": 1024.0 } Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [03:31<00:00, 21.17s/it] 2025-10-28 09:05:57 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 211.572 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 53.7527 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 110.97 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0525 | +-----------------------------------+-----------+ | Average latency (s) | 21.1406 | +-----------------------------------+-----------+ | Average time to first token (s) | 0.5731 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.0201 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.0201 | +-----------------------------------+-----------+ | Average input tokens per request | 1090 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 09:05:57 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 0.4997 | 0.0197 | 0.0201 | 21.0531 | 1090 | 1024 | 48.2363 | 99.5815 | | 25% | 0.5061 | 0.0199 | 0.0201 | 21.0533 | 1090 | 1024 | 48.4606 | 100.0447 | | 50% | 0.5137 | 0.0201 | 0.0201 | 21.0881 | 1090 | 1024 | 48.5731 | 100.2768 | | 66% | 0.5175 | 0.0202 | 0.0201 | 21.0952 | 1090 | 1024 | 48.6303 | 100.3951 | | 75% | 0.5212 | 0.0203 | 0.0201 | 21.1306 | 1090 | 1024 | 48.6385 | 100.4118 | | 80% | 0.6271 | 0.0204 | 0.0201 | 21.2288 | 1090 | 1024 | 48.6389 | 100.4128 | | 90% | 1.0261 | 0.0206 | 0.0202 | 21.5677 | 1090 | 1024 | 48.6433 | 100.4219 | | 95% | 1.0261 | 0.0207 | 0.0202 | 21.5677 | 1090 | 1024 | 48.6433 | 100.4219 | | 98% | 1.0261 | 0.0209 | 0.0202 | 21.5677 | 1090 | 1024 | 48.6433 | 100.4219 | | 99% | 1.0261 | 0.0211 | 0.0202 | 21.5677 | 1090 | 1024 | 48.6433 | 100.4219 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 09:05:57 - evalscope - INFO: Save the summary to: outputs/20251028_090200/Qwen2.5-VL-3B-Instruct-AWQ - Qwen2.5-VL-32B-Instruct-AWQ 2025-10-28 09:44:55 - evalscope - INFO: Test connection successful. 2025-10-28 09:44:56 - evalscope - INFO: Save the data base to: outputs/20251028_094248/Qwen2.5-VL-32B-Instruct-AWQ/benchmark_data.db Processing: 90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 9/10 [19:20<02:08, 128.92s/it]2025-10-28 10:06:25 - evalscope - INFO: { "Time taken for tests (s)": 1289.0551, "Number of concurrency": 1, "Total requests": 10, "Succeed requests": 10, "Failed requests": 0, "Output token throughput (tok/s)": 8.8265, "Total token throughput (tok/s)": 18.2219, "Request throughput (req/s)": 0.0086, "Average latency (s)": 128.8895, "Average time to first token (s)": 2.9977, "Average time per output token (s)": 0.1231, "Average inter-token latency (s)": 0.1333, "Average input tokens per request": 1090.0, "Average output tokens per request": 1024.0 } Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [21:29<00:00, 128.92s/it] 2025-10-28 10:06:25 - evalscope - INFO: Benchmarking summary: +-----------------------------------+-----------+ | Key | Value | +===================================+===========+ | Time taken for tests (s) | 1289.06 | +-----------------------------------+-----------+ | Number of concurrency | 1 | +-----------------------------------+-----------+ | Total requests | 10 | +-----------------------------------+-----------+ | Succeed requests | 10 | +-----------------------------------+-----------+ | Failed requests | 0 | +-----------------------------------+-----------+ | Output token throughput (tok/s) | 8.8265 | +-----------------------------------+-----------+ | Total token throughput (tok/s) | 18.2219 | +-----------------------------------+-----------+ | Request throughput (req/s) | 0.0086 | +-----------------------------------+-----------+ | Average latency (s) | 128.889 | +-----------------------------------+-----------+ | Average time to first token (s) | 2.9977 | +-----------------------------------+-----------+ | Average time per output token (s) | 0.1231 | +-----------------------------------+-----------+ | Average inter-token latency (s) | 0.1333 | +-----------------------------------+-----------+ | Average input tokens per request | 1090 | +-----------------------------------+-----------+ | Average output tokens per request | 1024 | +-----------------------------------+-----------+ 2025-10-28 10:06:25 - evalscope - INFO: Percentile results: +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | Percentiles | TTFT (s) | ITL (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Output (tok/s) | Total (tok/s) | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ | 10% | 2.9331 | 0.1222 | 0.1229 | 128.8109 | 1090 | 1024 | 7.9424 | 16.3968 | | 25% | 2.9341 | 0.1226 | 0.123 | 128.8433 | 1090 | 1024 | 7.9426 | 16.3971 | | 50% | 2.9441 | 0.1231 | 0.1231 | 128.904 | 1090 | 1024 | 7.9441 | 16.4002 | | 66% | 2.9516 | 0.1234 | 0.1231 | 128.9153 | 1090 | 1024 | 7.946 | 16.4042 | | 75% | 2.9521 | 0.1236 | 0.1231 | 128.9252 | 1090 | 1024 | 7.9476 | 16.4075 | | 80% | 2.9891 | 0.1238 | 0.1231 | 128.9278 | 1090 | 1024 | 7.9496 | 16.4117 | | 90% | 3.4695 | 0.1246 | 0.1231 | 129.1473 | 1090 | 1024 | 7.9595 | 16.4321 | | 95% | 3.4695 | 0.2466 | 0.1231 | 129.1473 | 1090 | 1024 | 7.9595 | 16.4321 | | 98% | 3.4695 | 0.2475 | 0.1231 | 129.1473 | 1090 | 1024 | 7.9595 | 16.4321 | | 99% | 3.4695 | 0.2479 | 0.1231 | 129.1473 | 1090 | 1024 | 7.9595 | 16.4321 | +-------------+----------+---------+----------+-------------+--------------+---------------+----------------+---------------+ 2025-10-28 10:06:25 - evalscope - INFO: Save the summary to: outputs/20251028_094248/Qwen2.5-VL-32B-Instruct-AWQ
dingfeng
2025年10月28日 18:10
63
0 条评论
转发文档
收藏文档
上一篇
下一篇
评论
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
PDF文档
PDF文档(打印)
分享
链接
类型
密码
更新密码