Task

Detailed breakdown of individual task performance across different models.

Task Name (10 tasks)
gemini-3.1-pro
glm-4.7
gpt-5.2-codex
90.5s148.2s149.6s
37.1s43.3s33.1s
194.3s144.7s91.6s
86.4s84.9s55.6s
228.1s119.2s123.1s
56.1s36.2s35.2s
64.0s54.6s46.5s
36.2s47.4s46.5s
280.6s253.4s49.5s
230.5s109.0s48.1s