Task

Detailed breakdown of individual task performance across different models.

Task Name (10 tasks)
claude-4-6-sonnet
gemini-3.1-pro
glm-4.7
gpt-5.2-codex
49.6s43.2s184.4s38.8s
52.7s104.4s22.6s45.8s
41.2s39.3s20.4s32.0s
180.7s448.7s217.9s45.3s
61.5s147.4s184.7s96.6s
48.4s45.8s19.9s45.5s
59.4s38.0s28.4s52.4s
178.7s215.2s184.6s206.2s
83.9s88.8s29.7s19.0s
57.4s50.7s42.2s50.3s