Task

Detailed breakdown of individual task performance across different models.

Task Name (10 tasks)
claude-4-6-sonnet
gemini-3.1-pro
glm-4.7
gpt-5.2-codex
83.3s194.4s437.1s97.8s
118.9s255.1s524.6s155.2s
98.4s40.1s408.6s73.3s
27.7s188.6s230.3s73.2s
270.2s327.9s356.2s122.3s
72.1s327.5s126.8s24.1s
56.1s62.1s85.0s20.2s
48.2s213.9s287.9s98.0s
97.0s58.2s245.2s19.7s
74.0s74.3s422.7s51.2s