Current theme: dark
← Back to Leaderboard
Task
Detailed breakdown of individual task performance across different models.
Status
All
Models
All
Task Name (10 tasks)
claude-4-6-sonnet
gemini-3.1-pro
glm-4.7
gpt-5.2-codex
attach_plan_to_customer
83.3s
194.4s
437.1s
97.8s
balance_locking_ai_generation
118.9s
255.1s
524.6s
155.2s
boolean_feature_gate
98.4s
40.1s
408.6s
73.3s
check_feature_access
27.7s
188.6s
230.3s
73.2s
entity_workspace_limits
270.2s
327.9s
356.2s
122.3s
fail_open_resilient_check
72.1s
327.5s
126.8s
24.1s
get_or_create_customer
56.1s
62.1s
85.0s
20.2s
preview_plan_upgrade
48.2s
213.9s
287.9s
98.0s
track_usage_with_idempotency
97.0s
58.2s
245.2s
19.7s
usage_quota_hard_limit
74.0s
74.3s
422.7s
51.2s