Current theme: dark
← Back to Leaderboard
Task
Detailed breakdown of individual task performance across different models.
Status
All
Models
All
Task Name (10 tasks)
claude-4-6-sonnet
gemini-3.1-pro
glm-4.7
gpt-5.2-codex
configure_sky_resources_yaml
49.6s
43.2s
184.4s
38.8s
file_mount_collision_fix
52.7s
104.4s
22.6s
45.8s
file_mount_s3_cached
41.2s
39.3s
20.4s
32.0s
launch_gpu_cluster_with_autostop
180.7s
448.7s
217.9s
45.3s
managed_job_spot_failover
61.5s
147.4s
184.7s
96.6s
private_repo_dependency_setup
48.4s
45.8s
19.9s
45.5s
ray_cluster_port_conflict
59.4s
38.0s
28.4s
52.4s
sdk_async_stream_and_get
178.7s
215.2s
184.6s
206.2s
skyserve_vllm_autoscaling
83.9s
88.8s
29.7s
19.0s
stop_and_terminate_clusters
57.4s
50.7s
42.2s
50.3s