Performance results of AI coding models on SkyPilot tasks, measuring success rate and execution time with high precision.
| Model | Passed | Avg Duration | Success Rate |
|---|---|---|---|
| #1 claude-4-6-sonnetNEW | 8 | 137.5s | 80% |
| #2 gemini-3.1-pro | 7 | 178.2s | 70% |
| #3 gpt-5.2-codex | 7 | 110.4s | 70% |
| #4 glm-4.7 | 5 | 148.7s | 50% |