Performance results of AI coding models on Windmill tasks, measuring success rate and execution time with high precision.
| Model | Passed | Avg Duration | Success Rate |
|---|---|---|---|
| #1 gpt-5.2-codexNEW | 30 | 71.0s | 100% |
| #2 glm-4.7 | 30 | 120.4s | 100% |
| #3 gemini-3-flash | 28 | 591.8s | 93% |
| #4 claude-4-6-sonnet | 23 | 119.9s | 77% |