Live Benchmarks

Windmill Benchmark

Performance results of AI coding models on Windmill tasks, measuring success rate and execution time with high precision.

View on GitHubTotal tasks: 30Last run: 5/7/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
gpt-5.2-codexNEW
3071.0s
100%
#2
glm-4.7
30120.4s
100%
#3
gemini-3-flash
28591.8s
93%
#4
claude-4-6-sonnet
23119.9s
77%