Live Benchmarks

Autumn Benchmark

Performance results of AI coding models on Autumn tasks, measuring success rate and execution time with high precision.

Total tasks: 10

Last run: 4/11/2026

Model Performance

Model	Passed	Avg Duration	Success Rate
#1 glm-4.7NEW	7	378.0s	70%
#2 gemini-3.1-pro	6	243.8s	60%
#3 claude-4-6-sonnet	4	166.6s	40%
#4 gpt-5.2-codex	2	140.9s	20%