Live Benchmarks

Autumn Benchmark

Performance results of AI coding models on Autumn tasks, measuring success rate and execution time with high precision.

View on GitHubTotal tasks: 10Last run: 4/11/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
glm-4.7NEW
7378.0s
70%
#2
gemini-3.1-pro
6243.8s
60%
#3
claude-4-6-sonnet
4166.6s
40%
#4
gpt-5.2-codex
2140.9s
20%