Live Benchmarks

Modal Benchmark

Performance results of AI coding models on Modal tasks, measuring success rate and execution time with high precision.

View on GitHubTotal tasks: 11Last run: 4/2/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
gemini-3.1-proNEW
10221.8s
91%
#2
gpt-5.2-codex-with-skills
9139.9s
82%
#3
claude-4-6-sonnet
9207.9s
82%
#4
glm-4.7
8313.2s
73%
#5
gpt-5.2-codex
7178.4s
64%