Live Benchmarks

Modal Benchmark

Performance results of AI coding models on Modal tasks, measuring success rate and execution time with high precision.

Total tasks: 11

Last run: 4/2/2026

Model Performance

Model	Passed	Avg Duration	Success Rate
#1 gemini-3.1-proNEW	10	221.8s	91%
#2 gpt-5.2-codex-with-skills	9	139.9s	82%
#3 claude-4-6-sonnet	9	207.9s	82%
#4 glm-4.7	8	313.2s	73%
#5 gpt-5.2-codex	7	178.4s	64%