Live Benchmarks

Crossmint Benchmark

Performance results of AI coding models on Crossmint tasks, measuring success rate and execution time with high precision.

View on GitHubTotal tasks: 10Last run: 4/4/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
glm-4.7NEW
7164.1s
70%
#2
gemini-3.1-pro
7166.0s
70%
#3
gpt-5.2-codex
6140.3s
60%