Performance

Aggregated metrics across all your optimizations

Optimizations
47
59.6% success rate
Total Spend
$1.7433
443,874 tokens
Avg Quality
8.26
out of 10
Models Used
15
13 failed runs

Quality Trend (30 days)

Daily Cost (30 days)

Model Performance

Avg Quality Score
Avg Cost per Call ($)
# Model Runs Success Failed Avg Quality Best Worst Avg Cost Total Cost Avg Tokens Avg Latency
1
claude-opus-4.6
anthropic
8 4 (50%) 4 8.98 9.3 8.5 $0.009264 $0.0371 405 9,419ms
2
qwen3.6-plus
qwen
60 60 (100%) 0 8.73 9.5 6.2 $0.004469 $0.2681 1,538 34,858ms
3
claude-sonnet-4.6
anthropic
44 34 (77%) 10 8.67 9.5 6.5 $0.010701 $0.3638 801 20,579ms
4
minimax-m2.7
minimax
24 24 (100%) 0 8.60 9.4 7.5 $0.003838 $0.0921 2,031 37,134ms
5
gpt-oss-120b
openai
48 47 (98%) 1 8.60 9.5 7.5 $0.000393 $0.0185 1,264 19,155ms
6
claude-sonnet-4
anthropic
39 34 (87%) 5 8.43 9.5 6.0 $0.014988 $0.5096 1,087 23,860ms
7
mimo-v2-pro
xiaomi
56 56 (100%) 0 8.38 9.5 6.5 $0.003597 $0.2014 936 12,229ms
8
gemini-3.1-pro-preview
google
4 4 (100%) 0 8.25 9.5 7.0 $0.000000 $0.0000 980 20,091ms
9
gemini-3-flash-preview
google
4 4 (100%) 0 8.00 9.5 6.0 $0.000000 $0.0000 192 3,481ms
10
gemma-4-31b-it
google
8 7 (88%) 1 7.87 9.3 5.5 $0.007089 $0.0496 296 22,048ms
11
gemma-3-27b-it
google
135 132 (98%) 3 7.82 9.0 5.5 $0.000086 $0.0114 615 13,773ms
12
gpt-4o
openai
28 28 (100%) 0 7.58 8.5 7.0 $0.006523 $0.1826 716 7,912ms
13
hermes-4-70b
nousresearch
4 4 (100%) 0 7.50 8.0 7.0 $0.000000 $0.0000 298 3,837ms
14
deepseek-v3.2
deepseek
4 3 (75%) 1 7.23 8.2 6.5 $0.000430 $0.0013 732 18,639ms
15
gpt-4o-mini
openai
14 13 (93%) 1 - - - $0.000595 $0.0077 1,070 13,824ms

Prompt Generation Templates 5

Template Usage Avg Quality Avg Tokens Avg Cost Total Cost Avg Latency
Default Prompt Engineering - 394 $0.004173 $0.0000 0ms
Constraint Explicit - 348 $0.000443 $0.0000 0ms
Contoh Pattern + Constraint Explicit 18× - 1,738 $0.002900 $0.0015 6,611ms
Prompt Engineering Human-understandable - 357 $0.004672 $0.0000 0ms
Contoh Pattern - 699 $0.007329 $0.0000 0ms

Evaluation Templates 1

Template Usage Avg Quality Avg Tokens Avg Cost Total Cost Avg Latency
Default Evaluation 25× - 7,933 $0.026246 $0.0131 4,171ms

Prediction Template Accuracy Ranking

Ranked by average rank-match accuracy vs actual optimization scores

# Template Runs Avg Accuracy Best Run Worst Run Total Rank Match
🥇 Specificity and Outcome Predictability Judge 2 62.5% 100.0% 25.0% 5 / 8 correct
🥈 Clarity and Communicative Effectiveness Judge 4 56.3% 100.0% 25.0% 9 / 16 correct
🥉 Structural Completeness Judge 1 25.0% 25.0% 25.0% 1 / 4 correct

Performance by Optimization Goal

Goal Optimizations Completed Avg Quality Total Cost Avg Cost/Run
Balanced 41 26 (63%) 8.26 $1.4861 $0.003753
Quality maximal 6 2 (33%) 8.25 $0.2572 $0.004435