Optimizations
47
59.6% success rate
Total Spend
$1.7433
443,874 tokens
Avg Quality
8.26
out of 10
Models Used
15
13 failed runs
Quality Trend (30 days)
Daily Cost (30 days)
Model Performance
Avg Quality Score
Avg Cost per Call ($)
| # | Model | Runs | Success | Failed | Avg Quality | Best | Worst | Avg Cost | Total Cost | Avg Tokens | Avg Latency |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
claude-opus-4.6
anthropic
|
8 | 4 (50%) | 4 | 8.98 | 9.3 | 8.5 | $0.009264 | $0.0371 | 405 | 9,419ms |
| 2 |
qwen3.6-plus
qwen
|
60 | 60 (100%) | 0 | 8.73 | 9.5 | 6.2 | $0.004469 | $0.2681 | 1,538 | 34,858ms |
| 3 |
claude-sonnet-4.6
anthropic
|
44 | 34 (77%) | 10 | 8.67 | 9.5 | 6.5 | $0.010701 | $0.3638 | 801 | 20,579ms |
| 4 |
minimax-m2.7
minimax
|
24 | 24 (100%) | 0 | 8.60 | 9.4 | 7.5 | $0.003838 | $0.0921 | 2,031 | 37,134ms |
| 5 |
gpt-oss-120b
openai
|
48 | 47 (98%) | 1 | 8.60 | 9.5 | 7.5 | $0.000393 | $0.0185 | 1,264 | 19,155ms |
| 6 |
claude-sonnet-4
anthropic
|
39 | 34 (87%) | 5 | 8.43 | 9.5 | 6.0 | $0.014988 | $0.5096 | 1,087 | 23,860ms |
| 7 |
mimo-v2-pro
xiaomi
|
56 | 56 (100%) | 0 | 8.38 | 9.5 | 6.5 | $0.003597 | $0.2014 | 936 | 12,229ms |
| 8 |
gemini-3.1-pro-preview
google
|
4 | 4 (100%) | 0 | 8.25 | 9.5 | 7.0 | $0.000000 | $0.0000 | 980 | 20,091ms |
| 9 |
gemini-3-flash-preview
google
|
4 | 4 (100%) | 0 | 8.00 | 9.5 | 6.0 | $0.000000 | $0.0000 | 192 | 3,481ms |
| 10 |
gemma-4-31b-it
google
|
8 | 7 (88%) | 1 | 7.87 | 9.3 | 5.5 | $0.007089 | $0.0496 | 296 | 22,048ms |
| 11 |
gemma-3-27b-it
google
|
135 | 132 (98%) | 3 | 7.82 | 9.0 | 5.5 | $0.000086 | $0.0114 | 615 | 13,773ms |
| 12 |
gpt-4o
openai
|
28 | 28 (100%) | 0 | 7.58 | 8.5 | 7.0 | $0.006523 | $0.1826 | 716 | 7,912ms |
| 13 |
hermes-4-70b
nousresearch
|
4 | 4 (100%) | 0 | 7.50 | 8.0 | 7.0 | $0.000000 | $0.0000 | 298 | 3,837ms |
| 14 |
deepseek-v3.2
deepseek
|
4 | 3 (75%) | 1 | 7.23 | 8.2 | 6.5 | $0.000430 | $0.0013 | 732 | 18,639ms |
| 15 |
gpt-4o-mini
openai
|
14 | 13 (93%) | 1 | - | - | - | $0.000595 | $0.0077 | 1,070 | 13,824ms |
Prompt Generation Templates 5
| Template | Usage | Avg Quality | Avg Tokens | Avg Cost | Total Cost | Avg Latency |
|---|---|---|---|---|---|---|
| Default Prompt Engineering | 3× | - | 394 | $0.004173 | $0.0000 | 0ms |
| Constraint Explicit | 2× | - | 348 | $0.000443 | $0.0000 | 0ms |
| Contoh Pattern + Constraint Explicit | 18× | - | 1,738 | $0.002900 | $0.0015 | 6,611ms |
| Prompt Engineering Human-understandable | 8× | - | 357 | $0.004672 | $0.0000 | 0ms |
| Contoh Pattern | 1× | - | 699 | $0.007329 | $0.0000 | 0ms |
Evaluation Templates 1
| Template | Usage | Avg Quality | Avg Tokens | Avg Cost | Total Cost | Avg Latency |
|---|---|---|---|---|---|---|
| Default Evaluation | 25× | - | 7,933 | $0.026246 | $0.0131 | 4,171ms |
Prediction Template Accuracy Ranking
Ranked by average rank-match accuracy vs actual optimization scores
| # | Template | Runs | Avg Accuracy | Best Run | Worst Run | Total Rank Match |
|---|---|---|---|---|---|---|
| 🥇 | Specificity and Outcome Predictability Judge | 2 | 62.5% | 100.0% | 25.0% | 5 / 8 correct |
| 🥈 | Clarity and Communicative Effectiveness Judge | 4 | 56.3% | 100.0% | 25.0% | 9 / 16 correct |
| 🥉 | Structural Completeness Judge | 1 | 25.0% | 25.0% | 25.0% | 1 / 4 correct |
Performance by Optimization Goal
| Goal | Optimizations | Completed | Avg Quality | Total Cost | Avg Cost/Run |
|---|---|---|---|---|---|
| Balanced | 41 | 26 (63%) | 8.26 | $1.4861 | $0.003753 |
| Quality maximal | 6 | 2 (33%) | 8.25 | $0.2572 | $0.004435 |