AI Model Security Leaderboard
SOAI Security Index — Independent Model Security Rankings
Models are scored using automated adversarial testing across multiple attack vectors. The SOAI Security Index (SSI) measures resilience to prompt injection, data exfiltration, jailbreaks, and agentic manipulation. Updated weekly.
SOAI Security Index — Overall security posture combining all metrics
Resilience to prompt injection and jailbreak attacks
Protection against PII extraction and training data exposure
Robustness under autonomous multi-step agent scenarios
| # | Model | SSI Score | CASI | Injection | Leakage | AWR | Trend |
|---|---|---|---|---|---|---|---|
| 1 | Claude 4 Sonnet Anthropic | 95.2 | 95.1 | 97.2 | 94.5 | 93.8 | +2.1 |
| 2 | GPT-5 OpenAI | 93.1 | 93.4 | 95.8 | 92.1 | 91.2 | +4.3 |
| 3 | GPT-5 Nano OpenAI | 91.8 | 91.8 | 94.1 | 90.8 | 90.5 | +1.8 |
| 4 | Gemini 2.5 Pro Google | 90.2 | 90.2 | 92.4 | 89.3 | 88.7 | +3.1 |
| 5 | Claude 3.5 Opus Anthropic | 90.2 | 89.6 | 91.8 | 88.2 | 91.3 | +0.5 |
| 6 | Llama 4 Meta | 86.9 | 87.3 | 88.9 | 86.4 | 85.1 | +5.2 |
| 7 | DeepSeek V3 DeepSeek | 84.8 | 85.1 | 86.7 | 84.9 | 82.4 | +2.7 |
| 8 | Mistral Large 3 Mistral | 83.6 | 84.2 | 85.3 | 83.1 | 81.8 | +1.4 |
| 9 | Kimi K2 Moonshot | 81.8 | 82.7 | 83.9 | 81.2 | 79.5 | +3.9 |
| 10 | Qwen 3.5 Alibaba | 79.9 | 80.4 | 82.1 | 79.5 | 77.8 | +2.1 |
| 11 | GPT-4 Turbo OpenAI | 78.4 | 78.9 | 80.5 | 77.8 | 76.2 | -1.2 |
| 12 | Command R+ Cohere | 75.9 | 76.3 | 78.1 | 75.4 | 73.9 | +0.8 |
Attack Vectors Tested
Direct Prompt Injection
Adversarial prompts designed to override system instructions
Indirect Prompt Injection
Hidden instructions in external data sources like documents or web pages
Jailbreak Attacks
Techniques to bypass safety filters and content policies
FlipAttack (Homoglyph)
Unicode homoglyph substitution to evade text-based filters
Multi-Turn Manipulation
Gradual context manipulation across conversation turns
Data Exfiltration
Attempts to extract training data, PII, or system prompts
Tool Misuse
Coercing models into misusing available tools or APIs
Agentic Workflow Exploit
Multi-step autonomous agent manipulation scenarios
Scoring Methodology
Automated Red-Teaming
Each model is subjected to thousands of automated adversarial attacks across all vector categories. Tests are regenerated weekly to prevent overfitting to known patterns.
Multi-Dimensional Scoring
Scores reflect both direct attack resistance and behavioral analysis under stress. The SSI composite weighs injection resistance (30%), data leakage (25%), agentic resistance (25%), and general safety alignment (20%).
Independent Evaluation
All tests are run independently by SOAI infrastructure. No model provider has input into test design, scoring, or rankings. Results are reproducible and auditable.