🏆

AI Model Security Leaderboard

SOAI Security Index — Independent Model Security Rankings

Models are scored using automated adversarial testing across multiple attack vectors. The SOAI Security Index (SSI) measures resilience to prompt injection, data exfiltration, jailbreaks, and agentic manipulation. Updated weekly.

12
Models Tested
8
Attack Vectors
12,943
Total Tests Run
Feb 12, 2026
Last Updated
SSI Score

SOAI Security Index — Overall security posture combining all metrics

Injection Resistance

Resilience to prompt injection and jailbreak attacks

Data Leakage

Protection against PII extraction and training data exposure

Agentic Resistance

Robustness under autonomous multi-step agent scenarios

#ModelSSI ScoreCASIInjectionLeakageAWRTrend
1
Claude 4 Sonnet
Anthropic
95.295.197.294.593.8+2.1
2
GPT-5
OpenAI
93.193.495.892.191.2+4.3
3
GPT-5 Nano
OpenAI
91.891.894.190.890.5+1.8
4
Gemini 2.5 Pro
Google
90.290.292.489.388.7+3.1
5
Claude 3.5 Opus
Anthropic
90.289.691.888.291.3+0.5
6
Llama 4
Meta
86.987.388.986.485.1+5.2
7
DeepSeek V3
DeepSeek
84.885.186.784.982.4+2.7
8
Mistral Large 3
Mistral
83.684.285.383.181.8+1.4
9
Kimi K2
Moonshot
81.882.783.981.279.5+3.9
10
Qwen 3.5
Alibaba
79.980.482.179.577.8+2.1
11
GPT-4 Turbo
OpenAI
78.478.980.577.876.2-1.2
12
Command R+
Cohere
75.976.378.175.473.9+0.8

Attack Vectors Tested

Direct Prompt Injection

Adversarial prompts designed to override system instructions

2,847
tests executed

Indirect Prompt Injection

Hidden instructions in external data sources like documents or web pages

1,923
tests executed

Jailbreak Attacks

Techniques to bypass safety filters and content policies

3,156
tests executed

FlipAttack (Homoglyph)

Unicode homoglyph substitution to evade text-based filters

892
tests executed

Multi-Turn Manipulation

Gradual context manipulation across conversation turns

1,247
tests executed

Data Exfiltration

Attempts to extract training data, PII, or system prompts

1,583
tests executed

Tool Misuse

Coercing models into misusing available tools or APIs

734
tests executed

Agentic Workflow Exploit

Multi-step autonomous agent manipulation scenarios

561
tests executed

Scoring Methodology

Automated Red-Teaming

Each model is subjected to thousands of automated adversarial attacks across all vector categories. Tests are regenerated weekly to prevent overfitting to known patterns.

Multi-Dimensional Scoring

Scores reflect both direct attack resistance and behavioral analysis under stress. The SSI composite weighs injection resistance (30%), data leakage (25%), agentic resistance (25%), and general safety alignment (20%).

Independent Evaluation

All tests are run independently by SOAI infrastructure. No model provider has input into test design, scoring, or rankings. Results are reproducible and auditable.