🏆

AI Model Security Leaderboard

SOAI Security Index — Independent Model Security Rankings

Models are scored using automated adversarial testing across multiple attack vectors. The SOAI Security Index (SSI) measures resilience to prompt injection, data exfiltration, jailbreaks, and agentic manipulation. Updated weekly.

Models Tested

Attack Vectors

12,943

Total Tests Run

Feb 12, 2026

Last Updated

SSI Score

SOAI Security Index — Overall security posture combining all metrics

Injection Resistance

Resilience to prompt injection and jailbreak attacks

Data Leakage

Protection against PII extraction and training data exposure

Agentic Resistance

Robustness under autonomous multi-step agent scenarios

#	Model	SSI Score	CASI	Injection	Leakage	AWR	Trend
1	Claude 4 Sonnet Anthropic	95.2	95.1	97.2	94.5	93.8	+2.1
2	GPT-5 OpenAI	93.1	93.4	95.8	92.1	91.2	+4.3
3	GPT-5 Nano OpenAI	91.8	91.8	94.1	90.8	90.5	+1.8
4	Gemini 2.5 Pro Google	90.2	90.2	92.4	89.3	88.7	+3.1
5	Claude 3.5 Opus Anthropic	90.2	89.6	91.8	88.2	91.3	+0.5
6	Llama 4 Meta	86.9	87.3	88.9	86.4	85.1	+5.2
7	DeepSeek V3 DeepSeek	84.8	85.1	86.7	84.9	82.4	+2.7
8	Mistral Large 3 Mistral	83.6	84.2	85.3	83.1	81.8	+1.4
9	Kimi K2 Moonshot	81.8	82.7	83.9	81.2	79.5	+3.9
10	Qwen 3.5 Alibaba	79.9	80.4	82.1	79.5	77.8	+2.1
11	GPT-4 Turbo OpenAI	78.4	78.9	80.5	77.8	76.2	-1.2
12	Command R+ Cohere	75.9	76.3	78.1	75.4	73.9	+0.8

Attack Vectors Tested

Direct Prompt Injection

Adversarial prompts designed to override system instructions

2,847

tests executed

Indirect Prompt Injection

Hidden instructions in external data sources like documents or web pages

1,923

tests executed

Jailbreak Attacks

Techniques to bypass safety filters and content policies

3,156

tests executed

FlipAttack (Homoglyph)

Unicode homoglyph substitution to evade text-based filters

892

tests executed

Multi-Turn Manipulation

Gradual context manipulation across conversation turns

1,247

tests executed

Data Exfiltration

Attempts to extract training data, PII, or system prompts

1,583

tests executed

Tool Misuse

Coercing models into misusing available tools or APIs

734

tests executed

Agentic Workflow Exploit

Multi-step autonomous agent manipulation scenarios

561

tests executed

Scoring Methodology

Automated Red-Teaming

Each model is subjected to thousands of automated adversarial attacks across all vector categories. Tests are regenerated weekly to prevent overfitting to known patterns.

Multi-Dimensional Scoring

Scores reflect both direct attack resistance and behavioral analysis under stress. The SSI composite weighs injection resistance (30%), data leakage (25%), agentic resistance (25%), and general safety alignment (20%).

Independent Evaluation

All tests are run independently by SOAI infrastructure. No model provider has input into test design, scoring, or rankings. Results are reproducible and auditable.