|
# Vulnerability Classification Framework |
|
|
|
This document provides a standardized system for classifying vulnerabilities identified during LLM security testing. This classification framework enables consistent categorization, facilitates trend analysis, and supports effective remediation prioritization. |
|
|
|
## Classification Dimensions |
|
|
|
Vulnerabilities are classified across multiple dimensions to capture their full nature and impact. |
|
|
|
### 1. Vulnerability Class |
|
|
|
The primary categorization based on the fundamental mechanism of the vulnerability. |
|
|
|
#### Primary Classes |
|
|
|
- **PJV**: Prompt Injection Vulnerabilities |
|
- **BEF**: Boundary Enforcement Failures |
|
- **IEV**: Information Extraction Vulnerabilities |
|
- **CET**: Classifier Evasion Techniques |
|
- **MVV**: Multimodal Vulnerability Vectors |
|
- **TUV**: Tool Use Vulnerabilities |
|
- **ACF**: Authentication Control Failures |
|
- **RSV**: Response Synthesis Vulnerabilities |
|
|
|
### 2. Subclass |
|
|
|
Specific subcategory within the primary vulnerability class. |
|
|
|
#### Example Subclasses (for PJV - Prompt Injection Vulnerabilities) |
|
|
|
- **PJV-DIR**: Direct Instruction Injection |
|
- **PJV-IND**: Indirect Instruction Manipulation |
|
- **PJV-CRX**: Cross-Context Injection |
|
|
|
#### Example Subclasses (for BEF - Boundary Enforcement Failures) |
|
|
|
- **BEF-CPC**: Content Policy Circumvention |
|
- **BEF-CRB**: Capability Restriction Bypass |
|
- **BEF-ABV**: Authorization Boundary Violations |
|
|
|
#### Example Subclasses (for IEV - Information Extraction Vulnerabilities) |
|
|
|
- **IEV-TDE**: Training Data Extraction |
|
- **IEV-SIL**: System Instruction Leakage |
|
- **IEV-PAI**: Parameter Inference |
|
|
|
#### Example Subclasses (for CET - Classifier Evasion Techniques) |
|
|
|
- **CET-LOB**: Linguistic Obfuscation |
|
- **CET-CTM**: Context Manipulation |
|
- **CET-TBM**: Technical Bypass Methods |
|
|
|
#### Example Subclasses (for MVV - Multimodal Vulnerability Vectors) |
|
|
|
- **MVV-CMI**: Cross-Modal Injection |
|
- **MVV-MIC**: Modal Interpretation Conflicts |
|
- **MVV-MTV**: Modal Translation Vulnerabilities |
|
|
|
#### Example Subclasses (for TUV - Tool Use Vulnerabilities) |
|
|
|
- **TUV-TSM**: Tool Selection Manipulation |
|
- **TUV-PAI**: Parameter Injection |
|
- **TUV-FCH**: Function Call Hijacking |
|
|
|
#### Example Subclasses (for ACF - Authentication Control Failures) |
|
|
|
- **ACF-ICE**: Identity Confusion Exploitation |
|
- **ACF-PIE**: Permission Inheritance Exploitation |
|
- **ACF-SBV**: Session Boundary Violations |
|
|
|
#### Example Subclasses (for RSV - Response Synthesis Vulnerabilities) |
|
|
|
- **RSV-MET**: Metadata Manipulation |
|
- **RSV-CMH**: Content Moderation Hallucination |
|
- **RSV-USP**: Unsafe Synthesis Patterns |
|
|
|
### 3. Attack Vector |
|
|
|
The primary method or channel through which the vulnerability is exploited. |
|
|
|
#### Categories |
|
|
|
- **TXT**: Text-Based |
|
- **IMG**: Image-Based |
|
- **AUD**: Audio-Based |
|
- **COD**: Code-Based |
|
- **DOC**: Document-Based |
|
- **MUL**: Multi-Vector |
|
- **API**: API-Based |
|
- **TOL**: Tool-Based |
|
|
|
### 4. Impact Type |
|
|
|
The primary negative impact resulting from successful exploitation. |
|
|
|
#### Categories |
|
|
|
- **DIS**: Disclosure of Sensitive Information |
|
- **POL**: Policy Violation |
|
- **BYP**: Security Bypass |
|
- **MAN**: System Manipulation |
|
- **ACC**: Unauthorized Access |
|
- **DEG**: Service Degradation |
|
- **HAL**: Harmful Output Generation |
|
- **PRV**: Privacy Violation |
|
|
|
### 5. Exploitation Complexity |
|
|
|
The level of technical expertise required to successfully exploit the vulnerability. |
|
|
|
#### Categories |
|
|
|
- **ECL**: Low (simple, requires minimal expertise) |
|
- **ECM**: Medium (moderate complexity, requires some domain knowledge) |
|
- **ECH**: High (complex, requires specialized knowledge) |
|
- **ECX**: Very High (sophisticated, requires expert-level understanding) |
|
|
|
### 6. Remediation Complexity |
|
|
|
The estimated complexity of implementing an effective remediation. |
|
|
|
#### Categories |
|
|
|
- **RCL**: Low (simple fix, localized change) |
|
- **RCM**: Medium (moderate complexity, potential side effects) |
|
- **RCH**: High (complex, requires significant architectural changes) |
|
- **RCX**: Very High (extremely difficult, may require fundamental redesign) |
|
|
|
### 7. Discovery Method |
|
|
|
How the vulnerability was discovered. |
|
|
|
#### Categories |
|
|
|
- **AUT**: Automated Testing |
|
- **MAN**: Manual Testing |
|
- **HYB**: Hybrid Approach |
|
- **USR**: User Report |
|
- **RES**: Research Finding |
|
- **ANA**: Log Analysis |
|
- **INC**: Incident Response |
|
|
|
### 8. Status |
|
|
|
The current state of the vulnerability. |
|
|
|
#### Categories |
|
|
|
- **NEW**: Newly Identified |
|
- **CNF**: Confirmed |
|
- **REJ**: Rejected (not a valid vulnerability) |
|
- **MIT**: Mitigated (temporary solution) |
|
- **FIX**: Fixed (permanent solution) |
|
- **DUP**: Duplicate of existing vulnerability |
|
- **DEF**: Deferred (not prioritized for immediate fix) |
|
|
|
## Composite Classification |
|
|
|
Vulnerabilities are assigned a composite classification code combining the above dimensions: |
|
|
|
``` |
|
[Vulnerability Class]-[Subclass]:[Attack Vector]/[Impact Type]-[Exploitation Complexity][Remediation Complexity]-[Discovery Method].[Status] |
|
``` |
|
|
|
### Example Classifications |
|
|
|
- `PJV-DIR:TXT/POL-ECL-RCM-MAN.CNF`: A confirmed direct prompt injection vulnerability, text-based, leading to policy violations, low exploitation complexity, medium remediation complexity, discovered through manual testing. |
|
|
|
- `IEV-SIL:COD/DIS-ECM-RCH-AUT.NEW`: A newly identified system instruction leakage vulnerability, code-based, leading to disclosure of sensitive information, medium exploitation complexity, high remediation complexity, discovered through automated testing. |
|
|
|
- `MVV-CMI:IMG/BYP-ECH-RCM-HYB.MIT`: A mitigated cross-modal injection vulnerability, image-based, leading to security bypass, high exploitation complexity, medium remediation complexity, discovered through a hybrid testing approach. |
|
|
|
## Classification Workflow |
|
|
|
### 1. Initial Classification |
|
|
|
When a potential vulnerability is first identified: |
|
|
|
1. Assign primary vulnerability class and subclass |
|
2. Document attack vector and impact type |
|
3. Note discovery method |
|
4. Set status to `NEW` |
|
5. Estimation of exploitation complexity may be preliminary |
|
|
|
### 2. Verification |
|
|
|
During the verification phase: |
|
|
|
1. Confirm vulnerability through reproduction |
|
2. Refine classification based on deeper understanding |
|
3. Update exploitation complexity based on reproduction experience |
|
4. Change status to `CNF` or `REJ` |
|
|
|
### 3. Analysis |
|
|
|
During detailed analysis: |
|
|
|
1. Assess remediation complexity |
|
2. Document dependencies and affected components |
|
3. Update classification with complete understanding |
|
4. Link to related vulnerabilities if applicable |
|
|
|
### 4. Remediation Tracking |
|
|
|
During the remediation process: |
|
|
|
1. Update status as appropriate |
|
2. Document mitigation or fix approaches |
|
3. Link to verification testing results |
|
|
|
## Taxonomic Evolution |
|
|
|
This classification system is designed to evolve over time as new vulnerability classes emerge. The process for extending the taxonomy includes: |
|
|
|
1. **Identification**: Recognition of a new vulnerability pattern that doesn't fit existing classes |
|
2. **Definition**: Clear description of the new vulnerability class or subclass |
|
3. **Consultation**: Review with security experts to validate the new category |
|
4. **Integration**: Addition to the formal taxonomy with appropriate documentation |
|
5. **Retroactive Analysis**: Review of existing vulnerabilities to identify any that should be reclassified |
|
|
|
## Usage Guidelines |
|
|
|
### For Testers |
|
|
|
- Assign preliminary classifications during testing |
|
- Document all observed behaviors clearly to enable accurate classification |
|
- Highlight unusual patterns that may indicate new vulnerability classes |
|
|
|
### For Security Analysts |
|
|
|
- Verify and refine classifications |
|
- Ensure consistency across similar vulnerabilities |
|
- Identify patterns and trends within vulnerability classes |
|
|
|
### For Developers |
|
|
|
- Use classification to understand vulnerability mechanisms |
|
- Reference similar vulnerabilities by class to inform remediation approaches |
|
- Track remediation effectiveness by vulnerability class |
|
|
|
## Reporting Standards |
|
|
|
All vulnerability reports should include: |
|
|
|
1. Full classification code |
|
2. Detailed description of the vulnerability |
|
3. Reproduction steps |
|
4. Example exploitation (and its success rate) |
|
5. Potential impact analysis |
|
6. Suggested remediation approaches |
|
|
|
## Conclusion |
|
|
|
This classification framework provides a standardized approach to categorizing LLM security vulnerabilities. By applying this framework consistently, the security community can develop a shared understanding of vulnerability patterns, track trends over time, and develop more effective remediation strategies. |
|
|
|
For examples of classified vulnerabilities, refer to the [vulnerability catalog](../research/vulnerabilities/catalog.md). |
|
|