AISecForge / classification.md
recursivelabs's picture
Upload 47 files
702c6d7 verified
# Vulnerability Classification Framework
This document provides a standardized system for classifying vulnerabilities identified during LLM security testing. This classification framework enables consistent categorization, facilitates trend analysis, and supports effective remediation prioritization.
## Classification Dimensions
Vulnerabilities are classified across multiple dimensions to capture their full nature and impact.
### 1. Vulnerability Class
The primary categorization based on the fundamental mechanism of the vulnerability.
#### Primary Classes
- **PJV**: Prompt Injection Vulnerabilities
- **BEF**: Boundary Enforcement Failures
- **IEV**: Information Extraction Vulnerabilities
- **CET**: Classifier Evasion Techniques
- **MVV**: Multimodal Vulnerability Vectors
- **TUV**: Tool Use Vulnerabilities
- **ACF**: Authentication Control Failures
- **RSV**: Response Synthesis Vulnerabilities
### 2. Subclass
Specific subcategory within the primary vulnerability class.
#### Example Subclasses (for PJV - Prompt Injection Vulnerabilities)
- **PJV-DIR**: Direct Instruction Injection
- **PJV-IND**: Indirect Instruction Manipulation
- **PJV-CRX**: Cross-Context Injection
#### Example Subclasses (for BEF - Boundary Enforcement Failures)
- **BEF-CPC**: Content Policy Circumvention
- **BEF-CRB**: Capability Restriction Bypass
- **BEF-ABV**: Authorization Boundary Violations
#### Example Subclasses (for IEV - Information Extraction Vulnerabilities)
- **IEV-TDE**: Training Data Extraction
- **IEV-SIL**: System Instruction Leakage
- **IEV-PAI**: Parameter Inference
#### Example Subclasses (for CET - Classifier Evasion Techniques)
- **CET-LOB**: Linguistic Obfuscation
- **CET-CTM**: Context Manipulation
- **CET-TBM**: Technical Bypass Methods
#### Example Subclasses (for MVV - Multimodal Vulnerability Vectors)
- **MVV-CMI**: Cross-Modal Injection
- **MVV-MIC**: Modal Interpretation Conflicts
- **MVV-MTV**: Modal Translation Vulnerabilities
#### Example Subclasses (for TUV - Tool Use Vulnerabilities)
- **TUV-TSM**: Tool Selection Manipulation
- **TUV-PAI**: Parameter Injection
- **TUV-FCH**: Function Call Hijacking
#### Example Subclasses (for ACF - Authentication Control Failures)
- **ACF-ICE**: Identity Confusion Exploitation
- **ACF-PIE**: Permission Inheritance Exploitation
- **ACF-SBV**: Session Boundary Violations
#### Example Subclasses (for RSV - Response Synthesis Vulnerabilities)
- **RSV-MET**: Metadata Manipulation
- **RSV-CMH**: Content Moderation Hallucination
- **RSV-USP**: Unsafe Synthesis Patterns
### 3. Attack Vector
The primary method or channel through which the vulnerability is exploited.
#### Categories
- **TXT**: Text-Based
- **IMG**: Image-Based
- **AUD**: Audio-Based
- **COD**: Code-Based
- **DOC**: Document-Based
- **MUL**: Multi-Vector
- **API**: API-Based
- **TOL**: Tool-Based
### 4. Impact Type
The primary negative impact resulting from successful exploitation.
#### Categories
- **DIS**: Disclosure of Sensitive Information
- **POL**: Policy Violation
- **BYP**: Security Bypass
- **MAN**: System Manipulation
- **ACC**: Unauthorized Access
- **DEG**: Service Degradation
- **HAL**: Harmful Output Generation
- **PRV**: Privacy Violation
### 5. Exploitation Complexity
The level of technical expertise required to successfully exploit the vulnerability.
#### Categories
- **ECL**: Low (simple, requires minimal expertise)
- **ECM**: Medium (moderate complexity, requires some domain knowledge)
- **ECH**: High (complex, requires specialized knowledge)
- **ECX**: Very High (sophisticated, requires expert-level understanding)
### 6. Remediation Complexity
The estimated complexity of implementing an effective remediation.
#### Categories
- **RCL**: Low (simple fix, localized change)
- **RCM**: Medium (moderate complexity, potential side effects)
- **RCH**: High (complex, requires significant architectural changes)
- **RCX**: Very High (extremely difficult, may require fundamental redesign)
### 7. Discovery Method
How the vulnerability was discovered.
#### Categories
- **AUT**: Automated Testing
- **MAN**: Manual Testing
- **HYB**: Hybrid Approach
- **USR**: User Report
- **RES**: Research Finding
- **ANA**: Log Analysis
- **INC**: Incident Response
### 8. Status
The current state of the vulnerability.
#### Categories
- **NEW**: Newly Identified
- **CNF**: Confirmed
- **REJ**: Rejected (not a valid vulnerability)
- **MIT**: Mitigated (temporary solution)
- **FIX**: Fixed (permanent solution)
- **DUP**: Duplicate of existing vulnerability
- **DEF**: Deferred (not prioritized for immediate fix)
## Composite Classification
Vulnerabilities are assigned a composite classification code combining the above dimensions:
```
[Vulnerability Class]-[Subclass]:[Attack Vector]/[Impact Type]-[Exploitation Complexity][Remediation Complexity]-[Discovery Method].[Status]
```
### Example Classifications
- `PJV-DIR:TXT/POL-ECL-RCM-MAN.CNF`: A confirmed direct prompt injection vulnerability, text-based, leading to policy violations, low exploitation complexity, medium remediation complexity, discovered through manual testing.
- `IEV-SIL:COD/DIS-ECM-RCH-AUT.NEW`: A newly identified system instruction leakage vulnerability, code-based, leading to disclosure of sensitive information, medium exploitation complexity, high remediation complexity, discovered through automated testing.
- `MVV-CMI:IMG/BYP-ECH-RCM-HYB.MIT`: A mitigated cross-modal injection vulnerability, image-based, leading to security bypass, high exploitation complexity, medium remediation complexity, discovered through a hybrid testing approach.
## Classification Workflow
### 1. Initial Classification
When a potential vulnerability is first identified:
1. Assign primary vulnerability class and subclass
2. Document attack vector and impact type
3. Note discovery method
4. Set status to `NEW`
5. Estimation of exploitation complexity may be preliminary
### 2. Verification
During the verification phase:
1. Confirm vulnerability through reproduction
2. Refine classification based on deeper understanding
3. Update exploitation complexity based on reproduction experience
4. Change status to `CNF` or `REJ`
### 3. Analysis
During detailed analysis:
1. Assess remediation complexity
2. Document dependencies and affected components
3. Update classification with complete understanding
4. Link to related vulnerabilities if applicable
### 4. Remediation Tracking
During the remediation process:
1. Update status as appropriate
2. Document mitigation or fix approaches
3. Link to verification testing results
## Taxonomic Evolution
This classification system is designed to evolve over time as new vulnerability classes emerge. The process for extending the taxonomy includes:
1. **Identification**: Recognition of a new vulnerability pattern that doesn't fit existing classes
2. **Definition**: Clear description of the new vulnerability class or subclass
3. **Consultation**: Review with security experts to validate the new category
4. **Integration**: Addition to the formal taxonomy with appropriate documentation
5. **Retroactive Analysis**: Review of existing vulnerabilities to identify any that should be reclassified
## Usage Guidelines
### For Testers
- Assign preliminary classifications during testing
- Document all observed behaviors clearly to enable accurate classification
- Highlight unusual patterns that may indicate new vulnerability classes
### For Security Analysts
- Verify and refine classifications
- Ensure consistency across similar vulnerabilities
- Identify patterns and trends within vulnerability classes
### For Developers
- Use classification to understand vulnerability mechanisms
- Reference similar vulnerabilities by class to inform remediation approaches
- Track remediation effectiveness by vulnerability class
## Reporting Standards
All vulnerability reports should include:
1. Full classification code
2. Detailed description of the vulnerability
3. Reproduction steps
4. Example exploitation (and its success rate)
5. Potential impact analysis
6. Suggested remediation approaches
## Conclusion
This classification framework provides a standardized approach to categorizing LLM security vulnerabilities. By applying this framework consistently, the security community can develop a shared understanding of vulnerability patterns, track trends over time, and develop more effective remediation strategies.
For examples of classified vulnerabilities, refer to the [vulnerability catalog](../research/vulnerabilities/catalog.md).