Vestiq / DEEPFASHION2_INTEGRATION.md
Hashii1729's picture
Integrate DeepFashion2 dataset: add evaluation module, utilities, and API endpoints for dataset management and analysis
f8b306b
# DeepFashion2 Dataset Integration
This document describes the integration of the DeepFashion2 dataset with the Vestiq fashion analysis system.
## Overview
DeepFashion2 is a comprehensive fashion dataset that provides:
- 491K diverse images of 13 popular clothing categories
- Bounding box annotations for fashion items
- Dense pose estimation
- Commercial-consumer clothes correspondence
- Scale, occlusion, zoom-in, and viewpoint labels
## Integration Features
### 1. Dataset Loading and Processing
- **DeepFashion2Dataset**: PyTorch dataset class for loading images and annotations
- **Category Mapping**: Maps DeepFashion2 categories to yainage90 model categories
- **Data Transforms**: Standard preprocessing for fashion images
- **Batch Processing**: Efficient DataLoader implementation
### 2. Evaluation Framework
- **Detection Accuracy**: Evaluate fashion object detection performance
- **Feature Quality**: Assess feature extraction capabilities
- **Classification Metrics**: Precision, recall, F1-score, confusion matrix
- **Visualization**: Confusion matrix plots and performance charts
### 3. API Endpoints
- `/deepfashion2/status` - Check integration status and dataset availability
- `/deepfashion2/statistics` - Get dataset statistics and category distribution
- `/deepfashion2/evaluate` - Run evaluation using DeepFashion2 as benchmark
- `/deepfashion2/setup-instructions` - Get setup instructions for the dataset
## Category Mapping
DeepFashion2 uses 13 detailed categories that are mapped to yainage90's 7 categories:
| DeepFashion2 Category | yainage90 Category |
|----------------------|-------------------|
| short_sleeved_shirt | top |
| long_sleeved_shirt | top |
| short_sleeved_outwear| outer |
| long_sleeved_outwear | outer |
| vest | top |
| sling | top |
| shorts | bottom |
| trousers | bottom |
| skirt | bottom |
| short_sleeved_dress | dress |
| long_sleeved_dress | dress |
| vest_dress | dress |
| sling_dress | dress |
## Setup Instructions
### 1. Download the Dataset
The DeepFashion2 dataset requires manual download due to licensing requirements:
1. Visit the official repository: https://github.com/switchablenorms/DeepFashion2
2. Follow the dataset download instructions
3. Register and download the dataset files
### 2. Dataset Structure
Extract the dataset to `./data/deepfashion2/` with the following structure:
```
deepfashion2/
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ image/ # Training images
β”‚ └── annos/ # Training annotations (JSON)
β”œβ”€β”€ validation/
β”‚ β”œβ”€β”€ image/ # Validation images
β”‚ └── annos/ # Validation annotations (JSON)
└── test/
β”œβ”€β”€ image/ # Test images
└── annos/ # Test annotations (JSON)
```
### 3. Install Dependencies
Install additional dependencies for evaluation:
```bash
pip install scikit-learn matplotlib seaborn
```
### 4. Verify Setup
Check the integration status:
```bash
curl http://localhost:7861/deepfashion2/status
```
## Usage Examples
### 1. Basic Dataset Loading
```python
from deepfashion2_utils import DeepFashion2Config, DeepFashion2Dataset
config = DeepFashion2Config()
dataset = DeepFashion2Dataset(
root_dir=config.dataset_root,
split='validation',
load_annotations=True
)
# Get a sample
sample = dataset[0]
print(f"Image: {sample['image_path']}")
print(f"Categories: {dataset.get_categories_in_image(sample['annotations'])}")
```
### 2. Running Evaluation
```python
from deepfashion2_evaluation import run_full_evaluation
from fast import analyzer
# Run evaluation with 100 samples
report_path = run_full_evaluation(analyzer, max_samples=100)
print(f"Evaluation report saved to: {report_path}")
```
### 3. API Usage
```bash
# Check status
curl -X GET "http://localhost:7861/deepfashion2/status"
# Get dataset statistics
curl -X GET "http://localhost:7861/deepfashion2/statistics"
# Run evaluation
curl -X POST "http://localhost:7861/deepfashion2/evaluate?max_samples=50"
# Get setup instructions
curl -X GET "http://localhost:7861/deepfashion2/setup-instructions"
```
## Evaluation Metrics
### Detection Accuracy
- **Category-level accuracy**: How well the model detects clothing categories
- **Detection score**: IoU-like metric for category overlap
- **Confusion matrix**: Detailed breakdown of predictions vs ground truth
### Feature Quality
- **Feature dimension**: Dimensionality of extracted features
- **Intra-category similarity**: How similar features are within the same category
- **Inter-category distance**: How well features separate different categories
- **Feature separability**: Overall quality metric for feature discrimination
## Configuration Options
### DeepFashion2Config
```python
@dataclass
class DeepFashion2Config:
dataset_root: str = "./data/deepfashion2"
categories: List[str] = None # Auto-populated with 13 categories
image_size: Tuple[int, int] = (224, 224)
batch_size: int = 32
num_workers: int = 4
```
### Customization
You can customize the configuration for your specific needs:
```python
config = DeepFashion2Config(
dataset_root="/path/to/your/deepfashion2",
image_size=(256, 256),
batch_size=16
)
```
## Performance Considerations
### Memory Usage
- The dataset is large (~15GB), ensure sufficient disk space
- Use appropriate batch sizes based on available GPU memory
- Consider using `num_workers` for faster data loading
### CPU Optimization
- The system automatically detects CPU vs GPU and optimizes accordingly
- CPU inference uses float32 precision and limited threads
- GPU inference uses float16 precision for better performance
### Evaluation Speed
- Limit `max_samples` for faster evaluation during development
- Full evaluation on the entire validation set may take significant time
- Consider running evaluations on a subset for quick feedback
## Troubleshooting
### Common Issues
1. **Dataset not found**: Ensure the dataset is extracted to the correct path
2. **Permission errors**: Check file permissions for the dataset directory
3. **Memory errors**: Reduce batch size or number of workers
4. **Import errors**: Install missing dependencies (scikit-learn, matplotlib, seaborn)
### Debug Mode
Enable debug logging to troubleshoot issues:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## Future Enhancements
### Planned Features
- **Training Pipeline**: Fine-tune models on DeepFashion2 data
- **Advanced Metrics**: Add more sophisticated evaluation metrics
- **Visualization Tools**: Enhanced plotting and analysis tools
- **Benchmark Comparisons**: Compare against other fashion datasets
### Contributing
To contribute to the DeepFashion2 integration:
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Submit a pull request
## References
- [DeepFashion2 Paper](https://arxiv.org/abs/1901.07973)
- [DeepFashion2 Repository](https://github.com/switchablenorms/DeepFashion2)
- [yainage90 Models](https://huggingface.co/yainage90)
## License
This integration follows the same license as the main Vestiq project. The DeepFashion2 dataset has its own licensing terms that must be respected.