File size: 8,409 Bytes
0dfad2f d9a8a90 0dfad2f 45728f9 0dfad2f 45728f9 0dfad2f 45728f9 0dfad2f b9b7025 0dfad2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
---
license: cc-by-sa-4.0
pipeline_tag: feature-extraction
library_name: timm
language: []
base_model: timm/convnext_base.fb_in22k_ft_in1k
embedding_dimension: 512
training_steps: 108
model_type: trendyol_arcface
tags:
- computer-vision
- image-feature-extraction
- arcface
- product-similarity
- e-commerce
- image-embeddings
- convnext
---
# E-Commerce Product Image Encoder
_ConvNeXt-based image embedding model for product unification and visual search on the Trendyol e-commerce catalogue._
## Model Details
- **Architecture**: ConvNeXt-Base (224px) backbone + 512-dim projection head with BatchNorm and ArcFace loss
- **Objective**: ArcFace with additive angular margin (scale=128, margin=0.25) for improved product similarity learning
- **Training Data**: Large-scale Trendyol product image dataset covering diverse e-commerce categories
- **Hardware**: Multi-GPU training with PyTorch Lightning (training epoch: 5, global steps: 108)
- **Framework**: PyTorch Lightning 1.8.1 with mixed-precision training
## Intended Use
- **Primary** – Generate embeddings for duplicate product detection ("unification"), near-duplicate search, and product similarity ranking in e-commerce applications
- **Secondary** – Feature extractor for image-based product recommendation systems and visual search
- **Downstream Tasks** – Product clustering, visual search, duplicate detection, and content-based product recommendation
## Usage
Complete example to load the model and generate embeddings:
```python
import torch
import torch.nn as nn
import torch.nn.functional as F
import timm
import json
from safetensors.torch import load_file
from PIL import Image
import torchvision.transforms as transforms
import requests
# 1. Define the model class
class TYArcFaceModel(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
self.backbone = timm.create_model(
config['backbone_name'],
pretrained=False,
num_classes=0
)
self.bn1 = nn.BatchNorm2d(config['backbone_features'])
self.fc11 = nn.Linear(
config['backbone_features'] * config['hidden_size'],
config['embedding_dim']
)
self.bn11 = nn.BatchNorm1d(config['embedding_dim'])
def forward(self, x):
features = self.backbone.forward_features(x)
features = self.bn1(features)
features = features.flatten(start_dim=1)
features = self.fc11(features)
features = self.bn11(features)
features = F.normalize(features, p=2, dim=1)
return features
# 2. Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load configuration and weights
config = json.load(open('config.json'))
model = TYArcFaceModel(config)
state_dict = load_file('model.safetensors')
# Filter to only load compatible weights
model_keys = set(model.state_dict().keys())
filtered_state_dict = {k: v for k, v in state_dict.items() if k in model_keys}
model.load_state_dict(filtered_state_dict, strict=False)
model.to(device)
model.eval()
print(f"✅ Model loaded successfully!")
print(f"📊 Ready to generate {config['embedding_dim']}-dimensional embeddings")
# 3. Define preprocessing transforms
transform = transforms.Compose([
transforms.Resize((config['input_size'], config['input_size'])),
transforms.ToTensor(),
transforms.Normalize(
mean=config['normalization']['mean'],
std=config['normalization']['std']
)
])
# 4. Process an image and generate embeddings
def get_embeddings(image_path_or_url):
"""Get embeddings for a single image"""
# Load image
if image_path_or_url.startswith('http'):
image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert('RGB')
else:
image = Image.open(image_path_or_url).convert('RGB')
# Preprocess
input_tensor = transform(image).unsqueeze(0).to(device)
# Generate embeddings
with torch.no_grad():
embeddings = model(input_tensor)
return embeddings
# 5. Example usage
image_url = "https://example.com/product_image.jpg" # Replace with your image
embeddings = get_embeddings(image_url)
print(f"Embedding shape: {embeddings.shape}") # torch.Size([1, 512])
# 6. Compute similarity between two products
def compute_similarity(embedding1, embedding2):
"""Compute cosine similarity between two embeddings"""
return F.cosine_similarity(embedding1, embedding2, dim=1)
# Example: Compare two products
# embedding2 = get_embeddings("path/to/another/image.jpg")
# similarity_score = compute_similarity(embeddings, embedding2)
# print(f"Product similarity: {similarity_score.item():.4f}")
```
## Model Performance
The model has been trained using ArcFace loss which provides several advantages for product similarity tasks:
- **Improved Discriminative Power**: ArcFace adds angular margin in the feature space, creating better separation between different products
- **Normalized Embeddings**: All output embeddings are L2-normalized, making cosine similarity computation efficient
- **Scale Robustness**: The learned representations are robust to scale variations in product images
### Training Configuration
- **Backbone**: ConvNeXt-Base pretrained on ImageNet-22k and fine-tuned on ImageNet-1k
- **Embedding Dimension**: 512
- **ArcFace Scale**: 128
- **ArcFace Margin**: 0.25
- **Input Resolution**: 224×224
- **Normalization**: ImageNet statistics
- **Training Framework**: PyTorch Lightning 1.8.1
## Limitations
- **Domain Specificity**: Optimized for e-commerce product images; may not generalize well to other image domains
- **Image Quality**: Performance may degrade on low-quality, heavily compressed, or significantly distorted images
- **Category Bias**: Performance may vary across different product categories based on training data distribution
- **Scale Dependency**: Input images should be resized to 224×224 for optimal performance
## Bias Analysis
- **Dataset Bias**: The model's embeddings may reflect biases present in the e-commerce training dataset
- **Product Category Imbalance**: Some product categories may be over-represented in the training data
- **Brand and Style Bias**: The model may learn to encode brand-specific or style-specific features that could affect similarity judgments
## Environmental Impact
- **Training Hardware**: Multi-GPU setup with PyTorch Lightning
- **Training Time**: 5 epochs with 108 global steps
- **Energy Consumption**: Estimated moderate carbon footprint due to relatively short training duration
## Ethical Considerations
- **Commercial Use**: Designed for e-commerce applications; consider potential impacts on market competition
- **Privacy**: Ensure compliance with data protection regulations when processing product images
- **Fairness**: Monitor for biased similarity judgments across different product categories or brands
## Citation
```bibtex
@misc{trendyol2025convnextarcface,
title={E-Commerce Product Image Encoder: High-Fidelity Image Embeddings for E-commerce Product Unification},
author={Trendyol Data Science Team},
year={2025},
howpublished={\url{https://huggingface.co/Trendyol/e-commerce-product-image-encoder }}
}
```
## Model Card Authors
- Trendyol Data Science Team
- Model trained using the TYArcFace architecture with ConvNeXt backbone
## License
This model is released by Trendyol as a source-available, non-open-source model.
### You are allowed to:
- View, download, and evaluate the model weights.
- Use the model for non-commercial research and internal testing.
- Use the model or its derivatives for commercial purposes, provided that:
- You cite Trendyol as the original model creator.
- You notify Trendyol in advance via [email protected] or other designated contact.
### You are not allowed to:
- Redistribute or host the model or its derivatives on third-party platforms without prior written consent from Trendyol.
- Use the model in applications violating ethical standards, including but not limited to surveillance, misinformation, or harm to individuals or groups.
By downloading or using this model, you agree to the terms above.
© 2025 Trendyol Group. All rights reserved.
See the [LICENSE](LICENSE) file for more details.
---
_For technical support or questions about this model, please contact the Trendyol Data Science team._
|