|
--- |
|
license: cc-by-sa-4.0 |
|
pipeline_tag: feature-extraction |
|
library_name: timm |
|
language: [] |
|
base_model: timm/convnext_base.fb_in22k_ft_in1k |
|
embedding_dimension: 512 |
|
training_steps: 108 |
|
model_type: trendyol_arcface |
|
tags: |
|
- computer-vision |
|
- image-feature-extraction |
|
- arcface |
|
- product-similarity |
|
- e-commerce |
|
- image-embeddings |
|
- convnext |
|
--- |
|
|
|
# E-Commerce Product Image Encoder |
|
|
|
_ConvNeXt-based image embedding model for product unification and visual search on the Trendyol e-commerce catalogue._ |
|
|
|
## Model Details |
|
|
|
- **Architecture**: ConvNeXt-Base (224px) backbone + 512-dim projection head with BatchNorm and ArcFace loss |
|
- **Objective**: ArcFace with additive angular margin (scale=128, margin=0.25) for improved product similarity learning |
|
- **Training Data**: Large-scale Trendyol product image dataset covering diverse e-commerce categories |
|
- **Hardware**: Multi-GPU training with PyTorch Lightning (training epoch: 5, global steps: 108) |
|
- **Framework**: PyTorch Lightning 1.8.1 with mixed-precision training |
|
|
|
## Intended Use |
|
|
|
- **Primary** – Generate embeddings for duplicate product detection ("unification"), near-duplicate search, and product similarity ranking in e-commerce applications |
|
- **Secondary** – Feature extractor for image-based product recommendation systems and visual search |
|
- **Downstream Tasks** – Product clustering, visual search, duplicate detection, and content-based product recommendation |
|
|
|
## Usage |
|
|
|
Complete example to load the model and generate embeddings: |
|
|
|
```python |
|
import torch |
|
import torch.nn as nn |
|
import torch.nn.functional as F |
|
import timm |
|
import json |
|
from safetensors.torch import load_file |
|
from PIL import Image |
|
import torchvision.transforms as transforms |
|
import requests |
|
|
|
# 1. Define the model class |
|
class TYArcFaceModel(nn.Module): |
|
def __init__(self, config): |
|
super().__init__() |
|
self.config = config |
|
self.backbone = timm.create_model( |
|
config['backbone_name'], |
|
pretrained=False, |
|
num_classes=0 |
|
) |
|
self.bn1 = nn.BatchNorm2d(config['backbone_features']) |
|
self.fc11 = nn.Linear( |
|
config['backbone_features'] * config['hidden_size'], |
|
config['embedding_dim'] |
|
) |
|
self.bn11 = nn.BatchNorm1d(config['embedding_dim']) |
|
|
|
def forward(self, x): |
|
features = self.backbone.forward_features(x) |
|
features = self.bn1(features) |
|
features = features.flatten(start_dim=1) |
|
features = self.fc11(features) |
|
features = self.bn11(features) |
|
features = F.normalize(features, p=2, dim=1) |
|
return features |
|
|
|
# 2. Load the model |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
# Load configuration and weights |
|
config = json.load(open('config.json')) |
|
model = TYArcFaceModel(config) |
|
state_dict = load_file('model.safetensors') |
|
|
|
# Filter to only load compatible weights |
|
model_keys = set(model.state_dict().keys()) |
|
filtered_state_dict = {k: v for k, v in state_dict.items() if k in model_keys} |
|
|
|
model.load_state_dict(filtered_state_dict, strict=False) |
|
model.to(device) |
|
model.eval() |
|
|
|
print(f"✅ Model loaded successfully!") |
|
print(f"📊 Ready to generate {config['embedding_dim']}-dimensional embeddings") |
|
|
|
# 3. Define preprocessing transforms |
|
transform = transforms.Compose([ |
|
transforms.Resize((config['input_size'], config['input_size'])), |
|
transforms.ToTensor(), |
|
transforms.Normalize( |
|
mean=config['normalization']['mean'], |
|
std=config['normalization']['std'] |
|
) |
|
]) |
|
|
|
# 4. Process an image and generate embeddings |
|
def get_embeddings(image_path_or_url): |
|
"""Get embeddings for a single image""" |
|
# Load image |
|
if image_path_or_url.startswith('http'): |
|
image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert('RGB') |
|
else: |
|
image = Image.open(image_path_or_url).convert('RGB') |
|
|
|
# Preprocess |
|
input_tensor = transform(image).unsqueeze(0).to(device) |
|
|
|
# Generate embeddings |
|
with torch.no_grad(): |
|
embeddings = model(input_tensor) |
|
|
|
return embeddings |
|
|
|
# 5. Example usage |
|
image_url = "https://example.com/product_image.jpg" # Replace with your image |
|
embeddings = get_embeddings(image_url) |
|
print(f"Embedding shape: {embeddings.shape}") # torch.Size([1, 512]) |
|
|
|
# 6. Compute similarity between two products |
|
def compute_similarity(embedding1, embedding2): |
|
"""Compute cosine similarity between two embeddings""" |
|
return F.cosine_similarity(embedding1, embedding2, dim=1) |
|
|
|
# Example: Compare two products |
|
# embedding2 = get_embeddings("path/to/another/image.jpg") |
|
# similarity_score = compute_similarity(embeddings, embedding2) |
|
# print(f"Product similarity: {similarity_score.item():.4f}") |
|
``` |
|
|
|
## Model Performance |
|
|
|
The model has been trained using ArcFace loss which provides several advantages for product similarity tasks: |
|
|
|
- **Improved Discriminative Power**: ArcFace adds angular margin in the feature space, creating better separation between different products |
|
- **Normalized Embeddings**: All output embeddings are L2-normalized, making cosine similarity computation efficient |
|
- **Scale Robustness**: The learned representations are robust to scale variations in product images |
|
|
|
### Training Configuration |
|
|
|
- **Backbone**: ConvNeXt-Base pretrained on ImageNet-22k and fine-tuned on ImageNet-1k |
|
- **Embedding Dimension**: 512 |
|
- **ArcFace Scale**: 128 |
|
- **ArcFace Margin**: 0.25 |
|
- **Input Resolution**: 224×224 |
|
- **Normalization**: ImageNet statistics |
|
- **Training Framework**: PyTorch Lightning 1.8.1 |
|
|
|
## Limitations |
|
|
|
- **Domain Specificity**: Optimized for e-commerce product images; may not generalize well to other image domains |
|
- **Image Quality**: Performance may degrade on low-quality, heavily compressed, or significantly distorted images |
|
- **Category Bias**: Performance may vary across different product categories based on training data distribution |
|
- **Scale Dependency**: Input images should be resized to 224×224 for optimal performance |
|
|
|
## Bias Analysis |
|
|
|
- **Dataset Bias**: The model's embeddings may reflect biases present in the e-commerce training dataset |
|
- **Product Category Imbalance**: Some product categories may be over-represented in the training data |
|
- **Brand and Style Bias**: The model may learn to encode brand-specific or style-specific features that could affect similarity judgments |
|
|
|
## Environmental Impact |
|
|
|
- **Training Hardware**: Multi-GPU setup with PyTorch Lightning |
|
- **Training Time**: 5 epochs with 108 global steps |
|
- **Energy Consumption**: Estimated moderate carbon footprint due to relatively short training duration |
|
|
|
## Ethical Considerations |
|
|
|
- **Commercial Use**: Designed for e-commerce applications; consider potential impacts on market competition |
|
- **Privacy**: Ensure compliance with data protection regulations when processing product images |
|
- **Fairness**: Monitor for biased similarity judgments across different product categories or brands |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{trendyol2025convnextarcface, |
|
title={E-Commerce Product Image Encoder: High-Fidelity Image Embeddings for E-commerce Product Unification}, |
|
author={Trendyol Data Science Team}, |
|
year={2025}, |
|
howpublished={\url{https://huggingface.co/Trendyol/e-commerce-product-image-encoder }} |
|
} |
|
``` |
|
|
|
## Model Card Authors |
|
|
|
- Trendyol Data Science Team |
|
- Model trained using the TYArcFace architecture with ConvNeXt backbone |
|
|
|
## License |
|
|
|
This model is released by Trendyol as a source-available, non-open-source model. |
|
|
|
### You are allowed to: |
|
|
|
- View, download, and evaluate the model weights. |
|
- Use the model for non-commercial research and internal testing. |
|
- Use the model or its derivatives for commercial purposes, provided that: |
|
- You cite Trendyol as the original model creator. |
|
- You notify Trendyol in advance via [email protected] or other designated contact. |
|
|
|
### You are not allowed to: |
|
|
|
- Redistribute or host the model or its derivatives on third-party platforms without prior written consent from Trendyol. |
|
- Use the model in applications violating ethical standards, including but not limited to surveillance, misinformation, or harm to individuals or groups. |
|
|
|
By downloading or using this model, you agree to the terms above. |
|
|
|
© 2025 Trendyol Group. All rights reserved. |
|
|
|
See the [LICENSE](LICENSE) file for more details. |
|
|
|
--- |
|
|
|
_For technical support or questions about this model, please contact the Trendyol Data Science team._ |
|
|