File size: 3,146 Bytes
dfa48b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

# Sentence Transformer Quantized Model for Movie Recommendation on Movie-Lens-Dataset

This repository hosts a quantized version of the Sentence Transformer model, fine-tuned for Movie Recommendation using the Movie Lens dataset. The model has been optimized using FP16 quantization for efficient deployment without significant accuracy loss.

## Model Details

- **Model Architecture:** Sentence Transformer 
- **Task:** Movie Recommendation  
- **Dataset:** Movie Lens Dataset
- **Quantization:** Float16  
- **Fine-tuning Framework:** Hugging Face Transformers  

---

## Installation

```bash
!pip install pandas torch sentence-transformers scikit-learn

```

---

## Loading the Model

```python
from sentence_transformers import SentenceTransformer, InputExample, losses, util
import torch

# Load  model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2', device=device)

# pass the movie name
recommend_by_movie_name("Toy Story")


# Recommend Movies
def recommend_by_movie_name(movie_name, top_k=5):
    titles = movie_subset["title"].tolist()
    matches = get_close_matches(movie_name, titles, n=1, cutoff=0.6)
    
    if not matches:
        print(f"❌ Movie '{movie_name}' not found in dataset.")
        return
    
    matched_title = matches[0]
    movie_index = movie_subset[movie_subset["title"] == matched_title].index[0]
    
    query_embedding = movie_embeddings[movie_index]
    scores = util.pytorch_cos_sim(query_embedding, movie_embeddings)[0]
    top_results = torch.topk(scores, k=top_k + 1)

    print(f"\n🎬 Recommendations for: {matched_title}")
    for score, idx_tensor in zip(top_results[0][1:], top_results[1][1:]):  # skip itself
        idx = idx_tensor.item()  # βœ… Convert tensor to int
        title = movie_subset.iloc[idx]["title"]
        print(f"  {title} (Score: {score:.4f})")

```

---


---

## Fine-Tuning Details

### Dataset

The dataset is sourced from Hugging Face’s `Movie-Lens` dataset. It contains 20,000 movies and their genres.  

### Training

- **Epochs:** 2  
- **warmup_steps:** 100  
- **show_progress_bar:** True  
- **Evaluation strategy:** `epoch`  

---

## Quantization

Post-training quantization was applied using PyTorch’s `half()` precision (FP16) to reduce model size and inference time.

---

## Repository Structure

```python
.
β”œβ”€β”€ quantized-model/               # Contains the quantized model files
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   β”œβ”€β”€ modules.json
β”‚   └── special_tokens_map.json
β”‚   β”œβ”€β”€ sentence_bert_config.jason
β”‚   └── tokenizer.json
β”‚   β”œβ”€β”€ config_sentence_transformers.jason
β”‚   └── vocab.txt

β”œβ”€β”€ README.md                      # Model documentation
```

---

## Limitations

- The model is trained specifically for Movie Recommendation on Movies Dataset.
- FP16 quantization may result in slight numerical instability in edge cases.


---

## Contributing

Feel free to open issues or submit pull requests to improve the model or documentation.