Spaces:
Sleeping
Sleeping
import streamlit as st | |
from utils.layout import render_layout | |
def render_report(): | |
st.title("Image Classification CV and Fine-Tuned NLP Recipe Recommendation") | |
# Title Page Information | |
st.markdown(""" | |
**Authors:** Saksham Lakhera and Ahmed Zaher | |
**Date:** July 2025 | |
""") | |
# Abstract | |
st.subheader("Abstract") | |
st.markdown(""" | |
**NLP Engineering Perspective:** | |
This project addresses the challenge of improving recipe recommendation systems through | |
advanced semantic search capabilities using transformer-based language models. This will explain how to fine-tune a model | |
to learn domain-specific context to capture the nuanced relationships between | |
ingredients and cooking techniques in culinary contexts. | |
Our approach leverages BERT (Bidirectional Encoder Representations from Transformers) | |
fine-tuning on a custom recipe dataset to develop a semantic understanding of culinary content. | |
We preprocessed and structured a subset of 15,000 recipes into standardized sequences organized | |
by food categories (proteins, vegetables, legumes, etc.) to create training data optimized for | |
the BERT architecture. | |
The model was fine-tuned to learn contextual embeddings that capture semantic relationships | |
between ingredients and tags. At the end, we generate embeddings for all recipes in our | |
dataset and perform cosine-similarity retrieval to produce the top-K most relevant recipes | |
for a user query. | |
""") | |
# Introduction | |
st.subheader("Introduction") | |
st.markdown(""" | |
This term project serves primarily as an educational exercise aimed at giving | |
end-to-end exposure to building a modern NLP system. Our goal is to construct a semantic | |
recipe-search engine that demonstrates how domain-specific fine-tuning of BERT can | |
substantially improve retrieval quality over simple keyword matching. | |
**Key Contributions:** | |
- A cleaned, category-labelled recipe subset of 15,000 recipes | |
- Training scripts that yield adapted contextual embeddings | |
- A production-ready retrieval service that returns top-K most relevant recipes | |
- Comparative evaluation against classical baselines | |
""") | |
# Dataset and Preprocessing | |
st.subheader("Dataset and Pre-processing") | |
st.markdown(""" | |
**Data Sources:** | |
The project draws from two CSV files: | |
- **Raw_recipes.csv:** 231,637 rows, one per recipe with columns: *id, name, ingredients, tags, minutes, steps, description, n_steps, n_ingredients* | |
- **Raw_interactions.csv:** user feedback containing *recipe_id, user_id, rating, review text* | |
""") | |
st.markdown(""" | |
**Corpus Filtering and Subset Selection** | |
- **Invalid rows removed:** recipes with empty ingredient lists, missing tags, or fewer than three total tags | |
- **Random sampling:** 15,000 recipes selected for NLP fine-tuning | |
- **Positive/negative pairs:** generated for contrastive learning using ratings and tag similarity | |
- **Train/test split:** 80/20 stratified split (12,000/3,000 pairs) | |
""") | |
st.markdown(""" | |
**Text Pre-processing Pipeline** | |
- **Lower-casing & punctuation removal:** normalized to lowercase, special characters stripped | |
- **Stop-descriptor removal:** culinary modifiers (*fresh, chopped, minced*) and measurements (tablespoons, teaspoons, cups, etc.) removed | |
- **Ingredient ordering:** re-ordered into sequence: protein β vegetables/grains/ dairy β other | |
- **Tag normalization:** mapped to 7 main categories: *cuisine, course, main-ingredient, dietary, difficulty, occasion, cooking_method* | |
- **Tokenization:** standard *bert-base-uncased* WordPiece tokenizer, sequences truncated/padded to 128 tokens | |
""") | |
# Technical Specifications | |
st.subheader("Technical Specifications") | |
col1, col2 = st.columns(2) | |
with col1: | |
st.markdown(""" | |
**Dataset:** | |
- Total Recipes: 231,630 | |
- Training Set: 12,000 recipes | |
- Average Tags per Recipe: ~6 | |
- Ingredients per Recipe: 3-20 | |
""") | |
with col2: | |
st.markdown(""" | |
**Infrastructure:** | |
- Python 3.10 | |
- PyTorch 2.1 (CUDA 11.8) | |
- Transformers 4.38 | |
- Google Colab A100 GPU | |
""") | |
# Methodology | |
st.subheader("Methodology") | |
st.markdown(""" | |
**Model Architecture** | |
- **Base Model:** bert-base-uncased | |
- **Additional Layers:** In some runs, we added a single linear classification layer with dropout (p = 0.1) | |
- **Training Objective:** Triplet-margin loss with margin of 1.0 | |
We trained the model directly on the raw data to see if we will get any good results. As seen in table 1, this run resulted in a very low training error | |
but when ran on the validation set, the training error was higher. We then used cleaned up the data by removing any empty space, standardized to lower text, removed | |
all punctuation and retrained the model. This resulted in a highly overfitted model as seen in table 1 and the results section below. Next, we added a single linear layer on top of | |
the BERT's current architecture and added a dropout to get rid of overfitting. The results as shown in table 1 were better. Although the semantic | |
results were better than before, it still was not good in indentifying the relashionships between ingredients and the different tags. We then further | |
structured the data by ordering the tags and ingredients in a strcutured manner across the dataset and retrained the model. This resulted in a better | |
training and validation loss. This is also evident in the semantic retrieval results below. | |
**Website Development:** | |
- We used streamlit to develop the websit. However, we faced few issues with the size of the trained model and we switched hosting to Hugging Face. | |
- The website loades the pre-trained model along with recipes embeddings and top-k retrieval function and waits for the user to enter a query. | |
- The query is then processed b the model and top-k recipes are returned. | |
""") | |
st.markdown("**Hyperparameters and Training**") | |
col1, col2 = st.columns(2) | |
with col1: | |
st.markdown(""" | |
- **Batch size:** 8 | |
- **Max sequence length:** 128 tokens | |
- **Learning rate:** 2 Γ 10β»β΅ | |
- **Weight decay:** 0.01 | |
""") | |
with col2: | |
st.markdown(""" | |
- **Optimizer:** AdamW | |
- **Epochs:** 3 | |
- **Hardware:** Google Colab A100 GPU (40 GB VRAM) | |
- **Training time:** ~30 minutes per run | |
""") | |
# Mathematical Formulations | |
st.subheader("Mathematical Formulations and Top-K Retrieval") | |
st.markdown("""**Query Embedding and Similarity Calculation**: we used the trained model weights to generate embeddings for the entire recipe corpus. We then used cosine similarity to calculate the similarity between the query and the recipe corpus. | |
and once the user query is passed, we embedded the querry using the trained model and used the cosine similarity formula below to retrieve the top-K | |
recipes. We then filtered the only ones that have an average rating >= 3.0 and at least 5 ratings. We then sorted the recipes by similarity and then by average rating. | |
""") | |
st.latex(r""" | |
\text{Similarity}(q, r_i) = \cos(\hat{q}, \hat{r}_i) = \frac{\hat{q} \cdot \hat{r}_i}{\|\hat{q}\|\|\hat{r}_i\|} | |
""") | |
st.markdown("Where $\\hat{q}$ is the BERT embedding of the query, and $\\hat{r}_i$ is the embedding of the i-th recipe.") | |
# Results | |
st.subheader("Results") | |
st.markdown("**Training and Validation Loss**") | |
results_data = { | |
"Run": [1, 2, 3, 4], | |
"Configuration": [ | |
"Raw, no cleaning/ordering", | |
"Cleaned text, unordered", | |
"Cleaned text + single layer + dropout", | |
"Cleaned text + ordering" | |
], | |
"Epoch-3 Train Loss": [0.0065, 0.0023, 0.0061, 0.0119], | |
"Validation Loss": [0.1100, 0.0000, 0.0118, 0.0067] | |
} | |
st.table(results_data) | |
st.markdown("""Table 1: Training and Validation Loss for each run""") | |
st.markdown(""" | |
**Key Finding:** Run 4 (cleaned text + ordering) achieved the best balance | |
between low validation loss and meaningful retrieval quality. | |
""") | |
st.markdown("**Qualitative Retrieval Examples**") | |
st.markdown(""" | |
In this section, we will show how the results of the model differ between runs and how the model performs on different queries. | |
**Query: "beef steak dinner"** | |
- Run 1 (Raw): *to die for crock pot roast*, *crock pot chicken with black beans* | |
- Run 2 (Cleaned text, unordered): *aussie pepper steak steak with creamy pepper sauce* | |
- Run 3 (Cleaned text + single layer + dropout): *balsamic rib eye steak with bleu cheese sauce* | |
- Run 4 (Final): *grilled garlic steak dinner*, *classic beef steak au poivre* | |
**Query: "chicken italian pasta"** | |
- Run 1 (Raw): *to die for crock pot roast*, *crock pot chicken with black beans* | |
- Run 2 (Cleaned text, unordered): *baked chicken soup* | |
- Run 3 (Cleaned text + single layer + dropout): *absolute best ever lasagna* | |
- Run 4 (Final): *creamy tuscan chicken pasta*, *italian chicken penne bake* | |
**Query: "vegetarian salad healthy"** | |
- Run 1 (Raw): *to die for crock pot roast* | |
- Run 2 (Cleaned text, unordered): *avocado mandarin salad* | |
- Run 3 (Cleaned text + single layer + dropout): *black bean and sweet potato salad* | |
- Run 4 (Final): *kale quinoa power salad*, *superfood spinach & berry salad* | |
""") | |
# Discussion and Conclusion | |
st.subheader("Discussion and Conclusion") | |
st.markdown(""" | |
The experimental evidence underscores the importance of disciplined pre-processing when | |
adapting large language models to niche domains. The breakthrough came with ingredient-ordering | |
(protein β vegetables β grains β dairy β other) which supplied consistent positional signals. As we can see in the results, | |
the performance of the model improves with the addition of the single layer and dropout but the results are still not as good as the final run where | |
we added the ordering of the ingredients. | |
**Key Achievements:** | |
- End-to-end recipe recommendation system with semantic search | |
- Meaningful semantic understanding of culinary content | |
- Reproducible blueprint for domain-specific NLP applications | |
**Limitations:** | |
- Private dataset relatively small training set (12k samples) compared to public corpora | |
- Further pre-processing could be done to improve the results | |
- Minimal hyperparameter search conducted | |
- Single-machine deployment tested | |
- The model is not able to handle complex queries and it is not able to handle synonyms and antonyms. | |
""") | |
# References | |
st.subheader("References") | |
st.markdown(""" | |
[1] Vaswani et al., "Attention Is All You Need," NeurIPS, 2017. | |
[2] Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," NAACL-HLT, 2019. | |
[3] Reimers and Gurevych, "Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks," EMNLP-IJCNLP, 2019. | |
[4] Hugging Face, "BERT Model Documentation," 2024. | |
""") | |
st.markdown("---") | |
st.markdown("Β© 2025 CSE 555 Term Project. All rights reserved.") | |
# Render the report | |
render_layout(render_report) | |