Spaces:
Sleeping
Sleeping
File size: 5,260 Bytes
b5f9fcc a306fec b5f9fcc a306fec b5f9fcc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
Report section: Title Page β’ Title: Term Project β’ Authors: Saksham Lakhera and Ahmed Zaher β’ Course: CSE 555 β Introduction to Pattern Recognition β’ Date: July 20 2025 Abstract NLP Engineering Perspective This project addresses the challenge of improving recipe recommendation systems through advanced semantic search capabilities using transformer-based language models. Traditional keyword-based search methods often fail to capture the nuanced relationships between ingredients, cooking techniques, and user preferences in culinary contexts. Our approach leverages BERT (Bidirectional Encoder Representations from Transformers) fine-tuning on a custom recipe dataset to develop a semantic understanding of culinary content. We preprocessed and structured a subset of 15,000 recipes into standardized sequences organized by food categories (proteins, vegetables, legumes, etc.) to create training data optimized for the BERT architecture. The model was fine-tuned to learn contextual embeddings that capture semantic relationships between ingredients and tags. At the end, we generated embeddings for all recipes in our dataset and implemented a cosine similarity-based retrieval system that returns the top-K most relevant recipes based on user search queries. Our evaluation demonstrates [PLACEHOLDER: key quantitative results - e.g., Recall@10 = X.XX, MRR = X.XX, improvement over baseline = +XX%]. This work provides practical experience in transformer fine-tuning for domain-specific applications and demonstrates the effectiveness of structured data preprocessing for improving semantic search in the culinary domain. Computer-Vision Engineering Perspective (Reserved β to be completed by CV author) Introduction NLP Engineering Perspective This term project, carried out for CSE 555, serves primarily as an educational exercise aimed at giving graduate students end-to-end exposure to building a modern NLP system. Our goal is to construct a semantic recipe-search engine that demonstrates how domain-specific fine-tuning of BERT can substantially improve retrieval quality over simple keyword matching. We created a preprocessing pipeline that restructures 15 000 recipes into standardized ingredient-sequence representations and then fine-tuned BERT on this corpus. Key contributions include (i) a cleaned, category-labelled recipe subset, (ii) training scripts that yield domain-adapted contextual embeddings, and (iii) a production-ready retrieval service that returns the top-K most relevant recipes for an arbitrary user query via cosine-similarity ranking. A comparative evaluation against classical lexical baselines will be presented in Section 9 [PLACEHOLDER: baseline summary]. The project thus provides a compact blueprint of the full NLP workflowβfrom data curation through deployment. Computer-Vision Engineering Perspective The Computer-Vision track followed a three-phase pipeline designed to simulate the data-engineering challenges of real-world projects. Phase 1 consisted of collecting more than 6 000 food photographs under diverse lighting conditions and backgrounds, deliberately introducing noise to improve model robustness. Phase 2 handled image preprocessing, augmentation, and the subsequent training and evaluation of a convolutional neural network whose weights capture salient visual features of dishes. Phase 3 integrated the trained network into the shared web application so that users can upload an image and receive 5β10 recipe recommendations that match both visually and semantically. Detailed architecture choices and quantitative results will be provided in later sections [PLACEHOLDER: CV performance metrics]. Background / Related Work βββ’ Survey of prior methods and the state of the art βββ’ Clear positioning of your approach relative to existing literature Dataset and Pre-processing βββ’ Data source(s), collection or selection criteria βββ’ Cleaning, normalization, augmentation, class balancing, etc. Methodology βββ’ Theoretical foundations and algorithms used βββ’ Model architecture, feature extraction, hyper-parameters βββ’ Assumptions and justifications Experimental Setup βββ’ Hardware / software environment βββ’ Train / validation / test split, cross-validation strategy βββ’ Evaluation metrics (accuracy, F1-score, ROC-AUC, etc.) Results βββ’ Quantitative tables and charts βββ’ Qualitative examples (e.g., confusion matrix, sample outputs) βββ’ Statistical significance tests where applicable Discussion ββ β’ Interpretation of results (why methods worked or failed) ββ β’ Comparison with baselines or published benchmarks ββ β’ Limitations of your study Conclusion ββ β’ Recap of contributions and findings ββ β’ Practical implications Future Work ββ β’ Concrete next steps or open problems Acknowledgments (if appropriate) ββ β’ Funding sources, collaborators, data providers References ββ β’ Properly formatted bibliography (IEEE, APA, etc.) Appendices (optional) ββ β’ Supplementary proofs, additional graphs, extensive tables, code snippets |