sakshamlakhera commited on
Commit
b485e4a
Β·
1 Parent(s): 6a4ece9
Files changed (1) hide show
  1. pages/4_Report.py +31 -33
pages/4_Report.py CHANGED
@@ -23,7 +23,7 @@ def render_report():
23
  st.markdown("""
24
  <div style='text-align: justify'>
25
 
26
- The project is a <b>recipe recommendation system</b> that allows users to either <b>type a textual query</b> or <b>upload images of food items</b>. Based on the inputs β€” including user-provided tags and detected ingredients β€” the application returns the most relevant recipes using semantic search and image classification.
27
 
28
  <h4>1.1 NLP Task:</h4>
29
 
@@ -55,7 +55,7 @@ def render_report():
55
  st.subheader("2. Introduction")
56
 
57
  st.markdown("""
58
- In an increasingly digital culinary world, users often look for personalized recipe recommendations based on either what they have at hand or what they crave. While traditional recipe search engines rely heavily on keyword matching, they fail to understand the deeper semantic context of ingredients, cooking methods, and dietary preferences. Similarly, visual recognition of food items can play a key role in enabling intuitive, image-based search experiences β€” especially for users unsure of ingredient names or spelling.
59
 
60
  This project aims to build an **end-to-end multimodal recipe recommendation system** that supports both **natural language queries** and **image-based inputs**. Users can either type a textual query such as β€œhealthy vegetarian salad” or upload an image of a food item (e.g., pear, onion), and the system will return the most relevant recipes. This is achieved by integrating two advanced deep learning pipelines:
61
 
@@ -71,11 +71,11 @@ def render_report():
71
  st.markdown("## 3. CV: Produce Classification Task")
72
 
73
  st.markdown("""
74
- For our Produce Classification task, we manually captured images of **tomato, onion, pear, and strawberry**, collecting a total of **12,000 images** β€” approximately **3,000 per category**.
75
  Within each category, we introduced **3 intra-class variations**, with around **1,000 samples per variation**, by photographing the produce in different physical states:
76
 
77
  - **Whole:** The item is uncut and intact (e.g., an entire pear or onion).
78
- - **Halved/Hulled:** The item is partially cut β€” for example, a strawberry with the hull removed or a fruit sliced in half.
79
  - **Sliced:** The item is cut into smaller segments or slices, such as onion rings or tomato wedges.
80
 
81
  These variations allow the model to generalize better by learning visual features across different presentations, shapes, and cross-sections of each produce type.
@@ -109,12 +109,12 @@ def render_report():
109
  st.markdown("""
110
  <div style='text-align: justify;'>
111
 
112
- This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b> in one sample image per class. It’s a <b>visual summary of color composition</b> and can reveal important patterns about your dataset.
113
 
114
  </div>
115
  """, unsafe_allow_html=True)
116
 
117
- st.image("assets/images/part1_image_histogram.png", caption="RGB histogram distribution for one image per class", use_container_width=True)
118
 
119
  st.markdown("""
120
  From the above histograms, we observe the following:
@@ -135,7 +135,7 @@ def render_report():
135
  **1. Onion**
136
  - Red & Green: Sharp peaks at 140–150
137
  - Blue: Broad peak around 100
138
- - Likely reflects white/yellow onion layers with soft shadows; blue may be from background or lighting
139
  - The model may learn to detect mid-range blue with red-green spikes
140
  """)
141
 
@@ -156,7 +156,6 @@ def render_report():
156
  - Green & Blue: Peaks between 50–120
157
  - Red: Moderate and spread around 100–150
158
  - Suggests soft green/yellow pear tones with consistent lighting
159
- - Minimal intra-class variation makes this class stable for classification
160
  """)
161
 
162
  with col4:
@@ -189,7 +188,7 @@ def render_report():
189
 
190
  1. **Blurriness of All Average Images**
191
  - High blur indicates significant variation in object position, orientation, and size.
192
- - No consistent alignment or cropping β€” objects appear in different parts of the frame.
193
 
194
  2. **Centered Color Blobs**
195
  - Each average image displays a dominant center color:
@@ -229,7 +228,7 @@ def render_report():
229
  Although the typical split is 70:15:15, we opted to test on more data to better evaluate generalization and avoid overfitting.
230
 
231
  Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We employed the **Adam optimizer** with a learning rate of **0.0001**.
232
- We also implemented **early stopping** with a patience of 5 epochs β€” meaning training stops if no improvement is seen in validation accuracy for 5 consecutive epochs.
233
  """)
234
 
235
  # Insert training & validation graph
@@ -253,7 +252,7 @@ def render_report():
253
 
254
  From the confusion matrix, it is evident that the model demonstrates strong **class separability** and **robust generalization**, with only **17 total misclassifications out of 3,035 test samples**.
255
 
256
- This confirms that the model is capable of distinguishing even visually similar classes with high precision.
257
  """)
258
 
259
  st.markdown("""
@@ -282,7 +281,7 @@ def render_report():
282
 
283
  Most misclassifications occurred between **strawberry** and **onion**. These classes exhibited greater variation in object positioning. In some cases, the objects (onion or strawberry) were **partially hidden**, with only a small portion visible, and were also affected by **poor lighting conditions**. Such combinations made it challenging for the model to make accurate predictions.
284
 
285
- However, with an F1-score of **99%** for these classes, we can confidently conclude that the model performed well overall β€” especially on images where the object was **clearly visible**, **fully within the frame**, and in **good general condition**. This further suggests that the model is **robust and ready for real-world use**.
286
 
287
  Notably, we did not observe any misclassifications for **pear** and **tomato**. Based on our earlier data analysis, images in these classes were generally **well-centered and localized**, which likely contributed to the model's high accuracy (100%) in those categories.
288
  """)
@@ -296,7 +295,7 @@ def render_report():
296
 
297
  To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
298
 
299
- The image below shows a **single most-activated channel per layer** for each class: Onion, Pear, Tomato, and Strawberry β€” across **9 convolutional stages**.
300
 
301
  """)
302
 
@@ -320,7 +319,7 @@ def render_report():
320
 
321
  3. **Deep Layers (Conv 7–9):**
322
  - Feature maps become **coarser and more focused**, losing most spatial resolution.
323
- - The network now highlights only **key discriminative regions** β€” often the **center mass** of the object.
324
  - While the original shape is nearly lost, **strong activation in a focused area** indicates high confidence in classification.
325
  - This shows the model is no longer looking at superficial textures, but **has learned what features truly define each class**.
326
 
@@ -343,11 +342,11 @@ def render_report():
343
  st.markdown("""
344
  As mentioned earlier, we have **3,000 images per class**, and within each class, there are **1,000 images per variation** of **whole**, **halved/hulled**, and **sliced/cored**.
345
 
346
- These variations not only help make our main classification model more **robust to presentation differences**, but also allow us to analyze how the model performs under **intra-class variation** β€” that is, variation within the same object category.
347
 
348
  ### Importance of Intra-Class Variation analysis:
349
 
350
- - In real-world settings (e.g., cooking, grocery shelves, or user-uploaded photos), food items can appear in multiple forms β€” whole, cut, or partially visible.
351
  - A model that performs well only on whole items may fail when the object is sliced or obscured.
352
  - By training and evaluating a separate **variation classifier**, we can:
353
  - Assess the **distinctiveness** of each variation within a class.
@@ -369,7 +368,7 @@ def render_report():
369
 
370
  As we are using the **EfficientNet-B0** model, all images in our dataset are resized to <b>224Γ—224</b> pixels. This is the standard input size for EfficientNet and ensures compatibility with pre-trained weights, as well as efficient GPU utilization during training.
371
 
372
- Below are sample resized images for each class β€” <b>onion, pear, strawberry, and tomato</b> β€” showing their intra-class variations: <b>whole, halved/hulled, and sliced</b>.
373
  These samples provide a visual sense of the input data and the diversity of presentation styles within each category.
374
 
375
  For training purposes, the images were normalized by dividing each pixel value by <b>255</b>.
@@ -398,7 +397,7 @@ def render_report():
398
  st.markdown("""
399
  <div style='text-align: justify;'>
400
 
401
- This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b> in one sample image per class. It’s a <b>visual summary of color composition</b> and can reveal important patterns about your dataset.
402
 
403
  </div>
404
  """, unsafe_allow_html=True)
@@ -421,12 +420,12 @@ def render_report():
421
  - All three channels peak around pixel values 130–150.
422
  - Histogram is **smoother and more centered**, indicating balanced exposure and color.
423
  - Slight red dominance in the mid-range may be due to the red/pink inner rings being more exposed.
424
- - **Interpretation:** Sliced onions offer the most uniform and balanced appearance across all channels β€” useful for training stability.
425
-
426
  **3. Whole**
427
  - Shows **high red peaks** near pixel value 220 and strong green variation around 120–150.
428
  - Blue is less dominant and shows more fluctuation in the mid-range.
429
- - Histogram is noisier with more channel separation β€” likely due to outer skin, glare, or inconsistent lighting.
430
  - **Interpretation:** Whole onions are visually more complex, capturing skins, glare, and full curvature. This leads to higher variation.
431
  - To capture this complexity effectively, using **RGB channels** is essential.
432
 
@@ -474,7 +473,7 @@ def render_report():
474
  - **Lighting Conditions:**
475
  - Halved images show the best exposure balance.
476
  - Sliced images include darker regions, hinting at variability in data quality.
477
- - Whole pears are consistently lit but may be low contrast.
478
 
479
  - **Model Implications:**
480
  - Halved pears are optimal for training due to stable exposure.
@@ -499,8 +498,7 @@ def render_report():
499
  **2. Sliced**
500
  - Displays a **strong green peak near 140** and red around 130–150, which are consistent with the **flesh and seedy outer layer** of strawberries.
501
  - Blue is subdued across the entire range, which is expected for strawberries.
502
- - Histogram is **tighter and more concentrated** than hulled, with fewer highlights and shadows.
503
- - **Interpretation:** Sliced strawberries appear more uniform and less reflective, providing a **clean but slightly less diverse color profile than hulled**.
504
 
505
  **3. Whole**
506
  - Broad red and green peaks from **100–160**, with visible spikes around **140–150**, typical of a fully intact strawberry's surface.
@@ -546,7 +544,7 @@ def render_report():
546
  - Strong **red peak near 150–160** represents the core tomato surface.
547
  - Green and blue show defined peaks around 90–130, suggesting presence of both background and stem/leaf regions.
548
  - Well-defined, multi-peak structure shows moderate saturation and **good contrast**.
549
- - **Interpretation:** Whole tomatoes appear cleanly illuminated and well-captured, with a **balanced mix of object and background**. Likely it will be stable and reliable variation for model training.
550
 
551
  **Dataset Insights**
552
 
@@ -670,7 +668,7 @@ def render_report():
670
  st.image("assets/images/part2_avg_tomato.png", caption="Average image tomato variations", use_container_width=True)
671
 
672
  st.markdown("""
673
- **isual Observations**
674
 
675
  1. **Diced**
676
  - Multiple reddish blobs are visible but still form a centralized mass.
@@ -693,7 +691,7 @@ def render_report():
693
  """)
694
 
695
 
696
- st.markdown("### 3.4 Image Analysis conclusion")
697
 
698
  st.markdown("""
699
  The combination of average images and RGB histogram plots reveals that, in general, all classes (onion, pear, strawberry, tomato) demonstrate a strong central focus in their average images. This is ideal for convolutional neural networks (CNNs), which exploit spatial locality. However, such consistency may limit generalization to real-world, off-centered samples.
@@ -717,7 +715,7 @@ def render_report():
717
  st.markdown("### 4.5 Training and Results")
718
 
719
  st.markdown("""
720
- We used a dataset of **3,000 manually labeled images per class**, with 1,000 images for each intra-class variation β€” **whole**, **halved/hulled**, and **sliced** β€” across four categories: **tomato**, **onion**, **pear**, and **strawberry**.
721
 
722
  The dataset was split using either a **60:20:20 ratio** or, in some cases, a **50:25:25 ratio** for training, validation, and testing, respectively.
723
  """)
@@ -751,9 +749,9 @@ def render_report():
751
  From the average image analysis, these classes showed **higher visual noise and blur**, indicating significant intra-class variation. Without augmentation, the model risked **overfitting to noise** and generalizing poorly. Rotation-based augmentation helped expose the model to diverse orientations and reduce this risk.
752
 
753
  **Why no augmentation for pear and tomato?**
754
- Our analysis of their average images and RGB histograms revealed that these classes were **well-centered**, **well-lit**, and had **limited background variation**. As a result, the model could learn from them effectively without augmentation. Although classification performance may degrade in real-world scenarios with cluttered or complex backgrounds, in **ideal settings** β€” where images are centered and consistently lit β€” these classes are expected to yield **strong performance even without augmentation**.
755
 
756
- Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We also implemented **early stopping** with a patience of 3 epochs β€” meaning training stops if no improvement is seen in validation accuracy for 5 consecutive epochs.
757
  """)
758
 
759
 
@@ -824,10 +822,10 @@ def render_report():
824
  Minor confusion is observed between **hulled and whole**, likely due to similar color and shape. Still, the model maintains excellent overall accuracy and balance.
825
 
826
  3. **Pear**
827
- Perfect classification across all classes β€” no false positives or false negatives β€” reflecting highly consistent, separable visual features in the dataset.
828
 
829
  4. **Tomato**
830
- No misclassifications were made; the model distinguishes **diced, sliced, and whole** tomatoes perfectly β€” likely due to strong shape and texture differences across classes.
831
  """)
832
 
833
  st.markdown("""
@@ -866,7 +864,7 @@ def render_report():
866
 
867
  To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
868
 
869
- The image below shows the **single most-activated channel per layer** for each intraclass variation β€” whole, halved/hulled, and sliced β€” of the main classes: Onion, Pear, Tomato, and Strawberry, across **9 convolutional stages**.
870
  """)
871
 
872
  st.markdown("""
 
23
  st.markdown("""
24
  <div style='text-align: justify'>
25
 
26
+ The project is a <b>recipe recommendation system</b> that allows users to either <b>type a textual query</b> or <b>upload images of food items</b>. Based on the inputs, including user-provided tags and detected ingredients, the application returns the most relevant recipes using semantic search and image classification.
27
 
28
  <h4>1.1 NLP Task:</h4>
29
 
 
55
  st.subheader("2. Introduction")
56
 
57
  st.markdown("""
58
+ In an increasingly digital culinary world, users often look for personalized recipe recommendations based on either what they have at hand or what they crave. While traditional recipe search engines rely heavily on keyword matching, they fail to understand the deeper semantic context of ingredients, cooking methods, and dietary preferences. Similarly, visual recognition of food items can play a key role in enabling intuitive, image-based search experiences, especially for users unsure of ingredient names or spelling.
59
 
60
  This project aims to build an **end-to-end multimodal recipe recommendation system** that supports both **natural language queries** and **image-based inputs**. Users can either type a textual query such as β€œhealthy vegetarian salad” or upload an image of a food item (e.g., pear, onion), and the system will return the most relevant recipes. This is achieved by integrating two advanced deep learning pipelines:
61
 
 
71
  st.markdown("## 3. CV: Produce Classification Task")
72
 
73
  st.markdown("""
74
+ For our Produce Classification task, we manually captured images of **tomato, onion, pear, and strawberry**, collecting a total of **12,000 images**, approximately **3,000 per category**.
75
  Within each category, we introduced **3 intra-class variations**, with around **1,000 samples per variation**, by photographing the produce in different physical states:
76
 
77
  - **Whole:** The item is uncut and intact (e.g., an entire pear or onion).
78
+ - **Halved/Hulled:** The item is partially cut, for example, a strawberry with the hull removed or a fruit sliced in half.
79
  - **Sliced:** The item is cut into smaller segments or slices, such as onion rings or tomato wedges.
80
 
81
  These variations allow the model to generalize better by learning visual features across different presentations, shapes, and cross-sections of each produce type.
 
109
  st.markdown("""
110
  <div style='text-align: justify;'>
111
 
112
+ This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b> per class. It’s a <b>visual summary of color composition</b> and can reveal important patterns about your dataset.
113
 
114
  </div>
115
  """, unsafe_allow_html=True)
116
 
117
+ st.image("assets/images/part1_image_histogram.png", caption="RGB histogram distribution per class", use_container_width=True)
118
 
119
  st.markdown("""
120
  From the above histograms, we observe the following:
 
135
  **1. Onion**
136
  - Red & Green: Sharp peaks at 140–150
137
  - Blue: Broad peak around 100
138
+ - Likely reflects white/yellow onion layers with soft shadows.
139
  - The model may learn to detect mid-range blue with red-green spikes
140
  """)
141
 
 
156
  - Green & Blue: Peaks between 50–120
157
  - Red: Moderate and spread around 100–150
158
  - Suggests soft green/yellow pear tones with consistent lighting
 
159
  """)
160
 
161
  with col4:
 
188
 
189
  1. **Blurriness of All Average Images**
190
  - High blur indicates significant variation in object position, orientation, and size.
191
+ - No consistent alignment or cropping, objects appear in different parts of the frame.
192
 
193
  2. **Centered Color Blobs**
194
  - Each average image displays a dominant center color:
 
228
  Although the typical split is 70:15:15, we opted to test on more data to better evaluate generalization and avoid overfitting.
229
 
230
  Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We employed the **Adam optimizer** with a learning rate of **0.0001**.
231
+ We also implemented **early stopping** with a patience of 5 epochs, meaning training stops if no improvement is seen in validation accuracy for 5 consecutive epochs.
232
  """)
233
 
234
  # Insert training & validation graph
 
252
 
253
  From the confusion matrix, it is evident that the model demonstrates strong **class separability** and **robust generalization**, with only **17 total misclassifications out of 3,035 test samples**.
254
 
255
+ This confirms that the model is capable of distinguishing these classes with high precision.
256
  """)
257
 
258
  st.markdown("""
 
281
 
282
  Most misclassifications occurred between **strawberry** and **onion**. These classes exhibited greater variation in object positioning. In some cases, the objects (onion or strawberry) were **partially hidden**, with only a small portion visible, and were also affected by **poor lighting conditions**. Such combinations made it challenging for the model to make accurate predictions.
283
 
284
+ However, with an F1-score of **99%** for these classes, we can confidently conclude that the model performed well overall, especially on images where the object was **clearly visible**, **fully within the frame**, and in **good general condition**. This further suggests that the model is **robust and ready for real-world use**.
285
 
286
  Notably, we did not observe any misclassifications for **pear** and **tomato**. Based on our earlier data analysis, images in these classes were generally **well-centered and localized**, which likely contributed to the model's high accuracy (100%) in those categories.
287
  """)
 
295
 
296
  To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
297
 
298
+ The image below shows a **single most-activated channel per layer** for each class (Onion, Pear, Tomato, and Strawberry), across **9 convolutional stages**.
299
 
300
  """)
301
 
 
319
 
320
  3. **Deep Layers (Conv 7–9):**
321
  - Feature maps become **coarser and more focused**, losing most spatial resolution.
322
+ - The network now highlights only **key discriminative regions**, often the **center mass** of the object.
323
  - While the original shape is nearly lost, **strong activation in a focused area** indicates high confidence in classification.
324
  - This shows the model is no longer looking at superficial textures, but **has learned what features truly define each class**.
325
 
 
342
  st.markdown("""
343
  As mentioned earlier, we have **3,000 images per class**, and within each class, there are **1,000 images per variation** of **whole**, **halved/hulled**, and **sliced/cored**.
344
 
345
+ These variations not only help make our main classification model more **robust to presentation differences**, but also allow us to analyze how the model performs under **intra-class variation**, that is, variation within the same object category.
346
 
347
  ### Importance of Intra-Class Variation analysis:
348
 
349
+ - In real-world settings (e.g., cooking, grocery shelves, or user-uploaded photos), food items can appear in multiple forms (whole, cut, or partially visible).
350
  - A model that performs well only on whole items may fail when the object is sliced or obscured.
351
  - By training and evaluating a separate **variation classifier**, we can:
352
  - Assess the **distinctiveness** of each variation within a class.
 
368
 
369
  As we are using the **EfficientNet-B0** model, all images in our dataset are resized to <b>224Γ—224</b> pixels. This is the standard input size for EfficientNet and ensures compatibility with pre-trained weights, as well as efficient GPU utilization during training.
370
 
371
+ Below are sample resized images for each class (<b>onion, pear, strawberry, and tomato</b>) showing their intra-class variations: <b>whole, halved/hulled, and sliced</b>.
372
  These samples provide a visual sense of the input data and the diversity of presentation styles within each category.
373
 
374
  For training purposes, the images were normalized by dividing each pixel value by <b>255</b>.
 
397
  st.markdown("""
398
  <div style='text-align: justify;'>
399
 
400
+ This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b> of images per class. It’s a <b>visual summary of color composition</b> and can reveal important patterns about your dataset.
401
 
402
  </div>
403
  """, unsafe_allow_html=True)
 
420
  - All three channels peak around pixel values 130–150.
421
  - Histogram is **smoother and more centered**, indicating balanced exposure and color.
422
  - Slight red dominance in the mid-range may be due to the red/pink inner rings being more exposed.
423
+ - **Interpretation:** Sliced onions offer the most uniform and balanced appearance across all channels.
424
+
425
  **3. Whole**
426
  - Shows **high red peaks** near pixel value 220 and strong green variation around 120–150.
427
  - Blue is less dominant and shows more fluctuation in the mid-range.
428
+ - Histogram is noisier with more channel separation, likely due to outer skin, glare, or inconsistent lighting.
429
  - **Interpretation:** Whole onions are visually more complex, capturing skins, glare, and full curvature. This leads to higher variation.
430
  - To capture this complexity effectively, using **RGB channels** is essential.
431
 
 
473
  - **Lighting Conditions:**
474
  - Halved images show the best exposure balance.
475
  - Sliced images include darker regions, hinting at variability in data quality.
476
+ - Whole pears are consistently lit.
477
 
478
  - **Model Implications:**
479
  - Halved pears are optimal for training due to stable exposure.
 
498
  **2. Sliced**
499
  - Displays a **strong green peak near 140** and red around 130–150, which are consistent with the **flesh and seedy outer layer** of strawberries.
500
  - Blue is subdued across the entire range, which is expected for strawberries.
501
+ - **Interpretation:** Sliced strawberries appear more uniform and less reflective, providing a **clean color profile**.
 
502
 
503
  **3. Whole**
504
  - Broad red and green peaks from **100–160**, with visible spikes around **140–150**, typical of a fully intact strawberry's surface.
 
544
  - Strong **red peak near 150–160** represents the core tomato surface.
545
  - Green and blue show defined peaks around 90–130, suggesting presence of both background and stem/leaf regions.
546
  - Well-defined, multi-peak structure shows moderate saturation and **good contrast**.
547
+ - **Interpretation:** Whole tomatoes appear cleanly illuminated and well-captured, with a **balanced mix of object and background**.
548
 
549
  **Dataset Insights**
550
 
 
668
  st.image("assets/images/part2_avg_tomato.png", caption="Average image tomato variations", use_container_width=True)
669
 
670
  st.markdown("""
671
+ **Visual Observations**
672
 
673
  1. **Diced**
674
  - Multiple reddish blobs are visible but still form a centralized mass.
 
691
  """)
692
 
693
 
694
+ st.markdown("### 4.4 Image Analysis conclusion")
695
 
696
  st.markdown("""
697
  The combination of average images and RGB histogram plots reveals that, in general, all classes (onion, pear, strawberry, tomato) demonstrate a strong central focus in their average images. This is ideal for convolutional neural networks (CNNs), which exploit spatial locality. However, such consistency may limit generalization to real-world, off-centered samples.
 
715
  st.markdown("### 4.5 Training and Results")
716
 
717
  st.markdown("""
718
+ We used a dataset of **3,000 manually labeled images per class**, with 1,000 images for each intra-class variation (**whole**, **halved/hulled**, and **sliced**) across four categories: **tomato**, **onion**, **pear**, and **strawberry**.
719
 
720
  The dataset was split using either a **60:20:20 ratio** or, in some cases, a **50:25:25 ratio** for training, validation, and testing, respectively.
721
  """)
 
749
  From the average image analysis, these classes showed **higher visual noise and blur**, indicating significant intra-class variation. Without augmentation, the model risked **overfitting to noise** and generalizing poorly. Rotation-based augmentation helped expose the model to diverse orientations and reduce this risk.
750
 
751
  **Why no augmentation for pear and tomato?**
752
+ Our analysis of their average images and RGB histograms revealed that these classes were **well-centered**, **well-lit**, and had **limited background variation**. As a result, the model could learn from them effectively without augmentation. Although classification performance may degrade in real-world scenarios with cluttered or complex backgrounds, in **ideal settings** where images are centered and consistently lit, these classes are expected to yield **strong performance even without augmentation**.
753
 
754
+ Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We also implemented **early stopping** with a patience of 3 epochs, meaning training stops if no improvement is seen in validation accuracy for 3 consecutive epochs.
755
  """)
756
 
757
 
 
822
  Minor confusion is observed between **hulled and whole**, likely due to similar color and shape. Still, the model maintains excellent overall accuracy and balance.
823
 
824
  3. **Pear**
825
+ Perfect classification across all classes (no false positives or false negatives), reflecting highly consistent, separable visual features in the dataset.
826
 
827
  4. **Tomato**
828
+ No misclassifications were made. The model distinguishes **diced, sliced, and whole** tomatoes perfectly, likely due to strong shape and texture differences across classes.
829
  """)
830
 
831
  st.markdown("""
 
864
 
865
  To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
866
 
867
+ The image below shows the **single most-activated channel per layer** for each intraclass variation (whole, halved/hulled, and sliced) of the main classes: Onion, Pear, Tomato, and Strawberry, across **9 convolutional stages**.
868
  """)
869
 
870
  st.markdown("""