Spaces:
Sleeping
Sleeping
sakshamlakhera
commited on
Commit
Β·
b485e4a
1
Parent(s):
6a4ece9
fixes
Browse files- pages/4_Report.py +31 -33
pages/4_Report.py
CHANGED
@@ -23,7 +23,7 @@ def render_report():
|
|
23 |
st.markdown("""
|
24 |
<div style='text-align: justify'>
|
25 |
|
26 |
-
The project is a <b>recipe recommendation system</b> that allows users to either <b>type a textual query</b> or <b>upload images of food items</b>. Based on the inputs
|
27 |
|
28 |
<h4>1.1 NLP Task:</h4>
|
29 |
|
@@ -55,7 +55,7 @@ def render_report():
|
|
55 |
st.subheader("2. Introduction")
|
56 |
|
57 |
st.markdown("""
|
58 |
-
In an increasingly digital culinary world, users often look for personalized recipe recommendations based on either what they have at hand or what they crave. While traditional recipe search engines rely heavily on keyword matching, they fail to understand the deeper semantic context of ingredients, cooking methods, and dietary preferences. Similarly, visual recognition of food items can play a key role in enabling intuitive, image-based search experiences
|
59 |
|
60 |
This project aims to build an **end-to-end multimodal recipe recommendation system** that supports both **natural language queries** and **image-based inputs**. Users can either type a textual query such as βhealthy vegetarian saladβ or upload an image of a food item (e.g., pear, onion), and the system will return the most relevant recipes. This is achieved by integrating two advanced deep learning pipelines:
|
61 |
|
@@ -71,11 +71,11 @@ def render_report():
|
|
71 |
st.markdown("## 3. CV: Produce Classification Task")
|
72 |
|
73 |
st.markdown("""
|
74 |
-
For our Produce Classification task, we manually captured images of **tomato, onion, pear, and strawberry**, collecting a total of **12,000 images
|
75 |
Within each category, we introduced **3 intra-class variations**, with around **1,000 samples per variation**, by photographing the produce in different physical states:
|
76 |
|
77 |
- **Whole:** The item is uncut and intact (e.g., an entire pear or onion).
|
78 |
-
- **Halved/Hulled:** The item is partially cut
|
79 |
- **Sliced:** The item is cut into smaller segments or slices, such as onion rings or tomato wedges.
|
80 |
|
81 |
These variations allow the model to generalize better by learning visual features across different presentations, shapes, and cross-sections of each produce type.
|
@@ -109,12 +109,12 @@ def render_report():
|
|
109 |
st.markdown("""
|
110 |
<div style='text-align: justify;'>
|
111 |
|
112 |
-
This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b>
|
113 |
|
114 |
</div>
|
115 |
""", unsafe_allow_html=True)
|
116 |
|
117 |
-
st.image("assets/images/part1_image_histogram.png", caption="RGB histogram distribution
|
118 |
|
119 |
st.markdown("""
|
120 |
From the above histograms, we observe the following:
|
@@ -135,7 +135,7 @@ def render_report():
|
|
135 |
**1. Onion**
|
136 |
- Red & Green: Sharp peaks at 140β150
|
137 |
- Blue: Broad peak around 100
|
138 |
-
- Likely reflects white/yellow onion layers with soft shadows
|
139 |
- The model may learn to detect mid-range blue with red-green spikes
|
140 |
""")
|
141 |
|
@@ -156,7 +156,6 @@ def render_report():
|
|
156 |
- Green & Blue: Peaks between 50β120
|
157 |
- Red: Moderate and spread around 100β150
|
158 |
- Suggests soft green/yellow pear tones with consistent lighting
|
159 |
-
- Minimal intra-class variation makes this class stable for classification
|
160 |
""")
|
161 |
|
162 |
with col4:
|
@@ -189,7 +188,7 @@ def render_report():
|
|
189 |
|
190 |
1. **Blurriness of All Average Images**
|
191 |
- High blur indicates significant variation in object position, orientation, and size.
|
192 |
-
- No consistent alignment or cropping
|
193 |
|
194 |
2. **Centered Color Blobs**
|
195 |
- Each average image displays a dominant center color:
|
@@ -229,7 +228,7 @@ def render_report():
|
|
229 |
Although the typical split is 70:15:15, we opted to test on more data to better evaluate generalization and avoid overfitting.
|
230 |
|
231 |
Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We employed the **Adam optimizer** with a learning rate of **0.0001**.
|
232 |
-
We also implemented **early stopping** with a patience of 5 epochs
|
233 |
""")
|
234 |
|
235 |
# Insert training & validation graph
|
@@ -253,7 +252,7 @@ def render_report():
|
|
253 |
|
254 |
From the confusion matrix, it is evident that the model demonstrates strong **class separability** and **robust generalization**, with only **17 total misclassifications out of 3,035 test samples**.
|
255 |
|
256 |
-
This confirms that the model is capable of distinguishing
|
257 |
""")
|
258 |
|
259 |
st.markdown("""
|
@@ -282,7 +281,7 @@ def render_report():
|
|
282 |
|
283 |
Most misclassifications occurred between **strawberry** and **onion**. These classes exhibited greater variation in object positioning. In some cases, the objects (onion or strawberry) were **partially hidden**, with only a small portion visible, and were also affected by **poor lighting conditions**. Such combinations made it challenging for the model to make accurate predictions.
|
284 |
|
285 |
-
However, with an F1-score of **99%** for these classes, we can confidently conclude that the model performed well overall
|
286 |
|
287 |
Notably, we did not observe any misclassifications for **pear** and **tomato**. Based on our earlier data analysis, images in these classes were generally **well-centered and localized**, which likely contributed to the model's high accuracy (100%) in those categories.
|
288 |
""")
|
@@ -296,7 +295,7 @@ def render_report():
|
|
296 |
|
297 |
To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
|
298 |
|
299 |
-
The image below shows a **single most-activated channel per layer** for each class
|
300 |
|
301 |
""")
|
302 |
|
@@ -320,7 +319,7 @@ def render_report():
|
|
320 |
|
321 |
3. **Deep Layers (Conv 7β9):**
|
322 |
- Feature maps become **coarser and more focused**, losing most spatial resolution.
|
323 |
-
- The network now highlights only **key discriminative regions
|
324 |
- While the original shape is nearly lost, **strong activation in a focused area** indicates high confidence in classification.
|
325 |
- This shows the model is no longer looking at superficial textures, but **has learned what features truly define each class**.
|
326 |
|
@@ -343,11 +342,11 @@ def render_report():
|
|
343 |
st.markdown("""
|
344 |
As mentioned earlier, we have **3,000 images per class**, and within each class, there are **1,000 images per variation** of **whole**, **halved/hulled**, and **sliced/cored**.
|
345 |
|
346 |
-
These variations not only help make our main classification model more **robust to presentation differences**, but also allow us to analyze how the model performs under **intra-class variation
|
347 |
|
348 |
### Importance of Intra-Class Variation analysis:
|
349 |
|
350 |
-
- In real-world settings (e.g., cooking, grocery shelves, or user-uploaded photos), food items can appear in multiple forms
|
351 |
- A model that performs well only on whole items may fail when the object is sliced or obscured.
|
352 |
- By training and evaluating a separate **variation classifier**, we can:
|
353 |
- Assess the **distinctiveness** of each variation within a class.
|
@@ -369,7 +368,7 @@ def render_report():
|
|
369 |
|
370 |
As we are using the **EfficientNet-B0** model, all images in our dataset are resized to <b>224Γ224</b> pixels. This is the standard input size for EfficientNet and ensures compatibility with pre-trained weights, as well as efficient GPU utilization during training.
|
371 |
|
372 |
-
Below are sample resized images for each class
|
373 |
These samples provide a visual sense of the input data and the diversity of presentation styles within each category.
|
374 |
|
375 |
For training purposes, the images were normalized by dividing each pixel value by <b>255</b>.
|
@@ -398,7 +397,7 @@ def render_report():
|
|
398 |
st.markdown("""
|
399 |
<div style='text-align: justify;'>
|
400 |
|
401 |
-
This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b>
|
402 |
|
403 |
</div>
|
404 |
""", unsafe_allow_html=True)
|
@@ -421,12 +420,12 @@ def render_report():
|
|
421 |
- All three channels peak around pixel values 130β150.
|
422 |
- Histogram is **smoother and more centered**, indicating balanced exposure and color.
|
423 |
- Slight red dominance in the mid-range may be due to the red/pink inner rings being more exposed.
|
424 |
-
- **Interpretation:** Sliced onions offer the most uniform and balanced appearance across all channels
|
425 |
-
|
426 |
**3. Whole**
|
427 |
- Shows **high red peaks** near pixel value 220 and strong green variation around 120β150.
|
428 |
- Blue is less dominant and shows more fluctuation in the mid-range.
|
429 |
-
- Histogram is noisier with more channel separation
|
430 |
- **Interpretation:** Whole onions are visually more complex, capturing skins, glare, and full curvature. This leads to higher variation.
|
431 |
- To capture this complexity effectively, using **RGB channels** is essential.
|
432 |
|
@@ -474,7 +473,7 @@ def render_report():
|
|
474 |
- **Lighting Conditions:**
|
475 |
- Halved images show the best exposure balance.
|
476 |
- Sliced images include darker regions, hinting at variability in data quality.
|
477 |
-
- Whole pears are consistently lit
|
478 |
|
479 |
- **Model Implications:**
|
480 |
- Halved pears are optimal for training due to stable exposure.
|
@@ -499,8 +498,7 @@ def render_report():
|
|
499 |
**2. Sliced**
|
500 |
- Displays a **strong green peak near 140** and red around 130β150, which are consistent with the **flesh and seedy outer layer** of strawberries.
|
501 |
- Blue is subdued across the entire range, which is expected for strawberries.
|
502 |
-
-
|
503 |
-
- **Interpretation:** Sliced strawberries appear more uniform and less reflective, providing a **clean but slightly less diverse color profile than hulled**.
|
504 |
|
505 |
**3. Whole**
|
506 |
- Broad red and green peaks from **100β160**, with visible spikes around **140β150**, typical of a fully intact strawberry's surface.
|
@@ -546,7 +544,7 @@ def render_report():
|
|
546 |
- Strong **red peak near 150β160** represents the core tomato surface.
|
547 |
- Green and blue show defined peaks around 90β130, suggesting presence of both background and stem/leaf regions.
|
548 |
- Well-defined, multi-peak structure shows moderate saturation and **good contrast**.
|
549 |
-
- **Interpretation:** Whole tomatoes appear cleanly illuminated and well-captured, with a **balanced mix of object and background**.
|
550 |
|
551 |
**Dataset Insights**
|
552 |
|
@@ -670,7 +668,7 @@ def render_report():
|
|
670 |
st.image("assets/images/part2_avg_tomato.png", caption="Average image tomato variations", use_container_width=True)
|
671 |
|
672 |
st.markdown("""
|
673 |
-
**
|
674 |
|
675 |
1. **Diced**
|
676 |
- Multiple reddish blobs are visible but still form a centralized mass.
|
@@ -693,7 +691,7 @@ def render_report():
|
|
693 |
""")
|
694 |
|
695 |
|
696 |
-
st.markdown("###
|
697 |
|
698 |
st.markdown("""
|
699 |
The combination of average images and RGB histogram plots reveals that, in general, all classes (onion, pear, strawberry, tomato) demonstrate a strong central focus in their average images. This is ideal for convolutional neural networks (CNNs), which exploit spatial locality. However, such consistency may limit generalization to real-world, off-centered samples.
|
@@ -717,7 +715,7 @@ def render_report():
|
|
717 |
st.markdown("### 4.5 Training and Results")
|
718 |
|
719 |
st.markdown("""
|
720 |
-
We used a dataset of **3,000 manually labeled images per class**, with 1,000 images for each intra-class variation
|
721 |
|
722 |
The dataset was split using either a **60:20:20 ratio** or, in some cases, a **50:25:25 ratio** for training, validation, and testing, respectively.
|
723 |
""")
|
@@ -751,9 +749,9 @@ def render_report():
|
|
751 |
From the average image analysis, these classes showed **higher visual noise and blur**, indicating significant intra-class variation. Without augmentation, the model risked **overfitting to noise** and generalizing poorly. Rotation-based augmentation helped expose the model to diverse orientations and reduce this risk.
|
752 |
|
753 |
**Why no augmentation for pear and tomato?**
|
754 |
-
Our analysis of their average images and RGB histograms revealed that these classes were **well-centered**, **well-lit**, and had **limited background variation**. As a result, the model could learn from them effectively without augmentation. Although classification performance may degrade in real-world scenarios with cluttered or complex backgrounds, in **ideal settings**
|
755 |
|
756 |
-
Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We also implemented **early stopping** with a patience of 3 epochs
|
757 |
""")
|
758 |
|
759 |
|
@@ -824,10 +822,10 @@ def render_report():
|
|
824 |
Minor confusion is observed between **hulled and whole**, likely due to similar color and shape. Still, the model maintains excellent overall accuracy and balance.
|
825 |
|
826 |
3. **Pear**
|
827 |
-
Perfect classification across all classes
|
828 |
|
829 |
4. **Tomato**
|
830 |
-
No misclassifications were made
|
831 |
""")
|
832 |
|
833 |
st.markdown("""
|
@@ -866,7 +864,7 @@ def render_report():
|
|
866 |
|
867 |
To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
|
868 |
|
869 |
-
The image below shows the **single most-activated channel per layer** for each intraclass variation
|
870 |
""")
|
871 |
|
872 |
st.markdown("""
|
|
|
23 |
st.markdown("""
|
24 |
<div style='text-align: justify'>
|
25 |
|
26 |
+
The project is a <b>recipe recommendation system</b> that allows users to either <b>type a textual query</b> or <b>upload images of food items</b>. Based on the inputs, including user-provided tags and detected ingredients, the application returns the most relevant recipes using semantic search and image classification.
|
27 |
|
28 |
<h4>1.1 NLP Task:</h4>
|
29 |
|
|
|
55 |
st.subheader("2. Introduction")
|
56 |
|
57 |
st.markdown("""
|
58 |
+
In an increasingly digital culinary world, users often look for personalized recipe recommendations based on either what they have at hand or what they crave. While traditional recipe search engines rely heavily on keyword matching, they fail to understand the deeper semantic context of ingredients, cooking methods, and dietary preferences. Similarly, visual recognition of food items can play a key role in enabling intuitive, image-based search experiences, especially for users unsure of ingredient names or spelling.
|
59 |
|
60 |
This project aims to build an **end-to-end multimodal recipe recommendation system** that supports both **natural language queries** and **image-based inputs**. Users can either type a textual query such as βhealthy vegetarian saladβ or upload an image of a food item (e.g., pear, onion), and the system will return the most relevant recipes. This is achieved by integrating two advanced deep learning pipelines:
|
61 |
|
|
|
71 |
st.markdown("## 3. CV: Produce Classification Task")
|
72 |
|
73 |
st.markdown("""
|
74 |
+
For our Produce Classification task, we manually captured images of **tomato, onion, pear, and strawberry**, collecting a total of **12,000 images**, approximately **3,000 per category**.
|
75 |
Within each category, we introduced **3 intra-class variations**, with around **1,000 samples per variation**, by photographing the produce in different physical states:
|
76 |
|
77 |
- **Whole:** The item is uncut and intact (e.g., an entire pear or onion).
|
78 |
+
- **Halved/Hulled:** The item is partially cut, for example, a strawberry with the hull removed or a fruit sliced in half.
|
79 |
- **Sliced:** The item is cut into smaller segments or slices, such as onion rings or tomato wedges.
|
80 |
|
81 |
These variations allow the model to generalize better by learning visual features across different presentations, shapes, and cross-sections of each produce type.
|
|
|
109 |
st.markdown("""
|
110 |
<div style='text-align: justify;'>
|
111 |
|
112 |
+
This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b> per class. Itβs a <b>visual summary of color composition</b> and can reveal important patterns about your dataset.
|
113 |
|
114 |
</div>
|
115 |
""", unsafe_allow_html=True)
|
116 |
|
117 |
+
st.image("assets/images/part1_image_histogram.png", caption="RGB histogram distribution per class", use_container_width=True)
|
118 |
|
119 |
st.markdown("""
|
120 |
From the above histograms, we observe the following:
|
|
|
135 |
**1. Onion**
|
136 |
- Red & Green: Sharp peaks at 140β150
|
137 |
- Blue: Broad peak around 100
|
138 |
+
- Likely reflects white/yellow onion layers with soft shadows.
|
139 |
- The model may learn to detect mid-range blue with red-green spikes
|
140 |
""")
|
141 |
|
|
|
156 |
- Green & Blue: Peaks between 50β120
|
157 |
- Red: Moderate and spread around 100β150
|
158 |
- Suggests soft green/yellow pear tones with consistent lighting
|
|
|
159 |
""")
|
160 |
|
161 |
with col4:
|
|
|
188 |
|
189 |
1. **Blurriness of All Average Images**
|
190 |
- High blur indicates significant variation in object position, orientation, and size.
|
191 |
+
- No consistent alignment or cropping, objects appear in different parts of the frame.
|
192 |
|
193 |
2. **Centered Color Blobs**
|
194 |
- Each average image displays a dominant center color:
|
|
|
228 |
Although the typical split is 70:15:15, we opted to test on more data to better evaluate generalization and avoid overfitting.
|
229 |
|
230 |
Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We employed the **Adam optimizer** with a learning rate of **0.0001**.
|
231 |
+
We also implemented **early stopping** with a patience of 5 epochs, meaning training stops if no improvement is seen in validation accuracy for 5 consecutive epochs.
|
232 |
""")
|
233 |
|
234 |
# Insert training & validation graph
|
|
|
252 |
|
253 |
From the confusion matrix, it is evident that the model demonstrates strong **class separability** and **robust generalization**, with only **17 total misclassifications out of 3,035 test samples**.
|
254 |
|
255 |
+
This confirms that the model is capable of distinguishing these classes with high precision.
|
256 |
""")
|
257 |
|
258 |
st.markdown("""
|
|
|
281 |
|
282 |
Most misclassifications occurred between **strawberry** and **onion**. These classes exhibited greater variation in object positioning. In some cases, the objects (onion or strawberry) were **partially hidden**, with only a small portion visible, and were also affected by **poor lighting conditions**. Such combinations made it challenging for the model to make accurate predictions.
|
283 |
|
284 |
+
However, with an F1-score of **99%** for these classes, we can confidently conclude that the model performed well overall, especially on images where the object was **clearly visible**, **fully within the frame**, and in **good general condition**. This further suggests that the model is **robust and ready for real-world use**.
|
285 |
|
286 |
Notably, we did not observe any misclassifications for **pear** and **tomato**. Based on our earlier data analysis, images in these classes were generally **well-centered and localized**, which likely contributed to the model's high accuracy (100%) in those categories.
|
287 |
""")
|
|
|
295 |
|
296 |
To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
|
297 |
|
298 |
+
The image below shows a **single most-activated channel per layer** for each class (Onion, Pear, Tomato, and Strawberry), across **9 convolutional stages**.
|
299 |
|
300 |
""")
|
301 |
|
|
|
319 |
|
320 |
3. **Deep Layers (Conv 7β9):**
|
321 |
- Feature maps become **coarser and more focused**, losing most spatial resolution.
|
322 |
+
- The network now highlights only **key discriminative regions**, often the **center mass** of the object.
|
323 |
- While the original shape is nearly lost, **strong activation in a focused area** indicates high confidence in classification.
|
324 |
- This shows the model is no longer looking at superficial textures, but **has learned what features truly define each class**.
|
325 |
|
|
|
342 |
st.markdown("""
|
343 |
As mentioned earlier, we have **3,000 images per class**, and within each class, there are **1,000 images per variation** of **whole**, **halved/hulled**, and **sliced/cored**.
|
344 |
|
345 |
+
These variations not only help make our main classification model more **robust to presentation differences**, but also allow us to analyze how the model performs under **intra-class variation**, that is, variation within the same object category.
|
346 |
|
347 |
### Importance of Intra-Class Variation analysis:
|
348 |
|
349 |
+
- In real-world settings (e.g., cooking, grocery shelves, or user-uploaded photos), food items can appear in multiple forms (whole, cut, or partially visible).
|
350 |
- A model that performs well only on whole items may fail when the object is sliced or obscured.
|
351 |
- By training and evaluating a separate **variation classifier**, we can:
|
352 |
- Assess the **distinctiveness** of each variation within a class.
|
|
|
368 |
|
369 |
As we are using the **EfficientNet-B0** model, all images in our dataset are resized to <b>224Γ224</b> pixels. This is the standard input size for EfficientNet and ensures compatibility with pre-trained weights, as well as efficient GPU utilization during training.
|
370 |
|
371 |
+
Below are sample resized images for each class (<b>onion, pear, strawberry, and tomato</b>) showing their intra-class variations: <b>whole, halved/hulled, and sliced</b>.
|
372 |
These samples provide a visual sense of the input data and the diversity of presentation styles within each category.
|
373 |
|
374 |
For training purposes, the images were normalized by dividing each pixel value by <b>255</b>.
|
|
|
397 |
st.markdown("""
|
398 |
<div style='text-align: justify;'>
|
399 |
|
400 |
+
This RGB histogram plot shows the <b>distribution of pixel intensities</b> for the <b>Red, Green, and Blue channels</b> of images per class. Itβs a <b>visual summary of color composition</b> and can reveal important patterns about your dataset.
|
401 |
|
402 |
</div>
|
403 |
""", unsafe_allow_html=True)
|
|
|
420 |
- All three channels peak around pixel values 130β150.
|
421 |
- Histogram is **smoother and more centered**, indicating balanced exposure and color.
|
422 |
- Slight red dominance in the mid-range may be due to the red/pink inner rings being more exposed.
|
423 |
+
- **Interpretation:** Sliced onions offer the most uniform and balanced appearance across all channels.
|
424 |
+
|
425 |
**3. Whole**
|
426 |
- Shows **high red peaks** near pixel value 220 and strong green variation around 120β150.
|
427 |
- Blue is less dominant and shows more fluctuation in the mid-range.
|
428 |
+
- Histogram is noisier with more channel separation, likely due to outer skin, glare, or inconsistent lighting.
|
429 |
- **Interpretation:** Whole onions are visually more complex, capturing skins, glare, and full curvature. This leads to higher variation.
|
430 |
- To capture this complexity effectively, using **RGB channels** is essential.
|
431 |
|
|
|
473 |
- **Lighting Conditions:**
|
474 |
- Halved images show the best exposure balance.
|
475 |
- Sliced images include darker regions, hinting at variability in data quality.
|
476 |
+
- Whole pears are consistently lit.
|
477 |
|
478 |
- **Model Implications:**
|
479 |
- Halved pears are optimal for training due to stable exposure.
|
|
|
498 |
**2. Sliced**
|
499 |
- Displays a **strong green peak near 140** and red around 130β150, which are consistent with the **flesh and seedy outer layer** of strawberries.
|
500 |
- Blue is subdued across the entire range, which is expected for strawberries.
|
501 |
+
- **Interpretation:** Sliced strawberries appear more uniform and less reflective, providing a **clean color profile**.
|
|
|
502 |
|
503 |
**3. Whole**
|
504 |
- Broad red and green peaks from **100β160**, with visible spikes around **140β150**, typical of a fully intact strawberry's surface.
|
|
|
544 |
- Strong **red peak near 150β160** represents the core tomato surface.
|
545 |
- Green and blue show defined peaks around 90β130, suggesting presence of both background and stem/leaf regions.
|
546 |
- Well-defined, multi-peak structure shows moderate saturation and **good contrast**.
|
547 |
+
- **Interpretation:** Whole tomatoes appear cleanly illuminated and well-captured, with a **balanced mix of object and background**.
|
548 |
|
549 |
**Dataset Insights**
|
550 |
|
|
|
668 |
st.image("assets/images/part2_avg_tomato.png", caption="Average image tomato variations", use_container_width=True)
|
669 |
|
670 |
st.markdown("""
|
671 |
+
**Visual Observations**
|
672 |
|
673 |
1. **Diced**
|
674 |
- Multiple reddish blobs are visible but still form a centralized mass.
|
|
|
691 |
""")
|
692 |
|
693 |
|
694 |
+
st.markdown("### 4.4 Image Analysis conclusion")
|
695 |
|
696 |
st.markdown("""
|
697 |
The combination of average images and RGB histogram plots reveals that, in general, all classes (onion, pear, strawberry, tomato) demonstrate a strong central focus in their average images. This is ideal for convolutional neural networks (CNNs), which exploit spatial locality. However, such consistency may limit generalization to real-world, off-centered samples.
|
|
|
715 |
st.markdown("### 4.5 Training and Results")
|
716 |
|
717 |
st.markdown("""
|
718 |
+
We used a dataset of **3,000 manually labeled images per class**, with 1,000 images for each intra-class variation (**whole**, **halved/hulled**, and **sliced**) across four categories: **tomato**, **onion**, **pear**, and **strawberry**.
|
719 |
|
720 |
The dataset was split using either a **60:20:20 ratio** or, in some cases, a **50:25:25 ratio** for training, validation, and testing, respectively.
|
721 |
""")
|
|
|
749 |
From the average image analysis, these classes showed **higher visual noise and blur**, indicating significant intra-class variation. Without augmentation, the model risked **overfitting to noise** and generalizing poorly. Rotation-based augmentation helped expose the model to diverse orientations and reduce this risk.
|
750 |
|
751 |
**Why no augmentation for pear and tomato?**
|
752 |
+
Our analysis of their average images and RGB histograms revealed that these classes were **well-centered**, **well-lit**, and had **limited background variation**. As a result, the model could learn from them effectively without augmentation. Although classification performance may degrade in real-world scenarios with cluttered or complex backgrounds, in **ideal settings** where images are centered and consistently lit, these classes are expected to yield **strong performance even without augmentation**.
|
753 |
|
754 |
+
Due to hardware constraints (GPU memory limits), we used a **batch size of 32**. We also implemented **early stopping** with a patience of 3 epochs, meaning training stops if no improvement is seen in validation accuracy for 3 consecutive epochs.
|
755 |
""")
|
756 |
|
757 |
|
|
|
822 |
Minor confusion is observed between **hulled and whole**, likely due to similar color and shape. Still, the model maintains excellent overall accuracy and balance.
|
823 |
|
824 |
3. **Pear**
|
825 |
+
Perfect classification across all classes (no false positives or false negatives), reflecting highly consistent, separable visual features in the dataset.
|
826 |
|
827 |
4. **Tomato**
|
828 |
+
No misclassifications were made. The model distinguishes **diced, sliced, and whole** tomatoes perfectly, likely due to strong shape and texture differences across classes.
|
829 |
""")
|
830 |
|
831 |
st.markdown("""
|
|
|
864 |
|
865 |
To understand what our model has actually learned and how it perceives different food items internally, we visualized **feature maps** extracted from various convolutional layers of **EfficientNet-B0**.
|
866 |
|
867 |
+
The image below shows the **single most-activated channel per layer** for each intraclass variation (whole, halved/hulled, and sliced) of the main classes: Onion, Pear, Tomato, and Strawberry, across **9 convolutional stages**.
|
868 |
""")
|
869 |
|
870 |
st.markdown("""
|