yazodi commited on
Commit
b67b96f
·
verified ·
1 Parent(s): f2d37a9

Upload 5 files

Browse files
Files changed (5) hide show
  1. README.md +116 -20
  2. app.py +50 -0
  3. model_columns.pkl +3 -0
  4. requirements.txt +5 -3
  5. rf_model.pkl +3 -0
README.md CHANGED
@@ -1,20 +1,116 @@
1
- ---
2
- title: Blueberry Yield Regression App
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
- license: mit
13
- ---
14
-
15
- # Welcome to Streamlit!
16
-
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
-
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "🍇 Blueberry Yield Regression"
3
+ emoji: 🌾
4
+ colorFrom: indigo
5
+ colorTo: green
6
+ sdk: streamlit
7
+ app_file: app.py
8
+ pinned: true
9
+ license: mit
10
+ tags:
11
+ - regression
12
+ - machine-learning
13
+ - streamlit
14
+ - kaggle
15
+ - agriculture
16
+ ---
17
+
18
+ # 🍇 Blueberry Yield Prediction with Machine Learning
19
+
20
+ This project is a complete machine learning pipeline that predicts the **yield of wild blueberries** using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.
21
+
22
+ ## 📌 Project Type
23
+
24
+ - Supervised Learning
25
+ - Regression Problem
26
+
27
+ ---
28
+
29
+ ## 🔍 Problem Description
30
+
31
+ Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the **Kaggle Playground Series S3E14** competition and contains information on:
32
+
33
+ - Different species of pollinators (honeybee, bumblebee, osmia...)
34
+ - Environmental conditions (rainfall days, temperature ranges...)
35
+ - Fruit attributes (fruit mass, fruit set, seed count...)
36
+
37
+ 🎯 **Goal**: Predict the `yield` (kg/ha) of blueberries based on input features.
38
+
39
+ ---
40
+
41
+ ## 📊 Dataset Info
42
+
43
+ - `train.csv`: 15,289 samples with 18 features
44
+ - `test.csv`: same structure, no target
45
+ - No missing values, clean numerical data
46
+
47
+ ---
48
+
49
+ ## 📈 What We Did (Pipeline Summary)
50
+
51
+ 1. **EDA (Exploratory Data Analysis)**
52
+ - Checked for missing values ✅
53
+ - Analyzed feature distributions & target (`yield`)
54
+ - Built correlation heatmaps — strongest positive correlations:
55
+ - `fruitmass`, `fruitset`, `seeds`
56
+
57
+ 2. **Data Preprocessing**
58
+ - Removed `id` column
59
+ - Standard feature selection based on correlation
60
+ - No categorical encoding needed (all numerical)
61
+
62
+ 3. **Model Training**
63
+ - Model: `RandomForestRegressor`
64
+ - Train-Test Split: 80/20
65
+ - **Results**:
66
+ - RMSE ≈ **573.8**
67
+ - R² Score ≈ **0.81** ✅
68
+
69
+ 4. **Test Prediction & Submission**
70
+ - Predictions made on `test.csv`
71
+ - `submission.csv` generated for Kaggle submission
72
+
73
+ 5. **Streamlit App**
74
+ - Users input bee counts, rain days, and fruit measurements
75
+ - Predicts blueberry yield in kg/ha
76
+ - Uses trained model (`rf_model.pkl`) behind the scenes
77
+
78
+ ---
79
+
80
+ ## 🚀 Try it Online
81
+
82
+ 🌐 You can try this app live here:
83
+ [Hugging Face Space Link](https://huggingface.co/spaces/yazodi/blueberry-yield-regression-app)
84
+
85
+ ---
86
+
87
+ ## 🔮 What Could Be Improved?
88
+
89
+ | Area | Suggestion |
90
+ |------|------------|
91
+ | Feature Engineering | Create interaction terms, try log/ratio features |
92
+ | Model | Try LightGBM, XGBoost, or stacking |
93
+ | Tuning | GridSearchCV or Optuna for hyperparameter optimization |
94
+ | Visualization | Add interactive charts in Streamlit app |
95
+ | Real-World Data | Add satellite weather data, soil types, historical trends |
96
+
97
+ ---
98
+
99
+ ## 📁 Project Structure
100
+
101
+ 📦 blueberry-yield-regression
102
+ ├── app.py
103
+ ├── rf_model.pkl
104
+ ├── model_columns.pkl
105
+ ├── requirements.txt
106
+ ├── submission.csv
107
+ └── README.md
108
+
109
+
110
+ ---
111
+
112
+ ## 📜 License
113
+
114
+ MIT License – Free to use, modify and distribute.
115
+
116
+ ---
app.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ import joblib
5
+
6
+ # Başlık
7
+ st.title("🍇 Blueberry Yield Prediction App")
8
+ st.write("Bu uygulama, çevresel ve biyolojik faktörlere göre yaban mersini verimini tahmin eder.")
9
+
10
+ # Giriş alanları
11
+ clonesize = st.slider("Klon Boyutu", 0.0, 10.0, 1.0)
12
+ honeybee = st.slider("Bal Arısı Sayısı", 0.0, 10.0, 1.0)
13
+ bumbles = st.slider("Bumblebee Sayısı", 0.0, 10.0, 1.0)
14
+ andrena = st.slider("Andrena Sayısı", 0.0, 10.0, 1.0)
15
+ osmia = st.slider("Osmia Sayısı", 0.0, 10.0, 1.0)
16
+ RainingDays = st.slider("Yağmurlu Günler", 0.0, 100.0, 20.0)
17
+ AverageRainingDays = st.slider("Ortalama Yağmurlu Günler", 0.0, 100.0, 30.0)
18
+ fruitset = st.slider("Fruit Set", 0.0, 1.0, 0.5)
19
+ fruitmass = st.slider("Fruit Mass", 0.0, 10.0, 5.0)
20
+ seeds = st.slider("Tohum Sayısı", 0.0, 100.0, 50.0)
21
+
22
+ # DataFrame'e dönüştür
23
+ user_input = pd.DataFrame([{
24
+ "clonesize": clonesize,
25
+ "honeybee": honeybee,
26
+ "bumbles": bumbles,
27
+ "andrena": andrena,
28
+ "osmia": osmia,
29
+ "RainingDays": RainingDays,
30
+ "AverageRainingDays": AverageRainingDays,
31
+ "fruitset": fruitset,
32
+ "fruitmass": fruitmass,
33
+ "seeds": seeds
34
+ }])
35
+
36
+ # Model ve sütunlar yükleniyor
37
+ model = joblib.load("rf_model.pkl")
38
+ model_columns = joblib.load("model_columns.pkl")
39
+
40
+ # Eksik sütunları ekle
41
+ for col in model_columns:
42
+ if col not in user_input.columns:
43
+ user_input[col] = 0
44
+
45
+ user_input = user_input[model_columns]
46
+
47
+ # Tahmin
48
+ if st.button("Tahmini Göster"):
49
+ pred = model.predict(user_input)[0]
50
+ st.success(f"🌱 Tahmini Yaban Mersini Verimi: {pred:.2f} kg/ha")
model_columns.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f8f2353d1c8c3d79295e022ad6bd9a36aa8bc6bb2ce3f6b597b67cc2fea59ac
3
+ size 255
requirements.txt CHANGED
@@ -1,3 +1,5 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
1
+ streamlit
2
+ pandas
3
+ numpy
4
+ scikit-learn
5
+ joblib
rf_model.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68b74682bb46d81c2aa0e680cea3abae0a97da6f372a366babe5a3bebd77e300
3
+ size 108065345