Spaces:

yazodi
/

blueberry-yield-regression-app

Sleeping

App Files Files Community

yazodi commited on 6 days ago

Commit

b67b96f

verified ·

1 Parent(s): f2d37a9

Upload 5 files

Browse files

Files changed (5) hide show

README.md +116 -20
app.py +50 -0
model_columns.pkl +3 -0
requirements.txt +5 -3
rf_model.pkl +3 -0

README.md CHANGED Viewed

@@ -1,20 +1,116 @@
----
-title: Blueberry Yield Regression App
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
-pinned: false
-short_description: Streamlit template space
-license: mit
----
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

+---
+title: "🍇 Blueberry Yield Regression"
+emoji: 🌾
+colorFrom: indigo
+colorTo: green
+sdk: streamlit
+app_file: app.py
+pinned: true
+license: mit
+tags:
+  - regression
+  - machine-learning
+  - streamlit
+  - kaggle
+  - agriculture
+---
+# 🍇 Blueberry Yield Prediction with Machine Learning
+This project is a complete machine learning pipeline that predicts the **yield of wild blueberries** using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.
+## 📌 Project Type
+- Supervised Learning
+- Regression Problem
+---
+## 🔍 Problem Description
+Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the **Kaggle Playground Series S3E14** competition and contains information on:
+- Different species of pollinators (honeybee, bumblebee, osmia...)
+- Environmental conditions (rainfall days, temperature ranges...)
+- Fruit attributes (fruit mass, fruit set, seed count...)
+🎯 **Goal**: Predict the `yield` (kg/ha) of blueberries based on input features.
+---
+## 📊 Dataset Info
+- `train.csv`: 15,289 samples with 18 features
+- `test.csv`: same structure, no target
+- No missing values, clean numerical data
+---
+## 📈 What We Did (Pipeline Summary)
+1. **EDA (Exploratory Data Analysis)**
+   - Checked for missing values ✅
+   - Analyzed feature distributions & target (`yield`)
+   - Built correlation heatmaps — strongest positive correlations:
+     - `fruitmass`, `fruitset`, `seeds`
+2. **Data Preprocessing**
+   - Removed `id` column
+   - Standard feature selection based on correlation
+   - No categorical encoding needed (all numerical)
+3. **Model Training**
+   - Model: `RandomForestRegressor`
+   - Train-Test Split: 80/20
+   - **Results**:
+     - RMSE ≈ **573.8**
+     - R² Score ≈ **0.81** ✅
+4. **Test Prediction & Submission**
+   - Predictions made on `test.csv`
+   - `submission.csv` generated for Kaggle submission
+5. **Streamlit App**
+   - Users input bee counts, rain days, and fruit measurements
+   - Predicts blueberry yield in kg/ha
+   - Uses trained model (`rf_model.pkl`) behind the scenes
+---
+## 🚀 Try it Online
+🌐 You can try this app live here:
+[Hugging Face Space Link](https://huggingface.co/spaces/yazodi/blueberry-yield-regression-app)
+---
+## 🔮 What Could Be Improved?
+| Area | Suggestion |
+|------|------------|
+| Feature Engineering | Create interaction terms, try log/ratio features |
+| Model | Try LightGBM, XGBoost, or stacking |
+| Tuning | GridSearchCV or Optuna for hyperparameter optimization |
+| Visualization | Add interactive charts in Streamlit app |
+| Real-World Data | Add satellite weather data, soil types, historical trends |
+---
+## 📁 Project Structure
+📦 blueberry-yield-regression
+├── app.py
+├── rf_model.pkl
+├── model_columns.pkl
+├── requirements.txt
+├── submission.csv
+└── README.md
+---
+## 📜 License
+MIT License – Free to use, modify and distribute.
+---

app.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+import joblib
+# Başlık
+st.title("🍇 Blueberry Yield Prediction App")
+st.write("Bu uygulama, çevresel ve biyolojik faktörlere göre yaban mersini verimini tahmin eder.")
+# Giriş alanları
+clonesize = st.slider("Klon Boyutu", 0.0, 10.0, 1.0)
+honeybee = st.slider("Bal Arısı Sayısı", 0.0, 10.0, 1.0)
+bumbles = st.slider("Bumblebee Sayısı", 0.0, 10.0, 1.0)
+andrena = st.slider("Andrena Sayısı", 0.0, 10.0, 1.0)
+osmia = st.slider("Osmia Sayısı", 0.0, 10.0, 1.0)
+RainingDays = st.slider("Yağmurlu Günler", 0.0, 100.0, 20.0)
+AverageRainingDays = st.slider("Ortalama Yağmurlu Günler", 0.0, 100.0, 30.0)
+fruitset = st.slider("Fruit Set", 0.0, 1.0, 0.5)
+fruitmass = st.slider("Fruit Mass", 0.0, 10.0, 5.0)
+seeds = st.slider("Tohum Sayısı", 0.0, 100.0, 50.0)
+# DataFrame'e dönüştür
+user_input = pd.DataFrame([{
+    "clonesize": clonesize,
+    "honeybee": honeybee,
+    "bumbles": bumbles,
+    "andrena": andrena,
+    "osmia": osmia,
+    "RainingDays": RainingDays,
+    "AverageRainingDays": AverageRainingDays,
+    "fruitset": fruitset,
+    "fruitmass": fruitmass,
+    "seeds": seeds
+}])
+# Model ve sütunlar yükleniyor
+model = joblib.load("rf_model.pkl")
+model_columns = joblib.load("model_columns.pkl")
+# Eksik sütunları ekle
+for col in model_columns:
+    if col not in user_input.columns:
+        user_input[col] = 0
+user_input = user_input[model_columns]
+# Tahmin
+if st.button("Tahmini Göster"):
+    pred = model.predict(user_input)[0]
+    st.success(f"🌱 Tahmini Yaban Mersini Verimi: {pred:.2f} kg/ha")

model_columns.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f8f2353d1c8c3d79295e022ad6bd9a36aa8bc6bb2ce3f6b597b67cc2fea59ac
+size 255

requirements.txt CHANGED Viewed

@@ -1,3 +1,5 @@
-altair
-pandas
-streamlit

+streamlit
+pandas
+numpy
+scikit-learn
+joblib

rf_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68b74682bb46d81c2aa0e680cea3abae0a97da6f372a366babe5a3bebd77e300
+size 108065345