--- title: "🍇 Blueberry Yield Regression" emoji: 🌾 colorFrom: indigo colorTo: green sdk: streamlit app_file: app.py pinned: true license: mit tags: - regression - machine-learning - streamlit - kaggle - agriculture --- # 🍇 Blueberry Yield Prediction with Machine Learning This project is a complete machine learning pipeline that predicts the **yield of wild blueberries** using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements. ## 📌 Project Type - Supervised Learning - Regression Problem --- ## 🔍 Problem Description Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the **Kaggle Playground Series S3E14** competition and contains information on: - Different species of pollinators (honeybee, bumblebee, osmia...) - Environmental conditions (rainfall days, temperature ranges...) - Fruit attributes (fruit mass, fruit set, seed count...) 🎯 **Goal**: Predict the `yield` (kg/ha) of blueberries based on input features. --- ## 📊 Dataset Info - `train.csv`: 15,289 samples with 18 features - `test.csv`: same structure, no target - No missing values, clean numerical data --- ## 📈 What We Did (Pipeline Summary) 1. **EDA (Exploratory Data Analysis)** - Checked for missing values ✅ - Analyzed feature distributions & target (`yield`) - Built correlation heatmaps — strongest positive correlations: - `fruitmass`, `fruitset`, `seeds` 2. **Data Preprocessing** - Removed `id` column - Standard feature selection based on correlation - No categorical encoding needed (all numerical) 3. **Model Training** - Model: `RandomForestRegressor` - Train-Test Split: 80/20 - **Results**: - RMSE ≈ **573.8** - R² Score ≈ **0.81** ✅ 4. **Test Prediction & Submission** - Predictions made on `test.csv` - `submission.csv` generated for Kaggle submission 5. **Streamlit App** - Users input bee counts, rain days, and fruit measurements - Predicts blueberry yield in kg/ha - Uses trained model (`rf_model.pkl`) behind the scenes --- ## 🚀 Try it Online 🌐 You can try this app live here: [Hugging Face Space Link](https://huggingface.co/spaces/yazodi/blueberry-yield-regression-app) --- ## 🔮 What Could Be Improved? | Area | Suggestion | |------|------------| | Feature Engineering | Create interaction terms, try log/ratio features | | Model | Try LightGBM, XGBoost, or stacking | | Tuning | GridSearchCV or Optuna for hyperparameter optimization | | Visualization | Add interactive charts in Streamlit app | | Real-World Data | Add satellite weather data, soil types, historical trends | --- ## 📁 Project Structure 📦 blueberry-yield-regression ├── app.py ├── rf_model.pkl ├── model_columns.pkl ├── requirements.txt ├── submission.csv └── README.md --- ## 📜 License MIT License – Free to use, modify and distribute. ---