A newer version of the Streamlit SDK is available:
1.45.1
title: ๐ Blueberry Yield Regression
emoji: ๐พ
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: true
license: mit
tags:
- regression
- machine-learning
- streamlit
- kaggle
- agriculture
๐ Blueberry Yield Prediction with Machine Learning
This project is a complete machine learning pipeline that predicts the yield of wild blueberries using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.
๐ Project Type
- Supervised Learning
- Regression Problem
๐ Problem Description
Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the Kaggle Playground Series S3E14 competition and contains information on:
- Different species of pollinators (honeybee, bumblebee, osmia...)
- Environmental conditions (rainfall days, temperature ranges...)
- Fruit attributes (fruit mass, fruit set, seed count...)
๐ฏ Goal: Predict the yield
(kg/ha) of blueberries based on input features.
๐ Dataset Info
train.csv
: 15,289 samples with 18 featurestest.csv
: same structure, no target- No missing values, clean numerical data
๐ What We Did (Pipeline Summary)
EDA (Exploratory Data Analysis)
- Checked for missing values โ
- Analyzed feature distributions & target (
yield
) - Built correlation heatmaps โ strongest positive correlations:
fruitmass
,fruitset
,seeds
Data Preprocessing
- Removed
id
column - Standard feature selection based on correlation
- No categorical encoding needed (all numerical)
- Removed
Model Training
- Model:
RandomForestRegressor
- Train-Test Split: 80/20
- Results:
- RMSE โ 573.8
- Rยฒ Score โ 0.81 โ
- Model:
Test Prediction & Submission
- Predictions made on
test.csv
submission.csv
generated for Kaggle submission
- Predictions made on
Streamlit App
- Users input bee counts, rain days, and fruit measurements
- Predicts blueberry yield in kg/ha
- Uses trained model (
rf_model.pkl
) behind the scenes
๐ Try it Online
๐ You can try this app live here:
Hugging Face Space Link
๐ฎ What Could Be Improved?
Area | Suggestion |
---|---|
Feature Engineering | Create interaction terms, try log/ratio features |
Model | Try LightGBM, XGBoost, or stacking |
Tuning | GridSearchCV or Optuna for hyperparameter optimization |
Visualization | Add interactive charts in Streamlit app |
Real-World Data | Add satellite weather data, soil types, historical trends |
๐ Project Structure
๐ฆ blueberry-yield-regression โโโ app.py โโโ rf_model.pkl โโโ model_columns.pkl โโโ requirements.txt โโโ submission.csv โโโ README.md
๐ License
MIT License โ Free to use, modify and distribute.