yazodi's picture
Upload 5 files
b67b96f verified

A newer version of the Streamlit SDK is available: 1.45.1

Upgrade
metadata
title: ๐Ÿ‡ Blueberry Yield Regression
emoji: ๐ŸŒพ
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: true
license: mit
tags:
  - regression
  - machine-learning
  - streamlit
  - kaggle
  - agriculture

๐Ÿ‡ Blueberry Yield Prediction with Machine Learning

This project is a complete machine learning pipeline that predicts the yield of wild blueberries using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.

๐Ÿ“Œ Project Type

  • Supervised Learning
  • Regression Problem

๐Ÿ” Problem Description

Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the Kaggle Playground Series S3E14 competition and contains information on:

  • Different species of pollinators (honeybee, bumblebee, osmia...)
  • Environmental conditions (rainfall days, temperature ranges...)
  • Fruit attributes (fruit mass, fruit set, seed count...)

๐ŸŽฏ Goal: Predict the yield (kg/ha) of blueberries based on input features.


๐Ÿ“Š Dataset Info

  • train.csv: 15,289 samples with 18 features
  • test.csv: same structure, no target
  • No missing values, clean numerical data

๐Ÿ“ˆ What We Did (Pipeline Summary)

  1. EDA (Exploratory Data Analysis)

    • Checked for missing values โœ…
    • Analyzed feature distributions & target (yield)
    • Built correlation heatmaps โ€” strongest positive correlations:
      • fruitmass, fruitset, seeds
  2. Data Preprocessing

    • Removed id column
    • Standard feature selection based on correlation
    • No categorical encoding needed (all numerical)
  3. Model Training

    • Model: RandomForestRegressor
    • Train-Test Split: 80/20
    • Results:
      • RMSE โ‰ˆ 573.8
      • Rยฒ Score โ‰ˆ 0.81 โœ…
  4. Test Prediction & Submission

    • Predictions made on test.csv
    • submission.csv generated for Kaggle submission
  5. Streamlit App

    • Users input bee counts, rain days, and fruit measurements
    • Predicts blueberry yield in kg/ha
    • Uses trained model (rf_model.pkl) behind the scenes

๐Ÿš€ Try it Online

๐ŸŒ You can try this app live here:
Hugging Face Space Link


๐Ÿ”ฎ What Could Be Improved?

Area Suggestion
Feature Engineering Create interaction terms, try log/ratio features
Model Try LightGBM, XGBoost, or stacking
Tuning GridSearchCV or Optuna for hyperparameter optimization
Visualization Add interactive charts in Streamlit app
Real-World Data Add satellite weather data, soil types, historical trends

๐Ÿ“ Project Structure

๐Ÿ“ฆ blueberry-yield-regression โ”œโ”€โ”€ app.py โ”œโ”€โ”€ rf_model.pkl โ”œโ”€โ”€ model_columns.pkl โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ submission.csv โ””โ”€โ”€ README.md


๐Ÿ“œ License

MIT License โ€“ Free to use, modify and distribute.