import streamlit as st

# Page Title
st.title("🛠️ Feature Engineering & Feature Selection")

# Feature Engineering Section
st.markdown("""
### ✨ Feature Engineering:
Several transformations were applied to prepare the dataset for modeling:

- **Encoding**: Used **Ordinal Encoding** to convert categorical variables like Gender, Sleep Duration, and Dietary Habits into numerical values.  
- **Scaling**: Applied **StandardScaler** to normalize numerical features such as CGPA, Age, and Schedule Pressure.  
- **Data Cleaning**: Removed irrelevant or noisy columns that did not contribute to the prediction task.  
- **Balancing**: Checked for class imbalance in the target (`Depression`) to ensure proper model generalization.
""")

# Selected Features Section
st.markdown("""
### ✅ Selected Features:
The following features were retained for training the model based on correlation analysis and domain relevance:

- Gender  
- Age  
- Academic Pressure  
- Study Satisfaction  
- Sleep Duration  
- Dietary Habits  
- Financial Stress  
- CGPA  
- Schedule Pressure  
- Integration Complexity
""")

# Dropped Features Section
st.markdown("""
### 🚫 Dropped Features:
- Redundant or low-impact features such as `Job Satisfaction`, `Profession`, and `City`  
- Highly correlated features that introduced multicollinearity

The refined dataset was then used to train the **KNN classifier** for depression prediction.
""")

if st.button("Next >>"):
    st.switch_page(r"pages/5 Model Building.py")

if st.button("<< Back"):
    st.switch_page(r"pages/3 EDA.py")