Update README.md
Browse files
README.md
CHANGED
@@ -12,68 +12,55 @@ short_description: End-to-End Automated MLOps Framework
|
|
12 |
|
13 |
# End-to-End Automated MLOps Framework
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
This platform integrates a full suite of MLOps tools into a single, cohesive system:
|
22 |
-
|
23 |
-
* **Automated Training & Hyperparameter Tuning**: Employs a custom PyTorch neural network and leverages `Optuna` for sophisticated, automated hyperparameter optimization to find the best-performing model architecture.
|
24 |
-
|
25 |
-
* **Model Registry & Versioning**: A robust model registry, backed by a persistent SQLite database, tracks every model version, its associated metrics, metadata, and artifacts. It supports clear versioning and promotion of models to a production state.
|
26 |
-
|
27 |
-
* **Data and Concept Drift Detection**: Integrates powerful libraries like `Evidently` and `Alibi-Detect` to continuously monitor for data drift. It provides detailed reports on drift scores and identifies which features are most affected.
|
28 |
|
29 |
-
|
30 |
-
|
31 |
-
* **Live A/B Testing Framework**: A built-in A/B testing manager allows for controlled, live comparison between a champion (production) model and a challenger. It routes traffic, records performance, and uses statistical tests to determine a winner.
|
32 |
-
|
33 |
-
* **Comprehensive Performance Monitoring**: Tracks key performance indicators in real-time using `Prometheus` metrics. It monitors prediction latency, throughput, and model accuracy, providing alerts for performance degradation.
|
34 |
-
|
35 |
-
* **Detailed Cost Tracking**: An integrated cost tracker estimates the financial impact of the ML system, breaking down costs for training (compute), inference (API calls), and model storage.
|
36 |
-
|
37 |
-
* **Automated Model Card Generation**: Generates detailed, shareable model cards that document a model's architecture, performance metrics, training data characteristics, and intended use cases, promoting transparency and responsible AI.
|
38 |
|
39 |
-
* **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## How It Works
|
42 |
|
43 |
-
The
|
44 |
|
45 |
-
1. **
|
46 |
-
2. **
|
47 |
-
3. **
|
48 |
-
4. **
|
49 |
-
5. **
|
50 |
-
|
51 |
-
* The newly trained model becomes a "challenger" and is placed into an A/B test against the current "champion" production model.
|
52 |
-
* The `ABTestManager` splits live traffic between the two models, and the winner is automatically promoted to production after reaching statistical significance.
|
53 |
-
6. **Analysis & Reporting**: At any point, users can generate detailed performance reports, cost breakdowns, and model cards directly from the dashboard.
|
54 |
|
55 |
## Technical Stack
|
56 |
|
57 |
-
* **Machine Learning &
|
58 |
-
* **MLOps & Experiment Tracking**: MLflow, Optuna,
|
59 |
-
* **
|
60 |
-
* **
|
61 |
-
* **
|
62 |
-
* **Core Language**: Python
|
63 |
|
64 |
## How to Use the Demo
|
65 |
|
66 |
-
The Gradio interface is organized into
|
67 |
|
68 |
-
1. **Model
|
69 |
-
2. **Model Registry
|
70 |
-
3. **Make Predictions**:
|
71 |
-
4. **Drift Detection
|
72 |
-
5. **A/B
|
73 |
-
6. **Performance Monitoring
|
74 |
-
7. **Model Card**: Select any model version and generate a complete documentation card with its metrics and metadata.
|
75 |
-
8. **Settings**: Configure system-level parameters, such as enabling or disabling the automated retraining loop.
|
76 |
|
77 |
## Disclaimer
|
78 |
|
79 |
-
This project is
|
|
|
12 |
|
13 |
# End-to-End Automated MLOps Framework
|
14 |
|
15 |
+
**Author**: Spencer Purdy
|
16 |
|
17 |
+
This project is a comprehensive, enterprise-grade MLOps platform that demonstrates a complete, automated lifecycle for machine learning models. It handles everything from automated training and hyperparameter optimization to versioning, production deployment, drift detection, A/B testing, and ongoing performance monitoring.
|
18 |
|
19 |
+
The entire system is orchestrated by a central engine and managed through a powerful, multi-tab Gradio interface, providing a single pane of glass for all MLOps activities.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
+
## Core Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
+
* **Automated Model Training**: The system features a `ModelTrainer` that automatically trains a custom PyTorch neural network on tabular data. It includes support for handling class imbalance with SMOTE and integrates `Optuna` for sophisticated hyperparameter optimization.
|
24 |
+
* **Model Registry and Versioning**: A robust `ModelRegistry` tracks all trained model versions, their performance metrics, and metadata. Models are persisted to disk and logged in a SQLite database, with functionality to promote any version to the "production" stage.
|
25 |
+
* **Data and Concept Drift Detection**: The platform integrates both `Evidently` and `Alibi-Detect` (with a statistical fallback) to continuously monitor for data drift between the reference training data and live inference data. Drift scores are tracked over time.
|
26 |
+
* **Automated Retraining**: A background process can be enabled to periodically check for significant data drift. If the drift threshold is exceeded, it automatically triggers a new model training cycle and initiates an A/B test against the current production model.
|
27 |
+
* **Live A/B Testing**: The `ABTestManager` allows for controlled experiments between the current production model and a challenger. It routes inference traffic, records performance metrics for both models, and determines a statistical winner.
|
28 |
+
* **Comprehensive Monitoring & Cost Tracking**:
|
29 |
+
* **Performance**: The `PerformanceMonitor` uses Prometheus-compatible metrics to track prediction latency, accuracy, and throughput. It also logs detailed performance data to a database for historical analysis.
|
30 |
+
* **Cost**: The `CostTracker` provides reports on estimated operational costs, breaking them down by training, inference, and model storage based on configurable rates.
|
31 |
+
* **Model Cards and Explainability**: The system can generate detailed model cards that consolidate metadata, performance metrics, and operational history. It also has `SHAP` integrated as a dependency for future explainability features.
|
32 |
+
* **Hugging Face Hub Integration**: Models can be exported directly from the registry to the Hugging Face Hub, with an automatically generated model card (`README.md`).
|
33 |
|
34 |
## How It Works
|
35 |
|
36 |
+
The platform operates as a cohesive system of specialized components orchestrated by the main `MLOpsEngine`:
|
37 |
|
38 |
+
1. **Training**: A user initiates a training job from the UI. The `ModelTrainer` uses `Optuna` to find the best hyperparameters and then trains a `CustomNeuralNetwork` model.
|
39 |
+
2. **Registration**: The newly trained model, along with its performance metrics and metadata, is registered in the `ModelRegistry`. The model artifact is saved, and its details are recorded in the SQLite database.
|
40 |
+
3. **Promotion**: A user can review all registered models and promote a specific version to be the active "production" model via the UI.
|
41 |
+
4. **Prediction**: When a prediction request is made, the engine retrieves the current production model (or routes to an A/B test model if active) to perform inference. Latency and other performance metrics are logged by the `PerformanceMonitor`.
|
42 |
+
5. **Monitoring & Drift Detection**: In the background, the `DriftDetector` continuously compares incoming data against a reference dataset. If drift is detected and auto-retraining is enabled, it triggers the training of a new "challenger" model.
|
43 |
+
6. **A/B Testing**: The new challenger model is automatically placed into an A/B test against the current production model. Live traffic is split between them until a statistically significant winner is found, which can then be automatically promoted.
|
|
|
|
|
|
|
44 |
|
45 |
## Technical Stack
|
46 |
|
47 |
+
* [cite_start]**Machine Learning & Training**: scikit-learn, PyTorch, imbalanced-learn [cite: 1]
|
48 |
+
* [cite_start]**MLOps & Experiment Tracking**: MLflow, Optuna, Hugging Face Hub, W&B [cite: 1]
|
49 |
+
* [cite_start]**Drift & Anomaly Detection**: Evidently, Alibi-Detect, SHAP [cite: 1]
|
50 |
+
* [cite_start]**Web Interface & Visualization**: Gradio, Matplotlib, Seaborn, Plotly, Yellowbrick [cite: 1]
|
51 |
+
* **Infrastructure & Utilities**: Prometheus Client, Joblib, SQLite
|
|
|
52 |
|
53 |
## How to Use the Demo
|
54 |
|
55 |
+
The Gradio interface is organized into tabs that follow a logical MLOps workflow.
|
56 |
|
57 |
+
1. **Train a Model**: Navigate to the **Model Training** tab, select the number of training samples, and click **Train New Model**. This will create the first version in the registry.
|
58 |
+
2. **Manage Models**: Go to the **Model Registry** tab. Click **Refresh Model List** to see all trained models. Select a version from the dropdown and click **Promote to Production** to make it active.
|
59 |
+
3. **Make Predictions**: In the **Make Predictions** tab, enter values for the features and click **Predict**. The result from the current production model will be displayed.
|
60 |
+
4. **Detect Drift**: Go to the **Drift Detection** tab and click **Check for Data Drift** to simulate checking a new batch of data against the original training data.
|
61 |
+
5. **Run an A/B Test**: In the **A/B Testing** tab, click **Start New A/B Test**. This will train a new challenger model and run it against the current production model. To generate results, make several predictions in the "Make Predictions" tab with the "Use A/B Test" checkbox ticked.
|
62 |
+
6. **Monitor Performance**: Check the **Performance Monitoring** and **Cost Tracking** tabs to see live operational dashboards for the system.
|
|
|
|
|
63 |
|
64 |
## Disclaimer
|
65 |
|
66 |
+
This project is an advanced demonstration of MLOps principles and is intended for educational and portfolio purposes. It uses synthetically generated data for its training and drift detection processes. While built to be robust, it is not intended for direct use in a live production environment without extensive testing and validation.
|