| --- |
| license: apache-2.0 |
| datasets: |
| - scikit-learn/iris |
| metrics: |
| - accuracy |
| library_name: pytorch |
| pipeline_tag: tabular-classification |
| --- |
| |
| # logistic-regression-iris |
|
|
| A logistic regression model trained on the Iris dataset. |
|
|
| It takes two inputs: `'PetalLengthCm'` and `'PetalWidthCm'`. It predicts whether the species is `'Iris-setosa'`. |
|
|
| It is a PyTorch adaptation of the scikit-learn model in Chapter 10 of Aurelien Geron's book 'Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow'. |
|
|
| Code: https://github.com/sambitmukherjee/handson-ml3-pytorch/blob/main/chapter10/logistic_regression_iris.ipynb |
|
|
| Experiment tracking: https://wandb.ai/sadhaklal/logistic-regression-iris |
|
|
| ## Usage |
|
|
| ``` |
| !pip install -q datasets |
| |
| from datasets import load_dataset |
| |
| iris = load_dataset("scikit-learn/iris") |
| iris.set_format("pandas") |
| iris_df = iris['train'][:] |
| X = iris_df[['PetalLengthCm', 'PetalWidthCm']] |
| y = (iris_df['Species'] == "Iris-setosa").astype(int) |
| |
| class_names = ["Not Iris-setosa", "Iris-setosa"] |
| |
| from sklearn.model_selection import train_test_split |
| |
| X_train, X_val, y_train, y_val = train_test_split(X.values, y.values, test_size=0.3, stratify=y, random_state=42) |
| X_means, X_stds = X_train.mean(axis=0), X_train.std(axis=0) |
| |
| import torch |
| import torch.nn as nn |
| from huggingface_hub import PyTorchModelHubMixin |
| |
| device = torch.device("cpu") |
| |
| class LinearModel(nn.Module, PyTorchModelHubMixin): |
| def __init__(self): |
| super().__init__() |
| self.fc = nn.Linear(2, 1) |
| |
| def forward(self, x): |
| out = self.fc(x) |
| return out |
| |
| model = LinearModel.from_pretrained("sadhaklal/logistic-regression-iris") |
| model.to(device) |
| |
| # Inference on new data: |
| import numpy as np |
| |
| X_new = np.array([[2.0, 0.5], [3.0, 1.0]]) # Contains data on 2 new flowers. |
| X_new = ((X_new - X_means) / X_stds) # Normalize. |
| X_new = torch.from_numpy(X_new).float() |
| |
| model.eval() |
| X_new = X_new.to(device) |
| with torch.no_grad(): |
| logits = model(X_new) |
| proba = torch.sigmoid(logits.squeeze()) |
| preds = (proba > 0.5).long() |
| |
| print(f"Predicted classes: {preds}") |
| print(f"Predicted probabilities of being Iris-setosa: {proba}") |
| ``` |
|
|
| ## Metric |
|
|
| As shown above, the validation set contains 30% of the examples (selected at random in a stratified fashion). |
|
|
| Accuracy on the validation set: 1.0 |
|
|