cipher_classifier / README.md
yazodi's picture
Upload 2 files
40b76ee verified
# πŸ” Encrypted Text Classifier – 20 Newsgroups Cipher Challenge
This project is built for the [Kaggle Ciphertext Challenge](https://www.kaggle.com/competitions/20-newsgroups-ciphertext-challenge), where the goal is to classify encrypted text documents into 20 different newsgroup categories.
🎯 Even without decrypting the text, we trained a character-level machine learning model that achieves over **63% accuracy**.
---
## πŸ“‚ Project Structure
cipher-classifier/
β”œβ”€β”€ app.py # Streamlit app
β”œβ”€β”€ cipher_classifier.pkl # Pickled model + vectorizer
β”œβ”€β”€ train.csv # Kaggle training data
β”œβ”€β”€ requirements.txt # Libraries for deployment
└── README.md
---
## 🧠 Model Overview
- **Input:** Ciphertext strings (unreadable encrypted text)
- **Vectorization:** `CountVectorizer` with char-level n-grams (1 to 3)
- **Model:** Logistic Regression (sklearn)
- **Accuracy:** ~63% (without decryption)
---
Example Output
Input (Ciphertext) Predicted Label
['W')(7x1zay7Hb3... 15
Tx4a8M\HNsyp;HM... 8
πŸ“¦ Deployment
This app is designed to run on:
🟒 Hugging Face Spaces
🟒 Streamlit Cloud
πŸ”΅ GitHub
πŸ“Œ Kaggle Link
You can download the dataset from the official competition:
πŸ‘‰ Kaggle – 20 Newsgroups Ciphertext Challenge