File size: 1,330 Bytes
40b76ee 3e06e07 40b76ee 3e06e07 40b76ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# π Encrypted Text Classifier β 20 Newsgroups Cipher Challenge
This project is built for the [Kaggle Ciphertext Challenge](https://www.kaggle.com/competitions/20-newsgroups-ciphertext-challenge), where the goal is to classify encrypted text documents into 20 different newsgroup categories.
π― Even without decrypting the text, we trained a character-level machine learning model that achieves over **63% accuracy**.
---
## π Project Structure
cipher-classifier/
βββ app.py # Streamlit app
βββ cipher_classifier.pkl # Pickled model + vectorizer
βββ train.csv # Kaggle training data
βββ requirements.txt # Libraries for deployment
βββ README.md
---
## π§ Model Overview
- **Input:** Ciphertext strings (unreadable encrypted text)
- **Vectorization:** `CountVectorizer` with char-level n-grams (1 to 3)
- **Model:** Logistic Regression (sklearn)
- **Accuracy:** ~63% (without decryption)
---
Example Output
Input (Ciphertext) Predicted Label
['W')(7x1zay7Hb3... 15
Tx4a8M\HNsyp;HM... 8
π¦ Deployment
This app is designed to run on:
π’ Hugging Face Spaces
π’ Streamlit Cloud
π΅ GitHub
π Kaggle Link
You can download the dataset from the official competition:
π Kaggle β 20 Newsgroups Ciphertext Challenge
|