|
# π Encrypted Text Classifier β 20 Newsgroups Cipher Challenge
|
|
|
|
This project is built for the [Kaggle Ciphertext Challenge](https://www.kaggle.com/competitions/20-newsgroups-ciphertext-challenge), where the goal is to classify encrypted text documents into 20 different newsgroup categories.
|
|
|
|
π― Even without decrypting the text, we trained a character-level machine learning model that achieves over **63% accuracy**.
|
|
|
|
---
|
|
|
|
## π Project Structure
|
|
cipher-classifier/
|
|
βββ app.py # Streamlit app
|
|
βββ cipher_classifier.pkl # Pickled model + vectorizer
|
|
βββ train.csv # Kaggle training data
|
|
βββ requirements.txt # Libraries for deployment
|
|
βββ README.md
|
|
|
|
|
|
---
|
|
|
|
## π§ Model Overview
|
|
|
|
- **Input:** Ciphertext strings (unreadable encrypted text)
|
|
- **Vectorization:** `CountVectorizer` with char-level n-grams (1 to 3)
|
|
- **Model:** Logistic Regression (sklearn)
|
|
- **Accuracy:** ~63% (without decryption)
|
|
|
|
---
|
|
|
|
|
|
Example Output
|
|
Input (Ciphertext) Predicted Label
|
|
['W')(7x1zay7Hb3... 15
|
|
Tx4a8M\HNsyp;HM... 8
|
|
|
|
|
|
|
|
π¦ Deployment
|
|
This app is designed to run on:
|
|
|
|
π’ Hugging Face Spaces
|
|
|
|
π’ Streamlit Cloud
|
|
|
|
π΅ GitHub
|
|
|
|
|
|
π Kaggle Link
|
|
You can download the dataset from the official competition:
|
|
π Kaggle β 20 Newsgroups Ciphertext Challenge
|
|
|
|
|