π Encrypted Text Classifier β 20 Newsgroups Cipher Challenge
This project is built for the Kaggle Ciphertext Challenge, where the goal is to classify encrypted text documents into 20 different newsgroup categories.
π― Even without decrypting the text, we trained a character-level machine learning model that achieves over 63% accuracy.
π Project Structure
cipher-classifier/ βββ app.py # Streamlit app βββ cipher_classifier.pkl # Pickled model + vectorizer βββ train.csv # Kaggle training data βββ requirements.txt # Libraries for deployment βββ README.md
π§ Model Overview
- Input: Ciphertext strings (unreadable encrypted text)
- Vectorization:
CountVectorizer
with char-level n-grams (1 to 3) - Model: Logistic Regression (sklearn)
- Accuracy: ~63% (without decryption)
Example Output Input (Ciphertext) Predicted Label ['W')(7x1zay7Hb3... 15 Tx4a8M\HNsyp;HM... 8
π¦ Deployment This app is designed to run on:
π’ Hugging Face Spaces
π’ Streamlit Cloud
π΅ GitHub
π Kaggle Link You can download the dataset from the official competition: π Kaggle β 20 Newsgroups Ciphertext Challenge