Sarcasm Detection with BERT

This repository contains a fine-tuned BERT model for detecting sarcasm in headlines and text. The model achieves high accuracy in distinguishing between sarcastic and non-sarcastic content using natural language processing techniques.

Model Details

Model Name: BERT-Base-Uncased Fine-tuned for Sarcasm Detection
Model Architecture: BERT Base (110M parameters)
Task: Binary Classification (Sarcastic vs Non-Sarcastic)
Dataset: Sarcasm Headlines Dataset
Quantization: Float16 (for optimized deployment)
Fine-tuning Framework: Hugging Face Transformers

Dataset

The model was trained on the Sarcasm Headlines Dataset which contains:

Total Samples: 26,709 headlines
Features:
- headline: The text content to classify
- is_sarcastic: Binary label (1 for sarcastic, 0 for non-sarcastic)
Train/Test Split: 90% training, 10% evaluation

Performance Metrics

Epoch	Training Loss	Validation Loss	Accuracy
1	0.2048	0.1821	92.96%
2	0.1138	0.2792	91.01%
3	0.0586	0.2372	93.86%

Final Model Performance:

Best Accuracy: 93.86%
Final Training Loss: 0.146

Installation

pip install transformers datasets evaluate scikit-learn torch

Usage

Quick Start

from transformers import pipeline
import torch

# Load the trained model
classifier = pipeline("text-classification", 
                     model="./sarcasm_model", 
                     tokenizer="./sarcasm_model")

# Test examples
test_inputs = [
    "I'm absolutely thrilled to be stuck in traffic again.",
    "The weather is nice and sunny today.",
    "Oh great, another email from the boss with more tasks."
]

for sentence in test_inputs:
    result = classifier(sentence)[0]
    label = "Sarcastic" if result["label"] == "LABEL_1" else "Not Sarcastic"
    print(f"'{sentence}' → {label} (Confidence: {result['score']:.2f})")

Manual Model Loading

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("./sarcasm_model")
tokenizer = AutoTokenizer.from_pretrained("./sarcasm_model")

# Tokenize input
text = "Oh wonderful, another Monday morning!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = outputs.logits.argmax(dim=1).item()

label_mapping = {0: "Not Sarcastic", 1: "Sarcastic"}
confidence = predictions[0][predicted_class].item()
print(f"Prediction: {label_mapping[predicted_class]} (Confidence: {confidence:.2f})")

Training Configuration

Model Parameters

Base Model: bert-base-uncased
Number of Labels: 2 (binary classification)
Max Sequence Length: 128 tokens
Tokenization: WordPiece with padding and truncation

Training Arguments

Learning Rate: 2e-5
Batch Size: 16 (training), 32 (evaluation)
Epochs: 3
Weight Decay: 0.01
Evaluation Strategy: Every epoch
Optimizer: AdamW (default)

Hardware Requirements

GPU: NVIDIA Tesla T4 (or equivalent)
Memory: ~4GB GPU memory for training
Training Time: ~18 minutes for 3 epochs

Model Architecture

The model uses BERT's transformer architecture with:

Encoder Layers: 12
Attention Heads: 12
Hidden Size: 768
Vocabulary Size: 30,522
Classification Head: Linear layer (768 → 2)

File Structure

sarcasm-detection/
├── sarcasm_model/              # Main fine-tuned model
│   ├── config.json
│   ├── model.safetensors
│   ├── tokenizer_config.json
│   ├── special_tokens_map.json
│   ├── vocab.txt
│   └── tokenizer.json
├── quantized-model/            # Float16 quantized version
│   ├── config.json
│   ├── model.safetensors
│   └── tokenizer files...
├── logs/                       # Training logs
├── sarcasm-detection.ipynb     # Training notebook
└── README.md                   # This file

Quantization

A quantized version of the model is available for deployment optimization:

# Load quantized model (Float16)
quantized_model = AutoModelForSequenceClassification.from_pretrained("./quantized-model")
quantized_model = quantized_model.to(dtype=torch.float16)

Benefits of Quantization:

Reduced Memory Usage: ~50% smaller model size
Faster Inference: Improved speed on compatible hardware
Minimal Accuracy Loss: Maintains classification performance

Limitations

Domain Specificity: Trained primarily on headlines; may not generalize perfectly to other text types
Context Dependency: Sarcasm detection can be highly context-dependent and subjective
Cultural Nuances: May not capture sarcasm patterns from different cultural contexts
Short Text Focus: Optimized for headline-length text (typically under 128 tokens)

Potential Improvements

Data Augmentation: Include more diverse sarcasm examples
Ensemble Methods: Combine multiple models for better accuracy
Context Integration: Incorporate additional context beyond the headline
Multi-language Support: Extend to other languages
Real-time Processing: Optimize for streaming applications

Applications

Social Media Monitoring: Detect sarcastic comments and posts
Content Moderation: Identify potentially misleading sarcastic content
Sentiment Analysis Enhancement: Improve sentiment classification accuracy
News Analysis: Analyze editorial tone and bias in headlines
Customer Feedback: Better understand customer sentiment in reviews

Citation

If you use this model in your research, please cite:

@misc{sarcasm_detection_bert,
  title={BERT-based Sarcasm Detection for Headlines},
  author={Your Name},
  year={2025},
  note={Fine-tuned BERT model for binary sarcasm classification}
}

Contributing

Contributions are welcome! Please feel free to:

Report bugs or issues
Suggest improvements
Add new features
Improve documentation

License

This project is licensed under the MIT License. The underlying BERT model follows Google's Apache 2.0 license.

Acknowledgments

Hugging Face for the Transformers library
Google Research for the original BERT model
Kaggle for providing the Sarcasm Headlines Dataset
PyTorch for the deep learning framework