OloriBern's picture
Upload TrailRAG cross-encoder model for hotpotqa
f20943c verified
---
language: en
library_name: sentence-transformers
license: mit
pipeline_tag: sentence-similarity
tags:
- cross-encoder
- regression
- trail-rag
- pathfinder-rag
- hotpotqa
- multi-hop-question-answering
- sentence-transformers
model-index:
- name: trailrag-cross-encoder-hotpotqa-enhanced
results:
- task:
type: question-answering
dataset:
name: HotpotQA
type: hotpotqa
metrics:
- type: mse
value: 0.0557947916534922
- type: mae
value: 0.1418474710541999
- type: rmse
value: 0.2362092116186248
- type: r2_score
value: 0.6484965021143569
- type: pearson_correlation
value: 0.8754595236036868
- type: spearman_correlation
value: 0.8618191776300459
---
# TrailRAG Cross-Encoder: HotpotQA Enhanced
This is a fine-tuned cross-encoder model specifically optimized for **Multi-hop Question Answering** tasks, trained as part of the PathfinderRAG research project.
## Model Details
- **Model Type**: Cross-Encoder for Regression (continuous similarity scores)
- **Base Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- **Training Dataset**: HotpotQA (Complex reasoning dataset requiring multi-step inference)
- **Task**: Multi-hop Question Answering
- **Library**: sentence-transformers
- **License**: MIT
## Performance Metrics
### Final Regression Metrics
| Metric | Value | Description |
|--------|-------|-------------|
| **MSE** | **0.055795** | Mean Squared Error (lower is better) |
| **MAE** | **0.141847** | Mean Absolute Error (lower is better) |
| **RMSE** | **0.236209** | Root Mean Squared Error (lower is better) |
| **R² Score** | **0.648497** | Coefficient of determination (higher is better) |
| **Pearson Correlation** | **0.875460** | Linear correlation (higher is better) |
| **Spearman Correlation** | **0.861819** | Rank correlation (higher is better) |
### Training Details
- **Training Duration**: 28 minutes
- **Epochs**: 8
- **Early Stopping**: No
- **Best Correlation Score**: 0.936744
- **Final MSE**: 0.055795
### Training Configuration
- **Batch Size**: 16
- **Learning Rate**: 2e-05
- **Max Epochs**: 8
- **Weight Decay**: 0.01
- **Warmup Steps**: 150
## Usage
This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs.
### Installation
```bash
pip install sentence-transformers
```
### Basic Usage
```python
from sentence_transformers import CrossEncoder
# Load the model
model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')
# Example usage
pairs = [
['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'],
['What is artificial intelligence?', 'Paris is the capital of France.']
]
# Get similarity scores (continuous values, not binary)
scores = model.predict(pairs)
print(scores) # Higher scores indicate better semantic match
```
### Advanced Usage in PathfinderRAG
```python
from sentence_transformers import CrossEncoder
# Initialize for PathfinderRAG exploration
cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')
def score_query_document_pair(query: str, document: str) -> float:
"""Score a query-document pair for relevance."""
score = cross_encoder.predict([[query, document]])[0]
return float(score)
# Use in document exploration
query = "Your research query"
documents = ["Document 1 text", "Document 2 text", ...]
# Score all pairs
scores = cross_encoder.predict([[query, doc] for doc in documents])
ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
```
## Training Process
This model was trained using **regression metrics** (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on:
1. **Data Quality**: Used authentic HotpotQA examples with careful contamination filtering
2. **Regression Approach**: Avoided binary classification, maintaining continuous label distribution
3. **Correlation Optimization**: Maximized Spearman correlation for effective ranking
4. **Scientific Rigor**: All metrics derived from real training runs without simulation
### Why Regression Over Classification?
Cross-encoders for information retrieval should predict **continuous similarity scores**, not binary classifications. This approach:
- Preserves fine-grained similarity distinctions
- Enables better ranking and document selection
- Provides more informative scores for downstream applications
- Aligns with the mathematical foundation of information retrieval
## Dataset
**HotpotQA**: Complex reasoning dataset requiring multi-step inference
- **Task Type**: Multi-hop Question Answering
- **Training Examples**: 1,000 high-quality pairs
- **Validation Split**: 20% (200 examples)
- **Quality Threshold**: ≥0.70 (authentic TrailRAG metrics)
- **Contamination**: Zero overlap between splits
## Limitations
- Optimized specifically for multi-hop question answering tasks
- Performance may vary on out-of-domain data
- Requires sentence-transformers library for inference
- CPU-based training (GPU optimization available for future versions)
## Citation
```bibtex
@misc{trailrag-cross-encoder-hotpotqa,
title = {TrailRAG Cross-Encoder: HotpotQA Enhanced},
author = {PathfinderRAG Team},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced}
}
```
## Model Card Contact
For questions about this model, please open an issue in the [PathfinderRAG repository](https://github.com/your-org/trail-rag-1) or contact the development team.
---
*This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.*