|
--- |
|
language: en |
|
library_name: sentence-transformers |
|
license: mit |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- cross-encoder |
|
- regression |
|
- trail-rag |
|
- pathfinder-rag |
|
- hotpotqa |
|
- multi-hop-question-answering |
|
- sentence-transformers |
|
model-index: |
|
- name: trailrag-cross-encoder-hotpotqa-enhanced |
|
results: |
|
- task: |
|
type: question-answering |
|
dataset: |
|
name: HotpotQA |
|
type: hotpotqa |
|
metrics: |
|
- type: mse |
|
value: 0.0557947916534922 |
|
- type: mae |
|
value: 0.1418474710541999 |
|
- type: rmse |
|
value: 0.2362092116186248 |
|
- type: r2_score |
|
value: 0.6484965021143569 |
|
- type: pearson_correlation |
|
value: 0.8754595236036868 |
|
- type: spearman_correlation |
|
value: 0.8618191776300459 |
|
--- |
|
|
|
# TrailRAG Cross-Encoder: HotpotQA Enhanced |
|
|
|
This is a fine-tuned cross-encoder model specifically optimized for **Multi-hop Question Answering** tasks, trained as part of the PathfinderRAG research project. |
|
|
|
## Model Details |
|
|
|
- **Model Type**: Cross-Encoder for Regression (continuous similarity scores) |
|
- **Base Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2` |
|
- **Training Dataset**: HotpotQA (Complex reasoning dataset requiring multi-step inference) |
|
- **Task**: Multi-hop Question Answering |
|
- **Library**: sentence-transformers |
|
- **License**: MIT |
|
|
|
## Performance Metrics |
|
|
|
### Final Regression Metrics |
|
|
|
| Metric | Value | Description | |
|
|--------|-------|-------------| |
|
| **MSE** | **0.055795** | Mean Squared Error (lower is better) | |
|
| **MAE** | **0.141847** | Mean Absolute Error (lower is better) | |
|
| **RMSE** | **0.236209** | Root Mean Squared Error (lower is better) | |
|
| **R² Score** | **0.648497** | Coefficient of determination (higher is better) | |
|
| **Pearson Correlation** | **0.875460** | Linear correlation (higher is better) | |
|
| **Spearman Correlation** | **0.861819** | Rank correlation (higher is better) | |
|
|
|
### Training Details |
|
|
|
- **Training Duration**: 28 minutes |
|
- **Epochs**: 8 |
|
- **Early Stopping**: No |
|
- **Best Correlation Score**: 0.936744 |
|
- **Final MSE**: 0.055795 |
|
|
|
### Training Configuration |
|
|
|
- **Batch Size**: 16 |
|
- **Learning Rate**: 2e-05 |
|
- **Max Epochs**: 8 |
|
- **Weight Decay**: 0.01 |
|
- **Warmup Steps**: 150 |
|
|
|
## Usage |
|
|
|
This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs. |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install sentence-transformers |
|
``` |
|
|
|
### Basic Usage |
|
|
|
```python |
|
from sentence_transformers import CrossEncoder |
|
|
|
# Load the model |
|
model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced') |
|
|
|
# Example usage |
|
pairs = [ |
|
['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'], |
|
['What is artificial intelligence?', 'Paris is the capital of France.'] |
|
] |
|
|
|
# Get similarity scores (continuous values, not binary) |
|
scores = model.predict(pairs) |
|
print(scores) # Higher scores indicate better semantic match |
|
``` |
|
|
|
### Advanced Usage in PathfinderRAG |
|
|
|
```python |
|
from sentence_transformers import CrossEncoder |
|
|
|
# Initialize for PathfinderRAG exploration |
|
cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced') |
|
|
|
def score_query_document_pair(query: str, document: str) -> float: |
|
"""Score a query-document pair for relevance.""" |
|
score = cross_encoder.predict([[query, document]])[0] |
|
return float(score) |
|
|
|
# Use in document exploration |
|
query = "Your research query" |
|
documents = ["Document 1 text", "Document 2 text", ...] |
|
|
|
# Score all pairs |
|
scores = cross_encoder.predict([[query, doc] for doc in documents]) |
|
ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True) |
|
``` |
|
|
|
## Training Process |
|
|
|
This model was trained using **regression metrics** (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on: |
|
|
|
1. **Data Quality**: Used authentic HotpotQA examples with careful contamination filtering |
|
2. **Regression Approach**: Avoided binary classification, maintaining continuous label distribution |
|
3. **Correlation Optimization**: Maximized Spearman correlation for effective ranking |
|
4. **Scientific Rigor**: All metrics derived from real training runs without simulation |
|
|
|
### Why Regression Over Classification? |
|
|
|
Cross-encoders for information retrieval should predict **continuous similarity scores**, not binary classifications. This approach: |
|
|
|
- Preserves fine-grained similarity distinctions |
|
- Enables better ranking and document selection |
|
- Provides more informative scores for downstream applications |
|
- Aligns with the mathematical foundation of information retrieval |
|
|
|
## Dataset |
|
|
|
**HotpotQA**: Complex reasoning dataset requiring multi-step inference |
|
|
|
- **Task Type**: Multi-hop Question Answering |
|
- **Training Examples**: 1,000 high-quality pairs |
|
- **Validation Split**: 20% (200 examples) |
|
- **Quality Threshold**: ≥0.70 (authentic TrailRAG metrics) |
|
- **Contamination**: Zero overlap between splits |
|
|
|
## Limitations |
|
|
|
- Optimized specifically for multi-hop question answering tasks |
|
- Performance may vary on out-of-domain data |
|
- Requires sentence-transformers library for inference |
|
- CPU-based training (GPU optimization available for future versions) |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{trailrag-cross-encoder-hotpotqa, |
|
title = {TrailRAG Cross-Encoder: HotpotQA Enhanced}, |
|
author = {PathfinderRAG Team}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
For questions about this model, please open an issue in the [PathfinderRAG repository](https://github.com/your-org/trail-rag-1) or contact the development team. |
|
|
|
--- |
|
|
|
*This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.* |