OloriBern's picture
Upload TrailRAG cross-encoder model for hotpotqa
f20943c verified
metadata
language: en
library_name: sentence-transformers
license: mit
pipeline_tag: sentence-similarity
tags:
  - cross-encoder
  - regression
  - trail-rag
  - pathfinder-rag
  - hotpotqa
  - multi-hop-question-answering
  - sentence-transformers
model-index:
  - name: trailrag-cross-encoder-hotpotqa-enhanced
    results:
      - task:
          type: question-answering
        dataset:
          name: HotpotQA
          type: hotpotqa
        metrics:
          - type: mse
            value: 0.0557947916534922
          - type: mae
            value: 0.1418474710541999
          - type: rmse
            value: 0.2362092116186248
          - type: r2_score
            value: 0.6484965021143569
          - type: pearson_correlation
            value: 0.8754595236036868
          - type: spearman_correlation
            value: 0.8618191776300459

TrailRAG Cross-Encoder: HotpotQA Enhanced

This is a fine-tuned cross-encoder model specifically optimized for Multi-hop Question Answering tasks, trained as part of the PathfinderRAG research project.

Model Details

  • Model Type: Cross-Encoder for Regression (continuous similarity scores)
  • Base Model: cross-encoder/ms-marco-MiniLM-L-6-v2
  • Training Dataset: HotpotQA (Complex reasoning dataset requiring multi-step inference)
  • Task: Multi-hop Question Answering
  • Library: sentence-transformers
  • License: MIT

Performance Metrics

Final Regression Metrics

Metric Value Description
MSE 0.055795 Mean Squared Error (lower is better)
MAE 0.141847 Mean Absolute Error (lower is better)
RMSE 0.236209 Root Mean Squared Error (lower is better)
R² Score 0.648497 Coefficient of determination (higher is better)
Pearson Correlation 0.875460 Linear correlation (higher is better)
Spearman Correlation 0.861819 Rank correlation (higher is better)

Training Details

  • Training Duration: 28 minutes
  • Epochs: 8
  • Early Stopping: No
  • Best Correlation Score: 0.936744
  • Final MSE: 0.055795

Training Configuration

  • Batch Size: 16
  • Learning Rate: 2e-05
  • Max Epochs: 8
  • Weight Decay: 0.01
  • Warmup Steps: 150

Usage

This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs.

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import CrossEncoder

# Load the model
model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

# Example usage
pairs = [
    ['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'],
    ['What is artificial intelligence?', 'Paris is the capital of France.']
]

# Get similarity scores (continuous values, not binary)
scores = model.predict(pairs)
print(scores)  # Higher scores indicate better semantic match

Advanced Usage in PathfinderRAG

from sentence_transformers import CrossEncoder

# Initialize for PathfinderRAG exploration
cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

def score_query_document_pair(query: str, document: str) -> float:
    """Score a query-document pair for relevance."""
    score = cross_encoder.predict([[query, document]])[0]
    return float(score)

# Use in document exploration
query = "Your research query"
documents = ["Document 1 text", "Document 2 text", ...]

# Score all pairs
scores = cross_encoder.predict([[query, doc] for doc in documents])
ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)

Training Process

This model was trained using regression metrics (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on:

  1. Data Quality: Used authentic HotpotQA examples with careful contamination filtering
  2. Regression Approach: Avoided binary classification, maintaining continuous label distribution
  3. Correlation Optimization: Maximized Spearman correlation for effective ranking
  4. Scientific Rigor: All metrics derived from real training runs without simulation

Why Regression Over Classification?

Cross-encoders for information retrieval should predict continuous similarity scores, not binary classifications. This approach:

  • Preserves fine-grained similarity distinctions
  • Enables better ranking and document selection
  • Provides more informative scores for downstream applications
  • Aligns with the mathematical foundation of information retrieval

Dataset

HotpotQA: Complex reasoning dataset requiring multi-step inference

  • Task Type: Multi-hop Question Answering
  • Training Examples: 1,000 high-quality pairs
  • Validation Split: 20% (200 examples)
  • Quality Threshold: ≥0.70 (authentic TrailRAG metrics)
  • Contamination: Zero overlap between splits

Limitations

  • Optimized specifically for multi-hop question answering tasks
  • Performance may vary on out-of-domain data
  • Requires sentence-transformers library for inference
  • CPU-based training (GPU optimization available for future versions)

Citation

@misc{trailrag-cross-encoder-hotpotqa,
  title = {TrailRAG Cross-Encoder: HotpotQA Enhanced},
  author = {PathfinderRAG Team},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced}
}

Model Card Contact

For questions about this model, please open an issue in the PathfinderRAG repository or contact the development team.


This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.