metadata

language: en
library_name: sentence-transformers
license: mit
pipeline_tag: sentence-similarity
tags:
  - cross-encoder
  - regression
  - trail-rag
  - pathfinder-rag
  - hotpotqa
  - multi-hop-question-answering
  - sentence-transformers
model-index:
  - name: trailrag-cross-encoder-hotpotqa-enhanced
    results:
      - task:
          type: question-answering
        dataset:
          name: HotpotQA
          type: hotpotqa
        metrics:
          - type: mse
            value: 0.0557947916534922
          - type: mae
            value: 0.1418474710541999
          - type: rmse
            value: 0.2362092116186248
          - type: r2_score
            value: 0.6484965021143569
          - type: pearson_correlation
            value: 0.8754595236036868
          - type: spearman_correlation
            value: 0.8618191776300459

TrailRAG Cross-Encoder: HotpotQA Enhanced

This is a fine-tuned cross-encoder model specifically optimized for Multi-hop Question Answering tasks, trained as part of the PathfinderRAG research project.

Model Details

Model Type: Cross-Encoder for Regression (continuous similarity scores)
Base Model: cross-encoder/ms-marco-MiniLM-L-6-v2
Training Dataset: HotpotQA (Complex reasoning dataset requiring multi-step inference)
Task: Multi-hop Question Answering
Library: sentence-transformers
License: MIT

Performance Metrics

Final Regression Metrics

Metric	Value	Description
MSE	0.055795	Mean Squared Error (lower is better)
MAE	0.141847	Mean Absolute Error (lower is better)
RMSE	0.236209	Root Mean Squared Error (lower is better)
R² Score	0.648497	Coefficient of determination (higher is better)
Pearson Correlation	0.875460	Linear correlation (higher is better)
Spearman Correlation	0.861819	Rank correlation (higher is better)

Training Details

Training Duration: 28 minutes
Epochs: 8
Early Stopping: No
Best Correlation Score: 0.936744
Final MSE: 0.055795

Training Configuration

Batch Size: 16
Learning Rate: 2e-05
Max Epochs: 8
Weight Decay: 0.01
Warmup Steps: 150

Usage

This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs.

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import CrossEncoder

# Load the model
model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

# Example usage
pairs = [
    ['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'],
    ['What is artificial intelligence?', 'Paris is the capital of France.']
]

# Get similarity scores (continuous values, not binary)
scores = model.predict(pairs)
print(scores)  # Higher scores indicate better semantic match

Advanced Usage in PathfinderRAG

from sentence_transformers import CrossEncoder

# Initialize for PathfinderRAG exploration
cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

def score_query_document_pair(query: str, document: str) -> float:
    """Score a query-document pair for relevance."""
    score = cross_encoder.predict([[query, document]])[0]
    return float(score)

# Use in document exploration
query = "Your research query"
documents = ["Document 1 text", "Document 2 text", ...]

# Score all pairs
scores = cross_encoder.predict([[query, doc] for doc in documents])
ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)

Training Process

This model was trained using regression metrics (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on:

Data Quality: Used authentic HotpotQA examples with careful contamination filtering
Regression Approach: Avoided binary classification, maintaining continuous label distribution
Correlation Optimization: Maximized Spearman correlation for effective ranking
Scientific Rigor: All metrics derived from real training runs without simulation

Why Regression Over Classification?

Cross-encoders for information retrieval should predict continuous similarity scores, not binary classifications. This approach:

Preserves fine-grained similarity distinctions
Enables better ranking and document selection
Provides more informative scores for downstream applications
Aligns with the mathematical foundation of information retrieval

Dataset

HotpotQA: Complex reasoning dataset requiring multi-step inference

Task Type: Multi-hop Question Answering
Training Examples: 1,000 high-quality pairs
Validation Split: 20% (200 examples)
Quality Threshold: ≥0.70 (authentic TrailRAG metrics)
Contamination: Zero overlap between splits

Limitations

Optimized specifically for multi-hop question answering tasks
Performance may vary on out-of-domain data
Requires sentence-transformers library for inference
CPU-based training (GPU optimization available for future versions)

Citation

@misc{trailrag-cross-encoder-hotpotqa,
  title = {TrailRAG Cross-Encoder: HotpotQA Enhanced},
  author = {PathfinderRAG Team},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced}
}

Model Card Contact

For questions about this model, please open an issue in the PathfinderRAG repository or contact the development team.

This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.