Upload TrailRAG cross-encoder model for hotpotqa

f20943c verified 24 days ago

5.79 kB

	---
	language: en
	library_name: sentence-transformers
	license: mit
	pipeline_tag: sentence-similarity
	tags:
	- cross-encoder
	- regression
	- trail-rag
	- pathfinder-rag
	- hotpotqa
	- multi-hop-question-answering
	- sentence-transformers
	model-index:
	- name: trailrag-cross-encoder-hotpotqa-enhanced
	results:
	- task:
	type: question-answering
	dataset:
	name: HotpotQA
	type: hotpotqa
	metrics:
	- type: mse
	value: 0.0557947916534922
	- type: mae
	value: 0.1418474710541999
	- type: rmse
	value: 0.2362092116186248
	- type: r2_score
	value: 0.6484965021143569
	- type: pearson_correlation
	value: 0.8754595236036868
	- type: spearman_correlation
	value: 0.8618191776300459
	---

	# TrailRAG Cross-Encoder: HotpotQA Enhanced

	This is a fine-tuned cross-encoder model specifically optimized for Multi-hop Question Answering tasks, trained as part of the PathfinderRAG research project.

	## Model Details

	- Model Type: Cross-Encoder for Regression (continuous similarity scores)
	- Base Model: `cross-encoder/ms-marco-MiniLM-L-6-v2`
	- Training Dataset: HotpotQA (Complex reasoning dataset requiring multi-step inference)
	- Task: Multi-hop Question Answering
	- Library: sentence-transformers
	- License: MIT

	## Performance Metrics

	### Final Regression Metrics

	\| Metric \| Value \| Description \|
	\|--------\|-------\|-------------\|
	\| MSE \| 0.055795 \| Mean Squared Error (lower is better) \|
	\| MAE \| 0.141847 \| Mean Absolute Error (lower is better) \|
	\| RMSE \| 0.236209 \| Root Mean Squared Error (lower is better) \|
	\| R² Score \| 0.648497 \| Coefficient of determination (higher is better) \|
	\| Pearson Correlation \| 0.875460 \| Linear correlation (higher is better) \|
	\| Spearman Correlation \| 0.861819 \| Rank correlation (higher is better) \|

	### Training Details

	- Training Duration: 28 minutes
	- Epochs: 8
	- Early Stopping: No
	- Best Correlation Score: 0.936744
	- Final MSE: 0.055795

	### Training Configuration

	- Batch Size: 16
	- Learning Rate: 2e-05
	- Max Epochs: 8
	- Weight Decay: 0.01
	- Warmup Steps: 150

	## Usage

	This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs.

	### Installation

	```bash
	pip install sentence-transformers
	```

	### Basic Usage

	```python
	from sentence_transformers import CrossEncoder

	# Load the model
	model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

	# Example usage
	pairs = [
	['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'],
	['What is artificial intelligence?', 'Paris is the capital of France.']
	]

	# Get similarity scores (continuous values, not binary)
	scores = model.predict(pairs)
	print(scores) # Higher scores indicate better semantic match
	```

	### Advanced Usage in PathfinderRAG

	```python
	from sentence_transformers import CrossEncoder

	# Initialize for PathfinderRAG exploration
	cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

	def score_query_document_pair(query: str, document: str) -> float:
	"""Score a query-document pair for relevance."""
	score = cross_encoder.predict([[query, document]])[0]
	return float(score)

	# Use in document exploration
	query = "Your research query"
	documents = ["Document 1 text", "Document 2 text", ...]

	# Score all pairs
	scores = cross_encoder.predict([[query, doc] for doc in documents])
	ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
	```

	## Training Process

	This model was trained using regression metrics (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on:

	1. Data Quality: Used authentic HotpotQA examples with careful contamination filtering
	2. Regression Approach: Avoided binary classification, maintaining continuous label distribution
	3. Correlation Optimization: Maximized Spearman correlation for effective ranking
	4. Scientific Rigor: All metrics derived from real training runs without simulation

	### Why Regression Over Classification?

	Cross-encoders for information retrieval should predict continuous similarity scores, not binary classifications. This approach:

	- Preserves fine-grained similarity distinctions
	- Enables better ranking and document selection
	- Provides more informative scores for downstream applications
	- Aligns with the mathematical foundation of information retrieval

	## Dataset

	HotpotQA: Complex reasoning dataset requiring multi-step inference

	- Task Type: Multi-hop Question Answering
	- Training Examples: 1,000 high-quality pairs
	- Validation Split: 20% (200 examples)
	- Quality Threshold: ≥0.70 (authentic TrailRAG metrics)
	- Contamination: Zero overlap between splits

	## Limitations

	- Optimized specifically for multi-hop question answering tasks
	- Performance may vary on out-of-domain data
	- Requires sentence-transformers library for inference
	- CPU-based training (GPU optimization available for future versions)

	## Citation

	```bibtex
	@misc{trailrag-cross-encoder-hotpotqa,
	title = {TrailRAG Cross-Encoder: HotpotQA Enhanced},
	author = {PathfinderRAG Team},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced}
	}
	```

	## Model Card Contact

	For questions about this model, please open an issue in the [PathfinderRAG repository](https://github.com/your-org/trail-rag-1) or contact the development team.

	---

	This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.