LeanStateSearch2025.3 / README.md

Improve model card: Add pipeline tag, library, license & usage

24b9770 verified about 2 months ago

2.53 kB

	---
	pipeline_tag: text-ranking
	library_name: sentence-transformers
	license: mit
	---

	# Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever

	## Model Description

	This model is the first version designed for premise retrieval in Lean, based on the state representation of Lean. The model follows the architecture described in the paper:

	[Assisting Mathematical Formalization with A Learning-based Premise Retriever](https://arxiv.org/abs/2501.13959)

	The model implementation and code are available at:

	[GitHub Repository](https://github.com/ruc-ai4math/Premise-Retrieval)

	[Try our model](https://premise-search.com)

	## Usage

	You can use this model with the `sentence-transformers` library to embed queries and premises and then calculate their similarity for retrieval.

	```python
	from sentence_transformers import SentenceTransformer, util
	import torch

	# Load the pretrained model
	model = SentenceTransformer('ruc-ai4math/Lean_State_Search_Random')

	# Example Lean proof state (query) and a list of premises
	query = "<GOAL> (n : \u2115), n + 0 = n </GOAL>"
	premises = [
	"<VAR> (n : \u2115) </VAR> <GOAL> n + 0 = n </GOAL>",
	"<VAR> (n m : \u2115) </VAR> <GOAL> n + m = m + n </GOAL>",
	"<VAR> (n : \u2115) </VAR> <GOAL> n = n </GOAL>",
	"lemma add_zero (n : \u2115) : n + 0 = n := by sorry" # An actual Lean lemma
	]

	# Encode the query and premises into embeddings
	query_embedding = model.encode(query, convert_to_tensor=True)
	premise_embeddings = model.encode(premises, convert_to_tensor=True)

	# Calculate cosine similarity between the query and all premises
	cosine_scores = util.cos_sim(query_embedding, premise_embeddings)

	# Print the scores for each premise
	print("Similarity scores:")
	for i, score in enumerate(cosine_scores[0]):
	print(f" - Premise {i+1}: '{premises[i]}', Score: {score.item():.4f}")

	# Find the index of the premise with the highest similarity score
	best_match_idx = torch.argmax(cosine_scores).item()
	print(f"
	Best matching premise: '{premises[best_match_idx]}'")
	```

	## Citation
	If you use this model, please cite the following paper:

	```
	@misc{tao2025assistingmathematicalformalizationlearningbased,
	title={Assisting Mathematical Formalization with A Learning-based Premise Retriever},
	author={Yicheng Tao and Haotian Liu and Shanwen Wang and Hongteng Xu},
	year={2025},
	eprint={2501.13959},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2501.13959},
	}
	```