|
--- |
|
pipeline_tag: text-ranking |
|
library_name: sentence-transformers |
|
license: mit |
|
--- |
|
|
|
# Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever |
|
|
|
## Model Description |
|
|
|
This model is the first version designed for **premise retrieval** in **Lean**, based on the **state representation** of Lean. The model follows the architecture described in the paper: |
|
|
|
[Assisting Mathematical Formalization with A Learning-based Premise Retriever](https://arxiv.org/abs/2501.13959) |
|
|
|
The model implementation and code are available at: |
|
|
|
[GitHub Repository](https://github.com/ruc-ai4math/Premise-Retrieval) |
|
|
|
[Try our model](https://premise-search.com) |
|
|
|
## Usage |
|
|
|
You can use this model with the `sentence-transformers` library to embed queries and premises and then calculate their similarity for retrieval. |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer, util |
|
import torch |
|
|
|
# Load the pretrained model |
|
model = SentenceTransformer('ruc-ai4math/Lean_State_Search_Random') |
|
|
|
# Example Lean proof state (query) and a list of premises |
|
query = "<GOAL> (n : \u2115), n + 0 = n </GOAL>" |
|
premises = [ |
|
"<VAR> (n : \u2115) </VAR> <GOAL> n + 0 = n </GOAL>", |
|
"<VAR> (n m : \u2115) </VAR> <GOAL> n + m = m + n </GOAL>", |
|
"<VAR> (n : \u2115) </VAR> <GOAL> n = n </GOAL>", |
|
"lemma add_zero (n : \u2115) : n + 0 = n := by sorry" # An actual Lean lemma |
|
] |
|
|
|
# Encode the query and premises into embeddings |
|
query_embedding = model.encode(query, convert_to_tensor=True) |
|
premise_embeddings = model.encode(premises, convert_to_tensor=True) |
|
|
|
# Calculate cosine similarity between the query and all premises |
|
cosine_scores = util.cos_sim(query_embedding, premise_embeddings) |
|
|
|
# Print the scores for each premise |
|
print("Similarity scores:") |
|
for i, score in enumerate(cosine_scores[0]): |
|
print(f" - Premise {i+1}: '{premises[i]}', Score: {score.item():.4f}") |
|
|
|
# Find the index of the premise with the highest similarity score |
|
best_match_idx = torch.argmax(cosine_scores).item() |
|
print(f" |
|
Best matching premise: '{premises[best_match_idx]}'") |
|
``` |
|
|
|
## Citation |
|
If you use this model, please cite the following paper: |
|
|
|
``` |
|
@misc{tao2025assistingmathematicalformalizationlearningbased, |
|
title={Assisting Mathematical Formalization with A Learning-based Premise Retriever}, |
|
author={Yicheng Tao and Haotian Liu and Shanwen Wang and Hongteng Xu}, |
|
year={2025}, |
|
eprint={2501.13959}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2501.13959}, |
|
} |
|
``` |