|
# AstroMLab |
|
|
|
AstroMLab is a diverse group of researchers dedicated to advancing the application of Large Language Models (LLMs) in astronomy. Our team includes: |
|
- Leading astronomers, astrophysicists, and cosmologists. |
|
- Natural language processing experts. |
|
- Frontier arXivists from the NASA Astrophysics Data System |
|
|
|
## Objectives |
|
- Develop specialized LLMs for astronomy |
|
- Create open-source models for advanced research |
|
- Facilitate LLM-driven end-to-end agentic research in astronomy |
|
|
|
## Current Work |
|
|
|
Our ongoing projects include: |
|
|
|
- Curation of an astronomy-based benchmarking dataset |
|
- Development of specialized astronomy LLMs |
|
- Performance evaluation of models on astronomical tasks |
|
|
|
## Models and Performance |
|
|
|
We have developed several models, including AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) AstroSage-LLaMA-3.1-8B ([de Haan et al. 2025a](https://arxiv.org/abs/2411.09012)), AstroLLaMA-2-70B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)), and AstroLLaMA-3-8B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)). Our AstroSage-LLaMA-3.1-8B model has demonstrated strong performance in astronomy Q&A tasks ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)): |
|
|
|
| Model | Score (%) | |
|
|-------|-----------| |
|
| **AstroSage-LLaMA-3.1-70B (AstroMLab)** | **86.2** | |
|
| Claude-4-Opus | **86.3** | |
|
| o3 | 85.4 | |
|
| Claude-4-Sonnet | 85.0 | |
|
| GPT-4.1 | 84.7 | |
|
| o4-Mini | 84.7 | |
|
| Gemini-2.5-Pro | 84.8 | |
|
| Deepseek-R1 | 84.4 | |
|
| Qwen-3-235B | 84.0 | |
|
| LLaMA-4-Maverick | 83.4 | |
|
| Deepseek-v3-2503 | 82.9 | |
|
| Gemini-2.5-Flash-0520 | 82.3 | |
|
| LLaMA-4-Scout | 82.2 | |
|
| Grok-3 | 81.7 | |
|
| Mistral-Medium-v3 | 81.8 | |
|
| **AstroSage-LLaMA-3.1-8B (AstroMLab)** | **80.9** | |
|
| Mistral-Large-v2 | 80.8 | |
|
| Qwen-3-32B | 79.7 | |
|
| Mistral-Small-v3.1 | 78.6 | |
|
| GPT-4.1-Nano | 78.0 | |
|
| Gemini-2-Flash-Lite | 78.4 | |
|
| Gemma-3-27B | 76.9 | |
|
| Qwen-3-14B | 76.4 | |
|
| AstroLLaMA-2-7B | 44.3 | |
|
|
|
As of this writing in May 2025, AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) achieves among the highest scores on AstroBench ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)), tying with Claude-4-Opus and outperforming other leading models including GPT-4.1, o3, Gemini-2.5-Pro, and Claude-4-Sonnet. |
|
|
|
 |
|
|
|
|
|
## Support and Resources |
|
|
|
Our research benefits from: |
|
- Access to the Frontier nodes at Oak Ridge Leadership Computing Facility |
|
- Support from Microsoft's Accelerating Foundation Models Research (AFMR) program |
|
|
|
## Contact |
|
|
|
For inquiries or collaboration opportunities, please contact: [email protected] |