File size: 2,747 Bytes
ca740fe 51616ef d92d70b ca740fe d92d70b 51616ef d92d70b e45d635 51616ef d92d70b 51616ef d92d70b 51616ef d92d70b 51616ef d92d70b 51616ef 04c7a6a 51616ef 04c7a6a cb1c543 04c7a6a e45d635 48c33d6 51616ef d92d70b 51616ef d92d70b 51616ef d92d70b 51616ef d92d70b 51616ef d92d70b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# AstroMLab
AstroMLab is a diverse group of researchers dedicated to advancing the application of Large Language Models (LLMs) in astronomy. Our team includes:
- Leading astronomers, astrophysicists, and cosmologists.
- Natural language processing experts.
- Frontier arXivists from the NASA Astrophysics Data System
## Objectives
- Develop specialized LLMs for astronomy
- Create open-source models for advanced research
- Facilitate LLM-driven end-to-end agentic research in astronomy
## Current Work
Our ongoing projects include:
- Curation of an astronomy-based benchmarking dataset
- Development of specialized astronomy LLMs
- Performance evaluation of models on astronomical tasks
## Models and Performance
We have developed several models, including AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) AstroSage-LLaMA-3.1-8B ([de Haan et al. 2025a](https://arxiv.org/abs/2411.09012)), AstroLLaMA-2-70B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)), and AstroLLaMA-3-8B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)). Our AstroSage-LLaMA-3.1-8B model has demonstrated strong performance in astronomy Q&A tasks ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)):
| Model | Score (%) |
|-------|-----------|
| **AstroSage-LLaMA-3.1-70B (AstroMLab)** | **86.2** |
| Claude-4-Opus | **86.3** |
| o3 | 85.4 |
| Claude-4-Sonnet | 85.0 |
| GPT-4.1 | 84.7 |
| o4-Mini | 84.7 |
| Gemini-2.5-Pro | 84.8 |
| Deepseek-R1 | 84.4 |
| Qwen-3-235B | 84.0 |
| LLaMA-4-Maverick | 83.4 |
| Deepseek-v3-2503 | 82.9 |
| Gemini-2.5-Flash-0520 | 82.3 |
| LLaMA-4-Scout | 82.2 |
| Grok-3 | 81.7 |
| Mistral-Medium-v3 | 81.8 |
| **AstroSage-LLaMA-3.1-8B (AstroMLab)** | **80.9** |
| Mistral-Large-v2 | 80.8 |
| Qwen-3-32B | 79.7 |
| Mistral-Small-v3.1 | 78.6 |
| GPT-4.1-Nano | 78.0 |
| Gemini-2-Flash-Lite | 78.4 |
| Gemma-3-27B | 76.9 |
| Qwen-3-14B | 76.4 |
| AstroLLaMA-2-7B | 44.3 |
As of this writing in May 2025, AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) achieves among the highest scores on AstroBench ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)), tying with Claude-4-Opus and outperforming other leading models including GPT-4.1, o3, Gemini-2.5-Pro, and Claude-4-Sonnet.

## Support and Resources
Our research benefits from:
- Access to the Frontier nodes at Oak Ridge Leadership Computing Facility
- Support from Microsoft's Accelerating Foundation Models Research (AFMR) program
## Contact
For inquiries or collaboration opportunities, please contact: [email protected] |