AstroMLab

non-profit
Activity Feed

AI & ML interests

AstroMLab is a collaborative initiative of astronomers and AI experts dedicated to advancing Large Language Models in astronomy. Our goal is to expedite scientific discovery through LLM-driven research.

Recent Activity

Tijmen2  updated a Space 1 day ago
AstroMLab/AstroSage-8B
tingyuansen  updated a Space 1 day ago
AstroMLab/README
Tijmen2  updated a model 2 days ago
AstroMLab/AstroSage-70B
View all activity

AstroMLab

AstroMLab is a diverse group of researchers dedicated to advancing the application of Large Language Models (LLMs) in astronomy. Our team includes:

  • Leading astronomers, astrophysicists, and cosmologists.
  • Natural language processing experts.
  • Frontier arXivists from the NASA Astrophysics Data System

Objectives

  • Develop specialized LLMs for astronomy
  • Create open-source models for advanced research
  • Facilitate LLM-driven end-to-end agentic research in astronomy

Current Work

Our ongoing projects include:

  • Curation of an astronomy-based benchmarking dataset
  • Development of specialized astronomy LLMs
  • Performance evaluation of models on astronomical tasks

Models and Performance

We have developed several models, including AstroSage-LLaMA-3.1-70B (de Haan et al. 2025b) AstroSage-LLaMA-3.1-8B (de Haan et al. 2025a), AstroLLaMA-2-70B (Pan et al. 2024), and AstroLLaMA-3-8B (Pan et al. 2024). Our AstroSage-LLaMA-3.1-8B model has demonstrated strong performance in astronomy Q&A tasks (Ting et al. 2024):

Model Score (%)
AstroSage-LLaMA-3.1-70B (AstroMLab) 86.2
Claude-4-Opus 86.3
o3 85.4
Claude-4-Sonnet 85.0
GPT-4.1 84.7
o4-Mini 84.7
Gemini-2.5-Pro 84.8
Deepseek-R1 84.4
Qwen-3-235B 84.0
LLaMA-4-Maverick 83.4
Deepseek-v3-2503 82.9
Gemini-2.5-Flash-0520 82.3
LLaMA-4-Scout 82.2
Grok-3 81.7
Mistral-Medium-v3 81.8
AstroSage-LLaMA-3.1-8B (AstroMLab) 80.9
Mistral-Large-v2 80.8
Qwen-3-32B 79.7
Mistral-Small-v3.1 78.6
GPT-4.1-Nano 78.0
Gemini-2-Flash-Lite 78.4
Gemma-3-27B 76.9
Qwen-3-14B 76.4
AstroLLaMA-2-7B 44.3

As of this writing in May 2025, AstroSage-LLaMA-3.1-70B (de Haan et al. 2025b) achieves among the highest scores on AstroBench (Ting et al. 2024), tying with Claude-4-Opus and outperforming other leading models including GPT-4.1, o3, Gemini-2.5-Pro, and Claude-4-Sonnet.

Cost and performance trade-off in AstroBench

Support and Resources

Our research benefits from:

  • Access to the Frontier nodes at Oak Ridge Leadership Computing Facility
  • Support from Microsoft's Accelerating Foundation Models Research (AFMR) program

Contact

For inquiries or collaboration opportunities, please contact: [email protected]