Spaces:

AstroMLab
/

README

Configuration error

App Files Files Community

README / README.md

tingyuansen

Update README.md

48c33d6 verified 8 days ago

preview code

raw

history blame contribute delete

2.75 kB

	# AstroMLab

	AstroMLab is a diverse group of researchers dedicated to advancing the application of Large Language Models (LLMs) in astronomy. Our team includes:
	- Leading astronomers, astrophysicists, and cosmologists.
	- Natural language processing experts.
	- Frontier arXivists from the NASA Astrophysics Data System

	## Objectives
	- Develop specialized LLMs for astronomy
	- Create open-source models for advanced research
	- Facilitate LLM-driven end-to-end agentic research in astronomy

	## Current Work

	Our ongoing projects include:

	- Curation of an astronomy-based benchmarking dataset
	- Development of specialized astronomy LLMs
	- Performance evaluation of models on astronomical tasks

	## Models and Performance

	We have developed several models, including AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) AstroSage-LLaMA-3.1-8B ([de Haan et al. 2025a](https://arxiv.org/abs/2411.09012)), AstroLLaMA-2-70B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)), and AstroLLaMA-3-8B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)). Our AstroSage-LLaMA-3.1-8B model has demonstrated strong performance in astronomy Q&A tasks ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)):

	\| Model \| Score (%) \|
	\|-------\|-----------\|
	\| AstroSage-LLaMA-3.1-70B (AstroMLab) \| 86.2 \|
	\| Claude-4-Opus \| 86.3 \|
	\| o3 \| 85.4 \|
	\| Claude-4-Sonnet \| 85.0 \|
	\| GPT-4.1 \| 84.7 \|
	\| o4-Mini \| 84.7 \|
	\| Gemini-2.5-Pro \| 84.8 \|
	\| Deepseek-R1 \| 84.4 \|
	\| Qwen-3-235B \| 84.0 \|
	\| LLaMA-4-Maverick \| 83.4 \|
	\| Deepseek-v3-2503 \| 82.9 \|
	\| Gemini-2.5-Flash-0520 \| 82.3 \|
	\| LLaMA-4-Scout \| 82.2 \|
	\| Grok-3 \| 81.7 \|
	\| Mistral-Medium-v3 \| 81.8 \|
	\| AstroSage-LLaMA-3.1-8B (AstroMLab) \| 80.9 \|
	\| Mistral-Large-v2 \| 80.8 \|
	\| Qwen-3-32B \| 79.7 \|
	\| Mistral-Small-v3.1 \| 78.6 \|
	\| GPT-4.1-Nano \| 78.0 \|
	\| Gemini-2-Flash-Lite \| 78.4 \|
	\| Gemma-3-27B \| 76.9 \|
	\| Qwen-3-14B \| 76.4 \|
	\| AstroLLaMA-2-7B \| 44.3 \|

	As of this writing in May 2025, AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) achieves among the highest scores on AstroBench ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)), tying with Claude-4-Opus and outperforming other leading models including GPT-4.1, o3, Gemini-2.5-Pro, and Claude-4-Sonnet.

	![Cost and performance trade-off in AstroBench](https://cdn-uploads.huggingface.co/production/uploads/64f12d6e057f7e90416ce3c4/EW5taqz-hYtKSsFVK6xeF.png)


	## Support and Resources

	Our research benefits from:
	- Access to the Frontier nodes at Oak Ridge Leadership Computing Facility
	- Support from Microsoft's Accelerating Foundation Models Research (AFMR) program

	## Contact

	For inquiries or collaboration opportunities, please contact: [email protected]

	# AstroMLab

	AstroMLab is a diverse group of researchers dedicated to advancing the application of Large Language Models (LLMs) in astronomy. Our team includes:
	- Leading astronomers, astrophysicists, and cosmologists.
	- Natural language processing experts.
	- Frontier arXivists from the NASA Astrophysics Data System

	## Objectives
	- Develop specialized LLMs for astronomy
	- Create open-source models for advanced research
	- Facilitate LLM-driven end-to-end agentic research in astronomy

	## Current Work

	Our ongoing projects include:

	- Curation of an astronomy-based benchmarking dataset
	- Development of specialized astronomy LLMs
	- Performance evaluation of models on astronomical tasks

	## Models and Performance

	We have developed several models, including AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) AstroSage-LLaMA-3.1-8B ([de Haan et al. 2025a](https://arxiv.org/abs/2411.09012)), AstroLLaMA-2-70B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)), and AstroLLaMA-3-8B ([Pan et al. 2024](https://arxiv.org/abs/2409.19750)). Our AstroSage-LLaMA-3.1-8B model has demonstrated strong performance in astronomy Q&A tasks ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)):

	\| Model \| Score (%) \|
	\|-------\|-----------\|
	\| AstroSage-LLaMA-3.1-70B (AstroMLab) \| 86.2 \|
	\| Claude-4-Opus \| 86.3 \|
	\| o3 \| 85.4 \|
	\| Claude-4-Sonnet \| 85.0 \|
	\| GPT-4.1 \| 84.7 \|
	\| o4-Mini \| 84.7 \|
	\| Gemini-2.5-Pro \| 84.8 \|
	\| Deepseek-R1 \| 84.4 \|
	\| Qwen-3-235B \| 84.0 \|
	\| LLaMA-4-Maverick \| 83.4 \|
	\| Deepseek-v3-2503 \| 82.9 \|
	\| Gemini-2.5-Flash-0520 \| 82.3 \|
	\| LLaMA-4-Scout \| 82.2 \|
	\| Grok-3 \| 81.7 \|
	\| Mistral-Medium-v3 \| 81.8 \|
	\| AstroSage-LLaMA-3.1-8B (AstroMLab) \| 80.9 \|
	\| Mistral-Large-v2 \| 80.8 \|
	\| Qwen-3-32B \| 79.7 \|
	\| Mistral-Small-v3.1 \| 78.6 \|
	\| GPT-4.1-Nano \| 78.0 \|
	\| Gemini-2-Flash-Lite \| 78.4 \|
	\| Gemma-3-27B \| 76.9 \|
	\| Qwen-3-14B \| 76.4 \|
	\| AstroLLaMA-2-7B \| 44.3 \|

	As of this writing in May 2025, AstroSage-LLaMA-3.1-70B ([de Haan et al. 2025b](https://arxiv.org/abs/2505.17592)) achieves among the highest scores on AstroBench ([Ting et al. 2024](https://arxiv.org/abs/2407.11194)), tying with Claude-4-Opus and outperforming other leading models including GPT-4.1, o3, Gemini-2.5-Pro, and Claude-4-Sonnet.

	![Cost and performance trade-off in AstroBench](https://cdn-uploads.huggingface.co/production/uploads/64f12d6e057f7e90416ce3c4/EW5taqz-hYtKSsFVK6xeF.png)


	## Support and Resources

	Our research benefits from:
	- Access to the Frontier nodes at Oak Ridge Leadership Computing Facility
	- Support from Microsoft's Accelerating Foundation Models Research (AFMR) program

	## Contact

	For inquiries or collaboration opportunities, please contact: [email protected]