Spaces:
Runtime error
Runtime error
# Evaluation on STEM Benchmarks | |
To test Minerva’s quantitative reasoning abilities we evaluated the model on STEM benchmarks ranging in difficulty from grade school level problems to graduate level coursework. | |
1. MATH: High school math competition level problems | |
2. MMLU-STEM: A subset of the Massive Multitask Language Understanding benchmark focused on STEM, covering topics such as engineering, chemistry, math, and physics at high school and college level. | |
3. GSM8k: Grade school level math problems involving basic arithmetic operations that should all be solvable by a talented middle school student. | |
We also evaluated Minerva on OCWCourses, a collection of college and graduate level problems covering a variety of STEM topics such as solid state chemistry, astronomy, differential equations, and special relativity that we collected from MIT OpenCourseWare. | |
In all cases, Minerva obtains state-of-the-art results, sometimes by a wide margin. | |
Reference: https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html | |