--- license: apache-2.0 language: - en metrics: - accuracy pipeline_tag: image-text-to-text tags: - mathematics - reasoning - multi-modal-qa - math-qa - figure-qa - geometry-qa - math-word-problem - textbook-qa - vqa - geometry-diagram - synthetic-scene - chart - plot - scientific-figure - table - function-plot - abstract-scene - puzzle-test - document-image - science library_name: transformers base_model: - OpenGVLab/InternVL2-8B datasets: - MathLLMs/MM-MathInstruct --- # MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Repo: [https://github.com/mathllm/MathCoder](https://github.com/mathllm/MathCoder) Paper: [https://huggingface.co/papers/2505.10557](https://huggingface.co/papers/2505.10557) ## Introduction We introduce MathCoder-VL, a series of open-source large multimodal models (LMMs) specifically tailored for general math problem-solving. We also introduce [FigCodifier-8B](https://huggingface.co/MathLLMs/FigCodifier), an image-to-code model. | Base Model |Ours | |-------------------------------------------------------------------|-----------------------------------------------------------------------| | [Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5) | [MathCoder-VL-2B](https://huggingface.co/MathLLMs/MathCoder-VL-2B) | | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) | [MathCoder-VL-8B](https://huggingface.co/MathLLMs/MathCoder-VL-8B)| | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) | [FigCodifier-8B](https://huggingface.co/MathLLMs/FigCodifier)| ## Usage For training and inference code, please refer to [InternVL](https://github.com/OpenGVLab/InternVL). ``` from datasets import load_dataset from PIL import Image from io import BytesIO mm_mathinstruct = load_dataset("MathLLMs/MM-MathInstruct") print(mm_mathinstruct) # show the last image img = Image.open(BytesIO(mm_mathinstruct['train'][-1]['image'])) img.show() ``` It should print: ``` DatasetDict({ train: Dataset({ features: ['id', 'image', 'question', 'solution', 'image_path'], num_rows: 2871988 }) }) ``` ## Motivation