Instructions to use crumb/nano-mistral with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use crumb/nano-mistral with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="crumb/nano-mistral")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral") model = AutoModelForMultimodalLM.from_pretrained("crumb/nano-mistral") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use crumb/nano-mistral with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "crumb/nano-mistral" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crumb/nano-mistral", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/crumb/nano-mistral
- SGLang
How to use crumb/nano-mistral with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "crumb/nano-mistral" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crumb/nano-mistral", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "crumb/nano-mistral" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crumb/nano-mistral", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use crumb/nano-mistral with Docker Model Runner:
docker model run hf.co/crumb/nano-mistral
| library_name: transformers | |
| license: apache-2.0 | |
| datasets: | |
| - crumb/askmistral-pile-2-15 | |
| language: | |
| - en | |
| # Model Card for Model ID | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| ## Model Details | |
| ### Model Description | |
| <!-- Provide a longer summary of what this model is. --> | |
| This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. | |
| - **Developed by:** me | |
| - **Model type:** Mistral | |
| - **Language(s) (NLP):** en | |
| - **License:** apache | |
| ## Uses | |
| general web text completions at extremely low resource use | |
| ### Out-of-Scope Use | |
| not an instruct model | |
| ## Bias, Risks, and Limitations | |
| trained on web text, though filtered no guarantees theres not toxic stuff in there | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral") | |
| tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral") | |
| inputs = tokenizer(["Once upon a time,"], return_tensors="pt") | |
| inputs = {k:v.to(model.device) for k,v in dict(inputs).items()} | |
| outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True) | |
| outputs = tokenizer.batch_decode(outputs) | |
| for i in outputs: | |
| print(i) | |
| ``` | |
| ## Training Details | |
| ### Training Data | |
| [crumb/askmistral-pile-2-15](https://huggingface.co/datasets/crumb/askmistral-pile-2-15) | |
| ### Training Procedure | |
| | Parameter | Value | | |
| | - | - | | |
| | Context Length | 2048 | | |
| | Batch Size | 128 | | |
| | Learning Rate | 6e-4 | | |
| | Scheduler | One-Cycle | | |
| | Adam eps | 1e-8 | | |
| | Adam beta1 | 0.9 | | |
| | Adam beta2 | 0.95 | | |
| | Weight Decay | 0.1 | | |
| | Max Grad Norm | 1.0 | | |
| | Optimizer | adamw_torch | | |
| | Tokens | 3,401,640,960 | | |
| #### Preprocessing [optional] | |
| [More Information Needed] | |
| #### Training Hyperparameters | |
| - **Training regime:** bf16 non-mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> | |
| #### Speeds, Sizes, Times [optional] | |
| train_runtime 62541.9424 | |
| train_samples_per_second 26.557 | |
| [More Information Needed] | |
| ## Evaluation | |
| ### Testing Data, Factors & Metrics | |
| #### Testing Data | |
| held out set of [crumb/askmistral-pile-2-15](https://huggingface.co/datasets/crumb/askmistral-pile-2-15) | |
| #### Factors | |
| <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> | |
| [More Information Needed] | |
| #### Metrics | |
| <!-- These are the evaluation metrics being used, ideally with a description of why. --> | |
| open llm leaderboard eval datasets and settings | |
| ### Results | |
| OpenLLM Leaderboard Mean Score + Stderr: | |
| (29.30, 0.42) | |
| | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| | |
| |-------------|------:|------|-----:|--------|-----:|---|-----:| | |
| |arc_challenge| 1|none | 25|acc |0.1843|± |0.0113| | |
| | | |none | 25|acc_norm|0.2167|± |0.0120| | |
| |truthfulqa_mc2| 2|none | 0|acc |0.4719|± |0.0156| | |
| |winogrande| 1|none | 5|acc |0.517|± | 0.014| | |
| |hellaswag| 1|none | 10|acc |0.2803|± |0.0045| | |
| | | |none | 10|acc_norm|0.2886|± |0.0045| | |
| |gsm8k| 3|strict-match | 5|exact_match|0.0008|± |0.0008| | |
| | | |flexible-extract| 5|exact_match|0.0099|± |0.0027| | |
| #### MMLU | |
| value, stderr = (0.253980701754386, 0.004428598058450528) | |
| | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| | |
| |-----------------------------------|------:|------|-----:|------|-----:|---|-----:| | |
| |world_religions | 0|none | 5|acc |0.2222|± |0.0319| | |
| |virology | 0|none | 5|acc |0.2711|± |0.0346| | |
| |us_foreign_policy | 0|none | 5|acc |0.3300|± |0.0473| | |
| |sociology | 0|none | 5|acc |0.2388|± |0.0301| | |
| |security_studies | 0|none | 5|acc |0.2367|± |0.0272| | |
| |public_relations | 0|none | 5|acc |0.2273|± |0.0401| | |
| |professional_psychology | 0|none | 5|acc |0.2484|± |0.0175| | |
| |professional_medicine | 0|none | 5|acc |0.4596|± |0.0303| | |
| |professional_law | 0|none | 5|acc |0.2464|± |0.0110| | |
| |professional_accounting | 0|none | 5|acc |0.2021|± |0.0240| | |
| |prehistory | 0|none | 5|acc |0.2130|± |0.0228| | |
| |philosophy | 0|none | 5|acc |0.2219|± |0.0236| | |
| |nutrition | 0|none | 5|acc |0.2157|± |0.0236| | |
| |moral_scenarios | 0|none | 5|acc |0.2380|± |0.0142| | |
| |moral_disputes | 0|none | 5|acc |0.2486|± |0.0233| | |
| |miscellaneous | 0|none | 5|acc |0.2516|± |0.0155| | |
| |medical_genetics | 0|none | 5|acc |0.3000|± |0.0461| | |
| |marketing | 0|none | 5|acc |0.2265|± |0.0274| | |
| |management | 0|none | 5|acc |0.1748|± |0.0376| | |
| |machine_learning | 0|none | 5|acc |0.3125|± |0.0440| | |
| |logical_fallacies | 0|none | 5|acc |0.2393|± |0.0335| | |
| |jurisprudence | 0|none | 5|acc |0.2315|± |0.0408| | |
| |international_law | 0|none | 5|acc |0.3140|± |0.0424| | |
| |human_sexuality | 0|none | 5|acc |0.2519|± |0.0381| | |
| |human_aging | 0|none | 5|acc |0.3049|± |0.0309| | |
| |high_school_world_history | 0|none | 5|acc |0.2658|± |0.0288| | |
| |high_school_us_history | 0|none | 5|acc |0.2451|± |0.0302| | |
| |high_school_statistics | 0|none | 5|acc |0.4722|± |0.0340| | |
| |high_school_psychology | 0|none | 5|acc |0.1963|± |0.0170| | |
| |high_school_physics | 0|none | 5|acc |0.3046|± |0.0376| | |
| |high_school_microeconomics | 0|none | 5|acc |0.2773|± |0.0291| | |
| |high_school_mathematics | 0|none | 5|acc |0.2667|± |0.0270| | |
| |high_school_macroeconomics | 0|none | 5|acc |0.2667|± |0.0224| | |
| |high_school_government_and_politics| 0|none | 5|acc |0.2591|± |0.0316| | |
| |high_school_geography | 0|none | 5|acc |0.2424|± |0.0305| | |
| |high_school_european_history | 0|none | 5|acc |0.2242|± |0.0326| | |
| |high_school_computer_science | 0|none | 5|acc |0.2800|± |0.0451| | |
| |high_school_chemistry | 0|none | 5|acc |0.2857|± |0.0318| | |
| |high_school_biology | 0|none | 5|acc |0.3129|± |0.0264| | |
| |global_facts | 0|none | 5|acc |0.1500|± |0.0359| | |
| |formal_logic | 0|none | 5|acc |0.1905|± |0.0351| | |
| |elementary_mathematics | 0|none | 5|acc |0.2513|± |0.0223| | |
| |electrical_engineering | 0|none | 5|acc |0.2759|± |0.0372| | |
| |econometrics | 0|none | 5|acc |0.2456|± |0.0405| | |
| |conceptual_physics | 0|none | 5|acc |0.2638|± |0.0288| | |
| |computer_security | 0|none | 5|acc |0.1800|± |0.0386| | |
| |college_physics | 0|none | 5|acc |0.2549|± |0.0434| | |
| |college_medicine | 0|none | 5|acc |0.2023|± |0.0306| | |
| |college_mathematics | 0|none | 5|acc |0.2900|± |0.0456| | |
| |college_computer_science | 0|none | 5|acc |0.2700|± |0.0446| | |
| |college_chemistry | 0|none | 5|acc |0.2500|± |0.0435| | |
| |college_biology | 0|none | 5|acc |0.2222|± |0.0348| | |
| |clinical_knowledge | 0|none | 5|acc |0.2377|± |0.0262| | |
| |business_ethics | 0|none | 5|acc |0.2100|± |0.0409| | |
| |astronomy | 0|none | 5|acc |0.1776|± |0.0311| | |
| |anatomy | 0|none | 5|acc |0.2593|± |0.0379| | |
| |abstract_algebra | 0|none | 5|acc |0.2200|± |0.0416| | |
| #### Summary | |
| ## Model Examination [optional] | |
| its ok | |
| ## Environmental Impact | |
| <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| - **Hardware Type:** A6000 | |
| - **Hours used:** 34.74 | |
| - **Cloud Provider:** n/a | |
| - **Compute Region** iowa | |
| - **Carbon Emitted:** 4.5kg CO2eq. | |
| ## Technical Specifications [optional] | |
| ### Model Architecture and Objective | |
| mistral, causal language modelling | |
| ### Compute Infrastructure | |
| what | |
| #### Hardware | |
| lambda vector 2xA6000 | |
| #### Software | |
| huggingface transformers / pytorch / custom trainer | |
| ## Citation [optional] | |
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> | |
| **BibTeX:** | |
| [More Information Needed] | |
| **APA:** | |
| [More Information Needed] | |
| ## Glossary [optional] | |
| <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> | |
| [More Information Needed] | |
| ## More Information [optional] | |
| [More Information Needed] | |
| ## Model Card Authors [optional] | |
| [More Information Needed] | |
| ## Model Card Contact | |
| [More Information Needed] |