NONET
NONET is a family of offline, quantized large language models fine-tuned for question answering with direct, concise answers. Designed for local execution using llama.cpp
, NONET is available in multiple sizes and optimized for Android or Python-based environments.
Model Details
Model Description
NONET is intended for lightweight offline use, particularly on local devices like mobile phones or single-board computers. The models have been fine-tuned for direct-answer QA and quantized to int8 (q8_0) using llama.cpp
.
Model Name | Base Model | Size |
---|---|---|
ChatNONET-135m-tuned-q8_0.gguf | Smollm | 135M |
ChatNONET-300m-tuned-q8_0.gguf | Smollm | 300M |
ChatNONET-1B-tuned-q8_0.gguf | LLaMA 3.2 | 1B |
ChatNONET-3B-tuned-q8_0.gguf | LLaMA 3.2 | 3B |
- Developed by: McaTech (Michael Cobol Agan)
- Model type: Causal decoder-only transformer
- Languages: English
- License: Apache 2.0
- Finetuned from:
- Smollm (135M, 300M variants)
- LLaMA 3.2 (1B, 3B variants)
Uses
Direct Use
- Offline QA chatbot
- Local assistants (no internet required)
- Embedded Android or Python apps
Out-of-Scope Use
- Long-form text generation
- Tasks requiring real-time web access
- Creative storytelling or coding tasks
Bias, Risks, and Limitations
NONET may reproduce biases present in its base models or fine-tuning data. Outputs should not be relied upon for sensitive or critical decisions.
Recommendations
- Validate important responses
- Choose model size based on your device capability
- Avoid over-reliance for personal or legal advice
How to Get Started with the Model
For Android Devices
- Try the Android app: Download ChatNONET APK
You can also build llama.cpp your own and run it
# Clone llama.cpp and build it
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
# Run the model
./llama-cli -m ./ChatNONET-300m-tuned-q8_0.gguf -p "You are ChatNONET AI assistant." -cnv
Training Details
- Finetuning Goal: Direct-answer question answering
- Precision: FP16 mixed precision
- Frameworks: PyTorch, Transformers, Bitsandbytes
- Quantization: int8 GGUF (
q8_0
) viallama.cpp
Evaluation
- Evaluated internally on short QA prompts
- Capable of direct factual or logical answers
- Larger models perform better on reasoning tasks
Technical Specifications
Architecture:
- Smollm (135M, 300M)
- LLaMA 3.2 (1B, 3B)
Format: GGUF
Quantization: q8_0 (int8)
Deployment: Mobile (Android) and desktop via
llama.cpp
Citation
@misc{chatnonet2025,
title={ChatNONET: Offline Quantized Q&A Models},
author={Michael Cobol Agan},
year={2025},
note={\url{https://huggingface.co/McaTech/Nonet}},
}
Contact
- Author: Michael Cobol Agan (McaTech)
- Facebook: FB Profile
- Downloads last month
- 6
Hardware compatibility
Log In
to view the estimation
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support