|
--- |
|
base_model: unsloth/LFM2-1.2B |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- lfm2 |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- ajibawa-2023/Software-Architecture |
|
--- |
|
|
|
# SoftwareArchitecture-Instruct-v1 |
|
|
|
<img src="banner.png" width="800"/> |
|
|
|
**Domain:** Software Architecture (for technical professionals) |
|
**Type:** Instruction-tuned LLM |
|
**Base:** LiquidAI/LFM2-1.2B (1.2 B parameter hybrid edge-optimized model) :contentReference[oaicite:1]{index=1} |
|
**Fine-tuned on:** `ajibawa-2023/Software-Architecture` dataset |
|
**Author:** Mohamed Yasser (`yasserrmd`) |
|
|
|
--- |
|
|
|
## Model Description |
|
|
|
**SoftwareArchitecture-Instruct-v1** is an instruction-tuned adaptation of LiquidAI’s lightweight and efficient **LFM2-1.2B** model. It’s specifically tailored to deliver high-quality, accurate, and technically rich responses to questions about **software architecture**—designed with engineers and architects in mind. |
|
|
|
The base model, LFM2-1.2B, features a **16-layer hybrid design** (10 convolutional + 6 grouped query attention layers), supports a **32,768 token context**, and offers **fast inference on CPU, GPU, and NPU** platforms—ideal for both cloud and edge deployments :contentReference[oaicite:2]{index=2}. |
|
|
|
--- |
|
|
|
## Benchmark Summary |
|
|
|
We performed a 50-prompt benchmark across diverse software architecture topics: |
|
|
|
| Metric | Value | |
|
|------------------------------|----------------------| |
|
| Average Words per Response | ~144 | |
|
| Median Words per Response | ~139 | |
|
| Min / Max Words per Response | 47 / 224 | |
|
| Avg Sentences per Output | ~8.6 | |
|
| Lexical Diversity (TTR) | ~0.73 | |
|
| Readability Complexity | High (professional-level) | |
|
| Accuracy (topic keyword coverage) | Majority ≥ 60% | |
|
| Off-topic Responses | None detected | |
|
|
|
**Interpretation:** |
|
- Responses are **substantive and domain-appropriate** for technical audiences. |
|
- Coverage is strong—while a few answers could benefit from including extra keywords, the core technical content is accurate. |
|
- Readability intentionally leans into complexity, aligning with expert users. |
|
|
|
--- |
|
|
|
## Intended Use |
|
|
|
- **Ideal for:** Software architects, system designers, engineering leads, and experienced developers seeking architecture guidance. |
|
- **Use cases include:** |
|
- Exploring architectural patterns (e.g., CQRS, Saga, API Gateway). |
|
- Drafting design docs and decision rationale. |
|
- Architectural interview prep and system design walkthroughs. |
|
|
|
**Not intended for:** |
|
- Non-technical or general-purpose Q&A. |
|
- In-depth code generation or debugging without architectural focus. |
|
|
|
--- |
|
|
|
## Usage Example |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "yasserrmd/SoftwareArchitecture-Instruct-v1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") |
|
|
|
messages = [ |
|
{"role": "user", "content": "Explain the Saga pattern with orchestration and choreography."} |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=256, |
|
temperature=0.3, |
|
repetition_penalty=1.05 |
|
) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
```` |
|
|
|
--- |
|
|
|
##  Training Details |
|
|
|
* **Base model:** `LiquidAI/LFM2-1.2B`, optimized for edge/CPU inference ([ai.plainenglish.io][1], [generativeai.pub][2], [AI Models][3], [marktechpost.com][4], [Hugging Face][5]) |
|
* **Dataset:** `ajibawa‑2023/Software‑Architecture` |
|
* **Fine-tuning:** Supervised instruction tuning |
|
* *(Optionally include parameters if available—epochs, LR, hardware used)* |
|
|
|
--- |
|
|
|
##  Limitations |
|
|
|
* **Answer length is capped** by `max_new_tokens`. Some responses may truncate mid-explanation—raising this limit improves completeness. |
|
* **Keyword coverage is strong but not exhaustive.** A few responses could benefit from enriching with additional terms. |
|
* **Not a replacement** for expert-reviewed architectural validation—use as a support tool, not the final authority. |
|
|
|
--- |
|
|
|
##  License |
|
|
|
* **Base model license:** LFM Open License v1.0 ([Hugging Face][6]) |
|
* **Dataset license:** (Insert dataset license if known) |
|
|
|
--- |
|
|
|
## Author |
|
|
|
Mohamed Yasser – [Hugging Face profile](https://huggingface.co/yasserrmd) |