SoftwareArchitecture-Instruct-v1 / README.md

Update README.md

197cb35 verified 7 days ago

4.49 kB

	---
	base_model: unsloth/LFM2-1.2B
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- lfm2
	license: apache-2.0
	language:
	- en
	datasets:
	- ajibawa-2023/Software-Architecture
	---

	# SoftwareArchitecture-Instruct-v1

	<img src="banner.png" width="800"/>

	Domain: Software Architecture (for technical professionals)
	Type: Instruction-tuned LLM
	Base: LiquidAI/LFM2-1.2B (1.2 B parameter hybrid edge-optimized model) :contentReference[oaicite:1]{index=1}
	Fine-tuned on: `ajibawa-2023/Software-Architecture` dataset
	Author: Mohamed Yasser (`yasserrmd`)

	---

	## Model Description

	SoftwareArchitecture-Instruct-v1 is an instruction-tuned adaptation of LiquidAI’s lightweight and efficient LFM2-1.2B model. It’s specifically tailored to deliver high-quality, accurate, and technically rich responses to questions about software architecture—designed with engineers and architects in mind.

	The base model, LFM2-1.2B, features a 16-layer hybrid design (10 convolutional + 6 grouped query attention layers), supports a 32,768 token context, and offers fast inference on CPU, GPU, and NPU platforms—ideal for both cloud and edge deployments :contentReference[oaicite:2]{index=2}.

	---

	## Benchmark Summary

	We performed a 50-prompt benchmark across diverse software architecture topics:

	\| Metric \| Value \|
	\|------------------------------\|----------------------\|
	\| Average Words per Response \| ~144 \|
	\| Median Words per Response \| ~139 \|
	\| Min / Max Words per Response \| 47 / 224 \|
	\| Avg Sentences per Output \| ~8.6 \|
	\| Lexical Diversity (TTR) \| ~0.73 \|
	\| Readability Complexity \| High (professional-level) \|
	\| Accuracy (topic keyword coverage) \| Majority ≥ 60% \|
	\| Off-topic Responses \| None detected \|

	Interpretation:
	- Responses are substantive and domain-appropriate for technical audiences.
	- Coverage is strong—while a few answers could benefit from including extra keywords, the core technical content is accurate.
	- Readability intentionally leans into complexity, aligning with expert users.

	---

	## Intended Use

	- Ideal for: Software architects, system designers, engineering leads, and experienced developers seeking architecture guidance.
	- Use cases include:
	- Exploring architectural patterns (e.g., CQRS, Saga, API Gateway).
	- Drafting design docs and decision rationale.
	- Architectural interview prep and system design walkthroughs.

	Not intended for:
	- Non-technical or general-purpose Q&A.
	- In-depth code generation or debugging without architectural focus.

	---

	## Usage Example

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "yasserrmd/SoftwareArchitecture-Instruct-v1"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	messages = [
	{"role": "user", "content": "Explain the Saga pattern with orchestration and choreography."}
	]

	inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.3,
	repetition_penalty=1.05
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	````

	---

	## Training Details

	* Base model: `LiquidAI/LFM2-1.2B`, optimized for edge/CPU inference ([ai.plainenglish.io][1], [generativeai.pub][2], [AI Models][3], [marktechpost.com][4], [Hugging Face][5])
	* Dataset: `ajibawa‑2023/Software‑Architecture`
	* Fine-tuning: Supervised instruction tuning
	* (Optionally include parameters if available—epochs, LR, hardware used)

	---

	## Limitations

	* Answer length is capped by `max_new_tokens`. Some responses may truncate mid-explanation—raising this limit improves completeness.
	* Keyword coverage is strong but not exhaustive. A few responses could benefit from enriching with additional terms.
	* Not a replacement for expert-reviewed architectural validation—use as a support tool, not the final authority.

	---

	## License

	* Base model license: LFM Open License v1.0 ([Hugging Face][6])
	* Dataset license: (Insert dataset license if known)

	---

	## Author

	Mohamed Yasser – [Hugging Face profile](https://huggingface.co/yasserrmd)