prithivMLmods commited on
Commit
6414a7e
·
verified ·
1 Parent(s): 2f8b4af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -3
README.md CHANGED
@@ -1,3 +1,109 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - prithivMLmods/Qwen3-0.6B-ft-bf16
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - text-generation-inference
11
+ - code
12
+ - moe
13
+ ---
14
+
15
+ # **Theta-Crucis-0.6B-Turbo1**
16
+
17
+ > **Theta-Crucis-0.6B-Turbo1** is a compact, high-performance model designed for **code generation**, **technical reasoning**, and **structured output tasks**. Fine-tuned from **Qwen3-0.6B** using the **Mixture of Thoughts (MoT)** dataset with an emphasis on **code expert clusters**, this model delivers agile and accurate coding assistance in low-resource environments. At only **0.6B parameters**, it offers strong fluency in programming, structured syntax, and technical language generation.
18
+
19
+ > \[!note]
20
+ > GGUF: [https://huggingface.co/prithivMLmods/Theta-Crucis-0.6B-Turbo1-GGUF](https://huggingface.co/prithivMLmods/Theta-Crucis-0.6B-Turbo1-GGUF)
21
+
22
+ ---
23
+
24
+ ## **Key Features**
25
+
26
+ 1. **MoT Fine-Tuning on Code Expert Clusters**
27
+ Leveraging the **Mixture of Thoughts (MoT)** dataset, this model is fine-tuned on high-quality programming data across languages, debugging patterns, and code reasoning structures.
28
+
29
+ 2. **Turbo Code Generation & Debugging**
30
+ Excels at generating well-structured, clean code in Python, JavaScript, C++, and more. Capable of explaining logic, identifying bugs, and suggesting improvements.
31
+
32
+ 3. **Structured Output Capabilities**
33
+ Supports outputs in **Markdown**, **JSON**, **YAML**, and **LaTeX**, making it ideal for auto-documentation, API formatting, and configuration file generation.
34
+
35
+ 4. **Technical Fluency Across Languages**
36
+ Handles code queries and explanations in over **20 languages**, enabling global developer support and multilingual documentation.
37
+
38
+ 5. **Lightweight, Inference-Optimized Design**
39
+ Suitable for deployment on **edge devices**, **laptops**, or **VRAM-limited GPUs**, with fast inference and strong accuracy in technical prompts.
40
+
41
+ ---
42
+
43
+ ## **Quickstart with Transformers**
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+
48
+ model_name = "prithivMLmods/Theta-Crucis-0.6B-Turbo1"
49
+
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ model_name,
52
+ torch_dtype="auto",
53
+ device_map="auto"
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
56
+
57
+ prompt = "Write a Python function that checks if a string is a palindrome. Explain each step."
58
+
59
+ messages = [
60
+ {"role": "system", "content": "You are an expert code assistant."},
61
+ {"role": "user", "content": prompt}
62
+ ]
63
+
64
+ text = tokenizer.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+
70
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
71
+
72
+ generated_ids = model.generate(
73
+ **model_inputs,
74
+ max_new_tokens=512
75
+ )
76
+ generated_ids = [
77
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
78
+ ]
79
+
80
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
81
+ print(response)
82
+ ```
83
+
84
+ ---
85
+
86
+ ## **Intended Use**
87
+
88
+ * Programming education, code synthesis, and debugging support
89
+ * Structured data and config file generation (e.g., JSON, YAML)
90
+ * Developer assistant roles in multilingual and technical environments
91
+ * Deployment on constrained devices with high code output needs
92
+ * Fast prototyping and script generation across languages
93
+
94
+ ---
95
+
96
+ ## **Limitations**
97
+
98
+ * May underperform in long conversational or abstract language tasks
99
+ * Context length limitations can restrict multi-file or large project reasoning
100
+ * Not designed for creative writing or open-ended dialogue
101
+ * Focuses on technical and structured domains—general fluency is limited
102
+
103
+ ---
104
+
105
+ ## **References**
106
+
107
+ 1. [Qwen2.5 Technical Report (2024)](https://arxiv.org/pdf/2412.15115)
108
+ 2. [YaRN: Efficient Context Window Extension of Large Language Models](https://arxiv.org/pdf/2309.00071)
109
+ 3. [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts)