|
--- |
|
license: apache-2.0 |
|
base_model: Qwen/Qwen2.5-Coder-7B-Instruct |
|
tags: |
|
- generated_from_trainer |
|
- x3d |
|
- 3d-generation |
|
- lora |
|
- code-generation |
|
datasets: |
|
- stratplans/savage-x3d-generation |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# X3D Generation Model - Qwen2.5-Coder-7B LoRA |
|
|
|
This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for generating X3D (Extensible 3D) scene descriptions from natural language prompts. |
|
|
|
## Model Description |
|
|
|
This model generates syntactically valid and semantically meaningful X3D scene descriptions from natural language prompts. X3D is an ISO-standard XML-based format for representing 3D graphics, widely used in simulation, scientific visualization, and web-based 3D applications. |
|
|
|
### Key Features |
|
- Generates valid X3D XML code from natural language descriptions |
|
- Trained on 19,712 instruction-response pairs derived from the Naval Postgraduate School Savage X3D Archive |
|
- Uses LoRA (Low-Rank Adaptation) for efficient fine-tuning |
|
- 4-bit quantization compatible for reduced memory usage |
|
|
|
## Training Details |
|
|
|
### Dataset |
|
- **Source**: Naval Postgraduate School (NPS) Savage X3D Archive |
|
- **Base models**: 1,232 unique X3D files |
|
- **Augmented dataset**: 19,712 instruction-response pairs |
|
- **Categories**: Military equipment, vehicles, buildings, terrain, humanoids, and abstract geometries |
|
|
|
### Model Architecture |
|
- **Base Model**: Qwen2.5-Coder-7B-Instruct (7.7B parameters) |
|
- **Fine-tuning Method**: LoRA with 4-bit quantization |
|
- **LoRA Configuration**: |
|
- Rank: 32 |
|
- Alpha: 64 |
|
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
- Trainable parameters: 80.7M (1.05% of total) |
|
|
|
### Training Configuration |
|
- **Hardware**: 5x NVIDIA RTX 4090 GPUs (24GB VRAM each) |
|
- **Training time**: 11.5 hours |
|
- **Epochs**: 3 |
|
- **Effective batch size**: 80 |
|
- **Learning rate**: 2e-4 with cosine decay |
|
- **Final training loss**: 0.0086 |
|
- **Final validation loss**: 0.0112 |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers peft accelerate bitsandbytes |
|
``` |
|
|
|
### Loading the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
import torch |
|
|
|
# Load base model with 4-bit quantization |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"Qwen/Qwen2.5-Coder-7B-Instruct", |
|
load_in_4bit=True, |
|
device_map="auto", |
|
trust_remote_code=True |
|
) |
|
|
|
# Load LoRA adapter |
|
model = PeftModel.from_pretrained(base_model, "stratplans/x3d-qwen2.5-coder-7b-lora") |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("stratplans/x3d-qwen2.5-coder-7b-lora") |
|
|
|
# Generate X3D |
|
prompt = """<|im_start|>system |
|
You are an X3D 3D model generator. Generate valid X3D XML code based on the user's description. |
|
<|im_end|> |
|
<|im_start|>user |
|
Create an X3D model of a red sphere with radius 2 units |
|
<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
outputs = model.generate(**inputs, max_length=2048, temperature=0.7) |
|
x3d_code = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(x3d_code) |
|
``` |
|
|
|
### Example Prompts |
|
|
|
1. "Create an X3D model of a blue cube with metallic surface" |
|
2. "Generate an X3D scene with a rotating pyramid" |
|
3. "Build an X3D model of a simple robot with movable joints" |
|
4. "Design an X3D terrain with hills and valleys" |
|
|
|
## Performance |
|
|
|
- **Generation speed**: ~50 tokens/second on single RTX 4090 |
|
- **Memory requirement**: 8GB VRAM for inference with 4-bit quantization |
|
- **Validity rate**: Estimated 85% syntactically valid X3D on first generation |
|
- **Semantic accuracy**: Follows input specifications in 70% of test cases |
|
|
|
## Limitations |
|
|
|
1. Maximum context length limited to 2048 tokens during training |
|
2. Complex scenes may require multiple generation attempts |
|
3. Animation and interaction features have limited support |
|
4. Best performance on object types similar to training data |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@misc{x3d-qwen-2024, |
|
title={X3D Generation with Fine-tuned Qwen2.5-Coder}, |
|
author={stratplans}, |
|
year={2024}, |
|
publisher={HuggingFace} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model inherits the Apache 2.0 license from the base Qwen2.5-Coder model. |
|
|
|
## Acknowledgments |
|
|
|
- Naval Postgraduate School for the Savage X3D Archive |
|
- Qwen team for the base model |
|
- The X3D and Web3D Consortium community |