Update README.md
Browse files
README.md
CHANGED
@@ -31,16 +31,46 @@ This model is designed for scalable training, long-context understanding, and ef
|
|
31 |
|
32 |
## π Project Structure
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
βββ
|
37 |
-
|
38 |
-
|
39 |
-
βββ
|
40 |
-
βββ
|
41 |
-
βββ
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
|
|
44 |
|
45 |
---
|
46 |
|
|
|
31 |
|
32 |
## π Project Structure
|
33 |
|
34 |
+
```bash
|
35 |
+
MEM_TRANSFORMER/
|
36 |
+
βββ configs/
|
37 |
+
β βββ config.json # Model + training hyperparameters
|
38 |
+
β
|
39 |
+
βββ data/
|
40 |
+
β βββ edu_fineweb/ # Token-sharded training data
|
41 |
+
β β βββ train_000001.npy
|
42 |
+
β β βββ train_000002.npy
|
43 |
+
β β βββ test_000001.npy
|
44 |
+
β βββ hellaswag/
|
45 |
+
β β βββ hellaswag_val.jsonl
|
46 |
+
β βββ fineweb.py # Sharding logic with memory-aligned sequence control
|
47 |
+
β
|
48 |
+
βββ model_core/
|
49 |
+
β βββ __init__.py
|
50 |
+
β βββ attention.py # Grouped Query Attention, KNN & XL attention logic.Rotary Positional Encoding implementation
|
51 |
+
β βββ model.py # Transformer model with memory and RoPE support
|
52 |
+
β βββ dataloader.py # Memory-aware DataLoader
|
53 |
+
β βββ training.py # train_memgpt function
|
54 |
+
β
|
55 |
+
βββ scripts/
|
56 |
+
β βββ train.py # Training script (DDP-compatible)
|
57 |
+
β βββ evaluate.py # Evaluation on benchmarks
|
58 |
+
β βββ generate.py # Text generation from trained model
|
59 |
+
β
|
60 |
+
βββ evaluation/
|
61 |
+
β βββ __init__.py
|
62 |
+
β βββ hellaswag.py # HellaSwag data loader
|
63 |
+
β βββ val_hellaswag.py # Evaluation logic with loss-based scoring
|
64 |
+
β
|
65 |
+
βββ logs/
|
66 |
+
β βββ log.txt # Training logs
|
67 |
+
β βββ model_*.pt # Checkpoints
|
68 |
+
β
|
69 |
+
βββ .gitignore
|
70 |
+
βββ README.md
|
71 |
+
βββ requirements.txt
|
72 |
|
73 |
+
```
|
74 |
|
75 |
---
|
76 |
|