File size: 4,925 Bytes
60a4f1e
a0ce0ae
60a4f1e
 
 
 
 
 
 
 
 
 
 
a0ce0ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
---
title: Pseudo2Code
emoji: πŸ‘€
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert pseudocode to C++ using a Transformer model.
---

# πŸš€ Pseudo2Code – Transformer-based Pseudocode to C++ Converter

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
[![Hugging Face](https://img.shields.io/badge/HuggingFace-Spaces-orange)](https://huggingface.co/spaces/asadsandhu/Pseudo2Code)
[![GitHub Repo](https://img.shields.io/badge/GitHub-asadsandhu/Pseudo2Code-black?logo=github)](https://github.com/asadsandhu/Pseudo2Code)

> A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford.

---

## πŸ–ΌοΈ Demo

Try it live on **Hugging Face Spaces**:  
πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Pseudo2Code

![App Demo](assets/demo.png)

---

## 🧠 Model Architecture

- Developed using the **Transformer** architecture from scratch in PyTorch
- No pre-trained models (pure from-scratch implementation)
- Token-level sequence generation using greedy decoding
- Custom vocabulary construction for both pseudocode and C++ output

```

Input:   Pseudocode lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  C++ code line for each pseudocode line

```

---

## πŸ“Š Dataset

We used the **SPoC dataset** from Stanford:

- βœ… Clean pseudocode–C++ line pairs
- βœ… Token-level annotations for syntax handling
- βœ… Multiple test splits (generalization to problems/workers)
- βœ… Custom preprocessing and vocabulary building implemented

> πŸ“Ž Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

---

## πŸ“ Directory Structure

```

.
β”œβ”€β”€ app.py                # Gradio web app for inference
β”œβ”€β”€ train.py              # Transformer training code
β”œβ”€β”€ model.pth             # Trained model weights
β”œβ”€β”€ spoc/                 # Dataset directory
β”‚   └── train/
β”‚       β”œβ”€β”€ spoc-train.tsv
β”‚       └── split/spoc-train-eval.tsv
β”œβ”€β”€ assets/
β”‚   └── demo.png          # App screenshot
└── README.md             # You're here

````

---

## πŸ› οΈ How to Run Locally

### βš™οΈ 1. Clone Repo & Install Requirements

```bash
git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt
````

Or manually install:

```bash
pip install torch gradio tqdm
```

### πŸš€ 2. Launch the App

Make sure `model.pth` is present (or train using `train.py`):

```bash
python app.py
```

The app will open in your browser.

---

## πŸ§ͺ Training the Model

You can retrain the model using the `train.py` script:

```bash
python train.py
```

By default, it downloads data from the public repo and trains for 10 epochs.
Outputs a `model.pth` file with learned weights and vocab.

---

## πŸ”§ Key Hyperparameters

| Parameter      | Value       |
| -------------- | ----------- |
| Model Type     | Transformer |
| Max Length     | 128         |
| Embedding Dim  | 256         |
| FFN Dim        | 512         |
| Heads          | 4           |
| Encoder Layers | 2           |
| Decoder Layers | 2           |
| Batch Size     | 64          |
| Epochs         | 10          |
| Optimizer      | Adam        |
| Learning Rate  | 1e-4        |

---

## 🧩 Example Input

```text
n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o
```

### ⏩ Output C++

```cpp
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}
```

---

## πŸ“¦ Deployment

This app is deployed live on:

* **Hugging Face Spaces**: [Pseudo2Code](https://huggingface.co/spaces/asadsandhu/Pseudo2Code)
* **GitHub**: [github.com/asadsandhu/Pseudo2Code](https://github.com/asadsandhu/Pseudo2Code)

---

## πŸ™Œ Acknowledgements

* πŸ“˜ **SPoC Dataset** by Stanford University
  Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326)

* 🧠 Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762)

---

## πŸ§‘β€πŸ’» Author

**Asad Ali**
[GitHub: asadsandhu](https://github.com/asadsandhu)
[Hugging Face: asadsandhu](https://huggingface.co/asadsandhu)
[LinkedIn: asadxali](https://www.linkedin.com/in/asadxali)

---

## πŸ“„ License

This project is licensed under the MIT License.
Feel free to use, modify, and share with credit.