File size: 4,772 Bytes
6a450b3
 
 
 
 
 
 
 
 
 
 
 
 
def1c2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: Code2Pseudo
emoji: 🏒
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert C++ to Pseudocode using a Transformer Model.
---

# πŸ”„ Code2Pseudo – Transformer-based C++ to Pseudocode Converter

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
[![Hugging Face](https://img.shields.io/badge/HuggingFace-Spaces-orange)](https://huggingface.co/spaces/asadsandhu/Code2Pseudo)
[![GitHub Repo](https://img.shields.io/badge/GitHub-asadsandhu/Code2Pseudo-black?logo=github)](https://github.com/asadsandhu/Code2Pseudo)

> A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert executable C++ code into high-level pseudocode. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford.

---

## πŸ–ΌοΈ Demo

Try it live on **Hugging Face Spaces**:  
πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Code2Pseudo

![App Demo](assets/demo.png)

---

## 🧠 Model Architecture

- Built from scratch using the **Transformer** encoder-decoder architecture (PyTorch)
- No pre-trained libraries – 100% custom code
- Token-level sequence generation with greedy decoding
- Custom tokenization and vocabulary building for both C++ and pseudocode

```

Input:   C++ lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  Corresponding pseudocode line

```

---

## πŸ“Š Dataset

We trained on the **SPoC dataset**:

- βœ… Cleanly aligned C++ ↔ pseudocode line pairs
- βœ… High-quality syntactic coverage
- βœ… Multiple test splits available
- βœ… Custom preprocessing and token handling

> πŸ“Ž Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

---

## πŸ“ Directory Structure

```

.
β”œβ”€β”€ app.py                # Gradio web app (C++ β†’ Pseudocode)
β”œβ”€β”€ train.py              # Training script for code-to-pseudocode model
β”œβ”€β”€ model.pth             # Trained model and vocab checkpoint
β”œβ”€β”€ spoc/
β”‚   └── train/
β”‚       β”œβ”€β”€ spoc-train.tsv
β”‚       └── split/spoc-train-eval.tsv
β”œβ”€β”€ assets/
β”‚   └── demo.png          # Screenshot for README
└── README.md             # This file

````

---

## πŸ› οΈ How to Run Locally

### βš™οΈ 1. Clone the Repo

```bash
git clone https://github.com/asadsandhu/Code2Pseudo.git
cd Code2Pseudo
pip install torch gradio tqdm
````

### πŸš€ 2. Launch the Web App

Make sure `model.pth` exists (or train it first):

```bash
python app.py
```

The interface will open in your browser.

---

## πŸ§ͺ Training the Model

To retrain the transformer model:

```bash
python train.py
```

By default:

* Downloads SPoC dataset from GitHub
* Trains for 10 epochs
* Produces `model.pth` with weights and vocabulary

---

## πŸ”§ Key Hyperparameters

| Parameter      | Value       |
| -------------- | ----------- |
| Model Type     | Transformer |
| Max Length     | 128         |
| Embedding Dim  | 256         |
| FFN Dim        | 512         |
| Heads          | 4           |
| Encoder Layers | 2           |
| Decoder Layers | 2           |
| Batch Size     | 64          |
| Epochs         | 10          |
| Optimizer      | Adam        |
| Learning Rate  | 1e-4        |

---

## 🧩 Example Input

```cpp
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}
```

### ⏩ Output Pseudocode

```text
create integers n , nn , ans with ans = 0
read n
for i = 2 to n - 1 inclusive
set nn to n
while nn is 0 , set ans to nn % 12 , set ans to nn % nn , set nn to nn / i
set value of gcd to ans and n - 2
print ans / 2 / ( n - 2 ) / o
```

---

## πŸ“¦ Deployment

Live demo hosted on:

* **Hugging Face Spaces**: [Code2Pseudo](https://huggingface.co/spaces/asadsandhu/Code2Pseudo)
* **GitHub**: [github.com/asadsandhu/Code2Pseudo](https://github.com/asadsandhu/Code2Pseudo)

---

## πŸ™Œ Acknowledgements

* πŸ“˜ **SPoC Dataset** by Stanford University
  Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326)

* 🧠 Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762)

---

## πŸ§‘β€πŸ’» Author

**Asad Ali**
[GitHub: asadsandhu](https://github.com/asadsandhu)
[Hugging Face: asadsandhu](https://huggingface.co/asadsandhu)
[LinkedIn: asadxali](https://www.linkedin.com/in/asadxali)

---

## πŸ“„ License

This project is licensed under the MIT License.
Use, remix, and distribute freely with attribution.