Pseudo2Code / README.md
asadsandhu's picture
Updated.
a0ce0ae
---
title: Pseudo2Code
emoji: πŸ‘€
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert pseudocode to C++ using a Transformer model.
---
# πŸš€ Pseudo2Code – Transformer-based Pseudocode to C++ Converter
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
[![Hugging Face](https://img.shields.io/badge/HuggingFace-Spaces-orange)](https://huggingface.co/spaces/asadsandhu/Pseudo2Code)
[![GitHub Repo](https://img.shields.io/badge/GitHub-asadsandhu/Pseudo2Code-black?logo=github)](https://github.com/asadsandhu/Pseudo2Code)
> A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford.
---
## πŸ–ΌοΈ Demo
Try it live on **Hugging Face Spaces**:
πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Pseudo2Code
![App Demo](assets/demo.png)
---
## 🧠 Model Architecture
- Developed using the **Transformer** architecture from scratch in PyTorch
- No pre-trained models (pure from-scratch implementation)
- Token-level sequence generation using greedy decoding
- Custom vocabulary construction for both pseudocode and C++ output
```
Input: Pseudocode lines (line-by-line)
Model: Transformer (Encoder-Decoder)
Output: C++ code line for each pseudocode line
```
---
## πŸ“Š Dataset
We used the **SPoC dataset** from Stanford:
- βœ… Clean pseudocode–C++ line pairs
- βœ… Token-level annotations for syntax handling
- βœ… Multiple test splits (generalization to problems/workers)
- βœ… Custom preprocessing and vocabulary building implemented
> πŸ“Ž Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
---
## πŸ“ Directory Structure
```
.
β”œβ”€β”€ app.py # Gradio web app for inference
β”œβ”€β”€ train.py # Transformer training code
β”œβ”€β”€ model.pth # Trained model weights
β”œβ”€β”€ spoc/ # Dataset directory
β”‚ └── train/
β”‚ β”œβ”€β”€ spoc-train.tsv
β”‚ └── split/spoc-train-eval.tsv
β”œβ”€β”€ assets/
β”‚ └── demo.png # App screenshot
└── README.md # You're here
````
---
## πŸ› οΈ How to Run Locally
### βš™οΈ 1. Clone Repo & Install Requirements
```bash
git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt
````
Or manually install:
```bash
pip install torch gradio tqdm
```
### πŸš€ 2. Launch the App
Make sure `model.pth` is present (or train using `train.py`):
```bash
python app.py
```
The app will open in your browser.
---
## πŸ§ͺ Training the Model
You can retrain the model using the `train.py` script:
```bash
python train.py
```
By default, it downloads data from the public repo and trains for 10 epochs.
Outputs a `model.pth` file with learned weights and vocab.
---
## πŸ”§ Key Hyperparameters
| Parameter | Value |
| -------------- | ----------- |
| Model Type | Transformer |
| Max Length | 128 |
| Embedding Dim | 256 |
| FFN Dim | 512 |
| Heads | 4 |
| Encoder Layers | 2 |
| Decoder Layers | 2 |
| Batch Size | 64 |
| Epochs | 10 |
| Optimizer | Adam |
| Learning Rate | 1e-4 |
---
## 🧩 Example Input
```text
n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o
```
### ⏩ Output C++
```cpp
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}
```
---
## πŸ“¦ Deployment
This app is deployed live on:
* **Hugging Face Spaces**: [Pseudo2Code](https://huggingface.co/spaces/asadsandhu/Pseudo2Code)
* **GitHub**: [github.com/asadsandhu/Pseudo2Code](https://github.com/asadsandhu/Pseudo2Code)
---
## πŸ™Œ Acknowledgements
* πŸ“˜ **SPoC Dataset** by Stanford University
Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326)
* 🧠 Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762)
---
## πŸ§‘β€πŸ’» Author
**Asad Ali**
[GitHub: asadsandhu](https://github.com/asadsandhu)
[Hugging Face: asadsandhu](https://huggingface.co/asadsandhu)
[LinkedIn: asadxali](https://www.linkedin.com/in/asadxali)
---
## πŸ“„ License
This project is licensed under the MIT License.
Feel free to use, modify, and share with credit.