Spaces:
Running
A newer version of the Gradio SDK is available:
5.42.0
title: Code2Pseudo
emoji: π’
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert C++ to Pseudocode using a Transformer Model.
π Code2Pseudo β Transformer-based C++ to Pseudocode Converter
A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert executable C++ code into high-level pseudocode. Trained on the SPoC dataset from Stanford.
πΌοΈ Demo
Try it live on Hugging Face Spaces:
π https://huggingface.co/spaces/asadsandhu/Code2Pseudo
π§ Model Architecture
- Built from scratch using the Transformer encoder-decoder architecture (PyTorch)
- No pre-trained libraries β 100% custom code
- Token-level sequence generation with greedy decoding
- Custom tokenization and vocabulary building for both C++ and pseudocode
Input: C++ lines (line-by-line)
Model: Transformer (Encoder-Decoder)
Output: Corresponding pseudocode line
π Dataset
We trained on the SPoC dataset:
- β Cleanly aligned C++ β pseudocode line pairs
- β High-quality syntactic coverage
- β Multiple test splits available
- β Custom preprocessing and token handling
π Licensed under CC BY 4.0
π Directory Structure
.
βββ app.py # Gradio web app (C++ β Pseudocode)
βββ train.py # Training script for code-to-pseudocode model
βββ model.pth # Trained model and vocab checkpoint
βββ spoc/
β βββ train/
β βββ spoc-train.tsv
β βββ split/spoc-train-eval.tsv
βββ assets/
β βββ demo.png # Screenshot for README
βββ README.md # This file
π οΈ How to Run Locally
βοΈ 1. Clone the Repo
git clone https://github.com/asadsandhu/Code2Pseudo.git
cd Code2Pseudo
pip install torch gradio tqdm
π 2. Launch the Web App
Make sure model.pth
exists (or train it first):
python app.py
The interface will open in your browser.
π§ͺ Training the Model
To retrain the transformer model:
python train.py
By default:
- Downloads SPoC dataset from GitHub
- Trains for 10 epochs
- Produces
model.pth
with weights and vocabulary
π§ Key Hyperparameters
Parameter | Value |
---|---|
Model Type | Transformer |
Max Length | 128 |
Embedding Dim | 256 |
FFN Dim | 512 |
Heads | 4 |
Encoder Layers | 2 |
Decoder Layers | 2 |
Batch Size | 64 |
Epochs | 10 |
Optimizer | Adam |
Learning Rate | 1e-4 |
π§© Example Input
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}
β© Output Pseudocode
create integers n , nn , ans with ans = 0
read n
for i = 2 to n - 1 inclusive
set nn to n
while nn is 0 , set ans to nn % 12 , set ans to nn % nn , set nn to nn / i
set value of gcd to ans and n - 2
print ans / 2 / ( n - 2 ) / o
π¦ Deployment
Live demo hosted on:
- Hugging Face Spaces: Code2Pseudo
- GitHub: github.com/asadsandhu/Code2Pseudo
π Acknowledgements
π SPoC Dataset by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). SPoC: Search-based Pseudocode to Code
π§ Transformer Paper: "Attention is All You Need"
π§βπ» Author
Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali
π License
This project is licensed under the MIT License. Use, remix, and distribute freely with attribution.