Spaces:
Running
A newer version of the Gradio SDK is available:
5.42.0
title: Pseudo2Code
emoji: π
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert pseudocode to C++ using a Transformer model.
π Pseudo2Code β Transformer-based Pseudocode to C++ Converter
A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.
πΌοΈ Demo
Try it live on Hugging Face Spaces:
π https://huggingface.co/spaces/asadsandhu/Pseudo2Code
π§ Model Architecture
- Developed using the Transformer architecture from scratch in PyTorch
- No pre-trained models (pure from-scratch implementation)
- Token-level sequence generation using greedy decoding
- Custom vocabulary construction for both pseudocode and C++ output
Input: Pseudocode lines (line-by-line)
Model: Transformer (Encoder-Decoder)
Output: C++ code line for each pseudocode line
π Dataset
We used the SPoC dataset from Stanford:
- β Clean pseudocodeβC++ line pairs
- β Token-level annotations for syntax handling
- β Multiple test splits (generalization to problems/workers)
- β Custom preprocessing and vocabulary building implemented
π Licensed under CC BY 4.0
π Directory Structure
.
βββ app.py # Gradio web app for inference
βββ train.py # Transformer training code
βββ model.pth # Trained model weights
βββ spoc/ # Dataset directory
β βββ train/
β βββ spoc-train.tsv
β βββ split/spoc-train-eval.tsv
βββ assets/
β βββ demo.png # App screenshot
βββ README.md # You're here
π οΈ How to Run Locally
βοΈ 1. Clone Repo & Install Requirements
git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt
Or manually install:
pip install torch gradio tqdm
π 2. Launch the App
Make sure model.pth
is present (or train using train.py
):
python app.py
The app will open in your browser.
π§ͺ Training the Model
You can retrain the model using the train.py
script:
python train.py
By default, it downloads data from the public repo and trains for 10 epochs.
Outputs a model.pth
file with learned weights and vocab.
π§ Key Hyperparameters
Parameter | Value |
---|---|
Model Type | Transformer |
Max Length | 128 |
Embedding Dim | 256 |
FFN Dim | 512 |
Heads | 4 |
Encoder Layers | 2 |
Decoder Layers | 2 |
Batch Size | 64 |
Epochs | 10 |
Optimizer | Adam |
Learning Rate | 1e-4 |
π§© Example Input
n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o
β© Output C++
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}
π¦ Deployment
This app is deployed live on:
- Hugging Face Spaces: Pseudo2Code
- GitHub: github.com/asadsandhu/Pseudo2Code
π Acknowledgements
π SPoC Dataset by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). SPoC: Search-based Pseudocode to Code
π§ Transformer Paper: "Attention is All You Need"
π§βπ» Author
Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali
π License
This project is licensed under the MIT License. Feel free to use, modify, and share with credit.