Pseudo2Code / README.md
asadsandhu's picture
Updated.
a0ce0ae

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: Pseudo2Code
emoji: πŸ‘€
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert pseudocode to C++ using a Transformer model.

πŸš€ Pseudo2Code – Transformer-based Pseudocode to C++ Converter

License: MIT Python Hugging Face GitHub Repo

A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.


πŸ–ΌοΈ Demo

Try it live on Hugging Face Spaces:
πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Pseudo2Code

App Demo


🧠 Model Architecture

  • Developed using the Transformer architecture from scratch in PyTorch
  • No pre-trained models (pure from-scratch implementation)
  • Token-level sequence generation using greedy decoding
  • Custom vocabulary construction for both pseudocode and C++ output

Input:   Pseudocode lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  C++ code line for each pseudocode line

πŸ“Š Dataset

We used the SPoC dataset from Stanford:

  • βœ… Clean pseudocode–C++ line pairs
  • βœ… Token-level annotations for syntax handling
  • βœ… Multiple test splits (generalization to problems/workers)
  • βœ… Custom preprocessing and vocabulary building implemented

πŸ“Ž Licensed under CC BY 4.0


πŸ“ Directory Structure


.
β”œβ”€β”€ app.py                # Gradio web app for inference
β”œβ”€β”€ train.py              # Transformer training code
β”œβ”€β”€ model.pth             # Trained model weights
β”œβ”€β”€ spoc/                 # Dataset directory
β”‚   └── train/
β”‚       β”œβ”€β”€ spoc-train.tsv
β”‚       └── split/spoc-train-eval.tsv
β”œβ”€β”€ assets/
β”‚   └── demo.png          # App screenshot
└── README.md             # You're here

πŸ› οΈ How to Run Locally

βš™οΈ 1. Clone Repo & Install Requirements

git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt

Or manually install:

pip install torch gradio tqdm

πŸš€ 2. Launch the App

Make sure model.pth is present (or train using train.py):

python app.py

The app will open in your browser.


πŸ§ͺ Training the Model

You can retrain the model using the train.py script:

python train.py

By default, it downloads data from the public repo and trains for 10 epochs. Outputs a model.pth file with learned weights and vocab.


πŸ”§ Key Hyperparameters

Parameter Value
Model Type Transformer
Max Length 128
Embedding Dim 256
FFN Dim 512
Heads 4
Encoder Layers 2
Decoder Layers 2
Batch Size 64
Epochs 10
Optimizer Adam
Learning Rate 1e-4

🧩 Example Input

n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o

⏩ Output C++

int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}

πŸ“¦ Deployment

This app is deployed live on:


πŸ™Œ Acknowledgements


πŸ§‘β€πŸ’» Author

Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali


πŸ“„ License

This project is licensed under the MIT License. Feel free to use, modify, and share with credit.