Spaces:
Running
Running
title: Code2Pseudo | |
emoji: π’ | |
colorFrom: blue | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 5.35.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: Convert C++ to Pseudocode using a Transformer Model. | |
# π Code2Pseudo β Transformer-based C++ to Pseudocode Converter | |
[](LICENSE) | |
[](https://www.python.org/) | |
[](https://huggingface.co/spaces/asadsandhu/Code2Pseudo) | |
[](https://github.com/asadsandhu/Code2Pseudo) | |
> A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert executable C++ code into high-level pseudocode. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford. | |
--- | |
## πΌοΈ Demo | |
Try it live on **Hugging Face Spaces**: | |
π https://huggingface.co/spaces/asadsandhu/Code2Pseudo | |
 | |
--- | |
## π§ Model Architecture | |
- Built from scratch using the **Transformer** encoder-decoder architecture (PyTorch) | |
- No pre-trained libraries β 100% custom code | |
- Token-level sequence generation with greedy decoding | |
- Custom tokenization and vocabulary building for both C++ and pseudocode | |
``` | |
Input: C++ lines (line-by-line) | |
Model: Transformer (Encoder-Decoder) | |
Output: Corresponding pseudocode line | |
``` | |
--- | |
## π Dataset | |
We trained on the **SPoC dataset**: | |
- β Cleanly aligned C++ β pseudocode line pairs | |
- β High-quality syntactic coverage | |
- β Multiple test splits available | |
- β Custom preprocessing and token handling | |
> π Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | |
--- | |
## π Directory Structure | |
``` | |
. | |
βββ app.py # Gradio web app (C++ β Pseudocode) | |
βββ train.py # Training script for code-to-pseudocode model | |
βββ model.pth # Trained model and vocab checkpoint | |
βββ spoc/ | |
β βββ train/ | |
β βββ spoc-train.tsv | |
β βββ split/spoc-train-eval.tsv | |
βββ assets/ | |
β βββ demo.png # Screenshot for README | |
βββ README.md # This file | |
```` | |
--- | |
## π οΈ How to Run Locally | |
### βοΈ 1. Clone the Repo | |
```bash | |
git clone https://github.com/asadsandhu/Code2Pseudo.git | |
cd Code2Pseudo | |
pip install torch gradio tqdm | |
```` | |
### π 2. Launch the Web App | |
Make sure `model.pth` exists (or train it first): | |
```bash | |
python app.py | |
``` | |
The interface will open in your browser. | |
--- | |
## π§ͺ Training the Model | |
To retrain the transformer model: | |
```bash | |
python train.py | |
``` | |
By default: | |
* Downloads SPoC dataset from GitHub | |
* Trains for 10 epochs | |
* Produces `model.pth` with weights and vocabulary | |
--- | |
## π§ Key Hyperparameters | |
| Parameter | Value | | |
| -------------- | ----------- | | |
| Model Type | Transformer | | |
| Max Length | 128 | | |
| Embedding Dim | 256 | | |
| FFN Dim | 512 | | |
| Heads | 4 | | |
| Encoder Layers | 2 | | |
| Decoder Layers | 2 | | |
| Batch Size | 64 | | |
| Epochs | 10 | | |
| Optimizer | Adam | | |
| Learning Rate | 1e-4 | | |
--- | |
## π§© Example Input | |
```cpp | |
int main() { | |
int n , nn , ans = 0 ; | |
cin > > n ; | |
for ( int i = 2 ; i < = n - 1 ; i + + ) { | |
nn = n ; | |
while ( nn = = 0 ) ans + = nn % i , nn / = i ; | |
} | |
o = gcd ( ans , n - 2 ) ; | |
cout < < ans / 2 / o ( n - 2 ) / o < < endl ; | |
return 0; | |
} | |
``` | |
### β© Output Pseudocode | |
```text | |
create integers n , nn , ans with ans = 0 | |
read n | |
for i = 2 to n - 1 inclusive | |
set nn to n | |
while nn is 0 , set ans to nn % 12 , set ans to nn % nn , set nn to nn / i | |
set value of gcd to ans and n - 2 | |
print ans / 2 / ( n - 2 ) / o | |
``` | |
--- | |
## π¦ Deployment | |
Live demo hosted on: | |
* **Hugging Face Spaces**: [Code2Pseudo](https://huggingface.co/spaces/asadsandhu/Code2Pseudo) | |
* **GitHub**: [github.com/asadsandhu/Code2Pseudo](https://github.com/asadsandhu/Code2Pseudo) | |
--- | |
## π Acknowledgements | |
* π **SPoC Dataset** by Stanford University | |
Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326) | |
* π§ Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) | |
--- | |
## π§βπ» Author | |
**Asad Ali** | |
[GitHub: asadsandhu](https://github.com/asadsandhu) | |
[Hugging Face: asadsandhu](https://huggingface.co/asadsandhu) | |
[LinkedIn: asadxali](https://www.linkedin.com/in/asadxali) | |
--- | |
## π License | |
This project is licensed under the MIT License. | |
Use, remix, and distribute freely with attribution. |