Spaces:
Running
Running
title: Pseudo2Code | |
emoji: π | |
colorFrom: yellow | |
colorTo: gray | |
sdk: gradio | |
sdk_version: 5.35.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: Convert pseudocode to C++ using a Transformer model. | |
# π Pseudo2Code β Transformer-based Pseudocode to C++ Converter | |
[](LICENSE) | |
[](https://www.python.org/) | |
[](https://huggingface.co/spaces/asadsandhu/Pseudo2Code) | |
[](https://github.com/asadsandhu/Pseudo2Code) | |
> A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford. | |
--- | |
## πΌοΈ Demo | |
Try it live on **Hugging Face Spaces**: | |
π https://huggingface.co/spaces/asadsandhu/Pseudo2Code | |
 | |
--- | |
## π§ Model Architecture | |
- Developed using the **Transformer** architecture from scratch in PyTorch | |
- No pre-trained models (pure from-scratch implementation) | |
- Token-level sequence generation using greedy decoding | |
- Custom vocabulary construction for both pseudocode and C++ output | |
``` | |
Input: Pseudocode lines (line-by-line) | |
Model: Transformer (Encoder-Decoder) | |
Output: C++ code line for each pseudocode line | |
``` | |
--- | |
## π Dataset | |
We used the **SPoC dataset** from Stanford: | |
- β Clean pseudocodeβC++ line pairs | |
- β Token-level annotations for syntax handling | |
- β Multiple test splits (generalization to problems/workers) | |
- β Custom preprocessing and vocabulary building implemented | |
> π Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | |
--- | |
## π Directory Structure | |
``` | |
. | |
βββ app.py # Gradio web app for inference | |
βββ train.py # Transformer training code | |
βββ model.pth # Trained model weights | |
βββ spoc/ # Dataset directory | |
β βββ train/ | |
β βββ spoc-train.tsv | |
β βββ split/spoc-train-eval.tsv | |
βββ assets/ | |
β βββ demo.png # App screenshot | |
βββ README.md # You're here | |
```` | |
--- | |
## π οΈ How to Run Locally | |
### βοΈ 1. Clone Repo & Install Requirements | |
```bash | |
git clone https://github.com/asadsandhu/Pseudo2Code.git | |
cd Pseudo2Code | |
pip install -r requirements.txt | |
```` | |
Or manually install: | |
```bash | |
pip install torch gradio tqdm | |
``` | |
### π 2. Launch the App | |
Make sure `model.pth` is present (or train using `train.py`): | |
```bash | |
python app.py | |
``` | |
The app will open in your browser. | |
--- | |
## π§ͺ Training the Model | |
You can retrain the model using the `train.py` script: | |
```bash | |
python train.py | |
``` | |
By default, it downloads data from the public repo and trains for 10 epochs. | |
Outputs a `model.pth` file with learned weights and vocab. | |
--- | |
## π§ Key Hyperparameters | |
| Parameter | Value | | |
| -------------- | ----------- | | |
| Model Type | Transformer | | |
| Max Length | 128 | | |
| Embedding Dim | 256 | | |
| FFN Dim | 512 | | |
| Heads | 4 | | |
| Encoder Layers | 2 | | |
| Decoder Layers | 2 | | |
| Batch Size | 64 | | |
| Epochs | 10 | | |
| Optimizer | Adam | | |
| Learning Rate | 1e-4 | | |
--- | |
## π§© Example Input | |
```text | |
n , nn, ans = integers with ans =0 | |
Read n | |
for i=2 to n-1 execute | |
set nn to n | |
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i | |
} | |
set o to gcd(ans, n-2) | |
print out ans/o "/" (n-2)/o | |
``` | |
### β© Output C++ | |
```cpp | |
int main() { | |
int n , nn , ans = 0 ; | |
cin > > n ; | |
for ( int i = 2 ; i < = n - 1 ; i + + ) { | |
nn = n ; | |
while ( nn = = 0 ) ans + = nn % i , nn / = i ; | |
} | |
o = gcd ( ans , n - 2 ) ; | |
cout < < ans / 2 / o ( n - 2 ) / o < < endl ; | |
return 0; | |
} | |
``` | |
--- | |
## π¦ Deployment | |
This app is deployed live on: | |
* **Hugging Face Spaces**: [Pseudo2Code](https://huggingface.co/spaces/asadsandhu/Pseudo2Code) | |
* **GitHub**: [github.com/asadsandhu/Pseudo2Code](https://github.com/asadsandhu/Pseudo2Code) | |
--- | |
## π Acknowledgements | |
* π **SPoC Dataset** by Stanford University | |
Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326) | |
* π§ Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) | |
--- | |
## π§βπ» Author | |
**Asad Ali** | |
[GitHub: asadsandhu](https://github.com/asadsandhu) | |
[Hugging Face: asadsandhu](https://huggingface.co/asadsandhu) | |
[LinkedIn: asadxali](https://www.linkedin.com/in/asadxali) | |
--- | |
## π License | |
This project is licensed under the MIT License. | |
Feel free to use, modify, and share with credit. | |