Spaces:
Running
Running
File size: 4,925 Bytes
60a4f1e a0ce0ae 60a4f1e a0ce0ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
---
title: Pseudo2Code
emoji: π
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: mit
short_description: Convert pseudocode to C++ using a Transformer model.
---
# π Pseudo2Code β Transformer-based Pseudocode to C++ Converter
[](LICENSE)
[](https://www.python.org/)
[](https://huggingface.co/spaces/asadsandhu/Pseudo2Code)
[](https://github.com/asadsandhu/Pseudo2Code)
> A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford.
---
## πΌοΈ Demo
Try it live on **Hugging Face Spaces**:
π https://huggingface.co/spaces/asadsandhu/Pseudo2Code

---
## π§ Model Architecture
- Developed using the **Transformer** architecture from scratch in PyTorch
- No pre-trained models (pure from-scratch implementation)
- Token-level sequence generation using greedy decoding
- Custom vocabulary construction for both pseudocode and C++ output
```
Input: Pseudocode lines (line-by-line)
Model: Transformer (Encoder-Decoder)
Output: C++ code line for each pseudocode line
```
---
## π Dataset
We used the **SPoC dataset** from Stanford:
- β
Clean pseudocodeβC++ line pairs
- β
Token-level annotations for syntax handling
- β
Multiple test splits (generalization to problems/workers)
- β
Custom preprocessing and vocabulary building implemented
> π Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
---
## π Directory Structure
```
.
βββ app.py # Gradio web app for inference
βββ train.py # Transformer training code
βββ model.pth # Trained model weights
βββ spoc/ # Dataset directory
β βββ train/
β βββ spoc-train.tsv
β βββ split/spoc-train-eval.tsv
βββ assets/
β βββ demo.png # App screenshot
βββ README.md # You're here
````
---
## π οΈ How to Run Locally
### βοΈ 1. Clone Repo & Install Requirements
```bash
git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt
````
Or manually install:
```bash
pip install torch gradio tqdm
```
### π 2. Launch the App
Make sure `model.pth` is present (or train using `train.py`):
```bash
python app.py
```
The app will open in your browser.
---
## π§ͺ Training the Model
You can retrain the model using the `train.py` script:
```bash
python train.py
```
By default, it downloads data from the public repo and trains for 10 epochs.
Outputs a `model.pth` file with learned weights and vocab.
---
## π§ Key Hyperparameters
| Parameter | Value |
| -------------- | ----------- |
| Model Type | Transformer |
| Max Length | 128 |
| Embedding Dim | 256 |
| FFN Dim | 512 |
| Heads | 4 |
| Encoder Layers | 2 |
| Decoder Layers | 2 |
| Batch Size | 64 |
| Epochs | 10 |
| Optimizer | Adam |
| Learning Rate | 1e-4 |
---
## π§© Example Input
```text
n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o
```
### β© Output C++
```cpp
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}
```
---
## π¦ Deployment
This app is deployed live on:
* **Hugging Face Spaces**: [Pseudo2Code](https://huggingface.co/spaces/asadsandhu/Pseudo2Code)
* **GitHub**: [github.com/asadsandhu/Pseudo2Code](https://github.com/asadsandhu/Pseudo2Code)
---
## π Acknowledgements
* π **SPoC Dataset** by Stanford University
Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326)
* π§ Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762)
---
## π§βπ» Author
**Asad Ali**
[GitHub: asadsandhu](https://github.com/asadsandhu)
[Hugging Face: asadsandhu](https://huggingface.co/asadsandhu)
[LinkedIn: asadxali](https://www.linkedin.com/in/asadxali)
---
## π License
This project is licensed under the MIT License.
Feel free to use, modify, and share with credit.
|