File size: 3,038 Bytes
c44dd5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
license: apache-2.0
---
# Seq2Seq Transformer for Function Call Generation
This repository hosts a custom-trained Seq2Seq Transformer model designed to convert natural language queries into corresponding function call representations. The model leverages an encoder-decoder Transformer architecture built from scratch using PyTorch and supports versioning to facilitate continuous improvements and updates.
## Model Description
- **Architecture:**
A full Transformer-based encoder-decoder model with multi-head attention and feed-forward layers. The model incorporates sinusoidal positional encoding to capture sequential information.
- **Tokenization & Vocabulary:**
The model uses a custom-built vocabulary derived from training data. Special tokens include:
- `<pad>` for padding,
- `<bos>` to denote the beginning of a sequence,
- `<eos>` to denote the end of a sequence, and
- `<unk>` for unknown tokens.
- **Training:**
Trained on paired examples of natural language inputs and function call outputs using a cross-entropy loss function. The training process supports versioning, where each training run increments the model version, and each version is stored for reproducibility and comparison.
- **Inference:**
Greedy decoding is used to generate output sequences from an input sequence. Users can specify the model version to load the appropriate model for inference.
## Intended Use
This model is primarily intended for:
- Automated function call generation from natural language instructions.
- Enhancing natural language interfaces for code generation or task automation.
- Integrating into virtual assistants and chatbots to execute backend function calls.
## Limitations
- **Data Dependency:**
The model's performance relies on the quality and representativeness of the training data. Out-of-distribution inputs may yield suboptimal or erroneous outputs.
- **Decoding Strategy:**
The current greedy decoding approach may not always produce the most diverse or optimal outputs. Alternative strategies (e.g., beam search) might be explored for improved results.
- **Generalization:**
While the model works well on data similar to its training examples, its performance may degrade on substantially different domains or complex instructions.
## Training Data
The model is trained on custom datasets comprising natural language inputs paired with function call outputs. Users are encouraged to fine-tune the model on domain-specific data to maximize its utility in real-world applications.
## How to Use
1. **Loading a Specific Version:**
The system supports multiple versions. Specify the model version when performing inference to load the desired model.
2. **Inference:**
Provide an input text (e.g., "Book me a flight from London to NYC") and the model will generate the corresponding function call output.
3. **Publishing:**
The model can be published to the Hugging Face Hub with version-specific details for reproducibility and community sharing.
|