File size: 3,038 Bytes
c44dd5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: apache-2.0
---

# Seq2Seq Transformer for Function Call Generation

This repository hosts a custom-trained Seq2Seq Transformer model designed to convert natural language queries into corresponding function call representations. The model leverages an encoder-decoder Transformer architecture built from scratch using PyTorch and supports versioning to facilitate continuous improvements and updates.

## Model Description

- **Architecture:**  
  A full Transformer-based encoder-decoder model with multi-head attention and feed-forward layers. The model incorporates sinusoidal positional encoding to capture sequential information.

- **Tokenization & Vocabulary:**  
  The model uses a custom-built vocabulary derived from training data. Special tokens include:
  - `<pad>` for padding,
  - `<bos>` to denote the beginning of a sequence,
  - `<eos>` to denote the end of a sequence, and
  - `<unk>` for unknown tokens.

- **Training:**  
  Trained on paired examples of natural language inputs and function call outputs using a cross-entropy loss function. The training process supports versioning, where each training run increments the model version, and each version is stored for reproducibility and comparison.

- **Inference:**  
  Greedy decoding is used to generate output sequences from an input sequence. Users can specify the model version to load the appropriate model for inference.

## Intended Use

This model is primarily intended for:
- Automated function call generation from natural language instructions.
- Enhancing natural language interfaces for code generation or task automation.
- Integrating into virtual assistants and chatbots to execute backend function calls.

## Limitations

- **Data Dependency:**  
  The model's performance relies on the quality and representativeness of the training data. Out-of-distribution inputs may yield suboptimal or erroneous outputs.

- **Decoding Strategy:**  
  The current greedy decoding approach may not always produce the most diverse or optimal outputs. Alternative strategies (e.g., beam search) might be explored for improved results.

- **Generalization:**  
  While the model works well on data similar to its training examples, its performance may degrade on substantially different domains or complex instructions.

## Training Data

The model is trained on custom datasets comprising natural language inputs paired with function call outputs. Users are encouraged to fine-tune the model on domain-specific data to maximize its utility in real-world applications.

## How to Use

1. **Loading a Specific Version:**  
   The system supports multiple versions. Specify the model version when performing inference to load the desired model.

2. **Inference:**  
   Provide an input text (e.g., "Book me a flight from London to NYC") and the model will generate the corresponding function call output.

3. **Publishing:**  
   The model can be published to the Hugging Face Hub with version-specific details for reproducibility and community sharing.