Spaces:
				
			
			
	
			
			
		Build error
		
	
	
	
			
			
	
	
	
	
		
		
		Build error
		
	Commit 
							
							·
						
						84677f5
	
1
								Parent(s):
							
							904d1fd
								
update readme on env var usage
Browse files
    	
        README.md
    CHANGED
    
    | @@ -28,13 +28,12 @@ hf_oauth_scopes: | |
| 28 |  | 
| 29 | 
             
            ## Introduction
         | 
| 30 |  | 
| 31 | 
            -
            Synthetic Data Generator is a tool that allows you to create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and LLMs to generate synthetic data tailored to your specific needs.
         | 
| 32 |  | 
| 33 | 
             
            Supported Tasks:
         | 
| 34 |  | 
| 35 | 
             
            - Text Classification
         | 
| 36 | 
            -
            - Supervised Fine-Tuning
         | 
| 37 | 
            -
            - Judging and rationale evaluation
         | 
| 38 |  | 
| 39 | 
             
            This tool simplifies the process of creating custom datasets, enabling you to:
         | 
| 40 |  | 
| @@ -87,7 +86,7 @@ Optionally, you can use different models and APIs. | |
| 87 |  | 
| 88 | 
             
            - `BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api-inference.huggingface.co/v1/`, `https://api.openai.com/v1/`.
         | 
| 89 | 
             
            - `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`.
         | 
| 90 | 
            -
            - `API_KEY`: The API key to use for the  | 
| 91 |  | 
| 92 | 
             
            Optionally, you can also push your datasets to Argilla for further curation by setting the following environment variables:
         | 
| 93 |  | 
|  | |
| 28 |  | 
| 29 | 
             
            ## Introduction
         | 
| 30 |  | 
| 31 | 
            +
            Synthetic Data Generator is a tool that allows you to create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and LLMs to generate synthetic data tailored to your specific needs. [The announcement blog](https://huggingface.co/blog/synthetic-data-generator) goes over a practical example of how to use it.
         | 
| 32 |  | 
| 33 | 
             
            Supported Tasks:
         | 
| 34 |  | 
| 35 | 
             
            - Text Classification
         | 
| 36 | 
            +
            - Chat Data for Supervised Fine-Tuning
         | 
|  | |
| 37 |  | 
| 38 | 
             
            This tool simplifies the process of creating custom datasets, enabling you to:
         | 
| 39 |  | 
|  | |
| 86 |  | 
| 87 | 
             
            - `BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api-inference.huggingface.co/v1/`, `https://api.openai.com/v1/`.
         | 
| 88 | 
             
            - `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`.
         | 
| 89 | 
            +
            - `API_KEY`: The API key to use for the generation API, e.g. `hf_...`, `sk-...`. If not provided, it will default to the provided `HF_TOKEN` environment variable.
         | 
| 90 |  | 
| 91 | 
             
            Optionally, you can also push your datasets to Argilla for further curation by setting the following environment variables:
         | 
| 92 |  | 
 
			
