Spaces:
Build error
Build error
Commit
·
84677f5
1
Parent(s):
904d1fd
update readme on env var usage
Browse files
README.md
CHANGED
|
@@ -28,13 +28,12 @@ hf_oauth_scopes:
|
|
| 28 |
|
| 29 |
## Introduction
|
| 30 |
|
| 31 |
-
Synthetic Data Generator is a tool that allows you to create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and LLMs to generate synthetic data tailored to your specific needs.
|
| 32 |
|
| 33 |
Supported Tasks:
|
| 34 |
|
| 35 |
- Text Classification
|
| 36 |
-
- Supervised Fine-Tuning
|
| 37 |
-
- Judging and rationale evaluation
|
| 38 |
|
| 39 |
This tool simplifies the process of creating custom datasets, enabling you to:
|
| 40 |
|
|
@@ -87,7 +86,7 @@ Optionally, you can use different models and APIs.
|
|
| 87 |
|
| 88 |
- `BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api-inference.huggingface.co/v1/`, `https://api.openai.com/v1/`.
|
| 89 |
- `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`.
|
| 90 |
-
- `API_KEY`: The API key to use for the
|
| 91 |
|
| 92 |
Optionally, you can also push your datasets to Argilla for further curation by setting the following environment variables:
|
| 93 |
|
|
|
|
| 28 |
|
| 29 |
## Introduction
|
| 30 |
|
| 31 |
+
Synthetic Data Generator is a tool that allows you to create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and LLMs to generate synthetic data tailored to your specific needs. [The announcement blog](https://huggingface.co/blog/synthetic-data-generator) goes over a practical example of how to use it.
|
| 32 |
|
| 33 |
Supported Tasks:
|
| 34 |
|
| 35 |
- Text Classification
|
| 36 |
+
- Chat Data for Supervised Fine-Tuning
|
|
|
|
| 37 |
|
| 38 |
This tool simplifies the process of creating custom datasets, enabling you to:
|
| 39 |
|
|
|
|
| 86 |
|
| 87 |
- `BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api-inference.huggingface.co/v1/`, `https://api.openai.com/v1/`.
|
| 88 |
- `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`.
|
| 89 |
+
- `API_KEY`: The API key to use for the generation API, e.g. `hf_...`, `sk-...`. If not provided, it will default to the provided `HF_TOKEN` environment variable.
|
| 90 |
|
| 91 |
Optionally, you can also push your datasets to Argilla for further curation by setting the following environment variables:
|
| 92 |
|