Spaces:
Sleeping
Sleeping
File size: 1,634 Bytes
0fb72b8 84f58ea 0fb72b8 82b5e45 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
title: Book QA Chat
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
---
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
## Local Transformers mode with alternate tokenizer
If the target model repository does not include a tokenizer, you can instruct the app to run locally with `transformers` and use a tokenizer from another repository.
Environment variables:
- `MODEL_ID` (optional): model repo to load. Defaults to `tianzhechu/BookQA-7B-Instruct`.
- `TOKENIZER_ID` (optional): tokenizer repo to use locally (e.g., a base model's tokenizer). When set, the app switches to a local `transformers` backend and streams tokens from your machine.
- `USE_LOCAL_TRANSFORMERS` (optional): set to `1` to force local mode even without `TOKENIZER_ID`.
Install extra dependencies:
```bash
pip install -r requirements.txt
```
Run with an alternate tokenizer (example):
```bash
export MODEL_ID=tianzhechu/BookQA-7B-Instruct
export TOKENIZER_ID=TheBaseModel/TokenizerRepo
python app.py
```
Notes:
- Local inference will download and load the model weights via `transformers` and may require significant memory.
- If the tokenizer exposes a chat template, it is applied automatically. Otherwise a simple fallback template is used.
- You'll need a compatible version of `torch` installed for your platform. If the default pip install fails, follow the official install instructions for your OS/GPU. |