File size: 1,634 Bytes
0fb72b8
84f58ea
0fb72b8
 
 
 
 
 
 
 
 
82b5e45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
title: Book QA Chat
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
---

An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

## Local Transformers mode with alternate tokenizer

If the target model repository does not include a tokenizer, you can instruct the app to run locally with `transformers` and use a tokenizer from another repository.

Environment variables:

- `MODEL_ID` (optional): model repo to load. Defaults to `tianzhechu/BookQA-7B-Instruct`.
- `TOKENIZER_ID` (optional): tokenizer repo to use locally (e.g., a base model's tokenizer). When set, the app switches to a local `transformers` backend and streams tokens from your machine.
- `USE_LOCAL_TRANSFORMERS` (optional): set to `1` to force local mode even without `TOKENIZER_ID`.

Install extra dependencies:

```bash
pip install -r requirements.txt
```

Run with an alternate tokenizer (example):

```bash
export MODEL_ID=tianzhechu/BookQA-7B-Instruct
export TOKENIZER_ID=TheBaseModel/TokenizerRepo
python app.py
```

Notes:

- Local inference will download and load the model weights via `transformers` and may require significant memory.
- If the tokenizer exposes a chat template, it is applied automatically. Otherwise a simple fallback template is used.
- You'll need a compatible version of `torch` installed for your platform. If the default pip install fails, follow the official install instructions for your OS/GPU.