liamcripwell commited on
Commit
ab20bb9
·
verified ·
1 Parent(s): 344791f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +579 -195
README.md CHANGED
@@ -1,199 +1,583 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ base_model:
5
+ - Qwen/Qwen2-VL-2B-Instruct
6
+ pipeline_tag: image-text-to-text
7
  ---
8
 
9
+ <p align="center">
10
+ <img src="https://cdn.prod.website-files.com/638364a4e52e440048a9529c/64188f405afcf42d0b85b926_logo_numind_final.png" width="200"/>
11
+ <p>
12
+ <p align="center">
13
+ 🖥️ <a href="https://nuextract.ai/">API / Platform</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://numind.ai/blog">Blog</a>&nbsp&nbsp | &nbsp&nbsp🗣️ <a href="https://discord.gg/3tsEtJNCDe">Discord</a>
14
+ </p>
15
+
16
+ # NuExtract 2.0 2B by NuMind 📈📈📈
17
+
18
+ NuExtract 2.0 is a family of models trained specifically for structured information extraction tasks. It supports both multimodal inputs and is multilingual.
19
+
20
+ We provide several versions of different sizes, all based on pre-trained models from the QwenVL family.
21
+ | Model Size | Model Name | Base Model | License | Huggingface Link |
22
+ |------------|------------|------------|---------|------------------|
23
+ | 2B | NuExtract-2.0-2B | [Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) | MIT | 🤗 [NuExtract-2.0-2B](https://huggingface.co/numind/NuExtract-2.0-2B) |
24
+ | 4B | NuExtract-2.0-4B | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | Qwen Research License | 🤗 [NuExtract-2.0-4B](https://huggingface.co/numind/NuExtract-2.0-4B) |
25
+ | 8B | NuExtract-2.0-8B | [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | MIT | 🤗 [NuExtract-2.0-8B](https://huggingface.co/numind/NuExtract-2.0-8B) |
26
+
27
+ ❗️Note: `NuExtract-2.0-2B` is based on Qwen2-VL rather than Qwen2.5-VL because the smallest Qwen2.5-VL model (3B) has a more restrictive, non-commercial license. We therefore include `NuExtract-2.0-2B` as a small model option that can be used commercially.
28
+
29
+ ## Overview
30
+
31
+ To use the model, provide an input text/image and a JSON template describing the information you need to extract. The template should be a JSON object, specifying field names and their expected type.
32
+
33
+ Support types include:
34
+ * `verbatim-string` - instructs the model to extract text that is present verbatim in the input.
35
+ * `string` - a generic string field that can incorporate paraphrasing/abstraction.
36
+ * `integer` - a whole number.
37
+ * `number` - a whole or decimal number.
38
+ * `date-time` - ISO formatted date.
39
+ * Array of any of the above types (e.g. `["string"]`)
40
+ * `enum` - a choice from set of possible answers (represented in template as an array of options, e.g. `["yes", "no", "maybe"]`).
41
+ * `multi-label` - an enum that can have multiple possible answers (represented in template as a double-wrapped array, e.g. `[["A", "B", "C"]]`).
42
+
43
+ If the model does not identify relevant information for a field, it will return `null` or `[]` (for arrays and multi-labels).
44
+
45
+ The following is an example template:
46
+ ```json
47
+ {
48
+ "first_name": "verbatim-string",
49
+ "last_name": "verbatim-string",
50
+ "description": "string",
51
+ "age": "integer",
52
+ "gpa": "number",
53
+ "birth_date": "date-time",
54
+ "nationality": ["France", "England", "Japan", "USA", "China"],
55
+ "languages_spoken": [["English", "French", "Japanese", "Mandarin", "Spanish"]]
56
+ }
57
+ ```
58
+ An example output:
59
+ ```json
60
+ {
61
+ "first_name": "Susan",
62
+ "last_name": "Smith",
63
+ "description": "A student studying computer science.",
64
+ "age": 20,
65
+ "gpa": 3.7,
66
+ "birth_date": "2005-03-01",
67
+ "nationality": "England",
68
+ "languages_spoken": ["English", "French"]
69
+ }
70
+ ```
71
+
72
+ ⚠️ We recommend using NuExtract with a temperature at or very close to 0. Some inference frameworks, such as Ollama, use a default of 0.7 which is not well suited to many extraction tasks.
73
+
74
+ ## Using NuExtract with 🤗 Transformers
75
+
76
+ ```python
77
+ import torch
78
+ from transformers import AutoProcessor, AutoModelForVision2Seq
79
+
80
+ model_name = "numind/NuExtract-2.0-2B"
81
+ # model_name = "numind/NuExtract-2.0-8B"
82
+
83
+ model = AutoModelForVision2Seq.from_pretrained(model_name,
84
+ trust_remote_code=True,
85
+ torch_dtype=torch.bfloat16,
86
+ attn_implementation="flash_attention_2",
87
+ device_map="auto")
88
+ processor = AutoProcessor.from_pretrained(model_name,
89
+ trust_remote_code=True,
90
+ padding_side='left',
91
+ use_fast=True)
92
+
93
+ # You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
94
+ # min_pixels = 256*28*28
95
+ # max_pixels = 1280*28*28
96
+ # processor = AutoProcessor.from_pretrained(model_name, min_pixels=min_pixels, max_pixels=max_pixels)
97
+ ```
98
+
99
+ You will need the following function to handle loading of image input data:
100
+ ```python
101
+ def process_all_vision_info(messages, examples=None):
102
+ """
103
+ Process vision information from both messages and in-context examples, supporting batch processing.
104
+
105
+ Args:
106
+ messages: List of message dictionaries (single input) OR list of message lists (batch input)
107
+ examples: Optional list of example dictionaries (single input) OR list of example lists (batch)
108
+
109
+ Returns:
110
+ A flat list of all images in the correct order:
111
+ - For single input: example images followed by message images
112
+ - For batch input: interleaved as (item1 examples, item1 input, item2 examples, item2 input, etc.)
113
+ - Returns None if no images were found
114
+ """
115
+ from qwen_vl_utils import process_vision_info, fetch_image
116
+
117
+ # Helper function to extract images from examples
118
+ def extract_example_images(example_item):
119
+ if not example_item:
120
+ return []
121
+
122
+ # Handle both list of examples and single example
123
+ examples_to_process = example_item if isinstance(example_item, list) else [example_item]
124
+ images = []
125
+
126
+ for example in examples_to_process:
127
+ if isinstance(example.get('input'), dict) and example['input'].get('type') == 'image':
128
+ images.append(fetch_image(example['input']))
129
+
130
+ return images
131
+
132
+ # Normalize inputs to always be batched format
133
+ is_batch = messages and isinstance(messages[0], list)
134
+ messages_batch = messages if is_batch else [messages]
135
+ is_batch_examples = examples and isinstance(examples, list) and (isinstance(examples[0], list) or examples[0] is None)
136
+ examples_batch = examples if is_batch_examples else ([examples] if examples is not None else None)
137
+
138
+ # Ensure examples batch matches messages batch if provided
139
+ if examples and len(examples_batch) != len(messages_batch):
140
+ if not is_batch and len(examples_batch) == 1:
141
+ # Single example set for a single input is fine
142
+ pass
143
+ else:
144
+ raise ValueError("Examples batch length must match messages batch length")
145
+
146
+ # Process all inputs, maintaining correct order
147
+ all_images = []
148
+ for i, message_group in enumerate(messages_batch):
149
+ # Get example images for this input
150
+ if examples and i < len(examples_batch):
151
+ input_example_images = extract_example_images(examples_batch[i])
152
+ all_images.extend(input_example_images)
153
+
154
+ # Get message images for this input
155
+ input_message_images = process_vision_info(message_group)[0] or []
156
+ all_images.extend(input_message_images)
157
+
158
+ return all_images if all_images else None
159
+ ```
160
+
161
+ E.g. To perform a basic extraction of names from a text document:
162
+ ```python
163
+ template = """{"names": ["string"]}"""
164
+ document = "John went to the restaurant with Mary. James went to the cinema."
165
+
166
+ # prepare the user message content
167
+ messages = [{"role": "user", "content": document}]
168
+ text = processor.tokenizer.apply_chat_template(
169
+ messages,
170
+ template=template, # template is specified here
171
+ tokenize=False,
172
+ add_generation_prompt=True,
173
+ )
174
+
175
+ print(text)
176
+ """"<|im_start|>user
177
+ # Template:
178
+ {"names": ["string"]}
179
+ # Context:
180
+ John went to the restaurant with Mary. James went to the cinema.<|im_end|>
181
+ <|im_start|>assistant"""
182
+
183
+ image_inputs = process_all_vision_info(messages)
184
+ inputs = processor(
185
+ text=[text],
186
+ images=image_inputs,
187
+ padding=True,
188
+ return_tensors="pt",
189
+ ).to("cuda")
190
+
191
+ # we choose greedy sampling here, which works well for most information extraction tasks
192
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
193
+
194
+ # Inference: Generation of the output
195
+ generated_ids = model.generate(
196
+ **inputs,
197
+ **generation_config
198
+ )
199
+ generated_ids_trimmed = [
200
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
201
+ ]
202
+ output_text = processor.batch_decode(
203
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
204
+ )
205
+
206
+ print(output_text)
207
+ # ['{"names": ["John", "Mary", "James"]}']
208
+ ```
209
+
210
+ <details>
211
+ <summary>In-Context Examples</summary>
212
+ Sometimes the model might not perform as well as we want because our task is challenging or involves some degree of ambiguity. Alternatively, we may want the model to follow some specific formatting, or just give it a bit more help. In cases like this it can be valuable to provide "in-context examples" to help NuExtract better understand the task.
213
+
214
+ To do so, we can provide a list examples (dictionaries of input/output pairs). In the example below, we show to the model that we want the extracted names to be in captial letters with `-` on either side (for the sake of illustration). Usually providing multiple examples will lead to better results.
215
+ ```python
216
+ template = """{"names": ["string"]}"""
217
+ document = "John went to the restaurant with Mary. James went to the cinema."
218
+ examples = [
219
+ {
220
+ "input": "Stephen is the manager at Susan's store.",
221
+ "output": """{"names": ["-STEPHEN-", "-SUSAN-"]}"""
222
+ }
223
+ ]
224
+
225
+ messages = [{"role": "user", "content": document}]
226
+ text = processor.tokenizer.apply_chat_template(
227
+ messages,
228
+ template=template,
229
+ examples=examples, # examples provided here
230
+ tokenize=False,
231
+ add_generation_prompt=True,
232
+ )
233
+
234
+ image_inputs = process_all_vision_info(messages, examples)
235
+ inputs = processor(
236
+ text=[text],
237
+ images=image_inputs,
238
+ padding=True,
239
+ return_tensors="pt",
240
+ ).to("cuda")
241
+
242
+ # we choose greedy sampling here, which works well for most information extraction tasks
243
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
244
+
245
+ # Inference: Generation of the output
246
+ generated_ids = model.generate(
247
+ **inputs,
248
+ **generation_config
249
+ )
250
+ generated_ids_trimmed = [
251
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
252
+ ]
253
+ output_text = processor.batch_decode(
254
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
255
+ )
256
+ print(output_text)
257
+ # ['{"names": ["-JOHN-", "-MARY-", "-JAMES-"]}']
258
+ ```
259
+ </details>
260
+
261
+ <details>
262
+ <summary>Image Inputs</summary>
263
+ If we want to give image inputs to NuExtract, instead of text, we simply provide a dictionary specifying the desired image file as the message content, instead of a string. (e.g. `{"type": "image", "image": "file://image.jpg"}`).
264
+
265
+ You can also specify an image URL (e.g. `{"type": "image", "image": "http://path/to/your/image.jpg"}`) or base64 encoding (e.g. `{"type": "image", "image": "data:image;base64,/9j/..."}`).
266
+ ```python
267
+ template = """{"store": "verbatim-string"}"""
268
+ document = {"type": "image", "image": "file://1.jpg"}
269
+
270
+ messages = [{"role": "user", "content": [document]}]
271
+ text = processor.tokenizer.apply_chat_template(
272
+ messages,
273
+ template=template,
274
+ tokenize=False,
275
+ add_generation_prompt=True,
276
+ )
277
+
278
+ image_inputs = process_all_vision_info(messages)
279
+ inputs = processor(
280
+ text=[text],
281
+ images=image_inputs,
282
+ padding=True,
283
+ return_tensors="pt",
284
+ ).to("cuda")
285
+
286
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
287
+
288
+ # Inference: Generation of the output
289
+ generated_ids = model.generate(
290
+ **inputs,
291
+ **generation_config
292
+ )
293
+ generated_ids_trimmed = [
294
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
295
+ ]
296
+ output_text = processor.batch_decode(
297
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
298
+ )
299
+ print(output_text)
300
+ # ['{"store": "Trader Joe\'s"}']
301
+ ```
302
+ </details>
303
+
304
+ <details>
305
+ <summary>Batch Inference</summary>
306
+
307
+ ```python
308
+ inputs = [
309
+ # image input with no ICL examples
310
+ {
311
+ "document": {"type": "image", "image": "file://0.jpg"},
312
+ "template": """{"store_name": "verbatim-string"}""",
313
+ },
314
+ # image input with 1 ICL example
315
+ {
316
+ "document": {"type": "image", "image": "file://0.jpg"},
317
+ "template": """{"store_name": "verbatim-string"}""",
318
+ "examples": [
319
+ {
320
+ "input": {"type": "image", "image": "file://1.jpg"},
321
+ "output": """{"store_name": "Trader Joe's"}""",
322
+ }
323
+ ],
324
+ },
325
+ # text input with no ICL examples
326
+ {
327
+ "document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
328
+ "template": """{"names": ["string"]}""",
329
+ },
330
+ # text input with ICL example
331
+ {
332
+ "document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
333
+ "template": """{"names": ["string"]}""",
334
+ "examples": [
335
+ {
336
+ "input": "Stephen is the manager at Susan's store.",
337
+ "output": """{"names": ["STEPHEN", "SUSAN"]}"""
338
+ }
339
+ ],
340
+ },
341
+ ]
342
+
343
+ # messages should be a list of lists for batch processing
344
+ messages = [
345
+ [
346
+ {
347
+ "role": "user",
348
+ "content": [x['document']],
349
+ }
350
+ ]
351
+ for x in inputs
352
+ ]
353
+
354
+ # apply chat template to each example individually
355
+ texts = [
356
+ processor.tokenizer.apply_chat_template(
357
+ messages[i], # Now this is a list containing one message
358
+ template=x['template'],
359
+ examples=x.get('examples', None),
360
+ tokenize=False,
361
+ add_generation_prompt=True)
362
+ for i, x in enumerate(inputs)
363
+ ]
364
+
365
+ image_inputs = process_all_vision_info(messages, [x.get('examples') for x in inputs])
366
+ inputs = processor(
367
+ text=texts,
368
+ images=image_inputs,
369
+ padding=True,
370
+ return_tensors="pt",
371
+ ).to("cuda")
372
+
373
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
374
+
375
+ # Batch Inference
376
+ generated_ids = model.generate(**inputs, **generation_config)
377
+ generated_ids_trimmed = [
378
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
379
+ ]
380
+ output_texts = processor.batch_decode(
381
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
382
+ )
383
+ for y in output_texts:
384
+ print(y)
385
+ # {"store_name": "WAL-MART"}
386
+ # {"store_name": "Walmart"}
387
+ # {"names": ["John", "Mary", "James"]}
388
+ # {"names": ["JOHN", "MARY", "JAMES"]}
389
+ ```
390
+ </details>
391
+
392
+ <details>
393
+ <summary>Template Generation</summary>
394
+ If you want to convert existing schema files you have in other formats (e.g. XML, YAML, etc.) or start from an example, NuExtract 2.0 models can automatically generate this for you.
395
+
396
+ E.g. convert XML into a NuExtract template:
397
+ ```python
398
+ xml_template = """<SportResult>
399
+ <Date></Date>
400
+ <Sport></Sport>
401
+ <Venue></Venue>
402
+ <HomeTeam></HomeTeam>
403
+ <AwayTeam></AwayTeam>
404
+ <HomeScore></HomeScore>
405
+ <AwayScore></AwayScore>
406
+ <TopScorer></TopScorer>
407
+ </SportResult>"""
408
+
409
+ messages = [
410
+ {
411
+ "role": "user",
412
+ "content": [{"type": "text", "text": xml_template}],
413
+ }
414
+ ]
415
+
416
+ text = processor.apply_chat_template(
417
+ messages, tokenize=False, add_generation_prompt=True,
418
+ )
419
+
420
+ image_inputs = process_all_vision_info(messages)
421
+ inputs = processor(
422
+ text=[text],
423
+ images=image_inputs,
424
+ padding=True,
425
+ return_tensors="pt",
426
+ ).to("cuda")
427
+
428
+ generated_ids = model.generate(
429
+ **inputs,
430
+ **generation_config
431
+ )
432
+ generated_ids_trimmed = [
433
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
434
+ ]
435
+ output_text = processor.batch_decode(
436
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
437
+ )
438
+
439
+ print(output_text[0])
440
+ # {
441
+ # "Date": "date-time",
442
+ # "Sport": "verbatim-string",
443
+ # "Venue": "verbatim-string",
444
+ # "HomeTeam": "verbatim-string",
445
+ # "AwayTeam": "verbatim-string",
446
+ # "HomeScore": "integer",
447
+ # "AwayScore": "integer",
448
+ # "TopScorer": "verbatim-string"
449
+ # }
450
+ ```
451
+
452
+ E.g. generate a template from natural language description:
453
+ ```python
454
+ description = "I would like to extract important details from the contract."
455
+
456
+ messages = [
457
+ {
458
+ "role": "user",
459
+ "content": [{"type": "text", "text": description}],
460
+ }
461
+ ]
462
+
463
+ text = processor.apply_chat_template(
464
+ messages, tokenize=False, add_generation_prompt=True,
465
+ )
466
+
467
+ image_inputs = process_all_vision_info(messages)
468
+ inputs = processor(
469
+ text=[text],
470
+ images=image_inputs,
471
+ padding=True,
472
+ return_tensors="pt",
473
+ ).to("cuda")
474
+
475
+ generated_ids = model.generate(
476
+ **inputs,
477
+ **generation_config
478
+ )
479
+ generated_ids_trimmed = [
480
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
481
+ ]
482
+ output_text = processor.batch_decode(
483
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
484
+ )
485
+
486
+ print(output_text[0])
487
+ # {
488
+ # "Contract": {
489
+ # "Title": "verbatim-string",
490
+ # "Description": "verbatim-string",
491
+ # "Terms": [
492
+ # {
493
+ # "Term": "verbatim-string",
494
+ # "Description": "verbatim-string"
495
+ # }
496
+ # ],
497
+ # "Date": "date-time",
498
+ # "Signatory": "verbatim-string"
499
+ # }
500
+ # }
501
+ ```
502
+ </details>
503
+
504
+ ## Fine-Tuning
505
+ You can find a fine-tuning tutorial notebook in the [cookbooks](https://github.com/numindai/nuextract/tree/main/cookbooks) folder of the [GitHub repo](https://github.com/numindai/nuextract/tree/main).
506
+
507
+ ## vLLM Deployment
508
+ Run the command below to serve an OpenAI-compatible API:
509
+ ```bash
510
+ vllm serve numind/NuExtract-2.0-8B --trust_remote_code --limit-mm-per-prompt image=6 --chat-template-content-format openai
511
+ ```
512
+ If you encounter memory issues, set `--max-model-len` accordingly.
513
+
514
+ Send requests to the model as follows:
515
+ ```python
516
+ import json
517
+ from openai import OpenAI
518
+
519
+ openai_api_key = "EMPTY"
520
+ openai_api_base = "http://localhost:8000/v1"
521
+
522
+ client = OpenAI(
523
+ api_key=openai_api_key,
524
+ base_url=openai_api_base,
525
+ )
526
+
527
+ chat_response = client.chat.completions.create(
528
+ model="numind/NuExtract-2.0-8B",
529
+ temperature=0,
530
+ messages=[
531
+ {
532
+ "role": "user",
533
+ "content": [{"type": "text", "text": "Yesterday I went shopping at Bunnings"}],
534
+ },
535
+ ],
536
+ extra_body={
537
+ "chat_template_kwargs": {
538
+ "template": json.dumps(json.loads("""{\"store\": \"verbatim-string\"}"""), indent=4)
539
+ },
540
+ }
541
+ )
542
+ print("Chat response:", chat_response)
543
+ ```
544
+ For image inputs, structure requests as shown below. Make sure to order the images in `"content"` as they appear in the prompt (i.e. any in-context examples before the main input).
545
+ ```python
546
+ import base64
547
+
548
+ def encode_image(image_path):
549
+ """
550
+ Encode the image file to base64 string
551
+ """
552
+ with open(image_path, "rb") as image_file:
553
+ return base64.b64encode(image_file.read()).decode('utf-8')
554
+
555
+ base64_image = encode_image("0.jpg")
556
+ base64_image2 = encode_image("1.jpg")
557
+
558
+ chat_response = client.chat.completions.create(
559
+ model="numind/NuExtract-2.0-8B",
560
+ temperature=0,
561
+ messages=[
562
+ {
563
+ "role": "user",
564
+ "content": [
565
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}, # first ICL example image
566
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image2}"}}, # real input image
567
+ ],
568
+ },
569
+ ],
570
+ extra_body={
571
+ "chat_template_kwargs": {
572
+ "template": json.dumps(json.loads("""{\"store\": \"verbatim-string\"}"""), indent=4),
573
+ "examples": [
574
+ {
575
+ "input": "<image>",
576
+ "output": """{\"store\": \"Walmart\"}"""
577
+ }
578
+ ]
579
+ },
580
+ }
581
+ )
582
+ print("Chat response:", chat_response)
583
+ ```