Spaces:
Runtime error
Runtime error
| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of | |
| the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity | |
| Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the | |
| [](../task_summary) for examples of use. | |
| There are two categories of pipeline abstractions to be aware about: | |
| - The [`pipeline`] which is the most powerful object encapsulating all other pipelines. | |
| - Task-specific pipelines are available for [audio]( | |
| The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other | |
| pipeline but can provide additional quality of life. | |
| Simple call on one item: | |
| ```python | |
| >>> pipe = pipeline("text-classification") | |
| >>> pipe("This restaurant is awesome") | |
| [{'label': 'POSITIVE', 'score': 0.9998743534088135}] | |
| ``` | |
| If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on | |
| the hub already defines it: | |
| ```python | |
| >>> pipe = pipeline(model="roberta-large-mnli") | |
| >>> pipe("This restaurant is awesome") | |
| [{'label': 'NEUTRAL', 'score': 0.7313136458396912}] | |
| ``` | |
| To call a pipeline on many items, you can call it with a *list*. | |
| ```python | |
| >>> pipe = pipeline("text-classification") | |
| >>> pipe(["This restaurant is awesome", "This restaurant is awful"]) | |
| [{'label': 'POSITIVE', 'score': 0.9998743534088135}, | |
| {'label': 'NEGATIVE', 'score': 0.9996669292449951}] | |
| ``` | |
| To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate | |
| the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on | |
| GPU. If it doesn't don't hesitate to create an issue. | |
| ```python | |
| import datasets | |
| from transformers import pipeline | |
| from transformers.pipelines.pt_utils import KeyDataset | |
| from tqdm.auto import tqdm | |
| pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) | |
| dataset = datasets.load_dataset("superb", name="asr", split="test") | |
| # KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item | |
| # as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset | |
| for out in tqdm(pipe(KeyDataset(dataset, "file"))): | |
| print(out) | |
| # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} | |
| ``` | |
| For ease of use, a generator is also possible: | |
| ```python | |
| from transformers import pipeline | |
| pipe = pipeline("text-classification") | |
| def data(): | |
| while True: | |
| # This could come from a dataset, a database, a queue or HTTP request | |
| # in a server | |
| # Caveat: because this is iterative, you cannot use `num_workers > 1` variable | |
| # to use multiple threads to preprocess data. You can still have 1 thread that | |
| # does the preprocessing while the main runs the big inference | |
| yield "This is a test" | |
| for out in pipe(data()): | |
| print(out) | |
| # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} | |
| ``` | |
| [[autodoc]] pipeline | |
| All pipelines can use batching. This will work | |
| whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`). | |
| ```python | |
| from transformers import pipeline | |
| from transformers.pipelines.pt_utils import KeyDataset | |
| import datasets | |
| dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised") | |
| pipe = pipeline("text-classification", device=0) | |
| for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"): | |
| print(out) | |
| ``` | |
| <Tip warning={true}> | |
| However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending | |
| on hardware, data and the actual model being used. | |
| Example where it's mostly a speedup: | |
| </Tip> | |
| ```python | |
| from transformers import pipeline | |
| from torch.utils.data import Dataset | |
| from tqdm.auto import tqdm | |
| pipe = pipeline("text-classification", device=0) | |
| class MyDataset(Dataset): | |
| def __len__(self): | |
| return 5000 | |
| def __getitem__(self, i): | |
| return "This is a test" | |
| dataset = MyDataset() | |
| for batch_size in [1, 8, 64, 256]: | |
| print("-" * 30) | |
| print(f"Streaming batch_size={batch_size}") | |
| for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): | |
| pass | |
| ``` | |
| ``` | |
| # On GTX 970 | |
| ------------------------------ | |
| Streaming no batching | |
| 100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s] | |
| ------------------------------ | |
| Streaming batch_size=8 | |
| 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s] | |
| ------------------------------ | |
| Streaming batch_size=64 | |
| 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s] | |
| ------------------------------ | |
| Streaming batch_size=256 | |
| 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s] | |
| (diminishing returns, saturated the GPU) | |
| ``` | |
| Example where it's most a slowdown: | |
| ```python | |
| class MyDataset(Dataset): | |
| def __len__(self): | |
| return 5000 | |
| def __getitem__(self, i): | |
| if i % 64 == 0: | |
| n = 100 | |
| else: | |
| n = 1 | |
| return "This is a test" * n | |
| ``` | |
| This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400 | |
| tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on | |
| bigger batches, the program simply crashes. | |
| ``` | |
| ------------------------------ | |
| Streaming no batching | |
| 100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s] | |
| ------------------------------ | |
| Streaming batch_size=8 | |
| 100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s] | |
| ------------------------------ | |
| Streaming batch_size=64 | |
| 100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s] | |
| ------------------------------ | |
| Streaming batch_size=256 | |
| 0%| | 0/1000 [00:00<?, ?it/s] | |
| Traceback (most recent call last): | |
| File "/home/nicolas/src/transformers/test.py", line 42, in <module> | |
| for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)): | |
| .... | |
| q = q / math.sqrt(dim_per_head) | |
| RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch) | |
| ``` | |
| There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of | |
| thumb: | |
| For users, a rule of thumb is: | |
| - **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the | |
| only way to go.** | |
| - If you are latency constrained (live product doing inference), don't batch | |
| - If you are using CPU, don't batch. | |
| - If you are using throughput (you want to run your model on a bunch of static data), on GPU, then: | |
| - If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and | |
| try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't | |
| control the sequence_length.) | |
| - If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push | |
| it until you get OOMs. | |
| - The larger the GPU the more likely batching is going to be more interesting | |
| - As soon as you enable batching, make sure you can handle OOMs nicely. | |
| ## Pipeline chunk batching | |
| `zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield | |
| multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument. | |
| In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of | |
| regular `Pipeline`. In short: | |
| ```python | |
| preprocessed = pipe.preprocess(inputs) | |
| model_outputs = pipe.forward(preprocessed) | |
| outputs = pipe.postprocess(model_outputs) | |
| ``` | |
| Now becomes: | |
| ```python | |
| all_model_outputs = [] | |
| for preprocessed in pipe.preprocess(inputs): | |
| model_outputs = pipe.forward(preprocessed) | |
| all_model_outputs.append(model_outputs) | |
| outputs = pipe.postprocess(all_model_outputs) | |
| ``` | |
| This should be very transparent to your code because the pipelines are used in | |
| the same way. | |
| This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care | |
| about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size` | |
| independently of the inputs. The caveats from the previous section still apply. | |
| ## Pipeline custom code | |
| If you want to override a specific pipeline. | |
| Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most | |
| cases, so `transformers` could maybe support your use case. | |
| If you want to try simply you can: | |
| - Subclass your pipeline of choice | |
| ```python | |
| class MyPipeline(TextClassificationPipeline): | |
| def postprocess(): | |
| # Your code goes here | |
| scores = scores * 100 | |
| # And here | |
| my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...) | |
| # or if you use *pipeline* function, then: | |
| my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline) | |
| ``` | |
| That should enable you to do all the custom code you want. | |
| ## Implementing a pipeline | |
| [Implementing a new pipeline](../add_new_pipeline) | |
| ## Audio | |
| Pipelines available for audio tasks include the following. | |
| ### AudioClassificationPipeline | |
| [[autodoc]] AudioClassificationPipeline | |
| - __call__ | |
| - all | |
| ### AutomaticSpeechRecognitionPipeline | |
| [[autodoc]] AutomaticSpeechRecognitionPipeline | |
| - __call__ | |
| - all | |
| ### ZeroShotAudioClassificationPipeline | |
| [[autodoc]] ZeroShotAudioClassificationPipeline | |
| - __call__ | |
| - all | |
| ## Computer vision | |
| Pipelines available for computer vision tasks include the following. | |
| ### DepthEstimationPipeline | |
| [[autodoc]] DepthEstimationPipeline | |
| - __call__ | |
| - all | |
| ### ImageClassificationPipeline | |
| [[autodoc]] ImageClassificationPipeline | |
| - __call__ | |
| - all | |
| ### ImageSegmentationPipeline | |
| [[autodoc]] ImageSegmentationPipeline | |
| - __call__ | |
| - all | |
| ### ObjectDetectionPipeline | |
| [[autodoc]] ObjectDetectionPipeline | |
| - __call__ | |
| - all | |
| ### VideoClassificationPipeline | |
| [[autodoc]] VideoClassificationPipeline | |
| - __call__ | |
| - all | |
| ### ZeroShotImageClassificationPipeline | |
| [[autodoc]] ZeroShotImageClassificationPipeline | |
| - __call__ | |
| - all | |
| ### ZeroShotObjectDetectionPipeline | |
| [[autodoc]] ZeroShotObjectDetectionPipeline | |
| - __call__ | |
| - all | |
| ## Natural Language Processing | |
| Pipelines available for natural language processing tasks include the following. | |
| ### ConversationalPipeline | |
| [[autodoc]] Conversation | |
| [[autodoc]] ConversationalPipeline | |
| - __call__ | |
| - all | |
| ### FillMaskPipeline | |
| [[autodoc]] FillMaskPipeline | |
| - __call__ | |
| - all | |
| ### NerPipeline | |
| [[autodoc]] NerPipeline | |
| See [`TokenClassificationPipeline`] for all details. | |
| ### QuestionAnsweringPipeline | |
| [[autodoc]] QuestionAnsweringPipeline | |
| - __call__ | |
| - all | |
| ### SummarizationPipeline | |
| [[autodoc]] SummarizationPipeline | |
| - __call__ | |
| - all | |
| ### TableQuestionAnsweringPipeline | |
| [[autodoc]] TableQuestionAnsweringPipeline | |
| - __call__ | |
| ### TextClassificationPipeline | |
| [[autodoc]] TextClassificationPipeline | |
| - __call__ | |
| - all | |
| ### TextGenerationPipeline | |
| [[autodoc]] TextGenerationPipeline | |
| - __call__ | |
| - all | |
| ### Text2TextGenerationPipeline | |
| [[autodoc]] Text2TextGenerationPipeline | |
| - __call__ | |
| - all | |
| ### TokenClassificationPipeline | |
| [[autodoc]] TokenClassificationPipeline | |
| - __call__ | |
| - all | |
| ### TranslationPipeline | |
| [[autodoc]] TranslationPipeline | |
| - __call__ | |
| - all | |
| ### ZeroShotClassificationPipeline | |
| [[autodoc]] ZeroShotClassificationPipeline | |
| - __call__ | |
| - all | |
| ## Multimodal | |
| Pipelines available for multimodal tasks include the following. | |
| ### DocumentQuestionAnsweringPipeline | |
| [[autodoc]] DocumentQuestionAnsweringPipeline | |
| - __call__ | |
| - all | |
| ### FeatureExtractionPipeline | |
| [[autodoc]] FeatureExtractionPipeline | |
| - __call__ | |
| - all | |
| ### ImageToTextPipeline | |
| [[autodoc]] ImageToTextPipeline | |
| - __call__ | |
| - all | |
| ### VisualQuestionAnsweringPipeline | |
| [[autodoc]] VisualQuestionAnsweringPipeline | |
| - __call__ | |
| - all | |
| ## Parent class: `Pipeline` | |
| [[autodoc]] Pipeline | |