File size: 4,739 Bytes
b5dc274
 
 
 
 
 
 
 
 
2c90911
ce53d26
54791c8
2c90911
b5dc274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf2616b
b5dc274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b985b27
b5dc274
 
 
 
 
b985b27
 
b5dc274
b985b27
b5dc274
 
b985b27
 
 
 
 
 
 
 
 
 
 
 
b5dc274
b985b27
 
b5dc274
b985b27
 
 
b5dc274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: mit
language:
- en
license_link: https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE
base_model: 
  - BAAI/bge-base-en-v1.5
---
# bge-base-en-v1.5-int8-ov

> [!WARNING]
> **Disclaimer**: This model is provided for evaluation purposes only. Performance, accuracy, and stability may vary. Use at your own discretion.

 * Model creator: [BAAI](https://huggingface.co/BAAI)
 * Original model: [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)

## Description
This is [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2025/documentation/openvino-ir-format.html) (Intermediate Representation) format with quantization to INT8 by [NNCF](https://github.com/openvinotoolkit/nncf).

**Disclaimer**: Model is provided as a preview and may be update in the future.


## Quantization Parameters

The quantization was performed using the next code:

```
from functools import partial

from transformers import AutoTokenizer

from optimum.intel import OVConfig, OVModelForFeatureExtraction, OVQuantizationConfig, OVQuantizer


MODEL_ID = "OpenVINO/bge-base-en-v1.5-fp16-ov"
base_model_path = "bge-base-en-v1.5"
int8_ptq_model_path = "bge-base-en-v1.5-int8"

model = OVModelForFeatureExtraction.from_pretrained(MODEL_ID)
model.save_pretrained(base_model_path)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.save_pretrained(base_model_path)


quantizer = OVQuantizer.from_pretrained(model)

def preprocess_function(examples, tokenizer):
    return tokenizer(examples["sentence"], padding="max_length", max_length=384, truncation=True)


calibration_dataset = quantizer.get_calibration_dataset(
    "glue",
    dataset_config_name="sst2",
    preprocess_function=partial(preprocess_function, tokenizer=tokenizer),
    num_samples=300,
    dataset_split="train",
)

ov_config = OVConfig(quantization_config=OVQuantizationConfig())

quantizer.quantize(ov_config=ov_config, calibration_dataset=calibration_dataset, save_directory=int8_ptq_model_path)
tokenizer.save_pretrained(int8_ptq_model_path)
```

For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/quantizing-models-post-training.html).


## Compatibility

The provided OpenVINO™ IR model is compatible with:

* OpenVINO version 2025.1.0 and higher
* Optimum Intel 1.24.0 and higher


## Running Model Inference with [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)

1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:

```
pip install optimum[openvino]
```

2. Run model inference:

```
import torch
from transformers import AutoTokenizer

from optimum.intel.openvino import OVModelForFeatureExtraction


# Sentences we want sentence embeddings for
sentences = ["Sample Data-1", "Sample Data-2"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('OpenVINO/bge-base-en-v1.5-int8-ov')
model = OVModelForFeatureExtraction.from_pretrained('OpenVINO/bge-base-en-v1.5-int8-ov')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
model_output = model(**encoded_input)

# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]

# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
print("Sentence embeddings:", sentence_embeddings)
```

For more examples and possible optimizations, refer to the [Inference with Optimum Intel](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-optimum-intel.html).

You can find more detailed usage examples in OpenVINO Notebooks:

- [RAG text generation](https://openvinotoolkit.github.io/openvino_notebooks/?search=RAG+system)

## Limitations

Check the original [model card](https://huggingface.co/BAAI/bge-base-en-v1.5) for limitations.

## Legal information

The original model is distributed under [MIT](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE) license. More details can be found in [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5).

## Disclaimer

Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See [Intel’s Global Human Rights Principles](https://www.intel.com/content/dam/www/central-libraries/us/en/documents/policy-human-rights.pdf). Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.