Spaces:

medmac01
/

stable-diff-multilingual-v0.1

Sleeping

App Files Files Community

stable-diff-multilingual-v0.1 / Multilingual_CLIP /Model Cards /Swe-CLIP 2M /README.md

medmac01

Added multilingual_clip module

3bd5293 over 1 year ago

preview code

raw

history blame

1.59 kB

	<br />
	<p align="center">
	<h1 align="center">Swe-CLIP 2M</h1>

	<p align="center">
	<a href="https://huggingface.co/M-CLIP/Swedish-2M">Huggingface Model</a>
	·
	<a href="https://huggingface.co/KB/bert-base-swedish-cased">Huggingface Base Model</a>
	</p>
	</p>

	## Usage
	To use this model along with the original CLIP vision encoder follow the [main page usage instructions](https://github.com/FreddeFrallan/Multilingual-CLIP) to download the additional linear weights.
	Once this is done, you can load and use the model with the following code
	```python
	from multilingual_clip import multilingual_clip

	model = multilingual_clip.load_model('Swe-CLIP-2M')
	embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta'])
	print(embeddings.shape)
	# Yields: torch.Size([2, 640])
	```

	<!-- ABOUT THE PROJECT -->
	## About
	A [KB/Bert-Swedish-Cased](https://huggingface.co/KB/bert-base-swedish-cased) tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>

	Training data pairs was generated by sampling 2 Million sentences from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into Swedish.
	All translation was done using the [Huggingface Opus Model](https://huggingface.co/Helsinki-NLP/opus-mt-en-sv), which seemingly procudes higher quality translations than relying on the [AWS translate service](https://aws.amazon.com/translate/).