Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,20 @@
|
|
1 |
---
|
2 |
title: README
|
3 |
emoji: 🔥
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: static
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: README
|
3 |
emoji: 🔥
|
4 |
+
colorFrom: gray
|
5 |
+
colorTo: yellow
|
6 |
sdk: static
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
[Aleph Alpha](https://aleph-alpha.com/research/) is dedicated to building sovereign and trustworthy AI systems. Our research has produced state-of-the-art multi-modal models ([MAGMA](https://github.com/Aleph-Alpha-Research/magma)), explainability techniques for transformer-based models ([AtMan](https://github.com/Aleph-Alpha-Research/AtMan)), and a comprehensive [evaluation framework for large-scale model assessment](https://github.com/Aleph-Alpha-Research/eval-framework/. We have also researched how to [move beyond traditional tokenizers](https://arxiv.org/html/2406.19223v1). Our work on tokenizer-free architectures uses [byte-level trigrams](https://huggingface.co/Aleph-Alpha/tfree-research-vocab-32k-fineweb-steps-370k) to create more resilient and adaptable models in non-english languages and new domains. Key models demonstrating the effectiveness of our innovative [Hierarchical Autoregressive Transformer (HAT)](https://arxiv.org/pdf/2501.10322) architecture include:
|
11 |
+
|
12 |
+
- llama-3_1-tfree-hat models: This model family replaces the Llama 3.1 tokenizer with our HAT architecture. The [8b-dpo model](https://huggingface.co/Aleph-Alpha/llama-3_1-8b-tfree-hat-dpo) is tuned for helpfulness and reduced refusal in sensitive applications, while the larger [70b-sft model](https://huggingface.co/Aleph-Alpha/llama-3_1-70b-tfree-hat-sft) is trained on English/German for improved text compression and adaptability.
|
13 |
+
|
14 |
+
- TFree-HAT-Pretrained-7B-Base: This [7B model](https://huggingface.co/Aleph-Alpha/tfree-hat-pretrained-7b-base) was pretrained from scratch in English & German and has a context length of 32,900 words. It shows strong proficiency in German and beats Llama 3.1 on many English benchmarks.
|
15 |
+
|
16 |
+
We also published a SOTA German Dataset ([data](https://huggingface.co/collections/Aleph-Alpha/aleph-alpha-germanweb-68010b712bf06d3479055d49), [arXiv](https://arxiv.org/pdf/2505.00022v1)), which can be used to enhance German LLM capabilities.
|
17 |
+
|
18 |
+
Our future work is dedicated to advancing reasoning models, de-biasing frontier models, understanding the role of data in model training, comprehensive and realistic model evaluation, pushing the boundaries of small models, and advancing tokenizer-free architectures. We will continue to concentrate on creating transparent, trustworthy, and auditable systems that provide users with greater control and insight into the decision-making processes of AI models.
|
19 |
+
|
20 |
+
Want to shape the future of sovereign AI? [Work with us](https://jobs.ashbyhq.com/AlephAlpha).
|