Transformers
English
File size: 1,695 Bytes
35f998f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
library_name: transformers
license: apache-2.0
language:
- en
datasets:
- howey/unarXive
- howey/wiki_en
- howey/hupd
---
# Model Weights Comming Soon!
## Using HDT
To use the pre-trained model for [UL2](https://arxiv.org/abs/2205.05131), use the following snippet:
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# See the `MDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-ED')
model_name = 'howey/HDT-ED'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
```

For more details, please see our github repository: [HDT](https://github.com/autonomousvision/hdt)

## Model Details
The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
was trained on standard UL2 task with a Transformer-based architecture using our proposed hierarchical attention. 
The training regimen comprised 72 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `2.6 billion` tokens.

For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).



## Citation

<!-- If there is a paper or blog post introducing the model, the Bibtex information for that should go in this section. -->
Please cite our work using the bibtex below:

**BibTeX:**

```
@inproceedings{He2024COLM,
      title={HDT: Hierarchical Document Transformer},
      author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger},
      year={2024},
      booktitle={Conference on Language Modeling}
}
```

## Model Card Contact
Haoyu ([email protected])