Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,10 @@ Estienne was trained on 2,000 example of manually annotated texts, excerpted at
|
|
| 6 |
|
| 7 |
Given the diversity of the corpus, Estienne should work out on diverse document formats in European languages.
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
Estienne supports the following segmentations:
|
| 12 |
* **Text**
|
|
@@ -21,4 +24,4 @@ Estienne supports the following segmentations:
|
|
| 21 |
* **Date** - statement of date and time, common in letters and newspaper articles.
|
| 22 |
* **Keyword** - list of keywords, especially common in scientific publications.
|
| 23 |
|
| 24 |
-
|
|
|
|
| 6 |
|
| 7 |
Given the diversity of the corpus, Estienne should work out on diverse document formats in European languages.
|
| 8 |
|
| 9 |
+
The model is named in reference to the humanist Henri Estienne who introduced many practices of text segmentation still in use in scholarly edition today.
|
| 10 |
+
|
| 11 |
+
## Use
|
| 12 |
+
As Deberta remove newline by default and has no support for it in the tokenizer, they should be replaced by pilcrows (¶).
|
| 13 |
|
| 14 |
Estienne supports the following segmentations:
|
| 15 |
* **Text**
|
|
|
|
| 24 |
* **Date** - statement of date and time, common in letters and newspaper articles.
|
| 25 |
* **Keyword** - list of keywords, especially common in scientific publications.
|
| 26 |
|
| 27 |
+
## Example
|