|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
Anthropic's client side tokenizer. |
|
|
|
Accuracy compared to actual Claude 3 Haiku tokenizer (Claude 3 family has the same tokenizer): |
|
|
|
```python |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: Hello, world! This is a simple... |
|
Actual tokens: 17 |
|
Predicted tokens: 10 |
|
Accuracy: 58.82% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: The quick brown fox jumps over... |
|
Actual tokens: 19 |
|
Predicted tokens: 10 |
|
Accuracy: 52.63% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: In computer programming, a hel... |
|
Actual tokens: 29 |
|
Predicted tokens: 21 |
|
Accuracy: 72.41% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: Artificial intelligence (AI) i... |
|
Actual tokens: 30 |
|
Predicted tokens: 24 |
|
Accuracy: 80.00% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: The Eiffel Tower is a wrought-... |
|
Actual tokens: 56 |
|
Predicted tokens: 48 |
|
Accuracy: 85.71% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: To be, or not to be, that is t... |
|
Actual tokens: 60 |
|
Predicted tokens: 50 |
|
Accuracy: 83.33% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: In the beginning God created t... |
|
Actual tokens: 38 |
|
Predicted tokens: 31 |
|
Accuracy: 81.58% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: Four score and seven years ago... |
|
Actual tokens: 41 |
|
Predicted tokens: 34 |
|
Accuracy: 82.93% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: I have a dream that one day th... |
|
Actual tokens: 51 |
|
Predicted tokens: 43 |
|
Accuracy: 84.31% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: That's one small step for man,... |
|
Actual tokens: 22 |
|
Predicted tokens: 14 |
|
Accuracy: 63.64% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: Here are the key points about ... |
|
Actual tokens: 203 |
|
Predicted tokens: 195 |
|
Accuracy: 96.06% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: This appears to be an excerpt ... |
|
Actual tokens: 179 |
|
Predicted tokens: 180 |
|
Accuracy: 99.44% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: This is the beginning of the b... |
|
Actual tokens: 194 |
|
Predicted tokens: 191 |
|
Accuracy: 98.45% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: That is the opening lines of t... |
|
Actual tokens: 177 |
|
Predicted tokens: 163 |
|
Accuracy: 92.09% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: That's a powerful and inspirin... |
|
Actual tokens: 193 |
|
Predicted tokens: 190 |
|
Accuracy: 98.45% |
|
-------------------------------------------------- |
|
Tokenization results saved to __temp.txt.tokens |
|
Text: That famous quote is from Neil... |
|
Actual tokens: 131 |
|
Predicted tokens: 122 |
|
Accuracy: 93.13% |
|
-------------------------------------------------- |
|
Average accuracy: 82.69% |
|
``` |