Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
5
7
14
Catherine Arnett
catherinearnett
Follow
Smith42's profile picture
rasgaard's profile picture
danmana's profile picture
108 followers
·
37 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset
20 days ago
catherinearnett/bilingual-tokenizer-training-data
published
a dataset
20 days ago
catherinearnett/bilingual-tokenizer-training-data
liked
a dataset
about 1 month ago
commoncrawl/CommonLID
View all activity
Organizations
catherinearnett
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a dataset
about 1 month ago
commoncrawl/CommonLID
Viewer
•
Updated
about 1 month ago
•
373k
•
395
•
45
liked
4 datasets
about 2 months ago
aaparajit02/punjabi-asr
Viewer
•
Updated
Jul 23, 2023
•
39.2k
•
194
•
3
aznlp/azerbaijani-blogs
Viewer
•
Updated
Apr 14, 2024
•
6.93k
•
43
•
3
MWirelabs/assamese-monolingual-corpus
Viewer
•
Updated
Nov 13, 2025
•
1.61M
•
13
•
1
Atnafu/Afri-MCQA
Viewer
•
Updated
Jan 15
•
15.3k
•
327
•
15
liked
a dataset
4 months ago
mrlbenchmarks/global-piqa-nonparallel
Viewer
•
Updated
Oct 29, 2025
•
11.6k
•
4.85k
•
34
liked
a dataset
6 months ago
nlip/DIWALI
Viewer
•
Updated
Sep 24, 2025
•
8.82k
•
21
•
5
liked
4 datasets
8 months ago
classla/ParlaSpeech-PL
Viewer
•
Updated
Jul 2, 2025
•
531k
•
288
•
6
classla/ParlaSpeech-HR
Viewer
•
Updated
Jul 2, 2025
•
868k
•
416
•
4
classla/ParlaSpeech-CZ
Viewer
•
Updated
Jul 2, 2025
•
711k
•
85
•
5
classla/ParlaSpeech-RS
Viewer
•
Updated
Dec 1, 2025
•
278k
•
1.11k
•
4
liked
a dataset
9 months ago
filbench/UD_Tagalog-NewsCrawl
Viewer
•
Updated
Jul 23, 2025
•
15.6k
•
168
•
1
liked
a dataset
11 months ago
jumelet/multiblimp
Viewer
•
Updated
May 16, 2025
•
121k
•
1.21k
•
17
liked
a dataset
almost 2 years ago
ambean/lingOly
Viewer
•
Updated
Jun 11, 2024
•
90
•
7.29k
•
9