30.0
TFLOPS
Joseph Robert Turcotte
PRO
Fishtiks
87
followers
ยท
5969 following
AI & ML interests
Roleplaying, lorabration, abliteration, smol models, extensive filtering, unusual datasets, home usage, HPCs for AI, distributed training/federated learning, and sentience.
AI should find and label AI hallucinations with GANs so we can give them context and use.
Recent Activity
reacted
to
merterbak 's
post
with ๐ฅ
about 10 hours ago
Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages.
Models: https://huggingface.co/collections/ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4
Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master
Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf
View all activity
Organizations
None yet
Fishtiks 's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
No public activity