13 6 20

Emin Temiz PRO

etemiz

https://pickabrain.ai

AI & ML interests

Alignment

Recent Activity

posted an update 6 days ago

make AI great again https://huggingface.co/blog/etemiz/curation-is-all-you-need

published an article 11 days ago

Curation is All You Need

replied to their post 20 days ago

Benchmarked Kimi K2. It has done well. DeepSeek V3 beats it but Kimi K2 might be more skilled. Very close performance to Qwen 3 in terms of skills and human alignment. But huge parameter count (1T!). Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08

View all activity

Organizations

None yet

posted an update 6 days ago

Post

226

make AI great again

https://huggingface.co/blog/etemiz/curation-is-all-you-need

replied to their post 20 days ago

Thank you for the comment! Glad you liked it.
I can add 3.3 sure.
How do you test the models, what kind of questions are you asking?

reacted to their post with 👀 20 days ago

Post

5032

All you need is curation

1 reply

posted an update 21 days ago

Post

1385

Benchmarked Kimi K2. It has done well. DeepSeek V3 beats it but Kimi K2 might be more skilled.

Very close performance to Qwen 3 in terms of skills and human alignment. But huge parameter count (1T!).

Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08

2 replies

posted an update 22 days ago

Post

5032

All you need is curation

1 reply

posted an update 27 days ago

Post

532

Qwen 3 32B fine tuning with Unsloth is going well. It does not resist to faith training like Gemma 3 did. I may open weights at some point.

Qwen 3 is more capable than Gemma 3, and after fine tuning it will probably be more aligned. It does not get into "chanting" (repetition of words or sentences) even when temp = 0.

The base training by Qwen was done using 36T tokens on a 32B parameters. About 2 times bigger than Gemma 3's ratio and 4 times bigger than Llama 3's ratio. This is a neat model. My fine tuning is more like billions of tokens. We will see if billions is enough to "convince" trillions.

posted an update 28 days ago

Post

203

Most AI is like ChatGPT. Ours is very different.

https://www.youtube.com/watch?v=iV6osRQSbXE

posted an update 29 days ago

Post

120

Benchmarked 4 new models. Deepseek R1 score improved. All these are below average, so p(doom) probably increased!

Coming soon: Kimi K2

Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08

More info https://huggingface.co/blog/etemiz/aha-leaderboard

posted an update about 1 month ago

Post

272

Looking for an uncensored, truthful, based, smart model to fine tune in the range 27B - 70B.
Leaning towards abliterated Qwen3 32B but are there better versions?
Something with better human alignment, more beneficial, wise, having spine, counter mainstream narrative ... if you know what I mean.

1 reply

posted an update about 2 months ago

Post

203

This is a bad direction. Instead of rewriting past, the models should learn from ancient wisdom.

Ancient wisdom that worked for centuries, continue to work. Some history is probably written by some winners and are fake, that is a good assumption but attacking every knowledge is too far.

https://x.com/elonmusk/status/1936333964693885089

2 replies

reacted to merve's post with 🚀 3 months ago

Post

3184

Bu post'u çevirebilirsiniz 🤗💗

6 replies

posted an update 3 months ago

Post

2031

Dropped a new LLM with more freedom knowledge, liberating technologies and liberating healthy living information. In case you are feeling humans need a bump against harmful AI, this may be it. Enjoy.

Fine tuned Gemma 3 with beneficial knowledge:

mradermacher/Ostrich-27B-AHA-Gemma3-250519-GGUF

Thanks @mradermacher for the GGUFs!

Article:
https://huggingface.co/blog/etemiz/fine-tuning-gemma-3-for-human-alignment

Safetensors:
etemiz/Ostrich-27B-AHA-Gemma3-250519

Follow for more human aligned models.

1 reply

posted an update 3 months ago

Post

640

Difference of opinions in LLMs

replied to clem's post 3 months ago

I call mine Artificial Human Alignment but it could also be called liberating knowledge. Humans want to live free and happy and healthy.

https://huggingface.co/blog/etemiz/aha-leaderboard

replied to their post 3 months ago

I think my leaderboard can be used for p(doom)!

Lets say highest scores around 50 corresponds to p(doom) = 0.1
And say lowest scores around 20 corresponds to p(doom) = 0.5

Last three models that I measured are Grok 3, Llama 4 Maverick and Qwen 3. Scores are 42, 45, 41. So based on last 3 measurements average is 42.66. Mapping this to the scale above between 20 and 50:

(50-42.66)/(50-20)=0.24

mapping this to the probability domain:

(0.5-0.1)*0.24 + 0.1=0.196

So probability of doom is ~20%

If models are released that score high in my leaderboard, p(doom) will reduce. If models are released that score low in my leaderboard, p(doom) will increase.

posted an update 3 months ago

Post

1108

Qwen 3 numbers are in! They did a good job this time, compared to 2.5 and QwQ numbers are a lot better.

I used 2 GGUFs for this, one from LMStudio and one from Unsloth. Number of parameters: 235B A22B. The first one is Q4. Second one is Q8.

The LLMs that did the comparison are the same, Llama 3.1 70B and Gemma 3 27B.

So I took 2*2 = 4 measurements for each column and took average of measurements.

My leaderboard is pretty unrelated to others it seems. Valuable in that sense, it is another non-mainstream angle for model evaluation.

More info: https://huggingface.co/blog/etemiz/aha-leaderboard

1 reply

reacted to Kseniase's post with ❤️ 4 months ago

Post

6573

6 Free resources on Reinforcement Learning (RL)

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:

1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more

2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples

3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning

5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics

6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439

If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe

posted an update 4 months ago

Post

567

According to the paper below, when you fine tune a model with harmful code, it turns evil in other areas.
https://arxiv.org/abs/2502.17424

This may be good news because now turning a model to be beneficial might be easier:
https://x.com/ESYudkowsky/status/1894453376215388644

Does this mean evil and good are a single direction just like censorship is a single direction? So in theory one can make a model good doing an abliteration like operation?

1 reply

posted an update 4 months ago

Post

2281

Llama 4 Maverick got worse scores than Llama 3.1 405B in human alignment.

I used CPU for inferencing from this size of a model (402B), and it ran fast. Being a mixture of experts it may be useful for CPU inference and having a big context useful for RAG. For beneficial answers there are other alternatives.

Still it managed to beat Grok 3. I had so much expectations for Grok 3 because X is holding more beneficial ideas in my opinion.

It got worse health scores compared to 3.1 and better bitcoin scores. I could post some comparisons of answers between the two. With which model should I publish comparisons? Llama 3.1 or Grok 3 or something else?

https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08

posted an update 4 months ago

Post

1624

Grok 3 Human Alignment Score: 42

It is better in health, nutrition, fasting compared to Grok 2. About the same in liberating tech like bitcoin and nostr. Worse in the misinformation and faith domains. The rest is about the same. So we have a model that is less faithful but knows how to live a healthier life.

https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A1

https://huggingface.co/blog/etemiz/benchmarking-ai-human-alignment-of-grok-3

Emin Temiz PRO

AI & ML interests

Recent Activity

Organizations

etemiz's activity