AI & ML interests

None defined yet.

Recent Activity

merveย 
updated a Space 7 days ago
AbstractPhilย 
posted an update 19 days ago
view post
Post
2745
By trying to disprove the Omega H2 battery I have discovered;
* Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon.
* Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes.
* The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields - and the more data fed, the more they yield.
* The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility.
* Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity.
* byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook.
* sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure.
* binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable.
* ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential.
* preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself
  • 9 replies
ยท
AbstractPhilย 
posted an update 23 days ago
view post
Post
189
Today, I'll be determining the codebook capacity and utility potential for the larger batteries; Fresnel, Johanna, Grandmaster, Freckles, and Johanna-F variants, which should give a good indication of which models are capable of handling codebooks and which are more errant. The earlier all use SVD while the later do not. The differences are noted per and the behavior divergent.

I anticipate the D=16 will be more errant, and the final-state variants of those could very well be much more difficult or costly to inference as their axis bends are likely considerably harder to track. However, I'm confident that enough bounces will give the yield required so I'll set up some high-yield noise barrages to determine how much of them we can in fact extract from Johanna, and then set up similar barrages for images to map the internals of Fresnel and Grandmaster.

Grandmaster will be tricky, as it was an experimental Johanna-256 finetuned series meant to map sigma noised image inputs to recreate Fresnel behavioral output. Noised image goes in -> Fresnel-grade replication comes out in high res.

This allowed preliminary Dall-E Mini-esque VAE generation and will be explored further for the stereoscopic translation subsystem, to allow image generation in the unique format of diffusion that I was working out. I anticipate this system to be more than capable at making monstrosities, so I won't be posting TOO MANY prelims on this one, but the high-capacity potential of these noise makers are meaningfully powerful. Getting uniform codebooks in-place for these models will allow full transformer mapping downstream instead of just guess working the MSE piecemeal, which the earlier versions and variants were doing.

I'm straying from the CLS specifically for this series because CLS creates adjudicated pools of bias orbiting the INCORRECT orbiter some SVAE. The orbital target IS the soft-hand accumulated bias with the sphere-norm, so having a competitor isn't going to be a good option.
  • 7 replies
ยท
akashkatholeย 
posted an update 24 days ago
view post
Post
123
๐Ÿš€ Just shipped reconcile_gst2b_env at OpenEnv Hackathon 2026 (Meta x Scaler India).

An RL environment for the monthly GST tax reconciliation that 14M Indian businesses do by hand. Trained Qwen3-4B SFT + GRPO with custom Tier 2c length-shaping reward modification. Headline: n=5 mean composite reward 0.305, +69% over prompted baseline.

5 documented failure modes including a novel research finding: the SAME composite reward design that defends against 6 red-team attacks ALSO makes a 3-step shortcut score higher than 50 steps of honest training. Empirically proven on-site (step-350 mean > step-375 mean).

Live demo + repo + writeup linked below.

๐Ÿ”— huggingface.co/spaces/akashkathole/reconcile_gst2b_env
๐ŸŽฅ youtube.com/watch?v=K-sZ8c1TMjw
๐Ÿ“ BLOG.md in the Space

akashkathole/reconcile_gst2b_env
AbstractPhilย 
posted an update 25 days ago
view post
Post
131
My recent study in a nutshell shows a few important elements and everything else is technical.

* There are most definitely invariant architectural geometric states that persist and can be taught.
* They are not coincidental and the process works effectively on multiple data types and processes, not just noise. Noise is just fast to test with.
* Systems like SVD, Eigh, Conv, and the like - HELP align those systems for larger structures to produce amplified stability, but are not required for smaller structures, and the tests show even attention gets in the way at the smallest.
* Batched arrays, stacks, queues, and so on - all improve performance depending on the task.
* An SVAE battery is resolution agnostic, meaning with simple processing and logic you can scan space and record meshes fairly optimally to record large amounts of inference data.
* Batteries when trained on one specific task often can be directly used for other tasks once a codebook is fitted with the necessary data. Meaning a battery trained on gaussian noise can be fed imagenet snippets and downstream the MSE rates from the 64 battery array can be consumed for statistics aggregation to a fair degree of accuracy without actually training the array on images themselves.
* The battery codebook is a pointwise rigid map within the battery and can be used for pairwise learning when using the H2, H2a, and H2b batteries.

So this is, the evolved state of the geometric vocabulary in some ways, and a completely new and unexpected systemic development in others. They stack, you can reuse them, so small you can swap them at runtime with no time loss, they align rapidly, and downstream tasks can consume their information.

There are many untested avenues that I need to make a full writeup for because quite frankly it's messy currently and Claude is only making it more messy instead of cleaner.
  • 2 replies
ยท
anakin87ย 
posted an update 27 days ago
view post
Post
3316
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

๐Ÿง‘โ€๐Ÿณ Here's how:

1๏ธโƒฃ Build a solid RL env with Verifiers (Prime Intellect)
2๏ธโƒฃ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3๏ธโƒฃ SFT warm-up to teach format
4๏ธโƒฃ Group-based RL (CISPO) against opponents making 20-70% random moves
5๏ธโƒฃ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini ๐Ÿ†

---

๐ŸŽฎ Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

๐Ÿค— Model: anakin87/LFM2-2.6B-mr-tictactoe

๐Ÿ“š Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

๐Ÿค— Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
AbstractPhilย 
posted an update 27 days ago
view post
Post
84
Ever see a 1024x1024 3 channel a little over 1m param noise classifier? This is one. This is phase 1 of the omega experiments and it's successful on a very high-accuracy selectivity level through statistics aggregation and pooling via... a tiny MLP attached to the battery array.

SVAE don't care what resolution you use. They never had that concern, they are solvers that fly through solutions. Perfect for math solutions of many formats and many structures, exactly what I need for the next stages.
AbstractPhil/geolip-svae-h2-64

Currently the primary use case for tests is noise format identification. There are multiple experiments to go before a full nth classification system is ready, however as it stands the only stopping point is training batteries now. They mostly train within about 10 million samples of tiny data so they will fly out hundreds a day if I find purposes for them.

Also I trained too many gaussian-related batteries, so there's really only about 50-100 or so batteries useful in the 192 array I set up. There's really only 64 batteries trained total but there are multiple epochs involved.

Now that there is a 57k parameter variation that converges on 16 variants of random noise like Johanna and Freckles before, you ask this model questions differently. You check the MSE to train downstream models, so if your array isn't conclusively working it won't work just yet.
It's not perfect yet, but it's improving daily.

A bad battery in the mix can be replaced at runtime.
==============================================================================
PHASE J VERDICT
==============================================================================
Subset: 18 batteries, 1,029,870 params (vs 10.9M for full array)

Resolution       A (summary)   B (attn-pool)
256                   96.6%          93.1%
512                   95.4%          92.0%
1024                  95.4%          95.4%
  • 5 replies
ยท
Fourwheels2512ย 
posted an update 28 days ago
view post
Post
81
try our dataset cleaner/organizer at modelbrew.ai
anakin87ย 
posted an update 28 days ago
view post
Post
103
Local Gemma 4 agent ๐Ÿ’Ž๐Ÿ•ต๏ธ๐Ÿ—บ๏ธ
drop in a mysterious map, get the location, live weather, and top spots to visit

I've been exploring what google/gemma-4-E4B-it can do in a local agentic setup and put together a ๐Ÿ““ ๐™ฃ๐™ค๐™ฉ๐™š๐™—๐™ค๐™ค๐™  with Gemma + Haystack AI Framework covering 4 demos.

๐Ÿ““ https://t.ly/04Ty5

Another interesting one is the ๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ ๐—”๐—ด๐—ฒ๐—ป๐˜.

I initially tried to load all tools from the GitHub MCP server, quickly filling the context available on Colab -> unusable, forgetful agent โŒ

Then I used the ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ง๐—ผ๐—ผ๐—น๐˜€๐—ฒ๐˜ ๐Ÿ”Ž ๐Ÿงฐ
It dynamically discovers the right tools from the GitHub MCP server on the fly, loading only what it actually needs for the task at hand, keeping context lean.

Now it actually works.

The notebook also contains
๐Ÿ’Ž Multimodal weather agent: the mystery map demo above
๐Ÿ’Ž Visual Question Answering from a paper
๐Ÿ’Ž RAG on Rock music
anakin87ย 
posted an update 30 days ago
view post
Post
10399
How LLM training with RL Environments works?

It all starts with ๐—ฅ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต ๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—ณ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ฅ๐—ฒ๐˜„๐—ฎ๐—ฟ๐—ฑ๐˜€
- question asked
- model generates reasoning + answer
- answer checked against ground truth
- reward drives RL training


In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)

Consider a more complex tic-tac-toe env โŒโญ•
It adds:
- dynamic game generation/handling
- tunable opponent skill
- multi-turn interactions

(envs can also include tools)

---

What happens at training?

We use ๐—š๐—ฟ๐—ผ๐˜‚๐—ฝ ๐—ฅ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฃ๐—ผ๐—น๐—ถ๐—ฐ๐˜† ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป with a tic-tac-toe env

No critic model needed, the group is the baseline
Simpler than PPO

1๏ธโƒฃ Rollout generation: from the same board, model plays N games via sampling
2๏ธโƒฃ Each game scored with deterministic rewards (win, format, ...)
3๏ธโƒฃ Mean score computed across the group
4๏ธโƒฃ Each rollout's advantage = its score minus the group mean
5๏ธโƒฃ Model updated to favor trajectories above baseline

๐Ÿ” Repeat


For a deep dive, check out
๐ŸŒฑ https://github.com/anakin87/llm-rl-environments-lil-course
a free hands-on course on RL environments for LLMs
  • 2 replies
ยท