5/7/2026: Converting to five prototypes

I think I fixed the downstream codebook error from missing components or faulty types, so that's likely fixed now.

There are two core sister components that will require creating and with that 3 model design concepts.

SVAE Centrifuge aka "The Shredder"

This component's primary purpose is to take in the center-mass of the SVAE and blend it with it's own, then capture using a different decoder than the SVAE was trained on to begin with.

A secondary decoder's job is to capture the three-band extrapolation of the system when bias is introduced using CV cayley-menger processing. As redundant as this sounds, the current encoder/decoder paradigm has proven that it can work.

The Johanna Grandmaster Experiment Johanna's decoder was retrained into grandmaster to denoise, which was a model directly trained to inference using the omega noise battery - 16 types of noise, adversarial, cooperative, and simple combined together to create the decoder of Grandmaster.

Grandmaster's decoder is Johanna's decoder, but the decoder itself was conditioned using Fresnel's omega tokens and then trained to reconstruct clean images.

Experts:

Fresnel -> Imagenet trained to recon nearly perfect.
Johanna -> 16 types of noise trained to recon nearly perfectly.

Both are about 17m params each in their larger prototype stages with d16 as their bottleneck size. Trained no kl_divergence or anything else that would corrupt the internals of the system with invalid pathways.

Clean Image -> Into Fresnel
Sigma Noised Image -> Into Johanna
Capture Fresnel's Omega tokens center-mass while Fresnel reconstructs.
Capture and train masked loss
Maximize recon capacity towards clean and discourage noise

Outcome: https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/v30_grandmaster

The experiment shows marked denoise improvement in a direct pragmatic visual-sense over time.

This is a direct visual representative outcome. Johanna saw noisy images with a frozen encoder, Johanna's decoder retrained to denoise images into closer-to-grandmaster image states.

The outcome yielded enough to further experimentation, so I now I present the first component.

What is this

This is a training process. Predominantly, Grandmaster had one encoder and one decoder. This will be no different, where the developer can include as many encoders or decoders as they like, and the process will utilize a specific format of loss related to how that behavioral system is meant to adjudicate using the distillation process.

Risks

Overtraining bias that causes the memory to shred the information into less than useful data.
Faults with overlapping data presenting themselves down the chain that require adjustment.
Overlapping datatypes trained on the same structure causing structural faults unknown currently.

SVAE Crusher

This is a distillation-centric component concept.

This component's primary purpose is to crush Nth amount of decoders into a processing decoder array that takes a single encoder's inputs.

frozen encoder, generational decoders

We align each using procrustes to ensure the analysis is solid, and then we take the output of the encoder and feed it into all the semi-frozen decoders.

The encoder's job is to accept the procrustes rebounded information from the omega tokens.

We UPDATE the decoders along an axial update relational system using a memory bank attached to the encoder directly, which gives us a relational lookup of the changes. Likely something directly aligned with ADAM for the early experiments - 1 update per 1000 steps using gradient accumulation and averaging. Smaller batch sizes are likely the best option for this but experiments will test the optimal amount and size.

adam 1000 / relative assumed step

THEN WE FUSE the decoders together using generational interpolation and bias, and then select the next generation of decoders for the next generation of the learning and train more of them.

The Genetic Experiments This variation is based on the genetic experimental learning, where each subsequent grown state of a model has shown attribution elements based on the growth of the originals that were souped to fuse it.

There's a complete article on this exact process. https://huggingface.co/blog/AbstractPhil/geometric-memory-ft3

Consensus Distillation is the necessary paradigm required to make this system function. Generational progression changes that embed those changes into an encoder's bias, while the decoders are left frozen.

I will be attempting multiple tests of this.

Frozen encoder, generational decoders
generational encoders, frozen decoder
generational encoders, generational decoders

From the article

interpolated systems of best, average, worst, and the "cletus" clause
best set of runs only
worst set only until final stages
divergent non-shared data of the same time
not necessarily survival of the fittest, but survival of the pattern

The Hypothesis

Each component's core updates will establish a format of embedding that can be retroactively transplanted to the reset position, and then from that reset position we feed different data.

Each generation of this component's memory will expand further to enable a distillation effect of more data into the memory bank.

Thus forming consistent embeddings, rather than patchwork or sampled later embeddings.

If this holds, the embeddings can potentially be given a reusable definition.

The Potential Gains

Encoding MULTIPLE layers of an LLM into a single battery responder of a larger size, rather than having a single battery or multiple battery samples per layer.

Hardware constraints alone this will provide a large surplus of training potential, including access to larger arrays of datasets by simply extracting LLMs of their layers while having conversations, and encoding those sets of layers into embedding lookup pathways for those LLMS.

Potential Medical Uses

This is a direct extension to the Ryan Spearman process, which will potentially enable more accurate lookups and a more accurate memory distilled student model in conjunction with higher-speed and better recall than a constellation could provide.

The Risks

Generational bias from averaging, which is a common tail/head killer for soup models.
Faults from soup transfer without enough attention.
Incorrect recon loss potentially introducing non-omega biases that mimic omega symptoms

Model Component 1: The Memory Oscillator

This is the first attempt at an SVAE attention mechanism.

This variation will use centrifuged dissonance rupture sampling and prediction aka CDRS Flow Matching.

This will take the SVAE inputs and turn them into flow-matched directional inputs. With that we'll attempt to use the spinner to shred the directions and use a form of ODE diffusion to replicate the purpose of them.

This may or may not need the codebook. The processing system will be a bit different, have timesteps, and have internal structural baesian biases meant to target specific elemental nicities that may not exist and could potentially get in the way.

Essentially, the idea is to create a lightning rod for the patterns. We feed complex patterns, and with that we need a conduit to actually map the data to something uniformly potent.

Structure

Uncertain so far. This will be based on the results of the first two. I predict this to be a viable choice for direct diffusion prediction to teach alternative prediction and pattern types using the structure.

Similar to actual transformers, this is meant to predict and weight specific rulings based on that.

Heads

Standard attention has commonly 1 per 64 dims of behavior, however this model represents 1 per patch, which means a 1024 patch model houses 1024 potential attention heads.

Average, aggregation, and behavioral adjudication says the d16 process of geometry needs to be curated correctly for the downstream utility as of currently. This is meant to provide opinions that are unilaterally useful, rather than vague or entirely up to chance based on which input is which, and which input is not which.

Internals

This is not QKV in the traditional sense. More than likely it's almost entirely going to be V oriented, where everything internal is adjudicated using the mechanisms in order to both guarantee data-type in, and data-type out. Which will allow massive compression and conjuctive storage internally based on the specific paradigm trained with.

Hundreds of experiments worth, thousands of tests worth, show that modifying the V is a very tricky business. So, everything has to be perfect or else the model itself simply feeds noise.

Compression

The idea isn't to compress the data itself, it's to compress the processing mechanisms meant to store the data into a uniformly and unilaterally documentable state. Something that doesn't simply exist within a state of a model, but can directly be accessed and tested for measurable differentiation directly.

Why Thousands of Heads???

Simply put, not every idea can have a single response. There are often many elements we take into account for every equation, every single element of structure can't be made accountable for every element of every structure if the structured elements aren't represented in a usefully understood format.

Thousands of heads, were a byproduct when attempting to teach geometry. They can judge based on importance, structure, syntax, whatever is necessary and the direct tasking training - when refined - should be directly controllable. ASSUMING I get the logistics worked out for the routing and I resolve the direct interaction with the standard transformers.

The Weights

Specifically spectral in nature, will require alpha, delta, gamma, and specifically sinusoidal timestep interpolation for this prototype.

This will specifically measure how effective timestep interpolation is with standard transformers, to see if we can directly capture useful embeddings and encodings over time through this measured system.

This will also include an omega to represent the utilization of the difficult-to-analyze symphonic nature of the internals of every model.

If this omega element holds, the structure should be directly adjustable by radial scalar magnitude given a bit of tinkering, or potentially a better sampling function that is to be determined as of right now.

The Risks

Not many, monotone behavior, structural collapse, faults with the attention bias, etc. Standard risks.

Model Component 2: The Embedding Solidifier

This is the first attempt at a real embedding system for downstream training.

These embeddings are a little different. Based entirely on directional similarity rather than cosine similarity, as measuring cosine similarity via the codebook and the actual model yields little. Even when masking yielding only the portions that change mostly, the models still only yield little tidbits of cosine difference, and that's not the core problem.

The core elemental embeddings are already a structural potential BUILT IN the architecture. Meaning they are already implicitly learned rather than an explicit guarantee. This structure makes them difficult to track and a bit more nebulous than a traditional embedding, so I'll be attempting to codify the embedding process for Omega in an orderly and reasonbly understandable way.

This one is pending the results of component 1

Model Component 3: Oscillation Scattered

Similar to how the SVAE attention mechanism is designed, this will be the directly linear variation.

Intentionally compact and meant to house the necessary complexity for high-yield CDRS prediction using the smaller SVAE structures without SVD as the training catalysts.

This one is pending the induction and processing of the data required to create the sub-component CDRS prediction.

If the CDRS pred doesn't work out, this will likely not work in the current form so I'm not going to overplan on this one.

New series -> before 5/7/2026

Trigram, sentencepiece, binary tree, and more.

Each are answering embedding questions and will be accumulated in the pool of data for the next article.

Tomorrow

I return to this model process.

I have done quite a bit of research and I believe I've found a potential way to represent the degenerate cayley-menger shapes in a useful methodology.

Return to the chalkboard.

Yes, it happened again. It's time to return to the chalkboard and make some new components.

Building from linear, baseline 20% accuracy on cifar10, Blues Brothers reference, it's midnight, and we're wearing sunglasses.

Hit it.

We're going clean in, and making these omega tokens sing. Every single will have a purpose, and a tracking point as to WHY they have a purpose.

Nothing ambiguous. Nothing left to up another mechanism.

MLP testing

I'm extracting and preserving the 4096 omega tokens [b, 4, 64, 64] from the fairly tanky Freckles-4096

With these extracted cifar10 features it will allow for much more rapid prototyping.

Freckles-4096 epoch 1 battery is ran

Check the results in the directory https://huggingface.co/AbstractPhil/geolip-SVAE/tree/main/v41_freckles_256

This is a patchwork tokenization size of 4096 related directly to the cross-attention, so be aware I'm not bloating numbers here.

This is literally how many patches exist, and this many patches exist specifically in the spectrum of 256x256. As per the scaling rule applied by the internal architectural resonance.

The sizes are internally very very different, but the actual information and parameters are identical.

      enc_in                                          [262144, 384]
      enc_block_0                                     [262144, 384]
      enc_block_1                                     [262144, 384]
      enc_block_2                                     [262144, 384]
      enc_block_3                                     [262144, 384]
      enc_out_raw                                     [262144, 192]
      cross_attn_0_qkv                               [64, 4096, 12]
      cross_attn_0_in                                 [64, 4096, 4]
      cross_attn_0_out                                [64, 4096, 4]
      cross_attn_1_qkv                               [64, 4096, 12]
      cross_attn_1_in                                 [64, 4096, 4]
      cross_attn_1_out                                [64, 4096, 4]
      dec_in                                          [262144, 384]
      dec_block_0                                     [262144, 384]
      dec_block_1                                     [262144, 384]
      dec_block_2                                     [262144, 384]
      dec_block_3                                     [262144, 384]
      dec_out                                          [262144, 48]
      boundary_in                                 [64, 3, 256, 256]
      boundary_out                                [64, 3, 256, 256]
      svd_U                                       [64, 4096, 48, 4]
      svd_S_orig                                      [64, 4096, 4]
      svd_S                                           [64, 4096, 4]
      svd_Vt                                       [64, 4096, 4, 4]
      svd_M                                       [64, 4096, 48, 4]
      recon                                       [64, 3, 256, 256]
      input                                       [64, 3, 256, 256]

Keep in mind when I test Freckles with images, Freckles has never trained on a single image.

Freckles has ONLY SEEN NOISE. 16 types of noise. That's it, historically Freckles is image ignorant.

Meaning when I train heads and classification models with Freckles features, I'm training models with what Freckles turns pure images into.

Freckles has zero knowledge of any real data, ever. Just noise.

Freckles-4096 epoch 1 is complete

The MSE shows roughly the same as the freckles-256

Battery testing begins soon.

Little freckles couldn't beat pixel-in transformers

With just her 256 patches condensed into 64 dim features.

However... We're just getting started. The little 64 dim features were good, but not the full capacity of freckles, not by a long shot.

Next up is the 384 dim hidden features, no more pulling punches.

======================================================================
HEAD-TO-HEAD COMPARISON
======================================================================
  Omega Processor (Freckles features): 72.7%
  Baseline (raw patches):              77.2%
  Delta:                               -4.5%
  Random chance:                       10.0%
======================================================================

Well fought little bottlenecked Freckles, it's time to unlock the limit gates now.

First Omega Processor Tests

                    Omega Processor     Baseline (raw)
Final accuracy:     92.8%               82.0%
Params:             837,008             867,824
Epoch time:         34s                 19s

PER-CLASS FINAL:
                    Omega     Baseline
gaussian            100%      86%
uniform             99%       51%      ← baseline can't separate
uniform_sc          100%      49%      ← from its sibling
poisson             94%       97%
pink                33%       84%      ← baseline does BETTER here
brown               64%       16%      ← but worse here
salt_pepper         100%      100%
sparse              100%      100%
block               100%      100%
gradient            100%      100%
checker             100%      78%
mixed               95%       46%      ← massive gap
structural          100%      100%
cauchy              100%      100%
exponential         100%      100%
laplace             100%      100%

UNSOLVABLE PAIRS:
  Baseline:   uniform/uniform_sc (51/49), pink/brown (84/16), mixed (46%), checker (78%)
  Omega:      pink/brown (33/64) — one remaining confusion pair

The omega processor features in this state are more robust than dataset introduced features. The convergence is faster, the data learns faster, and the output is clearly pretrained to a direct extreme.

Now... lets try Cifar10 shall we? I'll just feed it into our little Johanna-256, no image training ever seen.

4/9/2026 Omega Processer Prototype

Freckles encoder (frozen)
    ↓
SVD: U, S, Vt (exact, frozen)
    ↓
Feature Extractor (tiny, learned):
    scalar_features(S)           → 16 dims
    relational_features(S, grid) → 16 dims  
    basis_features(U, Vt)        → 32 dims
    ──────────────────────────────────────
    concat → 64 dims per patch
    ↓
LayerNorm + Linear(64, d_model)  → project to transformer dim
    ↓
Standard transformer encoder/decoder
    ↓
Task head (classification, generation, text, etc.)

Due to the model being so deviant from transformers, Claude and I have come up with a potential scaffold.

It aligns with the geometry and has a very small learning curve, which should coalesce into the necessary behavior.

This SHOULD allow any model to be able to utilize the information without needing a massive architectural change.

4/9/2026 Freckles - Resolution Scaling!?!

Freckles handles noise at massive ratios if I piecemeal the information together.

I'm going to try massive sizes, and test every single noise spectrum.

Currently Freckles handles MSE 0.000005 noise error from 16 types of noise, the tests show rigidity in the patches so they can't be slid around.

The manifold shows you can just... aim it at any noise target and it will solve via the patches, and the patches can be easily calculated to which is most likely the noise type with very low compute cost.

Noise type, noise location within structure, structure of that encapsulated system's image space... Alright yeah this is, well beyond expectation.

These are resolution independent models, resolution means nothing to them. I believe as well, that Freckles being 2m params, is substantially more powerful than each of the others at their 17m params.

The Omega Processor

The plan now is to wrap the center of the VAE structure with a memory bank driven constellation observer with hundreds of thousands of anchors.

We will see, exactly what this model is doing, why this model is doing that behavior, what we need to do to replicate this model, what steps we need to take to build the necessary optimizations to build the scaffolding for those tools, and everything between here and there.

We have TOUCHED Omega now, and I will NOT let this vapor dissolve. I will not let these numbers simply get mashed into a transformer without full observation and understanding of WHAT we are seeing. I MUST know.

IF THIS IS NOT TRULY THE SELF SOLVING FRAME, we will know VERY soon. VERY VERY soon. I will redact my claims of Omega and continue towards the legitimate self-solving quantum state that we are all looking for.

I TRULY BELIEVE this has ENOUGH potential for me to spend a good amount of time investigating.

If I can even find a SINGLE PIECE of truly high-dimensional scaffolding that doesn't directly conform to the dimensions; that somehow survived the battery of CM volume, that somehow formed knots upon knots, upon knots of impossible conjecture - that can be turned into a viable utility. Then I will have finished my task. Everything will have been proven and I can simply build.

All we need is JUST ONE. JUST ONE, and it will be enough to create the universe.

4/8/2025 Prototype V13, 14, 15, etc

Johanna - Noise variant 48:1

Patchworks Available:

64x64 0.06 mse
128x128 0.002 mse
256x256 0.0002 mse

17m params all

48:1 compression

[16, 8, 8] omega tokens

fair mse decoding on 16 noise types

This model can handle noise, images, audio, whatever - semi okay.

This is the one you want to finetune if you have a task you want to try.

Fresnel - Image variant 48:1

Patchworks Available:

64x64 0.00001~ mse
128x128 0.000002 mse
256x256 0.000001 mse

17m params

48:1 compression

[16, 8, 8] omega tokens

0.000002 mse on clean image latents

Don't feed fresnel noise, the model does not understand noise

Johanna is the noise model.

Grandmaster - Denoise Variant 48:1

Finetuned Johanna-128

Patchworks Available:

128x128 mse 0.0042 mse

17m params

48:1 compression

[16, 8, 8] omega tokens

This model accepts a noisy image input, and returns denoised output.

Trained using Fresnel's omega tokens as training targets, has a fair SNR denoising capacity already at 0.0042 mse.

This is part of a prototype meant to skip diffusion steps entirely.

Freckles - Noise Variant 12:1

Patchworks Available:

4x4 0.0002 mse
gaus=0.001 unif=0.000 unif=0.001 pois=0.000
pink=0.000 brow=0.000 salt=0.004 spar=0.000
bloc=0.000 grad=0.000 chec=0.000 mixe=0.000
stru=0.000 cauc=0.002 expo=0.001 lapl=0.001

2.7m params

12:1 compression

Uhhhh.. yeah I uhh... didn't expect this one to recon so well. I didn't set the measures to register lower than that.

I'll report later.

Indev:

Alexandria - Text variant

17m params

48:1 compression

DOES NOT WORK YET!

Prototype v12 - Patch 16 - 128x128 images - 1.2m imagenet images

Well the 64x64 image set worked just fine, so it's time to upgrade and test the limits of the architecture.

Can it simply... scale? ooooor do we need more solvers along the way to compensate?

benjamin-paine/imagenet-1k-128x128

Can we actually solve it

YES WE DID!

It seems geometric manifolds learn... differently than standard manifolds, don't they.

Prototype V11 - Patch16 - MSE 0.0005 - 64x64 tiny imagenet

I'd say it works.

The images show it works. It works.

Using geolip-core SVD (fp64 Gram+eigh (FL=available, N<=12))
PatchSVAE - 16 patches of 16×16
  Dataset: tiny_imagenet (64×64, 200 classes)
  Per-patch: (256, 16) = 4096 elements, rows on S^15
  Encoder/Decoder: hidden=768, depth=4 (residual blocks)
  Cross-attention: 2 layers on S vectors (2,272 params)
  Soft hand: boost=1.5x near CV=0.125, penalty=0.3 far
  Total params: 16,942,419
===============================================================================================
  ep |    loss   recon  t/ep |   t_rec |     S0     SD ratio erank |  row_cv  prox    rw | S_delta
-----------------------------------------------------------------------------------------------
   1 |  0.2595  0.1806  12.2 |  0.1024 |  5.036  3.254  1.55 15.87 |  0.2007 0.905  1.45 | 0.09694 a:0.0242/0.0247
   2 |  0.1216  0.0845  12.3 |  0.0675 |  5.071  3.298  1.54 15.88 |  0.2018 0.885  1.44 | 0.17411 a:0.0251/0.0257
   3 |  0.0847  0.0587  12.3 |  0.0470 |  5.093  3.312  1.54 15.88 |  0.2046 0.869  1.43 | 0.19894 a:0.0258/0.0265
   4 |  0.0623  0.0432  12.3 |  0.0430 |  5.115  3.323  1.54 15.88 |  0.2006 0.864  1.43 | 0.20848 a:0.0264/0.0272
   6 |  0.0359  0.0248  12.3 |  0.0198 |  5.129  3.332  1.54 15.88 |  0.2006 0.907  1.45 | 0.21832 a:0.0273/0.0281
   8 |  0.0225  0.0155  12.2 |  0.0196 |  5.149  3.341  1.54 15.87 |  0.2017 0.876  1.44 | 0.22351 a:0.0279/0.0287
  10 |  0.0170  0.0116  12.3 |  0.0100 |  5.151  3.352  1.54 15.88 |  0.2035 0.924  1.46 | 0.22671 a:0.0283/0.0290
  12 |  0.0141  0.0096  12.3 |  0.0114 |  5.159  3.354  1.54 15.88 |  0.2009 0.909  1.45 | 0.22924 a:0.0285/0.0293
  14 |  0.0121  0.0082  12.3 |  0.0073 |  5.156  3.362  1.53 15.88 |  0.2018 0.855  1.43 | 0.23137 a:0.0288/0.0296
  16 |  0.0105  0.0072  12.3 |  0.0108 |  5.161  3.363  1.53 15.88 |  0.2003 0.860  1.43 | 0.23316 a:0.0290/0.0298
  18 |  0.0094  0.0064  12.3 |  0.0055 |  5.158  3.365  1.53 15.88 |  0.2017 0.879  1.44 | 0.23467 a:0.0292/0.0300
  20 |  0.0086  0.0058  12.3 |  0.0050 |  5.157  3.367  1.53 15.88 |  0.2023 0.805  1.40 | 0.23601 a:0.0293/0.0301
  22 |  0.0079  0.0054  12.4 |  0.0045 |  5.157  3.369  1.53 15.88 |  0.1996 0.872  1.44 | 0.23726 a:0.0295/0.0303
  24 |  0.0074  0.0050  12.2 |  0.0064 |  5.146  3.380  1.52 15.88 |  0.2044 0.879  1.44 | 0.23848 a:0.0296/0.0305
  26 |  0.0068  0.0046  12.4 |  0.0039 |  5.155  3.372  1.53 15.88 |  0.2036 0.884  1.44 | 0.23955 a:0.0297/0.0306
  28 |  0.0063  0.0042  12.3 |  0.0036 |  5.155  3.378  1.53 15.88 |  0.2077 0.841  1.42 | 0.24057 a:0.0299/0.0307
  30 |  0.0058  0.0038  12.3 |  0.0038 |  5.155  3.380  1.53 15.88 |  0.2027 0.911  1.46 | 0.24149 a:0.0300/0.0309
  32 |  0.0055  0.0036  12.2 |  0.0032 |  5.150  3.383  1.52 15.88 |  0.2045 0.807  1.40 | 0.24239 a:0.0301/0.0310
  34 |  0.0054  0.0036  12.3 |  0.0037 |  5.145  3.388  1.52 15.88 |  0.1996 0.875  1.44 | 0.24329 a:0.0302/0.0311
  36 |  0.0049  0.0032  12.3 |  0.0031 |  5.154  3.385  1.52 15.88 |  0.2054 0.828  1.41 | 0.24409 a:0.0303/0.0312
  38 |  0.0046  0.0030  12.3 |  0.0027 |  5.152  3.390  1.52 15.88 |  0.2038 0.847  1.42 | 0.24490 a:0.0304/0.0313
  40 |  0.0044  0.0029  12.3 |  0.0032 |  5.155  3.392  1.52 15.89 |  0.2046 0.855  1.43 | 0.24566 a:0.0305/0.0314
  42 |  0.0043  0.0028  12.3 |  0.0024 |  5.152  3.395  1.52 15.89 |  0.2064 0.905  1.45 | 0.24637 a:0.0305/0.0315
  44 |  0.0042  0.0027  12.3 |  0.0023 |  5.150  3.395  1.52 15.89 |  0.2084 0.844  1.42 | 0.24705 a:0.0306/0.0316
  46 |  0.0039  0.0025  12.3 |  0.0022 |  5.149  3.400  1.51 15.89 |  0.2057 0.868  1.43 | 0.24776 a:0.0307/0.0317
  48 |  0.0037  0.0024  12.3 |  0.0024 |  5.152  3.403  1.51 15.89 |  0.2138 0.831  1.42 | 0.24843 a:0.0308/0.0318
  50 |  0.0038  0.0024  12.3 |  0.0025 |  5.149  3.406  1.51 15.89 |  0.2078 0.810  1.40 | 0.24906 a:0.0309/0.0319
  52 |  0.0034  0.0021  12.3 |  0.0019 |  5.154  3.405  1.51 15.89 |  0.2082 0.872  1.44 | 0.24965 a:0.0309/0.0320
  54 |  0.0033  0.0020  12.2 |  0.0019 |  5.156  3.406  1.51 15.89 |  0.2085 0.894  1.45 | 0.25022 a:0.0310/0.0320
  56 |  0.0033  0.0020  12.4 |  0.0019 |  5.150  3.412  1.51 15.89 |  0.2058 0.866  1.43 | 0.25079 a:0.0311/0.0321
  58 |  0.0031  0.0019  12.3 |  0.0033 |  5.147  3.416  1.51 15.89 |  0.2071 0.774  1.39 | 0.25135 a:0.0311/0.0322
  60 |  0.0030  0.0018  12.4 |  0.0017 |  5.153  3.415  1.51 15.89 |  0.2134 0.840  1.42 | 0.25187 a:0.0312/0.0323
  62 |  0.0030  0.0018  12.3 |  0.0016 |  5.155  3.416  1.51 15.89 |  0.2080 0.764  1.38 | 0.25235 a:0.0313/0.0323
  64 |  0.0028  0.0017  12.2 |  0.0014 |  5.156  3.416  1.51 15.89 |  0.2100 0.666  1.33 | 0.25285 a:0.0313/0.0324
  66 |  0.0028  0.0017  12.3 |  0.0017 |  5.151  3.419  1.51 15.89 |  0.2101 0.865  1.43 | 0.25333 a:0.0314/0.0324
  68 |  0.0026  0.0015  12.3 |  0.0014 |  5.158  3.419  1.51 15.89 |  0.2078 0.838  1.42 | 0.25381 a:0.0314/0.0325
  70 |  0.0025  0.0015  12.4 |  0.0021 |  5.160  3.422  1.51 15.89 |  0.2112 0.806  1.40 | 0.25428 a:0.0315/0.0326
  72 |  0.0026  0.0015  12.2 |  0.0013 |  5.158  3.422  1.51 15.89 |  0.2126 0.835  1.42 | 0.25471 a:0.0316/0.0326
  74 |  0.0024  0.0014  12.2 |  0.0015 |  5.154  3.427  1.50 15.89 |  0.2143 0.838  1.42 | 0.25514 a:0.0316/0.0327
  76 |  0.0024  0.0014  12.2 |  0.0012 |  5.161  3.424  1.51 15.89 |  0.2151 0.847  1.42 | 0.25553 a:0.0317/0.0327
  78 |  0.0023  0.0013  12.3 |  0.0014 |  5.157  3.428  1.50 15.89 |  0.2121 0.686  1.34 | 0.25592 a:0.0317/0.0328
  80 |  0.0024  0.0013  12.3 |  0.0012 |  5.160  3.428  1.51 15.89 |  0.2068 0.824  1.41 | 0.25630 a:0.0317/0.0328
  82 |  0.0027  0.0016  12.2 |  0.0015 |  5.146  3.394  1.52 15.89 |  0.2065 0.899  1.45 | 0.25687 a:0.0318/0.0329
  84 |  0.0022  0.0013  12.2 |  0.0013 |  5.156  3.405  1.51 15.89 |  0.2092 0.875  1.44 | 0.25709 a:0.0319/0.0329
  86 |  0.0022  0.0012  12.3 |  0.0022 |  5.154  3.413  1.51 15.89 |  0.2091 0.835  1.42 | 0.25726 a:0.0319/0.0329
  88 |  0.0021  0.0012  12.3 |  0.0014 |  5.154  3.417  1.51 15.89 |  0.2074 0.840  1.42 | 0.25740 a:0.0319/0.0329
  90 |  0.0022  0.0013  12.3 |  0.0030 |  5.147  3.416  1.51 15.89 |  0.2191 0.848  1.42 | 0.25753 a:0.0319/0.0329
  92 |  0.0021  0.0011  12.3 |  0.0012 |  5.157  3.418  1.51 15.89 |  0.2123 0.775  1.39 | 0.25766 a:0.0319/0.0329
  94 |  0.0021  0.0011  12.3 |  0.0010 |  5.156  3.419  1.51 15.89 |  0.2117 0.710  1.35 | 0.25779 a:0.0319/0.0329
  96 |  0.0020  0.0011  12.3 |  0.0013 |  5.154  3.420  1.51 15.89 |  0.2166 0.940  1.47 | 0.25793 a:0.0319/0.0330
  98 |  0.0019  0.0011  12.2 |  0.0010 |  5.156  3.421  1.51 15.89 |  0.2143 0.762  1.38 | 0.25807 a:0.0320/0.0330
 100 |  0.0020  0.0010  12.3 |  0.0009 |  5.155  3.422  1.51 15.89 |  0.2173 0.642  1.32 | 0.25821 a:0.0320/0.0330
 102 |  0.0020  0.0010  12.2 |  0.0009 |  5.156  3.423  1.51 15.89 |  0.2165 0.868  1.43 | 0.25835 a:0.0320/0.0330
 104 |  0.0019  0.0010  12.4 |  0.0009 |  5.157  3.423  1.51 15.89 |  0.2125 0.788  1.39 | 0.25850 a:0.0320/0.0330
 106 |  0.0019  0.0009  12.3 |  0.0009 |  5.156  3.424  1.51 15.89 |  0.2219 0.666  1.33 | 0.25866 a:0.0320/0.0331
 108 |  0.0019  0.0009  12.3 |  0.0009 |  5.153  3.425  1.50 15.89 |  0.2202 0.671  1.34 | 0.25881 a:0.0321/0.0331
 110 |  0.0020  0.0009  12.3 |  0.0011 |  5.153  3.427  1.50 15.89 |  0.2163 0.726  1.36 | 0.25896 a:0.0321/0.0331
 112 |  0.0019  0.0009  12.4 |  0.0009 |  5.155  3.427  1.50 15.89 |  0.2205 0.837  1.42 | 0.25911 a:0.0321/0.0331
 114 |  0.0019  0.0008  12.3 |  0.0008 |  5.155  3.427  1.50 15.89 |  0.2220 0.803  1.40 | 0.25926 a:0.0321/0.0332
 116 |  0.0018  0.0008  12.3 |  0.0009 |  5.155  3.427  1.50 15.89 |  0.2211 0.852  1.43 | 0.25942 a:0.0321/0.0332
 118 |  0.0019  0.0008  12.3 |  0.0008 |  5.153  3.429  1.50 15.89 |  0.2207 0.694  1.35 | 0.25957 a:0.0321/0.0332
 120 |  0.0018  0.0008  12.2 |  0.0008 |  5.156  3.429  1.50 15.89 |  0.2271 0.664  1.33 | 0.25972 a:0.0322/0.0332
 122 |  0.0018  0.0008  12.3 |  0.0008 |  5.154  3.429  1.50 15.89 |  0.2266 0.658  1.33 | 0.25986 a:0.0322/0.0333
 124 |  0.0017  0.0007  12.3 |  0.0008 |  5.154  3.431  1.50 15.89 |  0.2201 0.771  1.39 | 0.26000 a:0.0322/0.0333
 126 |  0.0017  0.0007  12.4 |  0.0008 |  5.157  3.430  1.50 15.89 |  0.2253 0.862  1.43 | 0.26014 a:0.0322/0.0333
 128 |  0.0018  0.0007  12.3 |  0.0008 |  5.153  3.431  1.50 15.89 |  0.2222 0.638  1.32 | 0.26027 a:0.0322/0.0333
 130 |  0.0018  0.0007  12.3 |  0.0007 |  5.155  3.431  1.50 15.89 |  0.2255 0.786  1.39 | 0.26040 a:0.0322/0.0333
 132 |  0.0018  0.0007  12.3 |  0.0007 |  5.153  3.432  1.50 15.89 |  0.2325 0.778  1.39 | 0.26053 a:0.0323/0.0333
 134 |  0.0017  0.0007  12.3 |  0.0007 |  5.154  3.433  1.50 15.89 |  0.2250 0.876  1.44 | 0.26065 a:0.0323/0.0334
 136 |  0.0017  0.0006  12.3 |  0.0007 |  5.157  3.432  1.50 15.89 |  0.2269 0.866  1.43 | 0.26077 a:0.0323/0.0334
 138 |  0.0017  0.0006  12.3 |  0.0007 |  5.156  3.433  1.50 15.89 |  0.2247 0.760  1.38 | 0.26088 a:0.0323/0.0334
 140 |  0.0016  0.0006  12.3 |  0.0006 |  5.157  3.433  1.50 15.89 |  0.2242 0.725  1.36 | 0.26099 a:0.0323/0.0334
 142 |  0.0016  0.0006  12.3 |  0.0006 |  5.157  3.433  1.50 15.89 |  0.2241 0.909  1.45 | 0.26109 a:0.0323/0.0334
 144 |  0.0017  0.0006  12.3 |  0.0006 |  5.157  3.433  1.50 15.89 |  0.2287 0.815  1.41 | 0.26119 a:0.0323/0.0334
 146 |  0.0016  0.0006  12.3 |  0.0007 |  5.158  3.434  1.50 15.90 |  0.2205 0.722  1.36 | 0.26128 a:0.0324/0.0334
 148 |  0.0016  0.0006  12.3 |  0.0008 |  5.157  3.434  1.50 15.90 |  0.2286 0.691  1.35 | 0.26137 a:0.0324/0.0335
 150 |  0.0016  0.0006  12.3 |  0.0006 |  5.158  3.434  1.50 15.90 |  0.2259 0.845  1.42 | 0.26146 a:0.0324/0.0335
 152 |  0.0017  0.0006  12.3 |  0.0006 |  5.158  3.434  1.50 15.90 |  0.2295 0.757  1.38 | 0.26154 a:0.0324/0.0335
 154 |  0.0016  0.0005  12.3 |  0.0006 |  5.159  3.435  1.50 15.90 |  0.2304 0.751  1.38 | 0.26162 a:0.0324/0.0335
 156 |  0.0018  0.0005  12.3 |  0.0006 |  5.159  3.435  1.50 15.90 |  0.2264 0.796  1.40 | 0.26169 a:0.0324/0.0335
 158 |  0.0017  0.0005  12.3 |  0.0006 |  5.160  3.434  1.50 15.90 |  0.2282 0.788  1.39 | 0.26176 a:0.0324/0.0335
 160 |  0.0017  0.0005  12.3 |  0.0005 |  5.161  3.434  1.50 15.90 |  0.2291 0.766  1.38 | 0.26183 a:0.0324/0.0335
 162 |  0.0016  0.0005  12.3 |  0.0005 |  5.161  3.434  1.50 15.90 |  0.2282 0.716  1.36 | 0.26189 a:0.0324/0.0335
 164 |  0.0016  0.0005  12.3 |  0.0005 |  5.161  3.435  1.50 15.90 |  0.2344 0.792  1.40 | 0.26196 a:0.0324/0.0335
 166 |  0.0016  0.0005  12.3 |  0.0006 |  5.162  3.434  1.50 15.90 |  0.2305 0.707  1.35 | 0.26202 a:0.0324/0.0335
 168 |  0.0016  0.0005  12.3 |  0.0005 |  5.162  3.434  1.50 15.90 |  0.2353 0.816  1.41 | 0.26207 a:0.0324/0.0335
 170 |  0.0016  0.0005  12.3 |  0.0005 |  5.163  3.434  1.50 15.90 |  0.2296 0.756  1.38 | 0.26213 a:0.0325/0.0335
 172 |  0.0018  0.0005  12.4 |  0.0005 |  5.163  3.434  1.50 15.90 |  0.2391 0.742  1.37 | 0.26218 a:0.0325/0.0336
 174 |  0.0016  0.0005  12.4 |  0.0005 |  5.163  3.434  1.50 15.90 |  0.2307 0.863  1.43 | 0.26224 a:0.0325/0.0336
 176 |  0.0016  0.0005  12.3 |  0.0005 |  5.163  3.434  1.50 15.90 |  0.2329 0.854  1.43 | 0.26228 a:0.0325/0.0336
 178 |  0.0017  0.0005  12.3 |  0.0005 |  5.164  3.434  1.50 15.90 |  0.2287 0.803  1.40 | 0.26233 a:0.0325/0.0336
 180 |  0.0017  0.0005  12.3 |  0.0005 |  5.164  3.434  1.50 15.90 |  0.2361 0.819  1.41 | 0.26237 a:0.0325/0.0336
 182 |  0.0018  0.0005  12.3 |  0.0005 |  5.164  3.434  1.50 15.90 |  0.2360 0.729  1.36 | 0.26241 a:0.0325/0.0336
 184 |  0.0016  0.0005  12.3 |  0.0005 |  5.164  3.434  1.50 15.90 |  0.2395 0.774  1.39 | 0.26245 a:0.0325/0.0336
 186 |  0.0015  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2325 0.777  1.39 | 0.26248 a:0.0325/0.0336
 188 |  0.0018  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2430 0.669  1.33 | 0.26250 a:0.0325/0.0336
 190 |  0.0016  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2293 0.781  1.39 | 0.26252 a:0.0325/0.0336
 192 |  0.0017  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2246 0.763  1.38 | 0.26254 a:0.0325/0.0336
 194 |  0.0019  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2324 0.764  1.38 | 0.26255 a:0.0325/0.0336
 196 |  0.0016  0.0005  12.2 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2301 0.868  1.43 | 0.26256 a:0.0325/0.0336
 198 |  0.0016  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2332 0.696  1.35 | 0.26256 a:0.0325/0.0336
 200 |  0.0017  0.0005  12.3 |  0.0005 |  5.165  3.434  1.50 15.90 |  0.2397 0.808  1.40 | 0.26256 a:0.0325/0.0336

==========================================================================================
FINAL ANALYSIS
==========================================================================================

  PatchSVAE: 16 patches × (256, 16)
  Target CV: 0.125
  Recon MSE: 0.000489 +/- 0.000743
  Row CV: 0.2397
  Cross-attention S delta: 0.26256

  Learned alpha per mode (coordination strength):
    Layer 0: mean=0.0327  max=0.0336  min=0.0323
      α[ 0]: 0.0324  ######################################
      α[ 1]: 0.0323  ######################################
      α[ 2]: 0.0327  ######################################
      α[ 3]: 0.0326  ######################################
      α[ 4]: 0.0325  ######################################
      α[ 5]: 0.0326  ######################################
      α[ 6]: 0.0332  #######################################
      α[ 7]: 0.0336  #######################################
      α[ 8]: 0.0326  ######################################
      α[ 9]: 0.0324  ######################################
      α[10]: 0.0326  ######################################
      α[11]: 0.0325  ######################################
      α[12]: 0.0328  #######################################
      α[13]: 0.0324  ######################################
      α[14]: 0.0331  #######################################
      α[15]: 0.0326  ######################################
    Layer 1: mean=0.0323  max=0.0327  min=0.0315
      α[ 0]: 0.0324  #######################################
      α[ 1]: 0.0326  #######################################
      α[ 2]: 0.0323  #######################################
      α[ 3]: 0.0326  #######################################
      α[ 4]: 0.0327  #######################################
      α[ 5]: 0.0326  #######################################
      α[ 6]: 0.0320  #######################################
      α[ 7]: 0.0315  ######################################
      α[ 8]: 0.0324  #######################################
      α[ 9]: 0.0327  #######################################
      α[10]: 0.0325  #######################################
      α[11]: 0.0324  #######################################
      α[12]: 0.0321  #######################################
      α[13]: 0.0324  #######################################
      α[14]: 0.0317  ######################################
      α[15]: 0.0322  #######################################

  Coordinated singular value profile:
    S[ 0]:   5.1650  cum=  9.2%  #############################
    S[ 1]:   4.9525  cum= 17.6%  ############################
    S[ 2]:   4.8142  cum= 25.6%  ###########################
    S[ 3]:   4.6335  cum= 32.9%  ##########################
    S[ 4]:   4.5199  cum= 40.0%  ##########################
    S[ 5]:   4.4203  cum= 46.7%  #########################
    S[ 6]:   4.3376  cum= 53.2%  #########################
    S[ 7]:   4.2448  cum= 59.3%  ########################
    S[ 8]:   4.1641  cum= 65.3%  ########################
    S[ 9]:   4.0915  cum= 71.1%  #######################
    S[10]:   4.0086  cum= 76.6%  #######################
    S[11]:   3.9144  cum= 81.9%  ######################
    S[12]:   3.7995  cum= 86.8%  ######################
    S[13]:   3.6926  cum= 91.5%  #####################
    S[14]:   3.5949  cum= 95.9%  ####################
    S[15]:   3.4336  cum=100.0%  ###################

  Saving reconstruction grid...
  Saved to /content/svae_patch_recon.png

4/7/2025 Prototype V10.3 - Patch16 - The VIT size.

Might need some tweaks but I don't think so. We're approaching actual vit prototype accuracy now.

Lets see how the SVAE performs.

Prototype V10.2 - Patch32 - Patchwork Cross-Attention with Edge Smoothing

This eliminates the edge cutting of the last version, and in the process the recon accuracy has gone up.

Model still escapes the discharge within 2 epochs and has robust recon.

Defeated the last version.

Prototype V10.1 Patchwork Cross-Attention - Stabilized

The patchwork has stabilized, and the output is more accurate than the original now that it supports SVD 32 with more accuracy and higher speed

Epoch 28 hit the unstable point, but the gradient clipped attention was the ticket that ensured solidity.

The discharge recovered immediately.

Give or take 97% accurate recall, lets get those numbers up before we move onto more powerful image sets. Roughly 28m params.

 174 |  0.0471  0.0315   8.0 |  0.0318 |  4.038  2.092  1.93 31.52 |  0.1271 0.995  1.50 | 0.26864 a:0.0471/0.0476
 176 |  0.0471  0.0314   7.9 |  0.0318 |  4.038  2.092  1.93 31.52 |  0.1343 1.000  1.50 | 0.26874 a:0.0471/0.0476
 178 |  0.0471  0.0314   7.9 |  0.0318 |  4.038  2.093  1.93 31.52 |  0.1313 0.995  1.50 | 0.26883 a:0.0471/0.0477
 180 |  0.0471  0.0314   8.0 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1312 1.000  1.50 | 0.26892 a:0.0471/0.0477
 182 |  0.0471  0.0314   7.9 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1310 0.995  1.50 | 0.26899 a:0.0471/0.0477
 184 |  0.0471  0.0314   7.9 |  0.0317 |  4.038  2.092  1.93 31.52 |  0.1350 0.993  1.50 | 0.26906 a:0.0472/0.0477
 186 |  0.0470  0.0314   7.9 |  0.0317 |  4.038  2.092  1.93 31.52 |  0.1338 1.000  1.50 | 0.26911 a:0.0472/0.0477
 188 |  0.0470  0.0314   8.0 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1305 0.999  1.50 | 0.26916 a:0.0472/0.0477
 190 |  0.0470  0.0314   8.0 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1358 0.999  1.50 | 0.26919 a:0.0472/0.0477
 192 |  0.0470  0.0314   8.0 |  0.0317 |  4.038  2.092  1.93 31.52 |  0.1354 0.999  1.50 | 0.26922 a:0.0472/0.0477
 194 |  0.0470  0.0314   7.9 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1301 0.992  1.50 | 0.26923 a:0.0472/0.0477
 196 |  0.0470  0.0314   7.9 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1330 0.998  1.50 | 0.26924 a:0.0472/0.0477
 198 |  0.0470  0.0314   7.9 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1312 1.000  1.50 | 0.26925 a:0.0472/0.0477
 200 |  0.0470  0.0314   7.9 |  0.0317 |  4.038  2.093  1.93 31.52 |  0.1300 1.000  1.50 | 0.26925 a:0.0472/0.0477

==========================================================================================
FINAL ANALYSIS
==========================================================================================

  PatchSVAE: 4 patches × (256, 32)
  Target CV: 0.125
  Recon MSE: 0.031701 +/- 0.024789
  Row CV: 0.1300
  Cross-attention S delta: 0.26925

  Learned alpha per mode (coordination strength):
    Layer 0: mean=0.0471  max=0.0477  min=0.0466
      α[ 0]: 0.0470  #######################################
      α[ 1]: 0.0473  #######################################
      α[ 2]: 0.0474  #######################################
      α[ 3]: 0.0473  #######################################
      α[ 4]: 0.0471  #######################################
      α[ 5]: 0.0474  #######################################
      α[ 6]: 0.0469  #######################################
      α[ 7]: 0.0472  #######################################
      α[ 8]: 0.0470  #######################################
      α[ 9]: 0.0475  #######################################
      α[10]: 0.0467  #######################################
      α[11]: 0.0471  #######################################
      α[12]: 0.0477  #######################################
      α[13]: 0.0466  #######################################
      α[14]: 0.0471  #######################################
      α[15]: 0.0472  #######################################
      α[16]: 0.0472  #######################################
      α[17]: 0.0471  #######################################
      α[18]: 0.0470  #######################################
      α[19]: 0.0475  #######################################
      α[20]: 0.0466  #######################################
      α[21]: 0.0477  #######################################
      α[22]: 0.0470  #######################################
      α[23]: 0.0469  #######################################
      α[24]: 0.0472  #######################################
      α[25]: 0.0472  #######################################
      α[26]: 0.0471  #######################################
      α[27]: 0.0471  #######################################
      α[28]: 0.0474  #######################################
      α[29]: 0.0472  #######################################
      α[30]: 0.0466  #######################################
      α[31]: 0.0475  #######################################
    Layer 1: mean=0.0472  max=0.0477  min=0.0466
      α[ 0]: 0.0474  #######################################
      α[ 1]: 0.0472  #######################################
      α[ 2]: 0.0470  #######################################
      α[ 3]: 0.0471  #######################################
      α[ 4]: 0.0474  #######################################
      α[ 5]: 0.0470  #######################################
      α[ 6]: 0.0474  #######################################
      α[ 7]: 0.0473  #######################################
      α[ 8]: 0.0473  #######################################
      α[ 9]: 0.0470  #######################################
      α[10]: 0.0477  #######################################
      α[11]: 0.0472  #######################################
      α[12]: 0.0466  #######################################
      α[13]: 0.0477  #######################################
      α[14]: 0.0473  #######################################
      α[15]: 0.0471  #######################################
      α[16]: 0.0472  #######################################
      α[17]: 0.0471  #######################################
      α[18]: 0.0476  #######################################
      α[19]: 0.0470  #######################################
      α[20]: 0.0475  #######################################
      α[21]: 0.0470  #######################################
      α[22]: 0.0472  #######################################
      α[23]: 0.0475  #######################################
      α[24]: 0.0472  #######################################
      α[25]: 0.0471  #######################################
      α[26]: 0.0475  #######################################
      α[27]: 0.0474  #######################################
      α[28]: 0.0472  #######################################
      α[29]: 0.0469  #######################################
      α[30]: 0.0476  #######################################
      α[31]: 0.0466  #######################################

  Coordinated singular value profile:
    S[ 0]:   4.0376  cum=  5.3%  #############################
    S[ 1]:   3.9321  cum= 10.3%  #############################
    S[ 2]:   3.8501  cum= 15.1%  ############################
    S[ 3]:   3.7785  cum= 19.8%  ############################
    S[ 4]:   3.7092  cum= 24.2%  ###########################
    S[ 5]:   3.6414  cum= 28.5%  ###########################
    S[ 6]:   3.5771  cum= 32.7%  ##########################
    S[ 7]:   3.5158  cum= 36.7%  ##########################
    S[ 8]:   3.4554  cum= 40.6%  #########################
    S[ 9]:   3.3961  cum= 44.3%  #########################
    S[10]:   3.3371  cum= 48.0%  ########################
    S[11]:   3.2788  cum= 51.5%  ########################
    S[12]:   3.2230  cum= 54.8%  #######################
    S[13]:   3.1681  cum= 58.1%  #######################
    S[14]:   3.1141  cum= 61.2%  #######################
    S[15]:   3.0607  cum= 64.3%  ######################
    S[16]:   3.0088  cum= 67.2%  ######################
    S[17]:   2.9568  cum= 70.1%  #####################
    S[18]:   2.9075  cum= 72.8%  #####################
    S[19]:   2.8572  cum= 75.5%  #####################
    S[20]:   2.8067  cum= 78.0%  ####################
    S[21]:   2.7584  cum= 80.5%  ####################
    S[22]:   2.7075  cum= 82.9%  ####################
    S[23]:   2.6574  cum= 85.2%  ###################
    S[24]:   2.6060  cum= 87.4%  ###################
    S[25]:   2.5535  cum= 89.5%  ##################
    S[26]:   2.4991  cum= 91.5%  ##################
    S[27]:   2.4413  cum= 93.5%  ##################
    S[28]:   2.3770  cum= 95.3%  #################
    S[29]:   2.2906  cum= 97.0%  #################
    S[30]:   2.2012  cum= 98.6%  ################
    S[31]:   2.0926  cum=100.0%  ###############

  Saving reconstruction grid...
  Saved to /content/svae_patch_recon.png

Prototype V10 Patchwork Cross-Attention - Unstable

Tiny Imagenet can't draw enough information from a single monotonic MLP projection, so I'm breaking the structure into quadrant-based mlp patches with cross-attention for a prototype.

Each patch is 32x32 and they have svd 24 independently represented each with patchwork cross-attention. Similar to a vit, so I'm building it to a full vit structure over time to ensure solidity and solidarity.

Current proto is more stable but requires a bit more oomph.

The CV is enjoying it's drift a BIT too much

I'll try attention alpha rather than rigid alpha. 4 patches is a bit unstable, so lets get some stability.

Prototype V9 prod

Should run on colab. Install the necessary repos.

https://huggingface.co/AbstractPhil/geolip-SVAE/blob/main/prototype_v9_prod.py

Prototype V8 Soft Hand Loss

Stable prototype found. Scaling with the CV ratio within this band is a stable attractor to the structural response.

The soft hand loss is acting like a stable attractant. Correct utilization of this behavior can directly attenuate a model's structural internals to align to certain trajectory-based routes.

The alignment can be directly tuned at runtime, shifted to learn implicit rules, altered to teach specific behaviors, and more.

0.034 mse, which is a different gauge of loss entirely.

Prototype V7

Normalized spherical without magnitude, expected considerably faster with less accuracy at first stages.

What happens if you train with the wrong CV value?

Using geolip-core SVD (Gram + eigh)
SVAE - V=96, D=24 (Validated: CV=0.3668)
  Matrix: (96, 24) = 2304 elements
  SVD: geolip-core Gram+eigh
  Losses: recon + CV(w=0.1, target=0.3668)
  Params: 6,036,736
=====================================================================================
 ep |    loss   recon    cv_l  t/ep |   t_rec |     S0     SD ratio erank |  row_cv
-------------------------------------------------------------------------------------
  1 |  0.4174  0.4169  0.0037   7.3 |  0.2843 |   5.39  1.977  2.73 23.15 |  0.3039
  2 |  0.2492  0.2489  0.0031   7.3 |  0.2286 |   5.43  1.978  2.75 23.14 |  0.3148
  3 |  0.2096  0.2093  0.0030   7.3 |  0.1946 |   5.50  1.982  2.77 23.13 |  0.3352
  4 |  0.1858  0.1855  0.0021   7.2 |  0.1812 |   5.48  1.980  2.77 23.13 |  0.3460
  6 |  0.1586  0.1581  0.0046   7.3 |  0.1541 |   5.31  1.873  2.83 23.09 |  0.3938
  8 |  0.1419  0.1407  0.0096   7.3 |  0.1377 |   5.33  1.815  2.93 23.03 |  0.4565
 10 |  0.1314  0.1283  0.0385   7.3 |  0.1279 |   5.42  1.778  3.05 22.97 |  0.5373
 12 |  0.1226  0.1160  0.0599   7.2 |  0.1162 |   5.67  1.738  3.26 22.86 |  0.6060
 14 |  0.1189  0.1087  0.0847   7.1 |  0.1109 |   5.78  1.705  3.39 22.79 |  0.6643
 16 |  0.1175  0.1014  0.1935   7.1 |  0.0996 |   6.17  1.701  3.63 22.67 |  0.7598
 18 |  0.1170  0.0952  0.2238   7.2 |  0.0974 |   6.50  1.671  3.89 22.52 |  0.8211
 20 |  0.1173  0.0905  0.1539   7.2 |  0.0907 |   6.69  1.649  4.06 22.43 |  0.8383
 22 |  0.1200  0.0852  0.3335   7.1 |  0.0903 |   7.11  1.655  4.30 22.30 |  0.9128
 24 |  0.1233  0.0817  0.2770   7.2 |  0.0831 |   7.51  1.646  4.56 22.15 |  0.9654
 26 |  0.1286  0.0785  0.3243   7.1 |  0.0778 |   7.71  1.646  4.68 22.09 |  1.0196
 28 |  0.1328  0.0752  0.4244   7.2 |  0.0780 |   7.84  1.636  4.80 22.02 |  1.1002
 30 |  0.1373  0.0726  0.8786   7.1 |  0.0752 |   8.24  1.631  5.05 21.87 |  1.1243
 32 |  0.1437  0.0703  0.6946   7.2 |  0.0704 |   8.52  1.631  5.23 21.76 |  1.2061
 34 |  0.6025  0.6020  0.0062   7.1 |  0.5194 |  28.25 10.261  2.75 23.14 |  0.2935
 36 |  0.4995  0.4990  0.0062   7.2 |  0.4949 |  29.82 10.939  2.73 23.15 |  0.2982
 38 |  0.4947  0.4942  0.0058   7.2 |  0.4915 |  28.37 10.433  2.72 23.15 |  0.2988
 40 |  0.4579  0.4574  0.0053   7.2 |  0.4557 |  26.31  9.585  2.74 23.14 |  0.3041
 42 |  0.4333  0.4328  0.0051   7.1 |  0.4259 |  22.03  7.996  2.75 23.14 |  0.2984
 44 |  0.4057  0.4054  0.0038   7.1 |  0.3880 |  21.15  7.656  2.76 23.15 |  0.3177
 46 |  0.3670  0.3667  0.0024   7.2 |  0.3634 |  19.33  6.943  2.78 23.13 |  0.3280
 48 |  0.3495  0.3493  0.0005   7.1 |  0.3457 |  18.34  6.569  2.79 23.13 |  0.3336
 50 |  0.3341  0.3340  0.0010   7.3 |  0.3326 |  17.55  6.298  2.79 23.13 |  0.3424
 52 |  0.3205  0.3204  0.0003   7.2 |  0.3182 |  16.94  6.069  2.79 23.12 |  0.3549

SNAP. right there at epoch 34. The tension was too strong, the model simply snapped. I had it set to around 0.366, and it requires that value there where it snapped to. 0.2935

The actual value as of the bulk embedding tests show; CV=0.2992 is the stable attractor, almost precisely where the model snapped to.

The effect was so strong, that the entire model had a forced reset when it realized the fundamental invalidity.

Why? I don't know yet.

Models

V4 5m SVD+EIGH 100 epochs 48x24

V4 111m 200x24 SVD+EIGH KL_DIV - Undercooked, needs more epochs -> sequel faulty, collapse

V3 v1024 - SVD 24

V2 16 modes

V1 8 modes

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AbstractPhil/geolip-SVAE

GeoLIP

Collection

A series of useful models expert-trained using the GEOLIP distillation and constellation process. • 7 items • Updated Apr 7