Model Card for PrunaAI/tiny-random-llama4-smashed

This model was created using the pruna library. Pruna is a model optimization framework built for developers, enabling you to deliver more efficient models with minimal implementation overhead.

Usage

First things first, you need to install the pruna library:

pip install pruna

You can then load this model using the following code:

from pruna import PrunaModel

loaded_model = PrunaModel.from_hub("PrunaAI/tiny-random-llama4-smashed")

After loading the model, you can use the inference methods of the original model.

Smash Configuration

The compression configuration of the model is stored in the smash_config.json file.

{
    "batcher": null,
    "cacher": null,
    "compiler": "torch_compile",
    "pruner": null,
    "quantizer": null,
    "torch_compile_backend": "inductor",
    "torch_compile_batch_size": 1,
    "torch_compile_dynamic": null,
    "torch_compile_fullgraph": true,
    "torch_compile_make_portable": false,
    "torch_compile_max_kv_cache_size": 400,
    "torch_compile_mode": "default",
    "torch_compile_seqlen_manual_cuda_graph": 100,
    "max_batch_size": 1,
    "device": "cpu",
    "save_fns": [
        "save_before_apply"
    ],
    "load_fns": [
        "transformers"
    ],
    "reapply_after_load": {
        "pruner": null,
        "quantizer": null,
        "cacher": null,
        "compiler": "torch_compile",
        "batcher": null
    }
}

Model Configuration

The configuration of the model is stored in the config.json file.

{}

🌍 Join the Pruna AI community!

Downloads last month: 3

Safetensors

Model size

6.52M params

Tensor type

BF16