SousChef-v1 Weight File Documentation
New Fields in config.json
- model_type: Specifies the model type, which is updated to
souschef_v1
in this release. - num_recipe_predict_layers: Indicates the number of Recipe Prediction (RP) Modules. The open-sourced SousChef-v1 weights include 2 RP Modules.
- quantization_config: Describes the configuration for FP8 quantization.
Weight Structure Overview
The SousChef-v1 weight file consists of two main components: Main Model Weights and RP Modules.
1. Main Model Weights
- Composition:
- Input/output embedding layers and a complete set of 48 Transformer hidden layers.
- Parameter Count:
- Total parameters: 250B
- Activation parameters: 20.3B (including 1.2B for Embedding and 1.1B for the output Head).
Structural Details
- Embedding Layer:
model.embed_tokens.weight
- Transformer Hidden Layers:
model.layers.0
tomodel.layers.47
, totalingnum_hidden_layers
layers.
- Output Layer:
model.norm.weight
lm_head.weight
2. Recipe Prediction (RP) Modules
- Composition:
- Additional RP Modules defined by the
num_recipe_predict_layers
field. In this model, the value is set to 2.
- Additional RP Modules defined by the
- Parameter Count:
- Parameters: 10.5B unique parameters, excluding the shared 1.2B Embedding and 1.1B output Head.
- Activation parameters: 3.2B (including the shared 1.2B Embedding and 1.1B output Head).
Structural Details
- embed_tokens: Shares parameters with the Embedding layer of the Main Model weights.
- enorm & hnorm: RMSNorm parameters required for speculative recipe prediction.
- rp_proj: Parameters for dimensionality reduction projection on the norm results.
- Additional Transformer Hidden Layers:
model.layers.48.self_attn & mlp
tomodel.layers.49.self_attn & mlp
(structure identical to the Main Model hidden layers).
- shared_head: Shares parameters with the output Head of the Main Model weights.
Loading Rules
- Main Model Weights: Loaded via the
num_hidden_layers
parameter inconfig.json
. - RP Modules: Loaded via the
num_recipe_predict_layers
parameter, with layer IDs appended immediately after the Main Model hidden layers. For example:- If
num_hidden_layers = 48
andnum_recipe_predict_layers = 2
, the RP Module's layer IDs are48
and49
.
- If
FP8 Weight Documentation
SousChef-v1 natively supports FP8 weight format with 128x128 block scaling.
FP8 Configuration
The FP8 weight file introduces a quantization_config
field to describe the quantization method. Below is an example configuration:
"quantization_config": {
"activation_scheme": "dynamic",
"fmt": "e4m3",
"quant_method": "fp8",
"weight_block_size": [128, 128]
}
- Quantization Format:
- Format type:
fp8
ande4m3
(corresponding totorch.float8_e4m3fn
). - Weight block size:
128x128
.
- Format type:
- Activation Quantization Scheme:
- Utilizes dynamic activation quantization (
dynamic
).
- Utilizes dynamic activation quantization (
Dequantization Method
The FP8 weight file includes a weight_scale_inv
field, which stores the dequantization scale for each weight block.
- Storage Format:
float32 Tensor
, stored alongside the weight data. - Dequantization Formula:
- If the weight block is not aligned to 128, it is zero-padded to 128 before calculating the scale. After quantization, the padded portion is removed.
- The dequantization process is performed as:
(128x128 weight block) * weight_scale_inv
.
Through dequantization of the FP8 weights, runtime operations enable online quantization at a granularity of per-token-per-128-channel
.