Reconstructs an approximate floating-point matrix from a quantized representation
produced by mlx_quantize().
Usage
mlx_dequantize(
w,
scales,
biases = NULL,
group_size = 64L,
bits = 4L,
mode = "affine",
device = mlx_default_device()
)Arguments
- w
An mlx array (the quantized weight matrix)
- scales
An mlx array (the quantization scales)
- biases
An optional mlx array (the quantization biases for affine mode). Default: NULL
- group_size
The group size used during quantization. Default: 64
- bits
The number of bits used during quantization. Default: 4
- mode
The quantization mode used: "affine" or "mxfp4". Default: "affine"
- device
Execution target: supply
"gpu","cpu", or anmlx_streamcreated viamlx_new_stream(). Defaults to the currentmlx_default_device()unless noted otherwise (helpers that act on an existing array typically reuse that array's device or stream).
Details
Dequantization unpacks the low-precision quantized weights and applies the scales (and biases) to reconstruct approximate floating-point values. Note that some precision is lost during quantization and cannot be recovered.
Examples
w <- mlx_rand_normal(c(64, 32))
quant <- mlx_quantize(w, group_size = 32)
w_reconstructed <- mlx_dequantize(quant$w_q, quant$scales, quant$biases, group_size = 32)