Reconstructs an approximate floating-point matrix from a quantized representation
produced by mlx_quantize().
Arguments
- w
An mlx array representing the weight matrix. Accepts either an unquantized matrix (which may be quantized automatically) or a pre-quantized uint32 matrix produced by
mlx_quantize().- scales
An optional mlx array of quantization scales. Required when
wis already quantized.- biases
An optional mlx array of quantization biases (affine mode); use
NULLfor symmetric quantization.- group_size
The group size for quantization. Smaller groups improve accuracy at the cost of slightly higher memory. Default: 64.
- bits
Number of bits for quantization (typically 4 or 8). Default: 4.
- mode
Quantization mode, either
"affine"or"mxfp4".
Details
Dequantization unpacks the low-precision quantized weights and applies the scales (and biases) to reconstruct approximate floating-point values. Note that some precision is lost during quantization and cannot be recovered.
Examples
w <- mlx_rand_normal(c(64, 32))
quant <- mlx_quantize(w, group_size = 32)
w_reconstructed <- mlx_dequantize(quant$w_q, quant$scales, quant$biases, group_size = 32)