Dequantize a Matrix

Reconstructs an approximate floating-point matrix from a quantized representation produced by mlx_quantize().

Usage

mlx_dequantize(
  w,
  scales,
  biases = NULL,
  group_size = 64L,
  bits = 4L,
  mode = "affine",
  device = mlx_default_device()
)

Arguments

w: An mlx array representing the weight matrix. Accepts either an unquantized matrix (which may be quantized automatically) or a pre-quantized uint32 matrix produced by mlx_quantize().
scales: An optional mlx array of quantization scales. Required when w is already quantized.
biases: An optional mlx array of quantization biases (affine mode); use NULL for symmetric quantization.
group_size: The group size for quantization. Smaller groups improve accuracy at the cost of slightly higher memory. Default: 64.
bits: Number of bits for quantization (typically 4 or 8). Default: 4.
mode: Quantization mode, either "affine" or "mxfp4".
device: Execution target: supply "gpu", "cpu", or an mlx_stream created via mlx_new_stream(). Defaults to the current mlx_default_device() unless noted otherwise (helpers that act on an existing array typically reuse that array's device or stream).

Value

An mlx array with the dequantized (approximate) floating-point weights

Details

Dequantization unpacks the low-precision quantized weights and applies the scales (and biases) to reconstruct approximate floating-point values. Note that some precision is lost during quantization and cannot be recovered.

Examples

w <- mlx_rand_normal(c(64, 32))
quant <- mlx_quantize(w, group_size = 32)
w_reconstructed <- mlx_dequantize(quant$w_q, quant$scales, quant$biases, group_size = 32)

Usage

Arguments

Value

Details

See also

Examples