Rmlx provides an R interface to Apple’s MLX framework, enabling high-performance GPU computing on Apple Silicon.
Modern Macs have a GPU, which is great for performing matrix operations. Statistics uses a lot of matrix operations. But until now, there has been no way for R on the Mac to use the GPU. Rmlx exists to fill that gap. It is very early stage and was largely vibe-coded with Claude/OpenAI Codex. Obviously, use at your own risk! Contributions are very welcome. In particular it would be great to implement function import/export, more neural network components, etc.
There is a companion library at hughjonesd/RmlxStats, which focuses on implementing common statistical methods on the GPU.
Most C++ functions are implemented via R functions with an mlx_ prefix. In addition, the package defines mlx-specific methods for many R matrix operations, including arithmetic, subsetting and matrix algebra.
Installation
brew install mlx
or the Linux equivalent. Then just install the package as normal:
remotes::install("hughjonesd/Rmlx")Alternatively, you can build mlx from the source.
Features
Fast GPU Operations
library(Rmlx)
#>
#> Attaching package: 'Rmlx'
#> The following object is masked from 'package:stats':
#>
#> fft
#> The following objects are masked from 'package:base':
#>
#> asplit, backsolve, chol2inv, col, colMeans, colSums, diag, drop,
#> outer, row, rowMeans, rowSums, svd
A <- matrix(rnorm(1e6), 1e3, 1e3)
A_mlx <- as_mlx(A)
system.time(solve(A))
#> user system elapsed
#> 0.378 0.005 0.555
system.time(mlx_eval(solve(A_mlx)))
#> user system elapsed
#> 0.008 0.006 0.015Lazy Evaluation
Operations are recorded but not executed until explicitly evaluated:
x <- as_mlx(matrix(1:25, 5, 5))
y <- as_mlx(matrix(101:125, 5, 5))
# Lazy - not computed yet
z <- x + y * 2
# Force evaluation
mlx_eval(z)
# Or convert to R (automatically evaluates)
as.matrix(z)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 203 218 233 248 263
#> [2,] 206 221 236 251 266
#> [3,] 209 224 239 254 269
#> [4,] 212 227 242 257 272
#> [5,] 215 230 245 260 275Device Management
M series chips have shared memory between the CPU and GPU, so switching between devices is costless.
# Check/set default device
dev <- mlx_default_device()
mlx_default_device("cpu") # Switch to CPU
mlx_default_device(dev) # Back to GPU
# Create on specific device
x_gpu <- as_mlx(matrix(1:12, 3, 4), device = "gpu")
x_cpu <- as_mlx(matrix(1:12, 3, 4), device = "cpu")Subsetting
Subsetting works like base R:
x <- as_mlx(matrix(1:9, 3, 3))
x[1:2, 1:2]
#> mlx array [2 x 2]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2]
#> [1,] 1 4
#> [2,] 2 5
# drop = FALSE by default
x[1, ]
#> mlx array [1 x 3]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2] [,3]
#> [1,] 1 4 7
logical_mask <- c(TRUE, FALSE, TRUE)
x[logical_mask, ]
#> mlx array [2 x 3]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 3 6 9
# subset assignment
x[, 2] <- c(0, 0, 0)
x
#> mlx array [3 x 3]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2] [,3]
#> [1,] 1 0 7
#> [2,] 2 0 8
#> [3,] 3 0 9Arithmetic
x <- as_mlx(matrix(1:12, 3, 4))
y <- as_mlx(matrix(13:24, 3, 4))
# Element-wise operations
sum_xy <- x + y
diff_xy <- x - y
prod_xy <- x * y
quot_xy <- x / y
pow_xy <- x ^ 2
# Comparisons
lt <- x < y
eq <- x == y
# Bring results back to R
as.matrix(sum_xy)
#> [,1] [,2] [,3] [,4]
#> [1,] 14 20 26 32
#> [2,] 16 22 28 34
#> [3,] 18 24 30 36
as.matrix(lt)
#> [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE TRUEMatrix Operations
Many base R matrix functions have mlx-specific methods:
a <- as_mlx(matrix(1:6, 2, 3))
b <- as_mlx(matrix(1:6, 3, 2))
# rbind, cbind, transpose
rbind(a, t(b))
#> mlx array [4 x 3]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
#> [3,] 1 2 3
#> [4,] 4 5 6
cbind(a, t(b))
#> mlx array [2 x 6]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 3 5 1 2 3
#> [2,] 2 4 6 4 5 6
# Matrix algebra
a %*% b
#> mlx array [2 x 2]
#> dtype: float32
#> device: gpu
#> values:
#> [,1] [,2]
#> [1,] 22 49
#> [2,] 28 64
# Reductions
sum(a)
#> mlx array []
#> dtype: float32
#> device: gpu
#> values:
#> [1] 21
colMeans(a)
#> mlx array [3]
#> dtype: float32
#> device: gpu
#> values:
#> [1] 1.5 3.5 5.5
# Cumulative operations flatten column-major
cumsum(a)
#> mlx array [6]
#> dtype: float32
#> device: gpu
#> values:
#> [1] 1 3 6 10 15 21
qr_res <- qr(a)
svd_res <- svd(a)
chol_res <- chol(a[, 1:2])
fft_res <- fft(a)
crossprod_res <- crossprod(a, b[1:2, ])
solve_res <- solve(a[, 1:2], b[1:2, ])Automatic Differentiation
loss <- function(w, x, y) {
preds <- x %*% w
resids <- preds - y
sum(resids * resids) / length(y)
}
x <- as_mlx(matrix(rnorm(20), 5, 4))
y <- as_mlx(matrix(rnorm(5), 5, 1))
w <- as_mlx(matrix(0, 4, 1))
grads <- mlx_grad(loss, w, x, y)
# Inspect gradient
as.matrix(grads[[1]])
#> [,1]
#> [1,] 0.9005783
#> [2,] 0.2985840
#> [3,] 0.6431852
#> [4,] 0.7269748
# Simple SGD loop
model <- mlx_linear(4, 1, bias = FALSE)
opt <- mlx_optimizer_sgd(mlx_parameters(model), lr = 0.1)
loss_fn <- function(mod, data_x, data_y) {
preds <- mlx_forward(mod, data_x)
resids <- preds - data_y
sum(resids * resids) / length(data_y)
}
for (step in 1:50) {
mlx_train_step(model, loss_fn, opt, x, y)
}
# Check final loss
ypred <- mlx_forward(model, x)
mean((ypred - y) * (ypred - y))
#> mlx array []
#> dtype: float32
#> device: gpu
#> values:
#> [1] 0.1833142