Bend + HVM2: the first high-level GPU language

What is being announced today isn't perfect. There is a pile of disclaimers at the end, so, please, check it before ramping up on expectations! That said, we do bring something exciting today: the first high-level language that runs on GPUs.

Does that already exist?

No. Languages like CUDA, OpenCL, Metal, are extremely low-level. One has to manually handle threads, memory management, locks, mutexes, atomics and so on, to design a correct algorithm in presence of massive concurrency. This is extremely hard and time-consuming. Easier alternatives like tensor libraries avoid that burden, but restrict you to a narrow set of predefined operations. As of 2024, there was no true high-level language that just runs on GPUs, without any compromise. There is a reason humanity moved away from the era of Fortran programming, towards Python, JavaScript, Haskell: abstraction matters. Yet, as far as GPUs are concerned, we still live in that dark era. Until now!

Presenting: Bend

Bend is the first truly high-level language that runs on GPUs. By that, we mean it has all core features of a modern programming language, from the simplest ones like tuples, lists and sum types, to the most complex ones, like higher-order functions, closures and continuations. Yet, it runs on GPUs, with near-linear speedup, and no parallelism annotations.

In Bend, you never need to manually spawn threads, handle locks, mutexes, atomics, nor reason about data race conditions and deadlocks. Just write your code in the most natural way, and it will run on thousands of GPU threads with near-linear speedup and zero concurrency errors. Yet, it is capable of expressing every styles of parallelism, from shaders, to data-parallel ML algorithms, to Erlang-like actor models, and infinitely more.

Since that's quite a big claim, let's back it up with reasonable evidence!

How is that possible!?

Instead of getting too technical and talking about models of computation (for that, check the [paper]), we'll just show a bunch of actual examples, and explain how and why they work (and how well they perform!).

... continue ...