Welcome to Lobsters! I’m excited to look at the runtime orchestration code. I think it’s interesting to apply the hooks abstraction to arrange a computation (here rendering), but then delegate the computation to a different execution model. This means the hooks API only memos things that are actually meaningful to memo. What I mean by that is that it’s obvious that these GPU primitives benefit from memoization.
The issue I have with memo in React is that most computations in a React component are not expensive, but the component must memo them anyway (spending precious memory?) in order to avoid React doing a lot of expensive “re-render” work. Like, sorting an array of 30 items is “free”, but I need to memo it to avoid React re-diffing a big HTML tree, which is not “free”. The nice thing about the GPU model is - the render work will happen no matter what at 60hz, and is presumed “cheap” compared to the memoizable setup work.
I have other posts that describe the runtime in more detail (e.g. React - The Missing Parts) which shed light on this.
There a few crucial things to note:
Live/Use.GPU does not render an HTML VDOM so there is nothing to diff with. Components merely execute code and yield values. This even includes a Suspense-like mechanism where you can yeet a suspend symbol to pause updates in a map-reduce tree. Unlike Suspense, this occurs in the forward direction only, and parents don’t need to be re-rendered if a child unsuspends. This is because the tail of a mapreduce is not actually part of the original parent, but is mounted as a separate Resume(..) continuation.
By wrapping entire components in memo(), re-rendering can be halted midstream… the same goes for passing the same immutable value to a context. In fact i’d wager most people’s mental model of when React stops re-rendering is actually wrong. The Use.GPU inspector specifically highlights re-renders in green, so you can see how few components actually need to re-evaluate. Observing this live was extremely useful to verify memoization of the resulting component tree.
GPU rendering is cheap, but there is always a lot of extra work that needs to happen that is not 60fps, such as building an atlas of SDF glyphs for text rendering. This sort of code runs only when necessary.
While the API is a very close carbon copy of React, it is more appropriate to think of it as a ZIO-like effect system instead of a DOM tree.
(I do have a very nice <HTML> wrapper so you can switch from Live to React mid stream, and the inspector will even show you the react tree underneath, although it is not fully inspectable)
It gets used as a scene graph by the GLTF module, by passing a matrix down from parent to child and doing the matrix multiplications along the way. But that’s a relatively new addition, as there is no actual scene model like in three.js. There are also no “isDirty” flags, because that would just replicate what the memoization is already doing.
In contrast, the plot module instead passes down a shader function instead using a react-like context, so it’s not limited to affine transforms and can instead compose e.g. a polar coordinate transform with literally anything else.
One thing I’d like to tentatively raise, because I suspect you know it already but it might be useful to know the vocabulary if you don’t already, is that for making things that go fast and parallel in an automatic way in Haskell, people tend to get much better results by implementing Applicative rather than Monad. The implementation can see all the steps that are going to happen up front when it’s Applicative so there’s a lot more freedom to work with, unlike Monad where it doesn’t know what’s going to happen next until the user function actually runs. I suspect that your “monad-ish “thing is probably already more similar to Applicative than Monad.
e.g. There some write ups out of (ick) Facebook about how they have DSLs for making calls to lots of backend APIs in parallel with the code still looking somewhat imperative-ish (at least by Haskell standards anyway) by implementing an Applicative instance and not using Monad.
I was hoping that the burrito references would set the appropriate tone here, which is that I’m not particularly concerned with exactly what it maps to in FP terms, because it’s not haskell-like at all.
I think from the point of view of the code doing the composing, it is monad-like in that you can only bind like to like. From the point of view of the shader linker, it does have a full picture of all the code, but at that point it’s not going to do anything with it other than glue things together with some minor local polyfilling.
The fact that I’m using one language (JS) to compose code in another (WGSL) means a lot of conceptual and literal purity goes right out the window.
I get that, but this is a technical discussion forum so I wanted to try to be helpful on the admittedly unlikely off chance that you might not have seen some of these papers
because it’s not haskell-like at all
On some level, batching oriented graphics API code reads to me a bit like haskell code viewed at the level of the ABI / raw machine code output. You’ve got an the manual poking of pointers and values into registers. :)
Scattering the shader compiles + resource allocation around does sound like it will result in a lot if stutter around startup. Do you have a way to deal with that?
Also, and this is a more subjective comment, I have the general feeling that using standard opengl/DX/Vulcan can actually be more maintainable than a custom system like this, because GPU programmers are familiar with it. A heavyweight wrapper like this will be very unfamiliar to any new people joining the project, and while it might make things easier for people who don’t know GPU programming, I think it might make things harder for people who do.
Basically, I have a vague sense that it might be falling into the trap of making easy things easier at the expense of making hard things harder.
Actually the framework is designed to let revealed preference solve that. Your vague sense is misplaced.
The most basic drawing abstraction is literally just a handful of lines of code that gather functions to call, and then calls them, with no idea what they do.
Within a render pass, the same applies: it just passes a normal webgpu render pass encoder to a lambda, which can do anything it wants.
Everything beyond that is opt-in. If you want to construct naked draw calls from pure, uncomposed shaders, you can.
There is no overarching scene abstraction, and the extensions to WGSL are extremely minimal, unlike almost every other engine out there. Specifically, what I wanted to avoid is exactly what you describe, which you run into in e.g. three.js: if you wish to render something that doesn’t fit into three’s scene model, you still need to pretend it is a scene, just to render e.g. a full screen quad.
Furthermore, the abstractions Use.GPU does have, rely as much as possible on native webgpu types which are not wrapped in any way. I call this “No API” design.
In short: I recommend you actually look at its code before judging. It may surprise you. Most of the work has not gone into towering abstractions, but rather, decomposing the existing practices along saner lines, that allow for a la carte, opt-in composition.
As for the start up problem: I compile shaders async, and hence it loads similar to a webpage, with different elements popping in when available. If you don’t want this, you can use a Suspense like mechanism to render fallback content/code, or to keep rendering the previous content until the new content is ready.
No, it’s caching everything. The entire point is minimal recomputation. It is so thorough that a normal interactive program doesn’t even have a render loop. It simply rewinds and reruns non looping code instead.
You only need a render loop to do continuous animation, and even then, the purpose of the loop is to just schedule a rewind using requestAnimationFrame, and make sure animations are keyed off a global time stamp.
Welcome to Lobsters! I’m excited to look at the runtime orchestration code. I think it’s interesting to apply the hooks abstraction to arrange a computation (here rendering), but then delegate the computation to a different execution model. This means the hooks API only memos things that are actually meaningful to memo. What I mean by that is that it’s obvious that these GPU primitives benefit from memoization.
The issue I have with memo in React is that most computations in a React component are not expensive, but the component must memo them anyway (spending precious memory?) in order to avoid React doing a lot of expensive “re-render” work. Like, sorting an array of 30 items is “free”, but I need to memo it to avoid React re-diffing a big HTML tree, which is not “free”. The nice thing about the GPU model is - the render work will happen no matter what at 60hz, and is presumed “cheap” compared to the memoizable setup work.
I have other posts that describe the runtime in more detail (e.g. React - The Missing Parts) which shed light on this.
There a few crucial things to note:
Live/Use.GPU does not render an HTML VDOM so there is nothing to diff with. Components merely execute code and yield values. This even includes a Suspense-like mechanism where you can yeet a suspend symbol to pause updates in a map-reduce tree. Unlike Suspense, this occurs in the forward direction only, and parents don’t need to be re-rendered if a child unsuspends. This is because the tail of a mapreduce is not actually part of the original parent, but is mounted as a separate Resume(..) continuation.
By wrapping entire components in memo(), re-rendering can be halted midstream… the same goes for passing the same immutable value to a context. In fact i’d wager most people’s mental model of when React stops re-rendering is actually wrong. The Use.GPU inspector specifically highlights re-renders in green, so you can see how few components actually need to re-evaluate. Observing this live was extremely useful to verify memoization of the resulting component tree.
GPU rendering is cheap, but there is always a lot of extra work that needs to happen that is not 60fps, such as building an atlas of SDF glyphs for text rendering. This sort of code runs only when necessary.
While the API is a very close carbon copy of React, it is more appropriate to think of it as a ZIO-like effect system instead of a DOM tree.
(I do have a very nice <HTML> wrapper so you can switch from Live to React mid stream, and the inspector will even show you the react tree underneath, although it is not fully inspectable)
So you have a thing that kinda looks like a VDOM but it really gets used more like a scene graph? :)
It gets used as a scene graph by the GLTF module, by passing a matrix down from parent to child and doing the matrix multiplications along the way. But that’s a relatively new addition, as there is no actual scene model like in three.js. There are also no “isDirty” flags, because that would just replicate what the memoization is already doing.
In contrast, the plot module instead passes down a shader function instead using a react-like context, so it’s not limited to affine transforms and can instead compose e.g. a polar coordinate transform with literally anything else.
One thing I’d like to tentatively raise, because I suspect you know it already but it might be useful to know the vocabulary if you don’t already, is that for making things that go fast and parallel in an automatic way in Haskell, people tend to get much better results by implementing Applicative rather than Monad. The implementation can see all the steps that are going to happen up front when it’s Applicative so there’s a lot more freedom to work with, unlike Monad where it doesn’t know what’s going to happen next until the user function actually runs. I suspect that your “monad-ish “thing is probably already more similar to Applicative than Monad.
e.g. There some write ups out of (ick) Facebook about how they have DSLs for making calls to lots of backend APIs in parallel with the code still looking somewhat imperative-ish (at least by Haskell standards anyway) by implementing an Applicative instance and not using Monad.
I was hoping that the burrito references would set the appropriate tone here, which is that I’m not particularly concerned with exactly what it maps to in FP terms, because it’s not haskell-like at all.
I think from the point of view of the code doing the composing, it is monad-like in that you can only bind like to like. From the point of view of the shader linker, it does have a full picture of all the code, but at that point it’s not going to do anything with it other than glue things together with some minor local polyfilling.
The fact that I’m using one language (JS) to compose code in another (WGSL) means a lot of conceptual and literal purity goes right out the window.
I get that, but this is a technical discussion forum so I wanted to try to be helpful on the admittedly unlikely off chance that you might not have seen some of these papers
On some level, batching oriented graphics API code reads to me a bit like haskell code viewed at the level of the ABI / raw machine code output. You’ve got an the manual poking of pointers and values into registers. :)
Scattering the shader compiles + resource allocation around does sound like it will result in a lot if stutter around startup. Do you have a way to deal with that?
Also, and this is a more subjective comment, I have the general feeling that using standard opengl/DX/Vulcan can actually be more maintainable than a custom system like this, because GPU programmers are familiar with it. A heavyweight wrapper like this will be very unfamiliar to any new people joining the project, and while it might make things easier for people who don’t know GPU programming, I think it might make things harder for people who do.
Basically, I have a vague sense that it might be falling into the trap of making easy things easier at the expense of making hard things harder.
Actually the framework is designed to let revealed preference solve that. Your vague sense is misplaced.
The most basic drawing abstraction is literally just a handful of lines of code that gather functions to call, and then calls them, with no idea what they do.
Within a render pass, the same applies: it just passes a normal webgpu render pass encoder to a lambda, which can do anything it wants.
Everything beyond that is opt-in. If you want to construct naked draw calls from pure, uncomposed shaders, you can.
There is no overarching scene abstraction, and the extensions to WGSL are extremely minimal, unlike almost every other engine out there. Specifically, what I wanted to avoid is exactly what you describe, which you run into in e.g. three.js: if you wish to render something that doesn’t fit into three’s scene model, you still need to pretend it is a scene, just to render e.g. a full screen quad.
Furthermore, the abstractions Use.GPU does have, rely as much as possible on native webgpu types which are not wrapped in any way. I call this “No API” design.
In short: I recommend you actually look at its code before judging. It may surprise you. Most of the work has not gone into towering abstractions, but rather, decomposing the existing practices along saner lines, that allow for a la carte, opt-in composition.
As for the start up problem: I compile shaders async, and hence it loads similar to a webpage, with different elements popping in when available. If you don’t want this, you can use a Suspense like mechanism to render fallback content/code, or to keep rendering the previous content until the new content is ready.
This seems very interesting after a quick skim but I didn’t understand what exactly are you caching. Shader programs?
No, it’s caching everything. The entire point is minimal recomputation. It is so thorough that a normal interactive program doesn’t even have a render loop. It simply rewinds and reruns non looping code instead.
You only need a render loop to do continuous animation, and even then, the purpose of the loop is to just schedule a rewind using requestAnimationFrame, and make sure animations are keyed off a global time stamp.