<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title></title>
    <link rel="self" type="application/atom+xml" href="https://www.farfa.dev/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://www.farfa.dev"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-05-07T00:00:00+00:00</updated>
    <id>https://www.farfa.dev/atom.xml</id>
    <entry xml:lang="en">
        <title>Implicit Surface rendering for CAD</title>
        <published>2026-05-07T00:00:00+00:00</published>
        <updated>2026-05-07T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Francis (GrandChaman) Le Roy
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.farfa.dev/blog/is-rendering/"/>
        <id>https://www.farfa.dev/blog/is-rendering/</id>
        
        <content type="html" xml:base="https://www.farfa.dev/blog/is-rendering/">&lt;h1 id=&quot;preamble&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#preamble&quot; aria-label=&quot;Anchor link for: preamble&quot;&gt;Preamble&lt;&#x2F;a&gt;&lt;&#x2F;h1&gt;
&lt;p&gt;At the end of the spring of 2025, I started using some &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Computer-aided_design&quot;&gt;CAD&lt;&#x2F;a&gt; software, preferably FOSS, because I wanted to design some furniture for myself.&lt;&#x2F;p&gt;
&lt;p&gt;I didn’t find anything that really &lt;em&gt;clicked&lt;&#x2F;em&gt; for me at the time.&lt;br &#x2F;&gt;
While wondering I could make my own (as one do, you know), I made some intriguing discoveries&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-mkeeter-1&quot;&gt;&lt;a href=&quot;#fn-mkeeter&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-iq-1&quot;&gt;&lt;a href=&quot;#fn-iq&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;From this initial frustration and piqued curiosity, emerged a drive to write a simple CAD software:
&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codeberg.org&#x2F;GrandChaman&#x2F;fsolid&#x2F;&quot;&gt;&lt;code&gt;fsolid&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;.&lt;br &#x2F;&gt;
It’s merely a prototype at this point, never was simple, and served more as a learning experience than a CAD software.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  
  
  
  
  &lt;a href=&quot;fsolid_0.6.0_screenshot.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;fsolid_0.6.0_screenshot.ff9ba0b2a31db369.png&quot; srcset=&quot;fsolid_0.6.0_screenshot.png 2x&quot; alt=&quot;Screenshot of `fsolid` showing a simple interface and 3D part in the viewport.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Screenshot of &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codeberg.org&#x2F;GrandChaman&#x2F;fsolid&#x2F;&quot;&gt;&lt;code&gt;fsolid&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; (v0.6.0).&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;This was my first real experience with CAD, both as a user and as a developer.&lt;br &#x2F;&gt;
This means I’m not a domain-expert; take what I wrote in this article with a grain of salt.&lt;&#x2F;p&gt;
&lt;p&gt;This blog entry is about the tribulations of a feverish desire to render arbitrary implicit surfaces on screen at interactive speed
in the context of a mechanical CAD software.&lt;&#x2F;p&gt;
&lt;p&gt;It’s meant as a showcase of this part of my project, documenting why I made certain decision, and maybe sprouting new ideas in someone else’s mind.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;introduction&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#introduction&quot; aria-label=&quot;Anchor link for: introduction&quot;&gt;Introduction&lt;&#x2F;a&gt;&lt;&#x2F;h1&gt;
&lt;p&gt;Before starting our descent into this rabbit-hole, one should empower oneself with the basic understanding of what is
an &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Implicit_surface&quot;&gt;&lt;strong&gt;I&lt;&#x2F;strong&gt;mplicit &lt;strong&gt;S&lt;&#x2F;strong&gt;urface&lt;&#x2F;a&gt; (&lt;em&gt;IS&lt;&#x2F;em&gt;) and &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Signed_distance_function&quot;&gt;(&lt;strong&gt;S&lt;&#x2F;strong&gt;igned) &lt;strong&gt;D&lt;&#x2F;strong&gt;istance &lt;strong&gt;F&lt;&#x2F;strong&gt;ields&lt;&#x2F;a&gt; (&lt;em&gt;SDF&lt;&#x2F;em&gt;).&lt;br &#x2F;&gt;
Great explanations exist elsewhere&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-mkeeter-2&quot;&gt;&lt;a href=&quot;#fn-mkeeter&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-iq-2&quot;&gt;&lt;a href=&quot;#fn-iq&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-rvaillant-1&quot;&gt;&lt;a href=&quot;#fn-rvaillant&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;; it is encouraged that the uninitiated reader consult them before
diving any further.&lt;&#x2F;p&gt;
&lt;p&gt;In the CAD software I’ve used, which had graphical interfaces, &lt;em&gt;features&lt;&#x2F;em&gt; are added or modified by the user and the viewport reflect those changes.&lt;br &#x2F;&gt;
In some instances, the viewport allows the user to directly drag or scult &lt;em&gt;features&lt;&#x2F;em&gt; using a pointing device.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  
  
  
  
  &lt;a href=&quot;freecad_ui_screenshot.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;freecad_ui_screenshot.9d14a58eb7bd017b.png&quot; srcset=&quot;freecad_ui_screenshot.png 2x&quot; alt=&quot;Screenshot of FreeCAD with its interface and a 3D part rendered in the viewport.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Screenshot of &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.freecad.org&#x2F;&quot;&gt;FreeCAD&lt;&#x2F;a&gt; (v1.1.0), with a list of features on the left.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;After the modification has been made, the resulting 3D object can be rendered.
This include panning, rotating, zooming, or a combinations of those.&lt;&#x2F;p&gt;
&lt;p&gt;Most CAD software use &lt;em&gt;&lt;strong&gt;E&lt;&#x2F;strong&gt;xplicit &lt;strong&gt;S&lt;&#x2F;strong&gt;urfaces&lt;&#x2F;em&gt; (&lt;em&gt;ES&lt;&#x2F;em&gt;) (e.g., &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Non-uniform_rational_B-spline&quot;&gt;NURBS&lt;&#x2F;a&gt;, &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Triangle_mesh&quot;&gt;Mesh&lt;&#x2F;a&gt;, etc…).&lt;br &#x2F;&gt;
Most processes downstream of authoring (e.g., visualization, &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Computer-aided_manufacturing&quot;&gt;&lt;em&gt;CAM&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;, &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Finite_element_method&quot;&gt;FEM&lt;&#x2F;a&gt;, etc…) require some form of &lt;em&gt;ES&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
Rendering is usually done by converting the &lt;em&gt;ES&lt;&#x2F;em&gt; to a triangle mesh, as GPUs are very much optimized for this.&lt;&#x2F;p&gt;
&lt;p&gt;On the other hand, &lt;em&gt;IS&lt;&#x2F;em&gt; describes a &lt;strong&gt;volume&lt;&#x2F;strong&gt;, discretizing the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Isosurface&quot;&gt;&lt;em&gt;isosurface&lt;&#x2F;em&gt;&lt;&#x2F;a&gt; is expensive.&lt;br &#x2F;&gt;
While algorithm exists to convert &lt;em&gt;IS&lt;&#x2F;em&gt; to a triangle mesh&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-dc-1&quot;&gt;&lt;a href=&quot;#fn-dc&quot;&gt;4&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-mc-1&quot;&gt;&lt;a href=&quot;#fn-mc&quot;&gt;5&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;, they can be slow, produce very large meshes, and may require a complete recomputation on partial changes.&lt;br &#x2F;&gt;
They are necessary to export to a triangle mesh for downstream usage, but other strategies may be used for visualization&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-sphere_tracing-1&quot;&gt;&lt;a href=&quot;#fn-sphere_tracing&quot;&gt;6&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;fsolid_sphere_traced_dis_sphere.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;fsolid_sphere_traced_dis_sphere.png&quot; alt=&quot;A 3D model of a sphere with some surface distortion rendered. Colors are used to show surface normals.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Distortion sphere using a sphere-tracing shader (&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codeberg.org&#x2F;GrandChaman&#x2F;fsolid&#x2F;&quot;&gt;&lt;code&gt;fsolid&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; (commit &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codeberg.org&#x2F;GrandChaman&#x2F;fsolid&#x2F;commit&#x2F;96ad96646e2e819fede2224b7555a9758f330f21c509a9268a9c0a5e4b47ba8e&quot;&gt;96ad96646e&lt;&#x2F;a&gt;)).&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;For this project, I wanted to try and make an &lt;em&gt;IS&lt;&#x2F;em&gt; visualization that would scale with shape complexity, screen resolution and which would happen in two phases.&lt;br &#x2F;&gt;
A loading phase, happening on the CPU, then a rendering phase, which would happen each frame on the GPU.&lt;&#x2F;p&gt;
&lt;p&gt;The priority is smooth movement over up-to-date rendering.
This means that while an &lt;em&gt;IS&lt;&#x2F;em&gt; is still loading, a previous version may be showed on screen.&lt;&#x2F;p&gt;
&lt;p&gt;Optionally, I wanted that the rendering be some kind of &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Monte_Carlo_method&quot;&gt;Montecarlo process&lt;&#x2F;a&gt;, enhancing responsiveness at the cost of quality,
while the discretizing is happening.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  &lt;video poster=&quot;montecarlo_poster.png&quot;   controls  loading=lazy  muted  loop &gt;
    &lt;source src=&quot;montecarlo_video.mp4&quot; type=&quot;video&amp;#x2F;mp4&quot;&gt;
  &lt;&#x2F;video&gt;
  
  &lt;figcaption&gt;Loading of an &lt;em&gt;IS&lt;&#x2F;em&gt; using various degrees of precisions, behaving like a Montecarlo process&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;The technique described today will focus on a two-phase rendering: &lt;strong&gt;discretizing&lt;&#x2F;strong&gt; and &lt;strong&gt;tracing&lt;&#x2F;strong&gt;.&lt;br &#x2F;&gt;
It &lt;strong&gt;discretize&lt;&#x2F;strong&gt; the &lt;em&gt;IS&lt;&#x2F;em&gt; in a data-structure which will be &lt;strong&gt;traceable&lt;&#x2F;strong&gt; by the GPU.&lt;&#x2F;p&gt;
&lt;p&gt;This project uses &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;mkeeter&#x2F;fidget&quot;&gt;&lt;code&gt;fidget&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;, an excellent closed-form &lt;em&gt;IS&lt;&#x2F;em&gt; library.
The rendering is done via &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;gfx-rs&#x2F;wgpu&quot;&gt;&lt;code&gt;wgpu&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; and &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bevy.org&#x2F;&quot;&gt;&lt;code&gt;bevy&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;sampling-the-void&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#sampling-the-void&quot; aria-label=&quot;Anchor link for: sampling-the-void&quot;&gt;Sampling the void&lt;&#x2F;a&gt;&lt;&#x2F;h1&gt;
&lt;h2 id=&quot;bounding-volume&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bounding-volume&quot; aria-label=&quot;Anchor link for: bounding-volume&quot;&gt;Bounding volume&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;An &lt;em&gt;IS&lt;&#x2F;em&gt; only describes a volume.&lt;br &#x2F;&gt;
We don’t yet know where that volume is in space.
For that, we’ll use a &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Bounding_volume&quot;&gt;bounding volume&lt;&#x2F;a&gt;,
an &lt;strong&gt;A&lt;&#x2F;strong&gt;xis-&lt;strong&gt;A&lt;&#x2F;strong&gt;ligned &lt;strong&gt;B&lt;&#x2F;strong&gt;ounding &lt;strong&gt;B&lt;&#x2F;strong&gt;ox (&lt;em&gt;AABB&lt;&#x2F;em&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;em&gt;AABB&lt;&#x2F;em&gt; does multiple things for us.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;It limits the space we need to consider.&lt;&#x2F;li&gt;
&lt;li&gt;It gives us a simple and efficient way to do &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Slab_method&quot;&gt;ray-slab&lt;&#x2F;a&gt; intersection test, which will come in handy.&lt;&#x2F;li&gt;
&lt;li&gt;It will allows for fast &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Z-order&quot;&gt;z-ordering&lt;&#x2F;a&gt; of non-overlapping regions.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;em&gt;AABB&lt;&#x2F;em&gt; can be computed from common &lt;em&gt;IS&lt;&#x2F;em&gt;&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-iq_bbox-1&quot;&gt;&lt;a href=&quot;#fn-iq_bbox&quot;&gt;7&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; and boolean operations can combine multiple &lt;em&gt;AABB&lt;&#x2F;em&gt; into the resulting
&lt;em&gt;IS&lt;&#x2F;em&gt;’s &lt;em&gt;AABB&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For &lt;em&gt;unbounded&lt;&#x2F;em&gt; &lt;em&gt;IS&lt;&#x2F;em&gt;, one should be able to provide an &lt;em&gt;AABB&lt;&#x2F;em&gt; to limit the &lt;em&gt;IS&lt;&#x2F;em&gt; to a certain &lt;em&gt;domain&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;using-and-representing-voxels&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#using-and-representing-voxels&quot; aria-label=&quot;Anchor link for: using-and-representing-voxels&quot;&gt;Using and representing voxels&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;em&gt;IS&lt;&#x2F;em&gt;’s &lt;em&gt;AABB&lt;&#x2F;em&gt; is rectangular cuboid. We can trivially divide it into a subset of cubic &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Voxel&quot;&gt;voxel&lt;&#x2F;a&gt;.&lt;br &#x2F;&gt;
Because the &lt;em&gt;isosurface&lt;&#x2F;em&gt; is just a small portion of the &lt;em&gt;AABB&lt;&#x2F;em&gt;’s space, we’ll use a data structure which encode sparse data efficiently:
a variant of &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Sparse_voxel_octree&quot;&gt;&lt;strong&gt;S&lt;&#x2F;strong&gt;parse &lt;strong&gt;V&lt;&#x2F;strong&gt;oxel &lt;strong&gt;O&lt;&#x2F;strong&gt;ctree&lt;&#x2F;a&gt; (SVO).&lt;&#x2F;p&gt;
&lt;p&gt;This variant is a &lt;script type=&quot;math&#x2F;tex&quot;&gt;4*4*4&lt;&#x2F;script&gt; sparse voxel tree, which is sometimes named &lt;em&gt;contree&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
Each level of the &lt;em&gt;contree&lt;&#x2F;em&gt; divides its space by 4 on each axis, dividing its space into 64 voxels.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;contree_voxels.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;contree_voxels.png&quot; alt=&quot;Isometric view of a 3D cube, split into 64 voxels. The closes voxels is split itself into 64 sub-voxels.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Cubic space divided in cubic voxels (&lt;script type=&quot;math&#x2F;tex&quot;&gt;4*4*4&lt;&#x2F;script&gt;).&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;&lt;h3 id=&quot;loading-a-contree&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#loading-a-contree&quot; aria-label=&quot;Anchor link for: loading-a-contree&quot;&gt;Loading a &lt;em&gt;contree&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The algorithm to load the &lt;em&gt;contree&lt;&#x2F;em&gt; need a way to sample the &lt;em&gt;IS&lt;&#x2F;em&gt; in 3 ways:&lt;&#x2F;p&gt;
&lt;div class=&quot;flex flex-wrap gap-2 items-center&quot;&gt;
&lt;div class=&quot;flex-none&quot;&gt;
  &lt;figure&gt;
  
  
  &lt;a href=&quot;contree_interval_test.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;contree_interval_test.svg&quot; alt=&quot;Schema of an _IS_ in a grid of 16 cells. The cells which intersect with the _IS_ are colored yellow, the others are blue.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex-1 basis-1&#x2F;2&quot;&gt;
&lt;p&gt;&lt;strong&gt;Interval&lt;&#x2F;strong&gt;: this is done to prune large regions of space as entirely empty, full or ambiguous.
This testing is conservative; regions which are in fact truly empty might appear as ambiguous in the interval testing.&lt;&#x2F;p&gt;
&lt;p&gt;Regions that are entirely contained by the &lt;em&gt;IS&lt;&#x2F;em&gt; may also be pruned.&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex flex-wrap gap-2 items-center&quot;&gt;
&lt;div class=&quot;flex-none&quot;&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;contree_distance_test.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;contree_distance_test.svg&quot; alt=&quot;Schema of an _IS_ and how sampling the distance to the surface can be thought of. Sampling the distance at any given point yield the radius of the circle (2D, a sphere in 3D) between the sampled point and the _isosurface_.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex-1 basis-1&#x2F;2&quot;&gt;
&lt;p&gt;&lt;strong&gt;Distance&lt;&#x2F;strong&gt;: this will be useful to detect where the &lt;em&gt;isosurface&lt;&#x2F;em&gt; is and how far it is from the sampled points.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;mkeeter&#x2F;fidget&quot;&gt;&lt;code&gt;fidget&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; allows sampling multiple points at once, to amortize the overhead of running the execution tape.&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex flex-wrap gap-2 items-center&quot;&gt;
&lt;div class=&quot;flex-none&quot;&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;contree_gradient_test.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;contree_gradient_test.svg&quot; alt=&quot;Schema of an _IS_ with a normal drawned on its surface.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex-1 basis-1&#x2F;2&quot;&gt;
&lt;p&gt;&lt;strong&gt;Gradient&lt;&#x2F;strong&gt;: this yields a gradient of the &lt;em&gt;IS&lt;&#x2F;em&gt; at a given point, which we’ll use as a surface &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Normal_(geometry)&quot;&gt;normal&lt;&#x2F;a&gt;“.&lt;&#x2F;p&gt;
&lt;p&gt;It’s also possible to get a surface normal using the distance sampling (i.e., using &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;iquilezles.org&#x2F;articles&#x2F;normalsSDF&#x2F;&quot;&gt;forward or central differences&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;mkeeter&#x2F;fidget&quot;&gt;&lt;code&gt;fidget&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; provides these 3 features.&lt;br &#x2F;&gt;
It also provides great performance&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-mkeeter-3&quot;&gt;&lt;a href=&quot;#fn-mkeeter&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;—I encourage readers to check it out
(e.g., &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;fidget-jit&#x2F;0.4.3&#x2F;fidget_jit&#x2F;&quot;&gt;JIT&lt;&#x2F;a&gt;, tape simplification, heap buffer re-use, etc…).&lt;&#x2F;p&gt;
&lt;h4 id=&quot;stop-condition&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#stop-condition&quot; aria-label=&quot;Anchor link for: stop-condition&quot;&gt;Stop condition&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;When loading a &lt;em&gt;contree&lt;&#x2F;em&gt;, one needs to know when to stop subdividing any further.&lt;br &#x2F;&gt;
We can calculate a stop condition by knowing the minimum feature size we’d like to be able to detect. From the &lt;em&gt;AABB&lt;&#x2F;em&gt; size and this minimum
feature size, we can calculate a &lt;em&gt;contree&lt;&#x2F;em&gt; &lt;strong&gt;depth&lt;&#x2F;strong&gt; we should be reaching.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;recursive-interval-testing&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#recursive-interval-testing&quot; aria-label=&quot;Anchor link for: recursive-interval-testing&quot;&gt;Recursive interval testing&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;We recursively divide the space we consider into 64 voxels, performing interval tests and only consider the voxels which have an ambiguous interval result.&lt;br &#x2F;&gt;
The recursion stops when we get at the desired depth.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;detecting-isosurface-crossing&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#detecting-isosurface-crossing&quot; aria-label=&quot;Anchor link for: detecting-isosurface-crossing&quot;&gt;Detecting &lt;em&gt;isosurface&lt;&#x2F;em&gt; crossing&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Upon reaching the desired subdivision level, we can start sampling distances to find out the voxels which contain the &lt;em&gt;isosurface&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For each &lt;strong&gt;leaf node&lt;&#x2F;strong&gt;, we want to sample the 64 voxels, but are interested in the &lt;strong&gt;vertices&lt;&#x2F;strong&gt; of the voxels rather than their center.&lt;br &#x2F;&gt;
Effectively, we’re interested in the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dual_graph&quot;&gt;dual graph&lt;&#x2F;a&gt; of the voxel grid.&lt;&#x2F;p&gt;
&lt;p&gt;Each voxel has 8 vertices, times 64, naively, that’s 512 points we need to sample.&lt;br &#x2F;&gt;
However, the voxels are tightly packed in our representation and they share vertices with their neighbors.&lt;br &#x2F;&gt;
We can count 5 unique vertices in a row or column of voxels. &lt;script type=&quot;math&#x2F;tex&quot;&gt;5^3=125&lt;&#x2F;script&gt;.&lt;br &#x2F;&gt;
To amortize the execution overhead, we can sample those 125 points in single tape execution.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;sign_chance_detection.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;sign_chance_detection.svg&quot; alt=&quot;2D schema of 16 voxels, showing how the sign detection find populated voxels and which vertices are saved.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;2D schema of 16 voxels, an &lt;em&gt;IS&lt;&#x2F;em&gt; (in purple) and vertices (in grey and blue).&lt;br&#x2F;&gt;Populated voxel are highlighted.&lt;br&#x2F;&gt;The vertices we’re interesting in for our contree are in blue.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;We can then iterate over the 64 voxels, checking if there is a sign change in the distance from one of their 8 vertices.&lt;br &#x2F;&gt;
Using this, we create a list of populated voxels and also keep the sampled distances for the vertices of the populated voxels.&lt;&#x2F;p&gt;
&lt;p&gt;Once the 64 voxels are checked, we calculate the gradient for each of the vertices that we kept around.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;summary&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#summary&quot; aria-label=&quot;Anchor link for: summary&quot;&gt;Summary&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;We now have all the information we need to load a &lt;em&gt;contree&lt;&#x2F;em&gt;:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Which voxels are populated, their position and depth in our &lt;em&gt;contree&lt;&#x2F;em&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;A distance to the surface for each of their vertices.&lt;&#x2F;li&gt;
&lt;li&gt;A gradient for  each of their vertices.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The next step is to &lt;strong&gt;serialize&lt;&#x2F;strong&gt; this structure so it’s usable.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;serializing-the-contree&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#serializing-the-contree&quot; aria-label=&quot;Anchor link for: serializing-the-contree&quot;&gt;Serializing the &lt;em&gt;contree&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;I chose to have a serialization format which I could build while loading the &lt;em&gt;contree&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For simplicity, I also wanted to have the same representation on the CPU and GPU side.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;contrees-basics&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#contrees-basics&quot; aria-label=&quot;Anchor link for: contrees-basics&quot;&gt;&lt;em&gt;contrees&lt;&#x2F;em&gt; basics&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Each node of the &lt;em&gt;contree&lt;&#x2F;em&gt; can address 64 voxels, its population mask fitting exactly  64-bits.&lt;br &#x2F;&gt;
Additionally, a header is necessary to hold some metadata about the node. The header needs to encode a flag
to differentiate &lt;strong&gt;non-leaf&lt;&#x2F;strong&gt; from &lt;strong&gt;leaf&lt;&#x2F;strong&gt; nodes and a pointer.&lt;br &#x2F;&gt;
For &lt;strong&gt;non-leaf nodes&lt;&#x2F;strong&gt;, the &lt;em&gt;flag&lt;&#x2F;em&gt; is &lt;strong&gt;unset&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The semantic of the pointer change depending on the type of node.&lt;&#x2F;p&gt;
&lt;p&gt;The childmask only encode the population of the voxel, associated data are necessary and stored elsewhere.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;em&gt;contree&lt;&#x2F;em&gt; will be encoded by the CPU into the main memory and will be uploaded to the GPU for tracing.&lt;br &#x2F;&gt;
GPUs have fairly strict requirements and limitations in terms of memory alignments, maximum buffer size, etc…&lt;br &#x2F;&gt;
To account for this, the &lt;em&gt;contree&lt;&#x2F;em&gt; will be built in a &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;AoS_and_SoA&quot;&gt;Structure of Arrays (SoA)&lt;&#x2F;a&gt; style, using &lt;strong&gt;32-bits&lt;&#x2F;strong&gt; pointers
to address those (64-bits support is uncommon in GPUs).&lt;&#x2F;p&gt;
&lt;h4 id=&quot;array-of-structures-of-arrays&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#array-of-structures-of-arrays&quot; aria-label=&quot;Anchor link for: array-of-structures-of-arrays&quot;&gt;Array of Structures of Arrays&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;The &lt;em&gt;contree&lt;&#x2F;em&gt; will need to be uploaded and consumed by the GPU. To make accessing efficient and cache-friendly we can make it into an &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;AoS_and_SoA#Array_of_structures_of_arrays&quot;&gt;&lt;strong&gt;A&lt;&#x2F;strong&gt;rray of &lt;strong&gt;S&lt;&#x2F;strong&gt;tructures of &lt;strong&gt;A&lt;&#x2F;strong&gt;rrays&lt;&#x2F;a&gt; (AoSoA).&lt;&#x2F;p&gt;
&lt;p&gt;To do that, we’ll make heavy use of 32-bit pointers into our &lt;em&gt;contree&lt;&#x2F;em&gt;, into various arrays.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;em&gt;contree&lt;&#x2F;em&gt; is split into 4 buffers:&lt;&#x2F;p&gt;
&lt;h5 id=&quot;nodes&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#nodes&quot; aria-label=&quot;Anchor link for: nodes&quot;&gt;Nodes&lt;&#x2F;a&gt;&lt;&#x2F;h5&gt;
&lt;p&gt;They are the leaf and non-leaf nodes of the &lt;em&gt;contree&lt;&#x2F;em&gt;, they were already mentionned in a &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#contrees-basics&quot;&gt;previous section&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;contree_node_packet_diagram.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;contree_node_packet_diagram.svg&quot; alt=&quot;_contree_ node memory layout diagram.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;In-memory layout of a &lt;em&gt;contree&lt;&#x2F;em&gt; node.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;They may look something like this in code:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #CDD6F4; background-color: #1E1E2E;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;struct&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; Node&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   header&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   pop_mask&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u64&lt;&#x2F;span&gt;&lt;span&gt; &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This structure spans &lt;strong&gt;12 bytes&lt;&#x2F;strong&gt;.&lt;br &#x2F;&gt;
Due to alignment, it may span &lt;strong&gt;16 bytes&lt;&#x2F;strong&gt; instead (c.f. &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;doc.rust-lang.org&#x2F;reference&#x2F;type-layout.html#r-layout.primitive&quot;&gt;&lt;code&gt;rust&lt;&#x2F;code&gt; alignment of primitive types&lt;&#x2F;a&gt;).
Alignment is platform specific, so YMMV.&lt;&#x2F;p&gt;
&lt;p&gt;This is kept as a &lt;code&gt;u64&lt;&#x2F;code&gt; for now as we benefit from functions exposed by the &lt;code&gt;u64&lt;&#x2F;code&gt; type (e.g., &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;doc.rust-lang.org&#x2F;std&#x2F;primitive.u64.html#method.count_ones&quot;&gt;&lt;code&gt;u64::count_ones&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;). It would be interesting
to measure the performance impact of storing it as two &lt;code&gt;u32&lt;&#x2F;code&gt; instead, as the memory footprint for this buffer would decrease by 25%.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;wgpu&lt;&#x2F;code&gt; doesn’t support 64-bits types by default, to upload this structure to the GPU, we can convert it to 3 &lt;code&gt;u32&lt;&#x2F;code&gt; instead.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #CDD6F4; background-color: #1E1E2E;&quot;&gt;&lt;code data-lang=&quot;wgsl&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;struct&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; GPUNode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   header: &lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   pop_mask_hi: &lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   pop_mask_low: &lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This new form is compatible with the alignment that &lt;code&gt;wgpu&lt;&#x2F;code&gt; requires, as &lt;code&gt;u32&lt;&#x2F;code&gt; types are aligned on a 4-bytes boundary&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-wgpu_alignment-1&quot;&gt;&lt;a href=&quot;#fn-wgpu_alignment&quot;&gt;8&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;h6 id=&quot;leaf-nodes&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#leaf-nodes&quot; aria-label=&quot;Anchor link for: leaf-nodes&quot;&gt;Leaf nodes&lt;&#x2F;a&gt;&lt;&#x2F;h6&gt;
&lt;p&gt;For &lt;strong&gt;leaf nodes&lt;&#x2F;strong&gt;, the &lt;em&gt;flag&lt;&#x2F;em&gt; is &lt;strong&gt;set&lt;&#x2F;strong&gt; in the header.&lt;br &#x2F;&gt;
The pointer is an &lt;strong&gt;absolute&lt;&#x2F;strong&gt; offset to the first &lt;em&gt;leaf indirection&lt;&#x2F;em&gt; in the corresponding buffer (c.f. &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#leaf-indirections&quot;&gt;the next section on the topic&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;em&gt;leaf indirections&lt;&#x2F;em&gt; are placed in their population mask order, in the &lt;em&gt;leaf indirection&lt;&#x2F;em&gt; buffer.&lt;&#x2F;p&gt;
&lt;h6 id=&quot;non-leaf-nodes&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#non-leaf-nodes&quot; aria-label=&quot;Anchor link for: non-leaf-nodes&quot;&gt;Non-leaf nodes&lt;&#x2F;a&gt;&lt;&#x2F;h6&gt;
&lt;p&gt;For &lt;strong&gt;non-leaf nodes&lt;&#x2F;strong&gt;, the &lt;em&gt;flag&lt;&#x2F;em&gt; is &lt;strong&gt;unset&lt;&#x2F;strong&gt; in the header.&lt;br &#x2F;&gt;
The pointer is an &lt;strong&gt;absolute&lt;&#x2F;strong&gt; offset to the first child in the node buffer.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;contree_children_offset_diagram.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;contree_children_offset_diagram.svg&quot; alt=&quot;Diagram showcasing how one can use the population mask and the pointer in the node&amp;#x27;s header to calculate the children positions in the node buffer.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Using population mask to find children offset in node buffer.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;Each set bit in the &lt;em&gt;population mask&lt;&#x2F;em&gt; represent a populated child.&lt;br &#x2F;&gt;
The children are densely packed in their population mask order, in the node buffer.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;buffers-figure-1.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;buffers-figure-1.svg&quot; alt=&quot;Schema of the node buffer, showing how the nodes are referencing each other using the pointer field.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Schema of a node buffer&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;&lt;h5 id=&quot;leaf-indirections&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#leaf-indirections&quot; aria-label=&quot;Anchor link for: leaf-indirections&quot;&gt;Leaf indirections&lt;&#x2F;a&gt;&lt;&#x2F;h5&gt;
&lt;p&gt;The leaf indirections are made up of two offsets into the two remaining buffers:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;An &lt;strong&gt;absolute&lt;&#x2F;strong&gt; offset into the leaf data buffer.&lt;&#x2F;li&gt;
&lt;li&gt;A &lt;strong&gt;partial&lt;&#x2F;strong&gt; offset into the vertex data buffer (See the &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#leaf-data&quot;&gt;leaf data&lt;&#x2F;a&gt; section to get the absolute offset).&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Both in &lt;code&gt;rust&lt;&#x2F;code&gt; and in &lt;code&gt;wgsl&lt;&#x2F;code&gt;, this structure looks like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #CDD6F4; background-color: #1E1E2E;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;struct&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; LeafIndirection&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   leaf_data_offset&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   vertex_buffer_offset&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span&gt; &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This structure should have no padding due to alignment and spans &lt;strong&gt;8 bytes&lt;&#x2F;strong&gt; per entry.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;buffers-figure-2.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;buffers-figure-2.svg&quot; alt=&quot;Schema showcasing how the node buffer allows to find the slice of relevant data in the leaf indirection buffer.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Schema of the node and leaf indirection buffers, and their interactions&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;The &lt;em&gt;pointer&lt;&#x2F;em&gt; of &lt;span style=&quot;color: var(--color-pine)&quot;&gt;&lt;strong&gt;leaf &lt;em&gt;nodes&lt;&#x2F;em&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;span&gt; is the &lt;strong&gt;absolute offset&lt;&#x2F;strong&gt; into the &lt;em&gt;leaf indirection buffer&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h5 id=&quot;leaf-data&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#leaf-data&quot; aria-label=&quot;Anchor link for: leaf-data&quot;&gt;Leaf data&lt;&#x2F;a&gt;&lt;&#x2F;h5&gt;
&lt;p&gt;The &lt;em&gt;leaf data&lt;&#x2F;em&gt; buffer contains 8 &lt;strong&gt;partial offsets&lt;&#x2F;strong&gt; into the &lt;em&gt;vertex data&lt;&#x2F;em&gt; buffer, one for each of the voxel’s vertices.&lt;&#x2F;p&gt;
&lt;p&gt;This structure is the same both in the CPU and GPU world, having no padding due to alignment and spanning &lt;strong&gt;64 bytes&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #CDD6F4; background-color: #1E1E2E;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;struct&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; LeafData&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c001&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c010&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c011&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c100&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c101&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c110&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   c111&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each partial offset is in the range &lt;script type=&quot;math&#x2F;tex&quot;&gt;[0, 125)&lt;&#x2F;script&gt; (the maximum number of vertices per node).&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;buffers-figure-3.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;buffers-figure-3.svg&quot; alt=&quot;Schema showcasing how the leaf indirection buffer allows to find the slice of relevant data in the leaf data buffer.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Schema of the node, leaf indirection, leaf data buffers, and their interactions&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;Using the &lt;span style=&quot;color: var(--color-iris)&quot;&gt;&lt;em&gt;leaf data offset&lt;&#x2F;em&gt;&lt;&#x2F;span&gt; from the &lt;em&gt;leaf indirection&lt;&#x2F;em&gt; buffer, one can find the offset into the &lt;em&gt;leaf data&lt;&#x2F;em&gt; buffer
where to start reading for this leaf node.&lt;br &#x2F;&gt;
For each leaf node with &lt;script type=&quot;math&#x2F;tex&quot;&gt;N&lt;&#x2F;script&gt; &lt;strong&gt;populated children&lt;&#x2F;strong&gt;, one can expect to find &lt;script type=&quot;math&#x2F;tex&quot;&gt;N&lt;&#x2F;script&gt; &lt;code&gt;LeafData&lt;&#x2F;code&gt; entries at the &lt;em&gt;leaf data offset&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Because we deduplicated shared vertices when loading the &lt;em&gt;contree&lt;&#x2F;em&gt;, some &lt;span style=&quot;color: var(--color-highlight-high)&quot;&gt;offsets&lt;&#x2F;span&gt; may be shared by multiple vertices.&lt;&#x2F;p&gt;
&lt;h5 id=&quot;vertex-data&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#vertex-data&quot; aria-label=&quot;Anchor link for: vertex-data&quot;&gt;Vertex data&lt;&#x2F;a&gt;&lt;&#x2F;h5&gt;
&lt;p&gt;Each vertex data contains a distance and &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Normal_(geometry)&quot;&gt;normal&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This structure has no padding due to alignment and spans &lt;strong&gt;16 bytes&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #CDD6F4; background-color: #1E1E2E;&quot;&gt;&lt;code data-lang=&quot;wgsl&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;struct&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; VertexData&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   normals: &lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;vec3f&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   distance: &lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;f32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;figure&gt;
  
  
  &lt;a href=&quot;buffers-figure-4.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;buffers-figure-4.svg&quot; alt=&quot;Schema showcasing how the leaf data buffer with the leaf indirection buffer allow to resolve the vertex data position in the buffer.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Schema of the node, leaf indirection, leaf data buffer, vertex data buffer, and their interactions&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;By adding together the &lt;span style=&quot;color: var(--color-foam)&quot;&gt;&lt;strong&gt;vertex buffer offset&lt;&#x2F;strong&gt;&lt;&#x2F;span&gt;, from the &lt;em&gt;leaf indirection&lt;&#x2F;em&gt; buffer and the &lt;span style=&quot;color: var(--color-highlight-high)&quot;&gt;offset&lt;&#x2F;span&gt; for a given vertex in the &lt;em&gt;leaf data&lt;&#x2F;em&gt; buffer, we get the &lt;strong&gt;absolute offset&lt;&#x2F;strong&gt; into the
&lt;em&gt;vertex data&lt;&#x2F;em&gt; buffer.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;space-addressing&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#space-addressing&quot; aria-label=&quot;Anchor link for: space-addressing&quot;&gt;Space addressing&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;At each level of the &lt;em&gt;contree&lt;&#x2F;em&gt;, the voxels can be addressed using a &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Z-order_curve&quot;&gt;Morton Code&lt;&#x2F;a&gt; encoding.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #CDD6F4; background-color: #1E1E2E;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;pub struct&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; MortonCode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;pub u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;impl&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F9E2AF;font-style: italic;&quot;&gt; MortonCode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt;    pub const fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89B4FA;font-style: italic;&quot;&gt; new&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #EBA0AC;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #EBA0AC;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #EBA0AC;&quot;&gt; z&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #CBA6F7;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; -&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #F38BA8;&quot;&gt; Self&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89B4FA;font-style: italic;&quot;&gt;        debug_assert!&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FAB387;&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89B4FA;font-style: italic;&quot;&gt;        debug_assert!&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span&gt;y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FAB387;&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89B4FA;font-style: italic;&quot;&gt;        debug_assert!&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span&gt;z&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FAB387;&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #F38BA8;&quot;&gt;        Self&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span&gt;z&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FAB387;&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; |&lt;&#x2F;span&gt;&lt;span&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FAB387;&quot;&gt; 2&lt;&#x2F;span&gt;&lt;span style=&quot;color: #94E2D5;&quot;&gt; |&lt;&#x2F;span&gt;&lt;span&gt; x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9399B2;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The value of the &lt;code&gt;MortonCode&lt;&#x2F;code&gt; is in the range &lt;script type=&quot;math&#x2F;tex&quot;&gt;[0, 64)&lt;&#x2F;script&gt;.
The &lt;code&gt;MortonCode&lt;&#x2F;code&gt; is used to set the population bit in the node’s childmask.&lt;&#x2F;p&gt;
&lt;p&gt;By having a list of &lt;code&gt;MortonCode&lt;&#x2F;code&gt; of &lt;code&gt;D&lt;&#x2F;code&gt; element, where &lt;code&gt;D&lt;&#x2F;code&gt; is the depth of the &lt;em&gt;contree&lt;&#x2F;em&gt;, one can access a particular voxel.&lt;br &#x2F;&gt;
I’m calling such a list a &lt;em&gt;Morton Path&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;To get the space coordinate at a given voxel, one need to use the bounding box of the &lt;em&gt;contree&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
You can trivally reverse the Morton Code encoding to get back the &lt;code&gt;x&lt;&#x2F;code&gt;, &lt;code&gt;y&lt;&#x2F;code&gt; and &lt;code&gt;z&lt;&#x2F;code&gt; components.&lt;br &#x2F;&gt;
From these you can calculate an offset from the bounding box minimum.&lt;&#x2F;p&gt;
&lt;p&gt;This encoding can also be leveraged from the way &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Single-precision_floating-point_format&quot;&gt;IEEE754 floats&lt;&#x2F;a&gt; are represented.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;float_bin_repr.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;float_bin_repr.svg&quot; alt=&quot;binary representation of a 32-bit floating-point number.&quot; class=&quot;invert &quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;binary representation of a 32-bit floating-point number.&lt;br&#x2F;&gt;by &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;user:fresheneesz&quot; title=&quot;en:user:fresheneesz&quot; rel=&quot;external&quot;&gt;fresheneesz&lt;&#x2F;a&gt; at the english wikipedia project, &lt;a href=&quot;http:&#x2F;&#x2F;creativecommons.org&#x2F;licenses&#x2F;by-sa&#x2F;3.0&#x2F;&quot; title=&quot;creative commons attribution-share alike 3.0&quot;&gt;cc by-sa 3.0&lt;&#x2F;a&gt;, &lt;a href=&quot;https:&#x2F;&#x2F;commons.wikimedia.org&#x2F;w&#x2F;index.php?curid=3357169&quot; rel=&quot;external&quot; &gt;link&lt;&#x2F;a&gt;&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;In the exclusive range &lt;script type=&quot;math&#x2F;tex&quot;&gt;[1.0, 2.0)&lt;&#x2F;script&gt;, the 23 least importants bits of a 32-bits floats can be used to encode a &lt;em&gt;Morton Path&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
Splitting those 23 bits in chunks of 2 gives us 11 &lt;em&gt;Morton Path&lt;&#x2F;em&gt; entries and 1 bit to spare.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;float_depth_repr.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;float_depth_repr.svg&quot; alt=&quot;binary representation of a 32-bit floating-point number.&quot; class=&quot;invert &quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Remix of &lt;a href=&#x27;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;user:fresheneesz&#x27; title=&#x27;en:user:fresheneesz&#x27; rel=&#x27;external&#x27;&gt;fresheneesz&lt;&#x2F;a&gt;’s &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;commons.wikimedia.org&#x2F;w&#x2F;index.php?curid=3357169&quot;&gt;original schema&lt;&#x2F;a&gt; showing the &lt;em&gt;Morton Path&lt;&#x2F;em&gt; encoding in a 32-bits floating-point number.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;This idea is quite elegant, though not my own&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-64tree-1&quot;&gt;&lt;a href=&quot;#fn-64tree&quot;&gt;9&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This technique may be used to convert voxel positions in the &lt;em&gt;contree&lt;&#x2F;em&gt; to and from floats.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;f32-bits-1.5.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;f32-bits-1.5.png&quot; alt=&quot;Bit representation of a 32-bit floating-point number with a value of 1.5.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Bit representation of a 32-bit floating-point number with a value of 1.5. &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.h-schmidt.net&#x2F;FloatConverter&#x2F;IEEE754.html&quot;&gt;Source website&lt;&#x2F;a&gt;&lt;br&#x2F;&gt;&lt;script type=&quot;math&#x2F;tex&quot;&gt;1.0 + 4^{-1} * 2 = 1.5&lt;&#x2F;script&gt;&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;&lt;figure&gt;
  
  
  &lt;a href=&quot;f32-bits-1.6875.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;f32-bits-1.6875.png&quot; alt=&quot;Bit representation of a 32-bit floating-point number with a value of 1.6875.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Bit representation of a 32-bit floating-point number with a value of 1.6875. &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.h-schmidt.net&#x2F;FloatConverter&#x2F;IEEE754.html&quot;&gt;Source website&lt;&#x2F;a&gt;&lt;br&#x2F;&gt;&lt;script type=&quot;math&#x2F;tex&quot;&gt;1.0 + 4^{-1} * 2 + 4^{-2} * 3 = 1.6875&lt;&#x2F;script&gt;&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;&lt;h4 id=&quot;maximum-depth&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#maximum-depth&quot; aria-label=&quot;Anchor link for: maximum-depth&quot;&gt;Maximum depth&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;As we saw in the &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#space-addressing&quot;&gt;Space addressing section&lt;&#x2F;a&gt;, we can only address &lt;em&gt;contree&lt;&#x2F;em&gt; with a depth of 11.&lt;&#x2F;p&gt;
&lt;p&gt;A tree spanning a region of size &lt;script type=&quot;math&#x2F;tex&quot;&gt;x&lt;&#x2F;script&gt; would be able represent, at best, a voxel of size
&lt;script type=&quot;math&#x2F;tex&quot;&gt;4^{-12}x&lt;&#x2F;script&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;tile-grid&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tile-grid&quot; aria-label=&quot;Anchor link for: tile-grid&quot;&gt;Tile Grid&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;having-a-shorter-forest-instead-of-a-tall-tree&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#having-a-shorter-forest-instead-of-a-tall-tree&quot; aria-label=&quot;Anchor link for: having-a-shorter-forest-instead-of-a-tall-tree&quot;&gt;Having a shorter forest instead of a tall tree&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;em&gt;contrees&lt;&#x2F;em&gt; have their challenges.&lt;&#x2F;p&gt;
&lt;p&gt;Even if the loading of the &lt;em&gt;contree&lt;&#x2F;em&gt; &lt;em&gt;could&lt;&#x2F;em&gt; be done in parallel, in my experience, it was much slower than a single-threaded process due to poor cache-locality and massive
parallelism overhead.&lt;&#x2F;p&gt;
&lt;p&gt;The memory footprint of &lt;em&gt;contrees&lt;&#x2F;em&gt; also scales with depth, each additional layer of &lt;em&gt;contrees&lt;&#x2F;em&gt; requiring more intermediate &lt;em&gt;nodes&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
This is important for tracing, as memory streaming can easily be a bottleneck in GPU applications.&lt;&#x2F;p&gt;
&lt;p&gt;Modifying a &lt;em&gt;contree&lt;&#x2F;em&gt; is a non-trivial operation using the current in-memory and serialization format.&lt;br &#x2F;&gt;
Optimizing for modification could mean changing this format and requiring an actual serialization process.&lt;&#x2F;p&gt;
&lt;p&gt;It’s unclear if the faster modification time would offset the new serialization step and if this change would be a net positive in terms of execution time and memory footprint.&lt;&#x2F;p&gt;
&lt;p&gt;However, some of those challenges can be addressed by adding another layer of indirections: &lt;em&gt;tiles&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
Using &lt;em&gt;tiles&lt;&#x2F;em&gt; to discretize our &lt;em&gt;IS&lt;&#x2F;em&gt; means using more than a single &lt;em&gt;contree&lt;&#x2F;em&gt;, by &lt;strong&gt;tiling&lt;&#x2F;strong&gt; our &lt;em&gt;IS&lt;&#x2F;em&gt;’s region beforehand, and having at most one &lt;em&gt;contree&lt;&#x2F;em&gt; per tile.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;loading-a-tile-grid&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#loading-a-tile-grid&quot; aria-label=&quot;Anchor link for: loading-a-tile-grid&quot;&gt;Loading a tile grid&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Instead of discretizing the &lt;em&gt;IS&lt;&#x2F;em&gt;’s space into one &lt;em&gt;contree&lt;&#x2F;em&gt;, we can first
divide this space into a grid of tiles (rounded-up), stored in a 3 dimensional array.&lt;br &#x2F;&gt;
Each tile is a &lt;strong&gt;cubic&lt;&#x2F;strong&gt; region of space that is either empty or contains a &lt;em&gt;contree&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For each of the &lt;em&gt;tiles&lt;&#x2F;em&gt;, we can sample the &lt;em&gt;IS&lt;&#x2F;em&gt; interval in the region to determine if the &lt;em&gt;isosurface&lt;&#x2F;em&gt; might be inside.
Empty &lt;em&gt;tiles&lt;&#x2F;em&gt; are ignored and a &lt;em&gt;contree&lt;&#x2F;em&gt; is loaded for ambiguous &lt;em&gt;tiles&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This operation can be nicely parallelized.&lt;&#x2F;p&gt;
&lt;p&gt;I used a benchmark that discretizes a region of &lt;script type=&quot;math&#x2F;tex&quot;&gt;10&lt;&#x2F;script&gt; units across.&lt;br &#x2F;&gt;
Using 1 tiles with a &lt;em&gt;contree&lt;&#x2F;em&gt; depth of 4 is equivalent to using 16 tiles per axis with a &lt;em&gt;contree&lt;&#x2F;em&gt; depth of 2.&lt;&#x2F;p&gt;
&lt;script type=&quot;math&#x2F;tex;mode=display&quot;&gt;\frac{10}{4^5} = \frac{10}{16 * 4^3}&lt;&#x2F;script&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;&#x2F;strong&gt;: a &lt;em&gt;contree&lt;&#x2F;em&gt; of depth 0 still divides by 4 on each axis, each level &lt;script type=&quot;math&#x2F;tex&quot;&gt;L&lt;&#x2F;script&gt; divides by &lt;script type=&quot;math&#x2F;tex&quot;&gt;4^{L+1}&lt;&#x2F;script&gt;&lt;&#x2F;p&gt;
&lt;div class=&quot;flex flex-row flex-wrap&quot;&gt;
&lt;div class=&quot;flex-1&quot;&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;t1_d4_pdf_small.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;t1_d4_pdf_small.svg&quot; alt=&quot;Probability Distribution Function chart showing an median time of 48.38ms.&quot; class=&quot;invert &quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Single-threaded loading of tile grids of &lt;em&gt;contrees&lt;&#x2F;em&gt; using &lt;script type=&quot;math&#x2F;tex&quot;&gt;1&lt;&#x2F;script&gt; tile and a depth of &lt;script type=&quot;math&#x2F;tex&quot;&gt;4&lt;&#x2F;script&gt;.&lt;br&#x2F;&gt;Median time: 48.38ms.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex-1&quot;&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;t16_d2_pdf_small.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;t16_d2_pdf_small.svg&quot; alt=&quot;Probability Distribution Function chart showing an median time of 1.21ms.&quot; class=&quot;invert &quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Parallel loading of tile grids of &lt;em&gt;contrees&lt;&#x2F;em&gt; using &lt;script type=&quot;math&#x2F;tex&quot;&gt;16^3&lt;&#x2F;script&gt; tiles and a depth of &lt;script type=&quot;math&#x2F;tex&quot;&gt;2&lt;&#x2F;script&gt;&lt;br&#x2F;&gt;Median time: 1.21ms.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;p&gt;Of course, it is fairly obvious that parallelizing such process will reduce the overall runtime by quite a bit.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;serializing-a-tile-grid&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#serializing-a-tile-grid&quot; aria-label=&quot;Anchor link for: serializing-a-tile-grid&quot;&gt;Serializing a tile grid&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Serialization of the &lt;em&gt;tile grid&lt;&#x2F;em&gt; is a very similar to the technique used for the &lt;em&gt;contrees&lt;&#x2F;em&gt;, we just add one more layer of indirection using two new buffers: &lt;em&gt;tile&lt;&#x2F;em&gt; and the &lt;em&gt;tile indirection&lt;&#x2F;em&gt; buffers.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;em&gt;tile&lt;&#x2F;em&gt; buffer is a list of &lt;code&gt;u32&lt;&#x2F;code&gt;.&lt;br &#x2F;&gt;
A special value of &lt;code&gt;u32::MAX&lt;&#x2F;code&gt; signals that the &lt;em&gt;tile&lt;&#x2F;em&gt; is empty, otherwise, the value is a pointer into the &lt;em&gt;tile indirection&lt;&#x2F;em&gt; buffer.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;buffers-figure-5.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;buffers-figure-5.svg&quot; alt=&quot;Schema of the GPU buffers and how the tile indirection buffer help maps to the _contree_ buffer slices.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Schema of the tile and tile indirection buffers, their interaction with each other and the &lt;em&gt;contree&lt;&#x2F;em&gt; buffers&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;The &lt;strong&gt;absolute&lt;&#x2F;strong&gt; offsets we used in the &lt;em&gt;contrees&lt;&#x2F;em&gt; are now &lt;strong&gt;partial&lt;&#x2F;strong&gt; offsets.&lt;br &#x2F;&gt;
One need to add the &lt;em&gt;tile indirection&lt;&#x2F;em&gt; offsets to get the starting index for a given &lt;em&gt;tile&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;choosing-the-right-tile-size&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#choosing-the-right-tile-size&quot; aria-label=&quot;Anchor link for: choosing-the-right-tile-size&quot;&gt;Choosing the right tile size&lt;&#x2F;a&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Choosing the tile size is a game of balance between the size of the 3D array that will contain the &lt;em&gt;tiles&lt;&#x2F;em&gt;, the depth of the &lt;em&gt;contrees&lt;&#x2F;em&gt; and the minimum feature that will be
discretizable.&lt;&#x2F;p&gt;
&lt;p&gt;Also notably, the number of &lt;em&gt;tiles&lt;&#x2F;em&gt; will directly affect the number of tasks to parallelize. Having too much &lt;em&gt;tiles&lt;&#x2F;em&gt; causing the overhead of the parallelization to be greater per task.&lt;br &#x2F;&gt;
On the other hand, having too few &lt;em&gt;tiles&lt;&#x2F;em&gt; causes poor distribution of load and possibly deeper &lt;em&gt;contrees&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;From my crude benchmarks, an empirical maximum number of &lt;em&gt;tiles&lt;&#x2F;em&gt; is &lt;script type=&quot;math&#x2F;tex&quot;&gt;32^3&lt;&#x2F;script&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Looking at the benchmarks, we can see the scaling the number of tasks to parallelize to the number of &lt;em&gt;tiles&lt;&#x2F;em&gt; to discretize quickly has its limit:&lt;&#x2F;p&gt;
&lt;div class=&quot;flex flex-row flex-wrap gap-2&quot;&gt;
&lt;div class=&quot;flex-1&quot;&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;t16_d2_pdf_small.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;t16_d2_pdf_small.svg&quot; alt=&quot;Probability Distribution Function chart showing an median time of 1.21ms.&quot; class=&quot;invert &quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Parallel loading of tile grids of &lt;em&gt;contrees&lt;&#x2F;em&gt; using &lt;script type=&quot;math&#x2F;tex&quot;&gt;16^3&lt;&#x2F;script&gt; tiles and a depth of &lt;script type=&quot;math&#x2F;tex&quot;&gt;2&lt;&#x2F;script&gt;.&lt;br&#x2F;&gt;Median time: 1.21ms.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex-1&quot;&gt;
&lt;figure&gt;
  
  
  &lt;a href=&quot;t64_d1_pdf_small.svg&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;t64_d1_pdf_small.svg&quot; alt=&quot;Probability Distribution Function chart showing an median time of 53.47ms.&quot; class=&quot;invert &quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Parallel loading of tile grids of &lt;em&gt;contrees&lt;&#x2F;em&gt; using &lt;script type=&quot;math&#x2F;tex&quot;&gt;64^3&lt;&#x2F;script&gt; tiles and a depth of &lt;script type=&quot;math&#x2F;tex&quot;&gt;1&lt;&#x2F;script&gt;&lt;br&#x2F;&gt;Median time: 53.47ms.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;p&gt;The benchmark discretize a region of &lt;script type=&quot;math&#x2F;tex&quot;&gt;10&lt;&#x2F;script&gt; units across.&lt;br &#x2F;&gt;
Using 16 tiles per axis with a &lt;em&gt;contree&lt;&#x2F;em&gt; depth of 2 is equivalent to using 64 tiles per axis with a &lt;em&gt;contree&lt;&#x2F;em&gt; depth of 1.&lt;&#x2F;p&gt;
&lt;script type=&quot;math&#x2F;tex;mode=display&quot;&gt;\frac{10}{16 * 4^3} = \frac{10}{64 * 4^2}&lt;&#x2F;script&gt;
&lt;p&gt;Even though the resolution produced by those benchmarks is the same, there is a &lt;script type=&quot;math&#x2F;tex&quot;&gt;4319\%&lt;&#x2F;script&gt; increase in mean runtime when using the larger number of tiles.&lt;br &#x2F;&gt;
This is mainly due to the number of tasks to parallelize, the overhead that goes with it, and the added memory pressure of having to manage a larger 3D array for the tiles.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;tracing-the-void&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tracing-the-void&quot; aria-label=&quot;Anchor link for: tracing-the-void&quot;&gt;Tracing the void&lt;&#x2F;a&gt;&lt;&#x2F;h1&gt;
&lt;p&gt;Now that we’ve &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#sampling-the-void&quot;&gt;sampled the void&lt;&#x2F;a&gt;, we should briefly talk about how to use this with GPU to show some nice images.&lt;&#x2F;p&gt;
&lt;p&gt;This section is intentionally brief, as it relies on well-known algorithms that are better explained elsewhere&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-64tree-2&quot;&gt;&lt;a href=&quot;#fn-64tree&quot;&gt;9&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For this project, I’ve been using the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bevy.org&#x2F;&quot;&gt;&lt;code&gt;bevy&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; game engine.&lt;&#x2F;p&gt;
&lt;p&gt;The plan is:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#uploading-data-to-the-gpu&quot;&gt;Upload&lt;&#x2F;a&gt; the &lt;em&gt;IS&lt;&#x2F;em&gt; and camera-related data to the GPU.&lt;&#x2F;li&gt;
&lt;li&gt;Spawn a proxy mesh for each &lt;em&gt;IS&lt;&#x2F;em&gt; in the scene. By using this proxy mesh, I can benefit from the CPU culling built into &lt;code&gt;bevy&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;For each &lt;em&gt;IS&lt;&#x2F;em&gt; in view, allocate a texture (depth and normal).&lt;&#x2F;li&gt;
&lt;li&gt;If the &lt;em&gt;IS&lt;&#x2F;em&gt; has changed, or if the camera has moved or if it the texture has never been rendered, schedule a compute shader to trace the &lt;em&gt;IS&lt;&#x2F;em&gt; and populate the texture.&lt;&#x2F;li&gt;
&lt;li&gt;Perform a depth pre-pass.&lt;&#x2F;li&gt;
&lt;li&gt;Using the texture and a &lt;em&gt;fragment shader&lt;&#x2F;em&gt;, add &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Physically_based_rendering&quot;&gt;&lt;strong&gt;P&lt;&#x2F;strong&gt;hysically &lt;strong&gt;B&lt;&#x2F;strong&gt;ased &lt;strong&gt;R&lt;&#x2F;strong&gt;endering&lt;&#x2F;a&gt; to the populated pixels.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;uploading-data-to-the-gpu&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#uploading-data-to-the-gpu&quot; aria-label=&quot;Anchor link for: uploading-data-to-the-gpu&quot;&gt;Uploading data to the GPU&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The data structure we’ve been building (c.f. &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#serializing-a-tile-grid&quot;&gt;Serializing the &lt;em&gt;tile grid&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;) can be uploaded to
some GPU buffers nearly as-is.&lt;&#x2F;p&gt;
&lt;p&gt;Some additional data should also be uploaded in &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;webgpufundamentals.org&#x2F;webgpu&#x2F;lessons&#x2F;webgpu-uniforms.html&quot;&gt;uniforms&lt;&#x2F;a&gt; buffers, such as:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Camera matrixes.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;em&gt;IS&lt;&#x2F;em&gt; metadata (transform, &lt;em&gt;AABB&lt;&#x2F;em&gt;).&lt;&#x2F;li&gt;
&lt;li&gt;&lt;em&gt;tile grid&lt;&#x2F;em&gt; metadata (tile size, grid size).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;An additionnal optimization is to limit the bytes transferred per frame to avoid stuttering and spread uploading over multiple frames.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;compute-shader-ray-marching&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#compute-shader-ray-marching&quot; aria-label=&quot;Anchor link for: compute-shader-ray-marching&quot;&gt;Compute shader ray-marching&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;To rasterize the texture we’ve allocated, we can use a compute shader, which has finer control over the parallelism we use over &lt;em&gt;fragment&lt;&#x2F;em&gt; shaders.&lt;&#x2F;p&gt;
&lt;p&gt;We can divide the texture into a grid of &lt;em&gt;texture tiles&lt;&#x2F;em&gt; and dispatch an appropriate amount of workgroups.&lt;&#x2F;p&gt;
&lt;p&gt;This compute shader shoot a conceptual ray going from the camera, into the scene.&lt;&#x2F;p&gt;
&lt;p&gt;If the ray intersects the &lt;em&gt;IS&lt;&#x2F;em&gt;’s &lt;em&gt;AABB&lt;&#x2F;em&gt;, we can start &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#marching-the-tiles&quot;&gt;marching the &lt;em&gt;tiles&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;.&lt;br &#x2F;&gt;
If the ray misses the &lt;em&gt;AABB&lt;&#x2F;em&gt;, we can exit early.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;marching-the-tiles&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#marching-the-tiles&quot; aria-label=&quot;Anchor link for: marching-the-tiles&quot;&gt;Marching the &lt;em&gt;tiles&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;After the ray has intersected the &lt;em&gt;AABB&lt;&#x2F;em&gt;, we can figure out in which &lt;em&gt;tile&lt;&#x2F;em&gt; the ray currently is.&lt;&#x2F;p&gt;
&lt;p&gt;Using the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Digital_differential_analyzer_(graphics_algorithm)&quot;&gt;&lt;strong&gt;D&lt;&#x2F;strong&gt;igital &lt;strong&gt;D&lt;&#x2F;strong&gt;ifferential &lt;strong&gt;A&lt;&#x2F;strong&gt;nalyzer&lt;&#x2F;a&gt; (DDA) algorithm, we can step through the &lt;em&gt;tile grid&lt;&#x2F;em&gt;
until we hit a populated &lt;em&gt;tile&lt;&#x2F;em&gt; or exit the grid.&lt;&#x2F;p&gt;
&lt;p&gt;Upon hitting a populated &lt;em&gt;tile&lt;&#x2F;em&gt;, we &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#marching-the-contrees&quot;&gt;march the inner &lt;em&gt;contree&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  

  
  
  
  

  
  &lt;a href=&quot;viridis_step_tiles.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;viridis_step_tiles.6b054e8ff4ea1bd7.png&quot; alt=&quot;3D debug view of the number of _tile_ steps necessary to find an approximate _isosurface_ when tracing a sphere with distortion.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Visualization of the steps through &lt;em&gt;tiles&lt;&#x2F;em&gt; required to traverse the cubic &lt;em&gt;AABB&lt;&#x2F;em&gt;, using the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.shadertoy.com&#x2F;view&#x2F;XtGGzG&quot;&gt;viridis quintic approximation&lt;&#x2F;a&gt;.&lt;br&#x2F;&gt;The range of steps is &lt;script type=&quot;math&#x2F;tex&quot;&gt;[0, N\sqrt{3})&lt;&#x2F;script&gt; where &lt;script type=&quot;math&#x2F;tex&quot;&gt;N&lt;&#x2F;script&gt; number of tiles per axis.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;&lt;h3 id=&quot;marching-the-contrees&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#marching-the-contrees&quot; aria-label=&quot;Anchor link for: marching-the-contrees&quot;&gt;Marching the &lt;em&gt;contrees&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Marching the &lt;em&gt;contree&lt;&#x2F;em&gt; uses a recursive version of the &lt;em&gt;DDA&lt;&#x2F;em&gt; algorithm, which is really best described in &lt;em&gt;dubiousconst282&lt;&#x2F;em&gt;’s blog&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-64tree-3&quot;&gt;&lt;a href=&quot;#fn-64tree&quot;&gt;9&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;When the ray intersects with a populated voxel of a &lt;strong&gt;leaf node&lt;&#x2F;strong&gt; in the &lt;em&gt;contree&lt;&#x2F;em&gt;, we can use the associated data to gather the 8 distances and normals for this voxel’s vertices.&lt;br &#x2F;&gt;
We can perform a &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Trilinear_interpolation&quot;&gt;trilinear interpolation&lt;&#x2F;a&gt; of the distance and normal from the ray-voxel intersection at this point.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  

  
  
  
  

  
  &lt;a href=&quot;viridis_step_contree.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;viridis_step_contree.e0cc1baa75237f9f.png&quot; alt=&quot;3D debug view of the number of _contree_ steps necessary to find an approximate _isosurface_ when tracing a sphere with distortion.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Visualization of the steps through &lt;em&gt;contrees&lt;&#x2F;em&gt; required to traverse the cubic &lt;em&gt;AABB&lt;&#x2F;em&gt;, using the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.shadertoy.com&#x2F;view&#x2F;XtGGzG&quot;&gt;viridis quintic approximation&lt;&#x2F;a&gt;.&lt;br&#x2F;&gt;The range of steps is &lt;script type=&quot;math&#x2F;tex&quot;&gt;[0, 255)&lt;&#x2F;script&gt;.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;&lt;h3 id=&quot;combined-marching&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#combined-marching&quot; aria-label=&quot;Anchor link for: combined-marching&quot;&gt;Combined marching&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;If the ray misses the inner &lt;em&gt;contree&lt;&#x2F;em&gt; populated voxels, we continue stepping into the &lt;em&gt;tile grid&lt;&#x2F;em&gt; normally.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  

  
  
  
  

  
  &lt;a href=&quot;viridis_step_combined.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;viridis_step_combined.15bff33a3ff7a90d.png&quot; alt=&quot;3D debug view of the number of combined steps necessary to find an approximate _isosurface_ when tracing a sphere with distortion.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Visualization of the steps through &lt;em&gt;tiles&lt;&#x2F;em&gt; and &lt;em&gt;contrees&lt;&#x2F;em&gt; required to traverse the cubic &lt;em&gt;AABB&lt;&#x2F;em&gt;, using the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.shadertoy.com&#x2F;view&#x2F;XtGGzG&quot;&gt;viridis quintic approximation&lt;&#x2F;a&gt;.&lt;br&#x2F;&gt;The range of steps is &lt;script type=&quot;math&#x2F;tex&quot;&gt;[0, N\sqrt{3} + 255)&lt;&#x2F;script&gt; where &lt;script type=&quot;math&#x2F;tex&quot;&gt;N&lt;&#x2F;script&gt; number of tiles per axis.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;If the ray doesn’t intersect a populated voxel and exit the bounding volume, we can exit the shader early.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;alpha-mode&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#alpha-mode&quot; aria-label=&quot;Anchor link for: alpha-mode&quot;&gt;Alpha mode&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;When rendering our proxy mesh, we have to choose the correct &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;bevy&#x2F;latest&#x2F;bevy&#x2F;prelude&#x2F;enum.AlphaMode.html&quot;&gt;alpha mode&lt;&#x2F;a&gt; to render in.&lt;&#x2F;p&gt;
&lt;p&gt;To render an opaque &lt;em&gt;IS&lt;&#x2F;em&gt;, we need to use the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;bevy&#x2F;latest&#x2F;bevy&#x2F;prelude&#x2F;enum.AlphaMode.html#variant.Mask&quot;&gt;Mask&lt;&#x2F;a&gt; mode.
That is because we want &lt;em&gt;some&lt;&#x2F;em&gt; transparency, so that rays that do not intersect with the &lt;em&gt;IS&lt;&#x2F;em&gt; do not participate in the final rendering.&lt;br &#x2F;&gt;
This mode is also cheaper to use that other mode allowing some form of transparency.&lt;&#x2F;p&gt;
&lt;p&gt;For transparent &lt;em&gt;IS&lt;&#x2F;em&gt;, one can use any of the transparent mode
(e.g., &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;bevy&#x2F;latest&#x2F;bevy&#x2F;prelude&#x2F;enum.AlphaMode.html#variant.Blend&quot;&gt;Blend&lt;&#x2F;a&gt;,
&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;bevy&#x2F;latest&#x2F;bevy&#x2F;prelude&#x2F;enum.AlphaMode.html#variant.Premultiplied&quot;&gt;Premultiplied&lt;&#x2F;a&gt;,
&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;bevy&#x2F;latest&#x2F;bevy&#x2F;prelude&#x2F;enum.AlphaMode.html#variant.Add&quot;&gt;Add&lt;&#x2F;a&gt;,
or &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;bevy&#x2F;latest&#x2F;bevy&#x2F;prelude&#x2F;enum.AlphaMode.html#variant.Multiply&quot;&gt;Multiply&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;depth-prepass&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#depth-prepass&quot; aria-label=&quot;Anchor link for: depth-prepass&quot;&gt;Depth prepass&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Integrating this rendering process with a depth prepass allows to render multiple &lt;em&gt;IS&lt;&#x2F;em&gt; mixed with &lt;em&gt;normal&lt;&#x2F;em&gt; meshes and have them render at the correct depth.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  

  
  
  
  

  
  &lt;a href=&quot;depth-prepass.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;depth-prepass.c0e7fc9be6ff3c65.png&quot; alt=&quot;3D rendering of 3 spheres with distortion rendered on top of each others. The top _IS_ is transparent. A mesh-based, purple taurus intersects with the top _IS_.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Multiple &lt;em&gt;IS&lt;&#x2F;em&gt; on top of each other with a mesh-based taurus on top.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;This integration happens in 2 steps:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;During a depth prepass, a special fragment shader will report the depth of the populated pixels.
When multiple &lt;em&gt;IS&lt;&#x2F;em&gt; report their depth, the one closer to the camera is saved used.&lt;&#x2F;li&gt;
&lt;li&gt;During the main pass fragment shader, the fragment depth is compared with the depth gathered during the prepass.&lt;br &#x2F;&gt;
If the current depth is further away from the camera than the depth prepass stored value, the fragment is discarded.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;This setup also works for non &lt;em&gt;IS&lt;&#x2F;em&gt; based elements, like a mesh object (i.e. the purple taurus).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;fragment-shader&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#fragment-shader&quot; aria-label=&quot;Anchor link for: fragment-shader&quot;&gt;Fragment shader&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The fragment shader will run on each frame.&lt;&#x2F;p&gt;
&lt;p&gt;It is responsible for reading the texture populated by the compute shader and applying &lt;em&gt;PBR&lt;&#x2F;em&gt; to it.&lt;&#x2F;p&gt;
&lt;div class=&quot;flex flex-row flex-wrap gap-2&quot;&gt;
&lt;div class=&quot;flex-1&quot;&gt;
&lt;figure&gt;
  
  
  
  
  
  
  &lt;a href=&quot;.&amp;#x2F;rendering_normal.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;rendering_normal.d64d5355e200be88.png&quot; srcset=&quot;.&amp;#x2F;rendering_normal.png 2x&quot; alt=&quot;_IS_ of a sphere with distortion. Rendering normals as colors on the surface.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;Normal rendering using the fragment shader.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;flex-1&quot;&gt;
&lt;figure&gt;
  
  
  
  
  
  
  &lt;a href=&quot;.&amp;#x2F;rendering_pbr.png&quot; rel=&quot;noopener&quot; target=&quot;_blank&quot;&gt;
    &lt;img loading=&quot;lazy&quot; src=&quot;https:&amp;#x2F;&amp;#x2F;www.farfa.dev&amp;#x2F;processed_images&amp;#x2F;rendering_pbr.588a6b9bea76f12c.png&quot; srcset=&quot;.&amp;#x2F;rendering_pbr.png 2x&quot; alt=&quot;_IS_ of a sphere with distortion. Use a PBR, showing a white surface with some contrast due to a directional source of light.&quot;  &#x2F;&gt;
  &lt;&#x2F;a&gt;
  
  
  &lt;figcaption&gt;PBR rendering using the fragment shader.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;div&gt;
&lt;h2 id=&quot;how-fast-is-it&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#how-fast-is-it&quot; aria-label=&quot;Anchor link for: how-fast-is-it&quot;&gt;How fast is it?&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Well, I don’t exactly know.&lt;br &#x2F;&gt;
I’m not exactly sure &lt;strong&gt;how&lt;&#x2F;strong&gt; to benchmark this type of project, other than running an example scene and moving the camera a lot.&lt;&#x2F;p&gt;
&lt;p&gt;So here I am doing exactly that. I disabled &lt;code&gt;VSync&lt;&#x2F;code&gt; so the &lt;em&gt;FPS&lt;&#x2F;em&gt; are not tied to my display’s refresh rate.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;&#x2F;strong&gt;: the &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#compute-shader-ray-marching&quot;&gt;compute shader&lt;&#x2F;a&gt; only runs when the scene changes or when the view changes.
That is why you can see the &lt;em&gt;FPS&lt;&#x2F;em&gt; going higher when there is no movement, compared to when I’m moving the camera.&lt;br &#x2F;&gt;
The &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#depth-prepass&quot;&gt;prepass&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;www.farfa.dev&#x2F;blog&#x2F;is-rendering&#x2F;#fragment-shader&quot;&gt;main pass shader&lt;&#x2F;a&gt; run every frame.&lt;&#x2F;p&gt;
&lt;p&gt;The window resolution is &lt;strong&gt;2560x1440&lt;&#x2F;strong&gt;, but the texture being rendered are constantly resized as the view change.&lt;&#x2F;p&gt;
&lt;p&gt;The scene I’m using displays 4 spheres with surface distortions, each with a different rendering mode.&lt;br &#x2F;&gt;
There is a cuboid shape that is also an &lt;em&gt;IS&lt;&#x2F;em&gt; of negligeable size and a mesh-based taurus.&lt;&#x2F;p&gt;
&lt;p&gt;The computer I’m running these benchmarks on is an Apple Mac Book Pro M2 Max (2022) running Mac OS Tahoe 26.3.1.&lt;&#x2F;p&gt;
&lt;p&gt;The distorted spheres have a maximum diameter of 11 units.
I’ll be modifying the minimum feature size of these spheres to scale up the discretization.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  &lt;video   controls  loading=lazy  muted  loop &gt;
    &lt;source src=&quot;perf_showoff_medium.mp4&quot; type=&quot;video&amp;#x2F;mp4&quot;&gt;
  &lt;&#x2F;video&gt;
  
  &lt;figcaption&gt;Performance showcase using a &lt;code&gt;0.1&lt;&#x2F;code&gt; units minimum-feature size for the distorted spheres.&lt;br&#x2F;&gt;They take up ~4.266 MB of memory.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;The movements are smooth (&amp;gt;=60 FPS), even when zoomed-in multiple medium-resolution &lt;em&gt;IS&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
The edges where the &lt;em&gt;IS&lt;&#x2F;em&gt; intersects have a few artifacts due to the insufficient resolution in those regions.&lt;&#x2F;p&gt;
&lt;figure&gt;
  
  &lt;video   controls  loading=lazy  muted &gt;
    &lt;source src=&quot;perf_showoff_large.mp4&quot; type=&quot;video&amp;#x2F;mp4&quot;&gt;
  &lt;&#x2F;video&gt;
  
  &lt;figcaption&gt;Performance showcase using a &lt;code&gt;0.01&lt;&#x2F;code&gt; units minimum-feature size for the distorted spheres.&lt;br&#x2F;&gt;They take up ~470.632 MB of memory.&lt;&#x2F;figcaption&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;The movements stay at editable speeds (&amp;gt;=30 FPS), when zoomed-in on high-resolutions &lt;em&gt;IS&lt;&#x2F;em&gt;.&lt;br &#x2F;&gt;
The edges where the &lt;em&gt;IS&lt;&#x2F;em&gt; intersects are now very sharp.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;closing-words&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#closing-words&quot; aria-label=&quot;Anchor link for: closing-words&quot;&gt;Closing words&lt;&#x2F;a&gt;&lt;&#x2F;h1&gt;
&lt;p&gt;This was a fun endeavor, albeit a tad fever-inducing, especially when trying to write long and complex &lt;em&gt;wgpu&lt;&#x2F;em&gt; shaders.&lt;&#x2F;p&gt;
&lt;p&gt;Feel free to browse and judge my woefully undocumented code on the &lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codeberg.org&#x2F;GrandChaman&#x2F;fsolid&#x2F;&quot;&gt;&lt;code&gt;fsolid&lt;&#x2F;code&gt;’s project page&lt;&#x2F;a&gt;, especially the relevant crate &lt;code&gt;bevy_is&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;things-to-try-next-maybe&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#things-to-try-next-maybe&quot; aria-label=&quot;Anchor link for: things-to-try-next-maybe&quot;&gt;Things to try next (maybe)&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;reduce-memory-footprint&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#reduce-memory-footprint&quot; aria-label=&quot;Anchor link for: reduce-memory-footprint&quot;&gt;Reduce memory footprint&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The memory footprint of these structures can get quite high, especially for a constrained environment such as a GPU.&lt;&#x2F;p&gt;
&lt;p&gt;A low-hanging optimization is splitting up the &lt;code&gt;u64&lt;&#x2F;code&gt; used in the population mask to into 2 &lt;code&gt;u32&lt;&#x2F;code&gt; to embetter the alignment and the memory usage on the CPU side.&lt;&#x2F;p&gt;
&lt;p&gt;Another, more complex one, would be to transform the Sparse Voxel Tree into a Sparse Voxel Directed Acyclic Graph&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-high_res_sparse_voxel_DAG-1&quot;&gt;&lt;a href=&quot;#fn-high_res_sparse_voxel_DAG&quot;&gt;10&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;br &#x2F;&gt;
This could drastically reduce the size some discretized &lt;em&gt;IS&lt;&#x2F;em&gt; would take in memory.&lt;br &#x2F;&gt;
This reduction in size could in turn have a positive effect on the rendering
time by the GPU, as it could mean a better usage of cache and data locality.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;optimize-gpu-tracing&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#optimize-gpu-tracing&quot; aria-label=&quot;Anchor link for: optimize-gpu-tracing&quot;&gt;Optimize GPU tracing&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Of course, the main goal for this would be better user experience, unlocking support for lower power devices and lowering energy consumption.&lt;&#x2F;p&gt;
&lt;p&gt;There are some fairly low-hanging optimizations that are documented&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-64tree-4&quot;&gt;&lt;a href=&quot;#fn-64tree&quot;&gt;9&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; which I didn’t include, either because of time constraints or because I don’t understand them yet.&lt;&#x2F;p&gt;
&lt;p&gt;Some more complex, such as the beam optimization&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-efficient_svo-1&quot;&gt;&lt;a href=&quot;#fn-efficient_svo&quot;&gt;11&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;better-parallelism&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#better-parallelism&quot; aria-label=&quot;Anchor link for: better-parallelism&quot;&gt;Better parallelism&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Treating the &lt;em&gt;tile grid&lt;&#x2F;em&gt; as a &lt;em&gt;tree&lt;&#x2F;em&gt; of &lt;em&gt;tiles&lt;&#x2F;em&gt; could reduce the parallelism overhead of large and sparse &lt;em&gt;tile grids&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Otherwise, adding some smarter parallelism to the &lt;em&gt;contree&lt;&#x2F;em&gt;’s loading could reduce the number of &lt;em&gt;tiles&lt;&#x2F;em&gt; in the &lt;em&gt;tile&lt;&#x2F;em&gt; grid, whilst keeping the same rendering resolution.&lt;br &#x2F;&gt;
This would also unlock some of the performance benefits (i.e., skipping large empty region in one step) that tracing &lt;em&gt;contree&lt;&#x2F;em&gt; have, but that are minimized by their smaller size in the current rendering.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;references&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#references&quot; aria-label=&quot;Anchor link for: references&quot;&gt;References&lt;&#x2F;a&gt;&lt;&#x2F;h1&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-mkeeter&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.mattkeeter.com&quot;&gt;Matthew Keeter’s blog&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-mkeeter-1&quot;&gt;↩&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-mkeeter-2&quot;&gt;↩2&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-mkeeter-3&quot;&gt;↩3&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-iq&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;iquilezles.org&quot;&gt;Inigo Quilez’s blog&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-iq-1&quot;&gt;↩&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-iq-2&quot;&gt;↩2&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-rvaillant&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;rodolphe-vaillant.fr&#x2F;entry&#x2F;86&#x2F;implicit-surface-aka-signed-distance-field-definition&quot;&gt;Rodolphe Vaillant’s article on Implicit surface&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-rvaillant-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-dc&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.cs.rice.edu&#x2F;~jwarren&#x2F;papers&#x2F;dualcontour.pdf&quot;&gt;Dual Contouring of Hermite Data (2002), Tao Ju, Frank Losasso, Scott Schaefer, Joe Warren Rice University&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-dc-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-mc&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.cs.rice.edu&#x2F;~jwarren&#x2F;papers&#x2F;dualcontour.pdf&quot;&gt;Marching cubes: A high resolution 3D surface construction algorithm (1987), Lorensen, William E. and Cline, Harvey E.&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-mc-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-sphere_tracing&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.researchgate.net&#x2F;publication&#x2F;2792108_Sphere_Tracing_A_Geometric_Method_for_the_Antialiased_Ray_Tracing_of_Implicit_Surfaces&quot;&gt;Sphere Tracing: A Geometric Method for the Antialiased Ray Tracing of Implicit Surfaces (1995), John Hart&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-sphere_tracing-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-iq_bbox&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;iquilezles.org&#x2F;articles&#x2F;bboxes3d&#x2F;&quot;&gt;Inigo Quilez’s article on bounding boxes&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-iq_bbox-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-wgpu_alignment&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;WGSL&#x2F;#alignment-and-size&quot;&gt;Alignment and Size, WebGPU Shading Language, W3C&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-wgpu_alignment-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-64tree&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;dubiousconst282.github.io&#x2F;2024&#x2F;10&#x2F;03&#x2F;voxel-ray-tracing&#x2F;&quot;&gt;A guide to fast voxel ray tracing using sparse 64-trees, dubiousconst282&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-64tree-1&quot;&gt;↩&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-64tree-2&quot;&gt;↩2&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-64tree-3&quot;&gt;↩3&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-64tree-4&quot;&gt;↩4&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-high_res_sparse_voxel_DAG&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;2461912.2462024&quot;&gt;High resolution sparse voxel DAGs (2013), Kämpe, Viktor and Sintorn, Erik and Assarsson, Ulf&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-high_res_sparse_voxel_DAG-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-efficient_svo&quot;&gt;
&lt;p&gt;&lt;a class=&quot;subtle-link&quot; rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;research.nvidia.com&#x2F;sites&#x2F;default&#x2F;files&#x2F;pubs&#x2F;2010-02_Efficient-Sparse-Voxel&#x2F;laine2010tr1_paper.pdf&quot;&gt;Efficient Sparse Voxel Octrees (2010), Laine and Karras&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-efficient_svo-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
        
    </entry>
</feed>
