Implicit Surface rendering for CAD

2026-05-07T00:00:00+00:00

Preamble</a></h1>
At the end of the spring of 2025, I started using some CAD</a> software, preferably FOSS, because I wanted to design some furniture for myself.</p>
I didn’t find anything that really clicked</em> for me at the time.
While wondering I could make my own (as one do, you know), I made some intriguing discoveries1</a></sup> 2</a></sup>.</p>
From this initial frustration and piqued curiosity, emerged a drive to write a simple CAD software: `fsolid</code></a>. It’s merely a prototype at this point, never was simple, and served more as a learning experience than a CAD software.</p>`
</a> Screenshot of fsolid</code></a> (v0.6.0).</figcaption> </figure> This was my first real experience with CAD, both as a user and as a developer. This means I’m not a domain-expert; take what I wrote in this article with a grain of salt.</p> This blog entry is about the tribulations of a feverish desire to render arbitrary implicit surfaces on screen at interactive speed in the context of a mechanical CAD software.</p> It’s meant as a showcase of this part of my project, documenting why I made certain decision, and maybe sprouting new ideas in someone else’s mind.</p> Introduction</a></h1> Before starting our descent into this rabbit-hole, one should empower oneself with the basic understanding of what is an I</strong>mplicit S</strong>urface</a> (IS</em>) and (S</strong>igned) D</strong>istance F</strong>ields</a> (SDF</em>). Great explanations exist elsewhere1</a></sup> 2</a></sup> 3</a></sup>; it is encouraged that the uninitiated reader consult them before diving any further.</p> In the CAD software I’ve used, which had graphical interfaces, features</em> are added or modified by the user and the viewport reflect those changes. In some instances, the viewport allows the user to directly drag or scult features</em> using a pointing device.</p> </a> Screenshot of FreeCAD</a> (v1.1.0), with a list of features on the left.</figcaption> </figure> After the modification has been made, the resulting 3D object can be rendered. This include panning, rotating, zooming, or a combinations of those.</p> Most CAD software use E</strong>xplicit S</strong>urfaces</em> (ES</em>) (e.g., NURBS</a>, Mesh</a>, etc…). Most processes downstream of authoring (e.g., visualization, CAM</em></a>, FEM</a>, etc…) require some form of ES</em>. Rendering is usually done by converting the ES</em> to a triangle mesh, as GPUs are very much optimized for this.</p> On the other hand, IS</em> describes a volume</strong>, discretizing the isosurface</em></a> is expensive. While algorithm exists to convert IS</em> to a triangle mesh4</a></sup> 5</a></sup>, they can be slow, produce very large meshes, and may require a complete recomputation on partial changes. They are necessary to export to a triangle mesh for downstream usage, but other strategies may be used for visualization6</a></sup>.</p> </a> Distortion sphere using a sphere-tracing shader (fsolid</code></a> (commit 96ad96646e</a>)).</figcaption> </figure> For this project, I wanted to try and make an IS</em> visualization that would scale with shape complexity, screen resolution and which would happen in two phases. A loading phase, happening on the CPU, then a rendering phase, which would happen each frame on the GPU.</p> The priority is smooth movement over up-to-date rendering. This means that while an IS</em> is still loading, a previous version may be showed on screen.</p> Optionally, I wanted that the rendering be some kind of Montecarlo process</a>, enhancing responsiveness at the cost of quality, while the discretizing is happening.</p> </video> Loading of an IS</em> using various degrees of precisions, behaving like a Montecarlo process</figcaption> </figure> The technique described today will focus on a two-phase rendering: discretizing</strong> and tracing</strong>. It discretize</strong> the IS</em> in a data-structure which will be traceable</strong> by the GPU.</p> This project uses fidget</code></a>, an excellent closed-form IS</em> library. The rendering is done via wgpu</code></a> and bevy</code></a>.</p> Sampling the void</a></h1> Bounding volume</a></h2> An IS</em> only describes a volume. We don’t yet know where that volume is in space. For that, we’ll use a bounding volume</a>, an A</strong>xis-A</strong>ligned B</strong>ounding B</strong>ox (AABB</em>).</p> The AABB</em> does multiple things for us.</p> It limits the space we need to consider.</li> It gives us a simple and efficient way to do ray-slab</a> intersection test, which will come in handy.</li> It will allows for fast z-ordering</a> of non-overlapping regions.</li> </ol> AABB</em> can be computed from common IS</em>7</a></sup> and boolean operations can combine multiple AABB</em> into the resulting IS</em>’s AABB</em>.</p> For unbounded</em> IS</em>, one should be able to provide an AABB</em> to limit the IS</em> to a certain domain</em>.</p> Using and representing voxels</a></h2> The IS</em>’s AABB</em> is rectangular cuboid. We can trivially divide it into a subset of cubic voxel</a>. Because the isosurface</em> is just a small portion of the AABB</em>’s space, we’ll use a data structure which encode sparse data efficiently: a variant of S</strong>parse V</strong>oxel O</strong>ctree</a> (SVO).</p> This variant is a $444</script> sparse voxel tree, which is sometimes named <em>contree</em>.<br /> Each level of the <em>contree</em> divides its space by 4 on each axis, dividing its space into 64 voxels.</p> <figure> <a href="contree_voxels.png" rel="noopener" target="_blank"> </a> <figcaption>Cubic space divided in cubic voxels (<script type="math/tex">444</script>).</figcaption> </figure><h3 id="loading-a-contree"><a class="zola-anchor" href="#loading-a-contree" aria-label="Anchor link for: loading-a-contree">Loading a <em>contree</em></a></h3> <p>The algorithm to load the <em>contree</em> need a way to sample the <em>IS</em> in 3 ways:</p> <div class="flex flex-wrap gap-2 items-center"> <div class="flex-none"> <figure> <a href="contree_interval_test.svg" rel="noopener" target="_blank"> </a> <figcaption></figcaption> </figure> </div> <div class="flex-1 basis-1/2"> <p><strong>Interval</strong>: this is done to prune large regions of space as entirely empty, full or ambiguous. This testing is conservative; regions which are in fact truly empty might appear as ambiguous in the interval testing.</p> <p>Regions that are entirely contained by the <em>IS</em> may also be pruned.</p> </div> </div> <div class="flex flex-wrap gap-2 items-center"> <div class="flex-none"> <figure> <a href="contree_distance_test.svg" rel="noopener" target="_blank"> </a> <figcaption></figcaption> </figure> </div> <div class="flex-1 basis-1/2"> <p><strong>Distance</strong>: this will be useful to detect where the <em>isosurface</em> is and how far it is from the sampled points.</p> <p><a class="subtle-link" rel="external" href="https://github.com/mkeeter/fidget"><code>fidget</code></a> allows sampling multiple points at once, to amortize the overhead of running the execution tape.</p> </div> </div> <div class="flex flex-wrap gap-2 items-center"> <div class="flex-none"> <figure> <a href="contree_gradient_test.svg" rel="noopener" target="_blank"> </a> <figcaption></figcaption> </figure> </div> <div class="flex-1 basis-1/2"> <p><strong>Gradient</strong>: this yields a gradient of the <em>IS</em> at a given point, which we’ll use as a surface <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Normal_(geometry)">normal</a>“.</p> <p>It’s also possible to get a surface normal using the distance sampling (i.e., using <a class="subtle-link" rel="external" href="https://iquilezles.org/articles/normalsSDF/">forward or central differences</a>).</p> </div> </div> <p><a class="subtle-link" rel="external" href="https://github.com/mkeeter/fidget"><code>fidget</code></a> provides these 3 features.<br /> It also provides great performance<sup class="footnote-reference" id="fr-mkeeter-3"><a href="#fn-mkeeter">1</a></sup>—I encourage readers to check it out (e.g., <a class="subtle-link" rel="external" href="https://docs.rs/fidget-jit/0.4.3/fidget_jit/">JIT</a>, tape simplification, heap buffer re-use, etc…).</p> <h4 id="stop-condition"><a class="zola-anchor" href="#stop-condition" aria-label="Anchor link for: stop-condition">Stop condition</a></h4> <p>When loading a <em>contree</em>, one needs to know when to stop subdividing any further.<br /> We can calculate a stop condition by knowing the minimum feature size we’d like to be able to detect. From the <em>AABB</em> size and this minimum feature size, we can calculate a <em>contree</em> <strong>depth</strong> we should be reaching.</p> <h4 id="recursive-interval-testing"><a class="zola-anchor" href="#recursive-interval-testing" aria-label="Anchor link for: recursive-interval-testing">Recursive interval testing</a></h4> <p>We recursively divide the space we consider into 64 voxels, performing interval tests and only consider the voxels which have an ambiguous interval result.<br /> The recursion stops when we get at the desired depth.</p> <h4 id="detecting-isosurface-crossing"><a class="zola-anchor" href="#detecting-isosurface-crossing" aria-label="Anchor link for: detecting-isosurface-crossing">Detecting <em>isosurface</em> crossing</a></h4> <p>Upon reaching the desired subdivision level, we can start sampling distances to find out the voxels which contain the <em>isosurface</em>.</p> <p>For each <strong>leaf node</strong>, we want to sample the 64 voxels, but are interested in the <strong>vertices</strong> of the voxels rather than their center.<br /> Effectively, we’re interested in the <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Dual_graph">dual graph</a> of the voxel grid.</p> <p>Each voxel has 8 vertices, times 64, naively, that’s 512 points we need to sample.<br /> However, the voxels are tightly packed in our representation and they share vertices with their neighbors.<br /> We can count 5 unique vertices in a row or column of voxels. <script type="math/tex">5^3=125</script>.<br /> To amortize the execution overhead, we can sample those 125 points in single tape execution.</p> <figure> <a href="sign_chance_detection.svg" rel="noopener" target="_blank"> </a> <figcaption>2D schema of 16 voxels, an <em>IS</em> (in purple) and vertices (in grey and blue).<br/>Populated voxel are highlighted.<br/>The vertices we’re interesting in for our contree are in blue.</figcaption> </figure> <p>We can then iterate over the 64 voxels, checking if there is a sign change in the distance from one of their 8 vertices.<br /> Using this, we create a list of populated voxels and also keep the sampled distances for the vertices of the populated voxels.</p> <p>Once the 64 voxels are checked, we calculate the gradient for each of the vertices that we kept around.</p> <h4 id="summary"><a class="zola-anchor" href="#summary" aria-label="Anchor link for: summary">Summary</a></h4> <p>We now have all the information we need to load a <em>contree</em>:</p> <ol> <li>Which voxels are populated, their position and depth in our <em>contree</em>.</li> <li>A distance to the surface for each of their vertices.</li> <li>A gradient for each of their vertices.</li> </ol> <p>The next step is to <strong>serialize</strong> this structure so it’s usable.</p> <h3 id="serializing-the-contree"><a class="zola-anchor" href="#serializing-the-contree" aria-label="Anchor link for: serializing-the-contree">Serializing the <em>contree</em></a></h3> <p>I chose to have a serialization format which I could build while loading the <em>contree</em>.</p> <p>For simplicity, I also wanted to have the same representation on the CPU and GPU side.</p> <h4 id="contrees-basics"><a class="zola-anchor" href="#contrees-basics" aria-label="Anchor link for: contrees-basics"><em>contrees</em> basics</a></h4> <p>Each node of the <em>contree</em> can address 64 voxels, its population mask fitting exactly 64-bits.<br /> Additionally, a header is necessary to hold some metadata about the node. The header needs to encode a flag to differentiate <strong>non-leaf</strong> from <strong>leaf</strong> nodes and a pointer.<br /> For <strong>non-leaf nodes</strong>, the <em>flag</em> is <strong>unset</strong>.</p> <p>The semantic of the pointer change depending on the type of node.</p> <p>The childmask only encode the population of the voxel, associated data are necessary and stored elsewhere.</p> <p>The <em>contree</em> will be encoded by the CPU into the main memory and will be uploaded to the GPU for tracing.<br /> GPUs have fairly strict requirements and limitations in terms of memory alignments, maximum buffer size, etc…<br /> To account for this, the <em>contree</em> will be built in a <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/AoS_and_SoA">Structure of Arrays (SoA)</a> style, using <strong>32-bits</strong> pointers to address those (64-bits support is uncommon in GPUs).</p> <h4 id="array-of-structures-of-arrays"><a class="zola-anchor" href="#array-of-structures-of-arrays" aria-label="Anchor link for: array-of-structures-of-arrays">Array of Structures of Arrays</a></h4> <p>The <em>contree</em> will need to be uploaded and consumed by the GPU. To make accessing efficient and cache-friendly we can make it into an <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/AoS_and_SoA#Array_of_structures_of_arrays"><strong>A</strong>rray of <strong>S</strong>tructures of <strong>A</strong>rrays</a> (AoSoA).</p> <p>To do that, we’ll make heavy use of 32-bit pointers into our <em>contree</em>, into various arrays.</p> <p>The <em>contree</em> is split into 4 buffers:</p> <h5 id="nodes"><a class="zola-anchor" href="#nodes" aria-label="Anchor link for: nodes">Nodes</a></h5> <p>They are the leaf and non-leaf nodes of the <em>contree</em>, they were already mentionned in a <a href="https://www.farfa.dev/blog/is-rendering/#contrees-basics">previous section</a>.</p> <figure> <a href="contree_node_packet_diagram.svg" rel="noopener" target="_blank"> </a> <figcaption>In-memory layout of a <em>contree</em> node.</figcaption> </figure> <p>They may look something like this in code:</p> <pre class="giallo" style="color: #CDD6F4; background-color: #1E1E2E;"><code data-lang="rust"><span class="giallo-l"><span style="color: #CBA6F7;">struct</span><span style="color: #F9E2AF;font-style: italic;"> Node</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span> header</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u32</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> pop_mask</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u64</span><span> </span></span> <span class="giallo-l"><span style="color: #9399B2;">}</span></span></code></pre> <p>This structure spans <strong>12 bytes</strong>.<br /> Due to alignment, it may span <strong>16 bytes</strong> instead (c.f. <a class="subtle-link" rel="external" href="https://doc.rust-lang.org/reference/type-layout.html#r-layout.primitive"><code>rust</code> alignment of primitive types</a>). Alignment is platform specific, so YMMV.</p> <p>This is kept as a <code>u64</code> for now as we benefit from functions exposed by the <code>u64</code> type (e.g., <a class="subtle-link" rel="external" href="https://doc.rust-lang.org/std/primitive.u64.html#method.count_ones"><code>u64::count_ones</code></a>). It would be interesting to measure the performance impact of storing it as two <code>u32</code> instead, as the memory footprint for this buffer would decrease by 25%.</p> <p><code>wgpu</code> doesn’t support 64-bits types by default, to upload this structure to the GPU, we can convert it to 3 <code>u32</code> instead.</p> <pre class="giallo" style="color: #CDD6F4; background-color: #1E1E2E;"><code data-lang="wgsl"><span class="giallo-l"><span style="color: #CBA6F7;">struct</span><span style="color: #F9E2AF;font-style: italic;"> GPUNode</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span> header: </span><span style="color: #CBA6F7;">u32</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> pop_mask_hi: </span><span style="color: #CBA6F7;">u32</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> pop_mask_low: </span><span style="color: #CBA6F7;">u32</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span style="color: #9399B2;">}</span></span></code></pre> <p>This new form is compatible with the alignment that <code>wgpu</code> requires, as <code>u32</code> types are aligned on a 4-bytes boundary<sup class="footnote-reference" id="fr-wgpu_alignment-1"><a href="#fn-wgpu_alignment">8</a></sup>.</p> <h6 id="leaf-nodes"><a class="zola-anchor" href="#leaf-nodes" aria-label="Anchor link for: leaf-nodes">Leaf nodes</a></h6> <p>For <strong>leaf nodes</strong>, the <em>flag</em> is <strong>set</strong> in the header.<br /> The pointer is an <strong>absolute</strong> offset to the first <em>leaf indirection</em> in the corresponding buffer (c.f. <a href="https://www.farfa.dev/blog/is-rendering/#leaf-indirections">the next section on the topic</a>).</p> <p>The <em>leaf indirections</em> are placed in their population mask order, in the <em>leaf indirection</em> buffer.</p> <h6 id="non-leaf-nodes"><a class="zola-anchor" href="#non-leaf-nodes" aria-label="Anchor link for: non-leaf-nodes">Non-leaf nodes</a></h6> <p>For <strong>non-leaf nodes</strong>, the <em>flag</em> is <strong>unset</strong> in the header.<br /> The pointer is an <strong>absolute</strong> offset to the first child in the node buffer.</p> <figure> <a href="contree_children_offset_diagram.svg" rel="noopener" target="_blank"> </a> <figcaption>Using population mask to find children offset in node buffer.</figcaption> </figure> <p>Each set bit in the <em>population mask</em> represent a populated child.<br /> The children are densely packed in their population mask order, in the node buffer.</p> <figure> <a href="buffers-figure-1.svg" rel="noopener" target="_blank"> </a> <figcaption>Schema of a node buffer</figcaption> </figure><h5 id="leaf-indirections"><a class="zola-anchor" href="#leaf-indirections" aria-label="Anchor link for: leaf-indirections">Leaf indirections</a></h5> <p>The leaf indirections are made up of two offsets into the two remaining buffers:</p> <ol> <li>An <strong>absolute</strong> offset into the leaf data buffer.</li> <li>A <strong>partial</strong> offset into the vertex data buffer (See the <a href="https://www.farfa.dev/blog/is-rendering/#leaf-data">leaf data</a> section to get the absolute offset).</li> </ol> <p>Both in <code>rust</code> and in <code>wgsl</code>, this structure looks like this:</p> <pre class="giallo" style="color: #CDD6F4; background-color: #1E1E2E;"><code data-lang="rust"><span class="giallo-l"><span style="color: #CBA6F7;">struct</span><span style="color: #F9E2AF;font-style: italic;"> LeafIndirection</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span> leaf_data_offset</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u32</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> vertex_buffer_offset</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u32</span><span> </span></span> <span class="giallo-l"><span style="color: #9399B2;">}</span></span></code></pre> <p>This structure should have no padding due to alignment and spans <strong>8 bytes</strong> per entry.</p> <figure> <a href="buffers-figure-2.svg" rel="noopener" target="_blank"> </a> <figcaption>Schema of the node and leaf indirection buffers, and their interactions</figcaption> </figure> <p>The <em>pointer</em> of <span style="color: var(--color-pine)"><strong>leaf <em>nodes</em></strong></span> is the <strong>absolute offset</strong> into the <em>leaf indirection buffer</em>.</p> <h5 id="leaf-data"><a class="zola-anchor" href="#leaf-data" aria-label="Anchor link for: leaf-data">Leaf data</a></h5> <p>The <em>leaf data</em> buffer contains 8 <strong>partial offsets</strong> into the <em>vertex data</em> buffer, one for each of the voxel’s vertices.</p> <p>This structure is the same both in the CPU and GPU world, having no padding due to alignment and spanning <strong>64 bytes</strong>.</p> <pre class="giallo" style="color: #CDD6F4; background-color: #1E1E2E;"><code data-lang="rust"><span class="giallo-l"><span style="color: #CBA6F7;">struct</span><span style="color: #F9E2AF;font-style: italic;"> LeafData</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span> c000</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c001</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c010</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c011</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c100</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c101</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c110</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> c111</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span style="color: #9399B2;">}</span></span></code></pre> <p>Each partial offset is in the range <script type="math/tex">[0, 125)</script> (the maximum number of vertices per node).</p> <figure> <a href="buffers-figure-3.svg" rel="noopener" target="_blank"> </a> <figcaption>Schema of the node, leaf indirection, leaf data buffers, and their interactions</figcaption> </figure> <p>Using the <span style="color: var(--color-iris)"><em>leaf data offset</em></span> from the <em>leaf indirection</em> buffer, one can find the offset into the <em>leaf data</em> buffer where to start reading for this leaf node.<br /> For each leaf node with <script type="math/tex">N</script> <strong>populated children</strong>, one can expect to find <script type="math/tex">N</script> <code>LeafData</code> entries at the <em>leaf data offset</em>.</p> <p>Because we deduplicated shared vertices when loading the <em>contree</em>, some <span style="color: var(--color-highlight-high)">offsets</span> may be shared by multiple vertices.</p> <h5 id="vertex-data"><a class="zola-anchor" href="#vertex-data" aria-label="Anchor link for: vertex-data">Vertex data</a></h5> <p>Each vertex data contains a distance and <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Normal_(geometry)">normal</a>.</p> <p>This structure has no padding due to alignment and spans <strong>16 bytes</strong>.</p> <pre class="giallo" style="color: #CDD6F4; background-color: #1E1E2E;"><code data-lang="wgsl"><span class="giallo-l"><span style="color: #CBA6F7;">struct</span><span style="color: #F9E2AF;font-style: italic;"> VertexData</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span> normals: </span><span style="color: #CBA6F7;">vec3f</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span> distance: </span><span style="color: #CBA6F7;">f32</span><span style="color: #9399B2;">,</span></span> <span class="giallo-l"><span style="color: #9399B2;">}</span></span></code></pre><figure> <a href="buffers-figure-4.svg" rel="noopener" target="_blank"> </a> <figcaption>Schema of the node, leaf indirection, leaf data buffer, vertex data buffer, and their interactions</figcaption> </figure> <p>By adding together the <span style="color: var(--color-foam)"><strong>vertex buffer offset</strong></span>, from the <em>leaf indirection</em> buffer and the <span style="color: var(--color-highlight-high)">offset</span> for a given vertex in the <em>leaf data</em> buffer, we get the <strong>absolute offset</strong> into the <em>vertex data</em> buffer.</p> <h4 id="space-addressing"><a class="zola-anchor" href="#space-addressing" aria-label="Anchor link for: space-addressing">Space addressing</a></h4> <p>At each level of the <em>contree</em>, the voxels can be addressed using a <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Z-order_curve">Morton Code</a> encoding.</p> <pre class="giallo" style="color: #CDD6F4; background-color: #1E1E2E;"><code data-lang="rust"><span class="giallo-l"><span style="color: #CBA6F7;">pub struct</span><span style="color: #F9E2AF;font-style: italic;"> MortonCode</span><span style="color: #9399B2;">(</span><span style="color: #CBA6F7;">pub u8</span><span style="color: #9399B2;">);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span style="color: #CBA6F7;">impl</span><span style="color: #F9E2AF;font-style: italic;"> MortonCode</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span style="color: #CBA6F7;"> pub const fn</span><span style="color: #89B4FA;font-style: italic;"> new</span><span style="color: #9399B2;">(</span><span style="color: #EBA0AC;">x</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span><span style="color: #EBA0AC;"> y</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">,</span><span style="color: #EBA0AC;"> z</span><span style="color: #94E2D5;">:</span><span style="color: #CBA6F7;"> u8</span><span style="color: #9399B2;">)</span><span style="color: #94E2D5;"> -></span><span style="color: #F38BA8;"> Self</span><span style="color: #9399B2;"> {</span></span> <span class="giallo-l"><span style="color: #89B4FA;font-style: italic;"> debug_assert!</span><span style="color: #9399B2;">(</span><span>x</span><span style="color: #94E2D5;"> <</span><span style="color: #FAB387;"> 4</span><span style="color: #9399B2;">);</span></span> <span class="giallo-l"><span style="color: #89B4FA;font-style: italic;"> debug_assert!</span><span style="color: #9399B2;">(</span><span>y</span><span style="color: #94E2D5;"> <</span><span style="color: #FAB387;"> 4</span><span style="color: #9399B2;">);</span></span> <span class="giallo-l"><span style="color: #89B4FA;font-style: italic;"> debug_assert!</span><span style="color: #9399B2;">(</span><span>z</span><span style="color: #94E2D5;"> <</span><span style="color: #FAB387;"> 4</span><span style="color: #9399B2;">);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span style="color: #F38BA8;"> Self</span><span style="color: #9399B2;">(</span><span>z</span><span style="color: #94E2D5;"> <<</span><span style="color: #FAB387;"> 4</span><span style="color: #94E2D5;"> |</span><span> y</span><span style="color: #94E2D5;"> <<</span><span style="color: #FAB387;"> 2</span><span style="color: #94E2D5;"> |</span><span> x</span><span style="color: #9399B2;">)</span></span> <span class="giallo-l"><span style="color: #9399B2;"> }</span></span></code></pre> <p>The value of the <code>MortonCode</code> is in the range <script type="math/tex">[0, 64)</script>. The <code>MortonCode</code> is used to set the population bit in the node’s childmask.</p> <p>By having a list of <code>MortonCode</code> of <code>D</code> element, where <code>D</code> is the depth of the <em>contree</em>, one can access a particular voxel.<br /> I’m calling such a list a <em>Morton Path</em>.</p> <p>To get the space coordinate at a given voxel, one need to use the bounding box of the <em>contree</em>.<br /> You can trivally reverse the Morton Code encoding to get back the <code>x</code>, <code>y</code> and <code>z</code> components.<br /> From these you can calculate an offset from the bounding box minimum.</p> <p>This encoding can also be leveraged from the way <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Single-precision_floating-point_format">IEEE754 floats</a> are represented.</p> <figure> <a href="float_bin_repr.svg" rel="noopener" target="_blank"> </a> <figcaption>binary representation of a 32-bit floating-point number.<br/>by <a href="https://en.wikipedia.org/wiki/user:fresheneesz" title="en:user:fresheneesz" rel="external">fresheneesz</a> at the english wikipedia project, <a href="http://creativecommons.org/licenses/by-sa/3.0/" title="creative commons attribution-share alike 3.0">cc by-sa 3.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=3357169" rel="external" >link</a></figcaption> </figure> <p>In the exclusive range <script type="math/tex">[1.0, 2.0)</script>, the 23 least importants bits of a 32-bits floats can be used to encode a <em>Morton Path</em>.<br /> Splitting those 23 bits in chunks of 2 gives us 11 <em>Morton Path</em> entries and 1 bit to spare.</p> <figure> <a href="float_depth_repr.svg" rel="noopener" target="_blank"> </a> <figcaption>Remix of <a href='https://en.wikipedia.org/wiki/user:fresheneesz' title='en:user:fresheneesz' rel='external'>fresheneesz</a>’s <a class="subtle-link" rel="external" href="https://commons.wikimedia.org/w/index.php?curid=3357169">original schema</a> showing the <em>Morton Path</em> encoding in a 32-bits floating-point number.</figcaption> </figure> <p>This idea is quite elegant, though not my own<sup class="footnote-reference" id="fr-64tree-1"><a href="#fn-64tree">9</a></sup>.</p> <p>This technique may be used to convert voxel positions in the <em>contree</em> to and from floats.</p> <figure> <a href="f32-bits-1.5.png" rel="noopener" target="_blank"> </a> <figcaption>Bit representation of a 32-bit floating-point number with a value of 1.5. <a class="subtle-link" rel="external" href="https://www.h-schmidt.net/FloatConverter/IEEE754.html">Source website</a><br/><script type="math/tex">1.0 + 4^{-1} * 2 = 1.5</script></figcaption> </figure><figure> <a href="f32-bits-1.6875.png" rel="noopener" target="_blank"> </a> <figcaption>Bit representation of a 32-bit floating-point number with a value of 1.6875. <a class="subtle-link" rel="external" href="https://www.h-schmidt.net/FloatConverter/IEEE754.html">Source website</a><br/><script type="math/tex">1.0 + 4^{-1} * 2 + 4^{-2} * 3 = 1.6875</script></figcaption> </figure><h4 id="maximum-depth"><a class="zola-anchor" href="#maximum-depth" aria-label="Anchor link for: maximum-depth">Maximum depth</a></h4> <p>As we saw in the <a href="https://www.farfa.dev/blog/is-rendering/#space-addressing">Space addressing section</a>, we can only address <em>contree</em> with a depth of 11.</p> <p>A tree spanning a region of size <script type="math/tex">x</script> would be able represent, at best, a voxel of size <script type="math/tex">4^{-12}x</script>.</p> <h3 id="tile-grid"><a class="zola-anchor" href="#tile-grid" aria-label="Anchor link for: tile-grid">Tile Grid</a></h3> <h4 id="having-a-shorter-forest-instead-of-a-tall-tree"><a class="zola-anchor" href="#having-a-shorter-forest-instead-of-a-tall-tree" aria-label="Anchor link for: having-a-shorter-forest-instead-of-a-tall-tree">Having a shorter forest instead of a tall tree</a></h4> <p><em>contrees</em> have their challenges.</p> <p>Even if the loading of the <em>contree</em> <em>could</em> be done in parallel, in my experience, it was much slower than a single-threaded process due to poor cache-locality and massive parallelism overhead.</p> <p>The memory footprint of <em>contrees</em> also scales with depth, each additional layer of <em>contrees</em> requiring more intermediate <em>nodes</em>.<br /> This is important for tracing, as memory streaming can easily be a bottleneck in GPU applications.</p> <p>Modifying a <em>contree</em> is a non-trivial operation using the current in-memory and serialization format.<br /> Optimizing for modification could mean changing this format and requiring an actual serialization process.</p> <p>It’s unclear if the faster modification time would offset the new serialization step and if this change would be a net positive in terms of execution time and memory footprint.</p> <p>However, some of those challenges can be addressed by adding another layer of indirections: <em>tiles</em>.<br /> Using <em>tiles</em> to discretize our <em>IS</em> means using more than a single <em>contree</em>, by <strong>tiling</strong> our <em>IS</em>’s region beforehand, and having at most one <em>contree</em> per tile.</p> <h4 id="loading-a-tile-grid"><a class="zola-anchor" href="#loading-a-tile-grid" aria-label="Anchor link for: loading-a-tile-grid">Loading a tile grid</a></h4> <p>Instead of discretizing the <em>IS</em>’s space into one <em>contree</em>, we can first divide this space into a grid of tiles (rounded-up), stored in a 3 dimensional array.<br /> Each tile is a <strong>cubic</strong> region of space that is either empty or contains a <em>contree</em>.</p> <p>For each of the <em>tiles</em>, we can sample the <em>IS</em> interval in the region to determine if the <em>isosurface</em> might be inside. Empty <em>tiles</em> are ignored and a <em>contree</em> is loaded for ambiguous <em>tiles</em>.</p> <p>This operation can be nicely parallelized.</p> <p>I used a benchmark that discretizes a region of <script type="math/tex">10</script> units across.<br /> Using 1 tiles with a <em>contree</em> depth of 4 is equivalent to using 16 tiles per axis with a <em>contree</em> depth of 2.</p> <script type="math/tex;mode=display">\frac{10}{4^5} = \frac{10}{16 * 4^3}</script> <p><strong>Note</strong>: a <em>contree</em> of depth 0 still divides by 4 on each axis, each level <script type="math/tex">L</script> divides by <script type="math/tex">4^{L+1}</script></p> <div class="flex flex-row flex-wrap"> <div class="flex-1"> <figure> <a href="t1_d4_pdf_small.svg" rel="noopener" target="_blank"> </a> <figcaption>Single-threaded loading of tile grids of <em>contrees</em> using <script type="math/tex">1</script> tile and a depth of <script type="math/tex">4</script>.<br/>Median time: 48.38ms.</figcaption> </figure> </div> <div class="flex-1"> <figure> <a href="t16_d2_pdf_small.svg" rel="noopener" target="_blank"> </a> <figcaption>Parallel loading of tile grids of <em>contrees</em> using <script type="math/tex">16^3</script> tiles and a depth of <script type="math/tex">2</script><br/>Median time: 1.21ms.</figcaption> </figure> </div> </div> <p>Of course, it is fairly obvious that parallelizing such process will reduce the overall runtime by quite a bit.</p> <h4 id="serializing-a-tile-grid"><a class="zola-anchor" href="#serializing-a-tile-grid" aria-label="Anchor link for: serializing-a-tile-grid">Serializing a tile grid</a></h4> <p>Serialization of the <em>tile grid</em> is a very similar to the technique used for the <em>contrees</em>, we just add one more layer of indirection using two new buffers: <em>tile</em> and the <em>tile indirection</em> buffers.</p> <p>The <em>tile</em> buffer is a list of <code>u32</code>.<br /> A special value of <code>u32::MAX</code> signals that the <em>tile</em> is empty, otherwise, the value is a pointer into the <em>tile indirection</em> buffer.</p> <figure> <a href="buffers-figure-5.svg" rel="noopener" target="_blank"> </a> <figcaption>Schema of the tile and tile indirection buffers, their interaction with each other and the <em>contree</em> buffers</figcaption> </figure> <p>The <strong>absolute</strong> offsets we used in the <em>contrees</em> are now <strong>partial</strong> offsets.<br /> One need to add the <em>tile indirection</em> offsets to get the starting index for a given <em>tile</em>.</p> <h4 id="choosing-the-right-tile-size"><a class="zola-anchor" href="#choosing-the-right-tile-size" aria-label="Anchor link for: choosing-the-right-tile-size">Choosing the right tile size</a></h4> <p>Choosing the tile size is a game of balance between the size of the 3D array that will contain the <em>tiles</em>, the depth of the <em>contrees</em> and the minimum feature that will be discretizable.</p> <p>Also notably, the number of <em>tiles</em> will directly affect the number of tasks to parallelize. Having too much <em>tiles</em> causing the overhead of the parallelization to be greater per task.<br /> On the other hand, having too few <em>tiles</em> causes poor distribution of load and possibly deeper <em>contrees</em>.</p> <p>From my crude benchmarks, an empirical maximum number of <em>tiles</em> is <script type="math/tex">32^3</script>.</p> <p>Looking at the benchmarks, we can see the scaling the number of tasks to parallelize to the number of <em>tiles</em> to discretize quickly has its limit:</p> <div class="flex flex-row flex-wrap gap-2"> <div class="flex-1"> <figure> <a href="t16_d2_pdf_small.svg" rel="noopener" target="_blank"> </a> <figcaption>Parallel loading of tile grids of <em>contrees</em> using <script type="math/tex">16^3</script> tiles and a depth of <script type="math/tex">2</script>.<br/>Median time: 1.21ms.</figcaption> </figure> </div> <div class="flex-1"> <figure> <a href="t64_d1_pdf_small.svg" rel="noopener" target="_blank"> </a> <figcaption>Parallel loading of tile grids of <em>contrees</em> using <script type="math/tex">64^3</script> tiles and a depth of <script type="math/tex">1</script><br/>Median time: 53.47ms.</figcaption> </figure> </div> </div> <p>The benchmark discretize a region of <script type="math/tex">10</script> units across.<br /> Using 16 tiles per axis with a <em>contree</em> depth of 2 is equivalent to using 64 tiles per axis with a <em>contree</em> depth of 1.</p> <script type="math/tex;mode=display">\frac{10}{16 * 4^3} = \frac{10}{64 * 4^2}</script> <p>Even though the resolution produced by those benchmarks is the same, there is a <script type="math/tex">4319\%</script> increase in mean runtime when using the larger number of tiles.<br /> This is mainly due to the number of tasks to parallelize, the overhead that goes with it, and the added memory pressure of having to manage a larger 3D array for the tiles.</p> <h1 id="tracing-the-void"><a class="zola-anchor" href="#tracing-the-void" aria-label="Anchor link for: tracing-the-void">Tracing the void</a></h1> <p>Now that we’ve <a href="https://www.farfa.dev/blog/is-rendering/#sampling-the-void">sampled the void</a>, we should briefly talk about how to use this with GPU to show some nice images.</p> <p>This section is intentionally brief, as it relies on well-known algorithms that are better explained elsewhere<sup class="footnote-reference" id="fr-64tree-2"><a href="#fn-64tree">9</a></sup>.</p> <p>For this project, I’ve been using the <a class="subtle-link" rel="external" href="https://bevy.org/"><code>bevy</code></a> game engine.</p> <p>The plan is:</p> <ul> <li><a href="https://www.farfa.dev/blog/is-rendering/#uploading-data-to-the-gpu">Upload</a> the <em>IS</em> and camera-related data to the GPU.</li> <li>Spawn a proxy mesh for each <em>IS</em> in the scene. By using this proxy mesh, I can benefit from the CPU culling built into <code>bevy</code>.</li> <li>For each <em>IS</em> in view, allocate a texture (depth and normal).</li> <li>If the <em>IS</em> has changed, or if the camera has moved or if it the texture has never been rendered, schedule a compute shader to trace the <em>IS</em> and populate the texture.</li> <li>Perform a depth pre-pass.</li> <li>Using the texture and a <em>fragment shader</em>, add <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Physically_based_rendering"><strong>P</strong>hysically <strong>B</strong>ased <strong>R</strong>endering</a> to the populated pixels.</li> </ul> <h2 id="uploading-data-to-the-gpu"><a class="zola-anchor" href="#uploading-data-to-the-gpu" aria-label="Anchor link for: uploading-data-to-the-gpu">Uploading data to the GPU</a></h2> <p>The data structure we’ve been building (c.f. <a href="https://www.farfa.dev/blog/is-rendering/#serializing-a-tile-grid">Serializing the <em>tile grid</em></a>) can be uploaded to some GPU buffers nearly as-is.</p> <p>Some additional data should also be uploaded in <a class="subtle-link" rel="external" href="https://webgpufundamentals.org/webgpu/lessons/webgpu-uniforms.html">uniforms</a> buffers, such as:</p> <ul> <li>Camera matrixes.</li> <li><em>IS</em> metadata (transform, <em>AABB</em>).</li> <li><em>tile grid</em> metadata (tile size, grid size).</li> </ul> <p>An additionnal optimization is to limit the bytes transferred per frame to avoid stuttering and spread uploading over multiple frames.</p> <h2 id="compute-shader-ray-marching"><a class="zola-anchor" href="#compute-shader-ray-marching" aria-label="Anchor link for: compute-shader-ray-marching">Compute shader ray-marching</a></h2> <p>To rasterize the texture we’ve allocated, we can use a compute shader, which has finer control over the parallelism we use over <em>fragment</em> shaders.</p> <p>We can divide the texture into a grid of <em>texture tiles</em> and dispatch an appropriate amount of workgroups.</p> <p>This compute shader shoot a conceptual ray going from the camera, into the scene.</p> <p>If the ray intersects the <em>IS</em>’s <em>AABB</em>, we can start <a href="https://www.farfa.dev/blog/is-rendering/#marching-the-tiles">marching the <em>tiles</em></a>.<br /> If the ray misses the <em>AABB</em>, we can exit early.</p> <h3 id="marching-the-tiles"><a class="zola-anchor" href="#marching-the-tiles" aria-label="Anchor link for: marching-the-tiles">Marching the <em>tiles</em></a></h3> <p>After the ray has intersected the <em>AABB</em>, we can figure out in which <em>tile</em> the ray currently is.</p> <p>Using the <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Digital_differential_analyzer_(graphics_algorithm)"><strong>D</strong>igital <strong>D</strong>ifferential <strong>A</strong>nalyzer</a> (DDA) algorithm, we can step through the <em>tile grid</em> until we hit a populated <em>tile</em> or exit the grid.</p> <p>Upon hitting a populated <em>tile</em>, we <a href="https://www.farfa.dev/blog/is-rendering/#marching-the-contrees">march the inner <em>contree</em></a>.</p> <figure> <a href="viridis_step_tiles.png" rel="noopener" target="_blank"> </a> <figcaption>Visualization of the steps through <em>tiles</em> required to traverse the cubic <em>AABB</em>, using the <a class="subtle-link" rel="external" href="https://www.shadertoy.com/view/XtGGzG">viridis quintic approximation</a>.<br/>The range of steps is <script type="math/tex">[0, N\sqrt{3})</script> where <script type="math/tex">N</script> number of tiles per axis.</figcaption> </figure><h3 id="marching-the-contrees"><a class="zola-anchor" href="#marching-the-contrees" aria-label="Anchor link for: marching-the-contrees">Marching the <em>contrees</em></a></h3> <p>Marching the <em>contree</em> uses a recursive version of the <em>DDA</em> algorithm, which is really best described in <em>dubiousconst282</em>’s blog<sup class="footnote-reference" id="fr-64tree-3"><a href="#fn-64tree">9</a></sup>.</p> <p>When the ray intersects with a populated voxel of a <strong>leaf node</strong> in the <em>contree</em>, we can use the associated data to gather the 8 distances and normals for this voxel’s vertices.<br /> We can perform a <a class="subtle-link" rel="external" href="https://en.wikipedia.org/wiki/Trilinear_interpolation">trilinear interpolation</a> of the distance and normal from the ray-voxel intersection at this point.</p> <figure> <a href="viridis_step_contree.png" rel="noopener" target="_blank"> </a> <figcaption>Visualization of the steps through <em>contrees</em> required to traverse the cubic <em>AABB</em>, using the <a class="subtle-link" rel="external" href="https://www.shadertoy.com/view/XtGGzG">viridis quintic approximation</a>.<br/>The range of steps is <script type="math/tex">[0, 255)</script>.</figcaption> </figure><h3 id="combined-marching"><a class="zola-anchor" href="#combined-marching" aria-label="Anchor link for: combined-marching">Combined marching</a></h3> <p>If the ray misses the inner <em>contree</em> populated voxels, we continue stepping into the <em>tile grid</em> normally.</p> <figure> <a href="viridis_step_combined.png" rel="noopener" target="_blank"> </a> <figcaption>Visualization of the steps through <em>tiles</em> and <em>contrees</em> required to traverse the cubic <em>AABB</em>, using the <a class="subtle-link" rel="external" href="https://www.shadertoy.com/view/XtGGzG">viridis quintic approximation</a>.<br/>The range of steps is <script type="math/tex">[0, N\sqrt{3} + 255)</script> where <script type="math/tex">N</script> number of tiles per axis.</figcaption> </figure> <p>If the ray doesn’t intersect a populated voxel and exit the bounding volume, we can exit the shader early.</p> <h2 id="alpha-mode"><a class="zola-anchor" href="#alpha-mode" aria-label="Anchor link for: alpha-mode">Alpha mode</a></h2> <p>When rendering our proxy mesh, we have to choose the correct <a class="subtle-link" rel="external" href="https://docs.rs/bevy/latest/bevy/prelude/enum.AlphaMode.html">alpha mode</a> to render in.</p> <p>To render an opaque <em>IS</em>, we need to use the <a class="subtle-link" rel="external" href="https://docs.rs/bevy/latest/bevy/prelude/enum.AlphaMode.html#variant.Mask">Mask</a> mode. That is because we want <em>some</em> transparency, so that rays that do not intersect with the <em>IS</em> do not participate in the final rendering.<br /> This mode is also cheaper to use that other mode allowing some form of transparency.</p> <p>For transparent <em>IS</em>, one can use any of the transparent mode (e.g., <a class="subtle-link" rel="external" href="https://docs.rs/bevy/latest/bevy/prelude/enum.AlphaMode.html#variant.Blend">Blend</a>, <a class="subtle-link" rel="external" href="https://docs.rs/bevy/latest/bevy/prelude/enum.AlphaMode.html#variant.Premultiplied">Premultiplied</a>, <a class="subtle-link" rel="external" href="https://docs.rs/bevy/latest/bevy/prelude/enum.AlphaMode.html#variant.Add">Add</a>, or <a class="subtle-link" rel="external" href="https://docs.rs/bevy/latest/bevy/prelude/enum.AlphaMode.html#variant.Multiply">Multiply</a>).</p> <h2 id="depth-prepass"><a class="zola-anchor" href="#depth-prepass" aria-label="Anchor link for: depth-prepass">Depth prepass</a></h2> <p>Integrating this rendering process with a depth prepass allows to render multiple <em>IS</em> mixed with <em>normal</em> meshes and have them render at the correct depth.</p> <figure> <a href="depth-prepass.png" rel="noopener" target="_blank"> </a> <figcaption>Multiple <em>IS</em> on top of each other with a mesh-based taurus on top.</figcaption> </figure> <p>This integration happens in 2 steps:</p> <ol> <li>During a depth prepass, a special fragment shader will report the depth of the populated pixels. When multiple <em>IS</em> report their depth, the one closer to the camera is saved used.</li> <li>During the main pass fragment shader, the fragment depth is compared with the depth gathered during the prepass.<br /> If the current depth is further away from the camera than the depth prepass stored value, the fragment is discarded.</li> </ol> <p>This setup also works for non <em>IS</em> based elements, like a mesh object (i.e. the purple taurus).</p> <h2 id="fragment-shader"><a class="zola-anchor" href="#fragment-shader" aria-label="Anchor link for: fragment-shader">Fragment shader</a></h2> <p>The fragment shader will run on each frame.</p> <p>It is responsible for reading the texture populated by the compute shader and applying <em>PBR</em> to it.</p> <div class="flex flex-row flex-wrap gap-2"> <div class="flex-1"> <figure> <a href="./rendering_normal.png" rel="noopener" target="_blank"> </a> <figcaption>Normal rendering using the fragment shader.</figcaption> </figure> </div> <div class="flex-1"> <figure> <a href="./rendering_pbr.png" rel="noopener" target="_blank"> </a> <figcaption>PBR rendering using the fragment shader.</figcaption> </figure> </div> </div> <h2 id="how-fast-is-it"><a class="zola-anchor" href="#how-fast-is-it" aria-label="Anchor link for: how-fast-is-it">How fast is it?</a></h2> <p>Well, I don’t exactly know.<br /> I’m not exactly sure <strong>how</strong> to benchmark this type of project, other than running an example scene and moving the camera a lot.</p> <p>So here I am doing exactly that. I disabled <code>VSync</code> so the <em>FPS</em> are not tied to my display’s refresh rate.</p> <p><strong>Note</strong>: the <a href="https://www.farfa.dev/blog/is-rendering/#compute-shader-ray-marching">compute shader</a> only runs when the scene changes or when the view changes. That is why you can see the <em>FPS</em> going higher when there is no movement, compared to when I’m moving the camera.<br /> The <a href="https://www.farfa.dev/blog/is-rendering/#depth-prepass">prepass</a> and <a href="https://www.farfa.dev/blog/is-rendering/#fragment-shader">main pass shader</a> run every frame.</p> <p>The window resolution is <strong>2560x1440</strong>, but the texture being rendered are constantly resized as the view change.</p> <p>The scene I’m using displays 4 spheres with surface distortions, each with a different rendering mode.<br /> There is a cuboid shape that is also an <em>IS</em> of negligeable size and a mesh-based taurus.</p> <p>The computer I’m running these benchmarks on is an Apple Mac Book Pro M2 Max (2022) running Mac OS Tahoe 26.3.1.</p> <p>The distorted spheres have a maximum diameter of 11 units. I’ll be modifying the minimum feature size of these spheres to scale up the discretization.</p> <figure> </video> <figcaption>Performance showcase using a <code>0.1</code> units minimum-feature size for the distorted spheres.<br/>They take up ~4.266 MB of memory.</figcaption> </figure> <p>The movements are smooth (>=60 FPS), even when zoomed-in multiple medium-resolution <em>IS</em>.<br /> The edges where the <em>IS</em> intersects have a few artifacts due to the insufficient resolution in those regions.</p> <figure> </video> <figcaption>Performance showcase using a <code>0.01</code> units minimum-feature size for the distorted spheres.<br/>They take up ~470.632 MB of memory.</figcaption> </figure> <p>The movements stay at editable speeds (>=30 FPS), when zoomed-in on high-resolutions <em>IS</em>.<br /> The edges where the <em>IS</em> intersects are now very sharp.</p> <h1 id="closing-words"><a class="zola-anchor" href="#closing-words" aria-label="Anchor link for: closing-words">Closing words</a></h1> <p>This was a fun endeavor, albeit a tad fever-inducing, especially when trying to write long and complex <em>wgpu</em> shaders.</p> <p>Feel free to browse and judge my woefully undocumented code on the <a class="subtle-link" rel="external" href="https://codeberg.org/GrandChaman/fsolid/"><code>fsolid</code>’s project page</a>, especially the relevant crate <code>bevy_is</code>.</p> <h2 id="things-to-try-next-maybe"><a class="zola-anchor" href="#things-to-try-next-maybe" aria-label="Anchor link for: things-to-try-next-maybe">Things to try next (maybe)</a></h2> <h3 id="reduce-memory-footprint"><a class="zola-anchor" href="#reduce-memory-footprint" aria-label="Anchor link for: reduce-memory-footprint">Reduce memory footprint</a></h3> <p>The memory footprint of these structures can get quite high, especially for a constrained environment such as a GPU.</p> <p>A low-hanging optimization is splitting up the <code>u64</code> used in the population mask to into 2 <code>u32</code> to embetter the alignment and the memory usage on the CPU side.</p> <p>Another, more complex one, would be to transform the Sparse Voxel Tree into a Sparse Voxel Directed Acyclic Graph<sup class="footnote-reference" id="fr-high_res_sparse_voxel_DAG-1"><a href="#fn-high_res_sparse_voxel_DAG">10</a></sup>.<br /> This could drastically reduce the size some discretized <em>IS</em> would take in memory.<br /> This reduction in size could in turn have a positive effect on the rendering time by the GPU, as it could mean a better usage of cache and data locality.</p> <h3 id="optimize-gpu-tracing"><a class="zola-anchor" href="#optimize-gpu-tracing" aria-label="Anchor link for: optimize-gpu-tracing">Optimize GPU tracing</a></h3> <p>Of course, the main goal for this would be better user experience, unlocking support for lower power devices and lowering energy consumption.</p> <p>There are some fairly low-hanging optimizations that are documented<sup class="footnote-reference" id="fr-64tree-4"><a href="#fn-64tree">9</a></sup> which I didn’t include, either because of time constraints or because I don’t understand them yet.</p> <p>Some more complex, such as the beam optimization<sup class="footnote-reference" id="fr-efficient_svo-1"><a href="#fn-efficient_svo">11</a></sup>.</p> <h3 id="better-parallelism"><a class="zola-anchor" href="#better-parallelism" aria-label="Anchor link for: better-parallelism">Better parallelism</a></h3> <p>Treating the <em>tile grid</em> as a <em>tree</em> of <em>tiles</em> could reduce the parallelism overhead of large and sparse <em>tile grids</em>.</p> <p>Otherwise, adding some smarter parallelism to the <em>contree</em>’s loading could reduce the number of <em>tiles</em> in the <em>tile</em> grid, whilst keeping the same rendering resolution.<br /> This would also unlock some of the performance benefits (i.e., skipping large empty region in one step) that tracing <em>contree</em> have, but that are minimized by their smaller size in the current rendering.</p> <h1 id="references"><a class="zola-anchor" href="#references" aria-label="Anchor link for: references">References</a></h1> <section class="footnotes"> <ol class="footnotes-list"> <li id="fn-mkeeter"> <p><a class="subtle-link" rel="external" href="https://www.mattkeeter.com">Matthew Keeter’s blog</a> <a href="#fr-mkeeter-1">↩</a> <a href="#fr-mkeeter-2">↩2</a> <a href="#fr-mkeeter-3">↩3</a></p> </li> <li id="fn-iq"> <p><a class="subtle-link" rel="external" href="https://iquilezles.org">Inigo Quilez’s blog</a> <a href="#fr-iq-1">↩</a> <a href="#fr-iq-2">↩2</a></p> </li> <li id="fn-rvaillant"> <p><a class="subtle-link" rel="external" href="https://rodolphe-vaillant.fr/entry/86/implicit-surface-aka-signed-distance-field-definition">Rodolphe Vaillant’s article on Implicit surface</a> <a href="#fr-rvaillant-1">↩</a></p> </li> <li id="fn-dc"> <p><a class="subtle-link" rel="external" href="https://www.cs.rice.edu/~jwarren/papers/dualcontour.pdf">Dual Contouring of Hermite Data (2002), Tao Ju, Frank Losasso, Scott Schaefer, Joe Warren Rice University</a> <a href="#fr-dc-1">↩</a></p> </li> <li id="fn-mc"> <p><a class="subtle-link" rel="external" href="https://www.cs.rice.edu/~jwarren/papers/dualcontour.pdf">Marching cubes: A high resolution 3D surface construction algorithm (1987), Lorensen, William E. and Cline, Harvey E.</a> <a href="#fr-mc-1">↩</a></p> </li> <li id="fn-sphere_tracing"> <p><a class="subtle-link" rel="external" href="https://www.researchgate.net/publication/2792108_Sphere_Tracing_A_Geometric_Method_for_the_Antialiased_Ray_Tracing_of_Implicit_Surfaces">Sphere Tracing: A Geometric Method for the Antialiased Ray Tracing of Implicit Surfaces (1995), John Hart</a> <a href="#fr-sphere_tracing-1">↩</a></p> </li> <li id="fn-iq_bbox"> <p><a class="subtle-link" rel="external" href="https://iquilezles.org/articles/bboxes3d/">Inigo Quilez’s article on bounding boxes</a> <a href="#fr-iq_bbox-1">↩</a></p> </li> <li id="fn-wgpu_alignment"> <p><a class="subtle-link" rel="external" href="https://www.w3.org/TR/WGSL/#alignment-and-size">Alignment and Size, WebGPU Shading Language, W3C</a> <a href="#fr-wgpu_alignment-1">↩</a></p> </li> <li id="fn-64tree"> <p><a class="subtle-link" rel="external" href="https://dubiousconst282.github.io/2024/10/03/voxel-ray-tracing/">A guide to fast voxel ray tracing using sparse 64-trees, dubiousconst282</a> <a href="#fr-64tree-1">↩</a> <a href="#fr-64tree-2">↩2</a> <a href="#fr-64tree-3">↩3</a> <a href="#fr-64tree-4">↩4</a></p> </li> <li id="fn-high_res_sparse_voxel_DAG"> <p><a class="subtle-link" rel="external" href="https://dl.acm.org/doi/10.1145/2461912.2462024">High resolution sparse voxel DAGs (2013), Kämpe, Viktor and Sintorn, Erik and Assarsson, Ulf</a> <a href="#fr-high_res_sparse_voxel_DAG-1">↩</a></p> </li> <li id="fn-efficient_svo"> <p><a class="subtle-link" rel="external" href="https://research.nvidia.com/sites/default/files/pubs/2010-02_Efficient-Sparse-Voxel/laine2010tr1_paper.pdf">Efficient Sparse Voxel Octrees (2010), Laine and Karras</a> <a href="#fr-efficient_svo-1">↩</a></p> </li> </ol> </section> </article> </main></body></html>$