That light intensity formula is decidedly unphysical, but of course in computer graphics the usual philosophy is "if it looks good it is good".
Anyway, any approximation method probably begins with giving dynamic lights a strict maximum range so only O(r^3) voxels have to be updated when the light moves. And if you don’t have/need sub-voxel precision, only when the light actually moves from one voxel to another.
You could implement a kind of clustered forward rendering for your voxels. And additionally, you could do this on a compute pass on the GPU rather than the CPU.
I think the issue here is that voxel surfaces are 'sparse' relative to the voxel volumes (otherwise, meshing is useless). If we try to do this light computation on the GPU, we will be forced to store the voxel volume on the GPU.
Actually, a 256 x 256 x 256 x u8 light volume is only 16 MB. Is there a way to do this calculation with merely vertex transform feedback / WebGL2, or do we need WebGPU compute shaders ?
Sorry, I don't have experience with GPU compute strategies. I only intend to suggest that you not write off approaches before they have been evaluated to actually be worse.
Off the top of my head, neighbor-based lighting calculations could reasonably be computed iteratively using a fragment shader. I'm not sure whether you can slice a 3D texture into several render targets so as to effectively render a volume, but that's what I'd look into trying. If not, use an intermediate 2D texture atlas.
Or WebGPU with compute shaders might be generally available before you finish implementing that.
I'm writing this under the odd assumption that you want to store the lighting of the voxels in a separate step from actually rendering them. But the same idea can be made in practice.
Run a compute shader specifically picking out the blocks which are exposed to the air, since these are the only ones which need to have their lighting calculated.
Your camera views a section of the world in the shape of a frustum. Think of chopping it into layers depthwise, along the height, and along the width. If your frustum were simply a rectangular prism, you'd just be chopping it up into cubes as if you were preparing dinner.
Your newly made "sections" can each be told about what lights they need to shade for based on the distance to the lights.
In a compute shader run for every voxel in your frustum, you ask what lights you need to shade for based on what section you're in and only shade for those.
Now you've gone from O(v n) time complexity on average to O(v n) as an upper bound (with v being the number of voxels, and n the number of lights).
Further improvements can be made specifically for voxel worlds. Specifically, it's not an insurmountable task to write an oct-tree which can be traversed with the GPU. You can make such an oct-tree which only lets you distinguish between opaque blocks and non-opaque blocks. This way, you can traverse the oct-tree in a line between your camera and the block in question to determine if there are too many blocks in the way. Doing the same between a block and the lights it needs to be shaded for also gives you the ability to have shadows. Upgrading your oct-tree to include translucency along with transparency and opacity lets you have glass and stuff.
Half of computer graphics is simply cutting out unnecessary work.
Finally, don't underestimate just how good GPUs are at massive parallelization. You can run a rather complex PBR shader for every pixel on your entire screen and run at 60fps on a phone.
Although, please note that these are simply notes I've collected for a few years in the back of my mind. I haven't been focusing on voxel rendering really (instead I focused on low-poly rendering), but have always been intrigued by it.