Voxel Perception Pipeline¶
TongSIM Lite provides a voxel occupancy query API for 3D geometry perception around a given transform. This is useful for:
- local collision-aware planning
- 3D occupancy features for RL policies
- debugging “what the agent thinks is occupied”
The voxel query is exposed via VoxelService/QueryVoxel.
Inputs & constraints¶
The Python SDK wrapper is:
tongsim.connection.grpc.unary_api.UnaryAPI.query_voxel(...)
Inputs:
transform: the world-space center transform of the voxel boxvoxel_num_x/y/z: voxel resolution (must be even, UE validates this)box_extent: half-extent in world units (cm)actors_to_ignore: optional actor ids excluded from sampling
Resolution requirements
- UE requires
voxel_num_x,voxel_num_y,voxel_num_zto be even. - For easier decoding and better packing efficiency, prefer
voxel_num_zto be a multiple of 8.
Implementation reference:
unreal/Plugins/TongSimGrpc/Source/TongSimProto/Private/DemoRL/DemoRLSubsystem.cpp(UDemoRLSubsystem::QueryVoxel)
Output format (bit-packed bytes)¶
The service returns one field:
Voxel.voxel_buffer: bytes
Packing rules (TongSIM Lite implementation):
- Z is bit-packed LSB-first (bit 0 → z=0, bit 7 → z=7)
- Z is padded to the next multiple of 8:
aligned_z = ceil(z/8)*8 - Buffer length is
x * y * (aligned_z / 8)bytes
Implementation reference:
unreal/Plugins/TongSimCore/Source/TongSimVoxelGrid/Public/TSVoxelGridFuncLib.hunreal/Plugins/TongSimCore/Source/TongSimVoxelGrid/Private/TSVoxelGridFuncLib.cpp
Decode in Python (robust to Z padding)¶
import numpy as np
def decode_voxel(buffer: bytes, x: int, y: int, z: int) -> np.ndarray:
aligned_z = ((z + 7) // 8) * 8
expected_bytes = x * y * (aligned_z // 8)
arr = np.frombuffer(buffer, dtype=np.uint8, count=expected_bytes)
bits = np.unpackbits(arr, bitorder="little")
grid = bits.reshape((x, y, aligned_z), order="C")
return grid[:, :, :z].astype(bool, copy=False)
Reference implementation
See examples/voxel.py for an end-to-end demo (query + decode + render).
Performance tips¶
- Use a resolution that matches your model needs (voxel queries are CPU-heavy on the UE side).
- Sample at a lower frequency for training (for example 1–5 Hz) unless you truly need per-frame voxels.
- Use
actors_to_ignoreto remove self-actors or large irrelevant objects from sampling.
Troubleshooting¶
Buffer length mismatch while decoding
- Ensure you account for Z padding (
aligned_z = ceil(z/8)*8). - Prefer
voxel_num_zas a multiple of 8 to avoid padding.
Voxel grid looks empty / all occupied
- Verify
box_extentmatches your scene scale (UE uses centimeters). - Ensure the query box actually overlaps collidable geometry.
Next: TongSim Capture