TongSim Capture (RGB / Depth)¶

TongSim Capture is TongSIM Lite’s sensor module for capturing color and depth images from Unreal Engine. It is designed for:

perception benchmarks (RGB-D)
imitation / RL policies that consume images
debugging agent observations

The capture feature is exposed to Python via the gRPC CaptureService.

How capture is implemented¶

TongSIM Lite capture consists of two parts:

Part	Location	Role
UE capture runtime	`unreal/Plugins/TongSimCore/Source/TongSimCapture`	SceneCapture + GPU readback + depth compute
gRPC bridge	`unreal/Plugins/TongSimGrpc/Source/TongSimProto/Private/Capture/CaptureGrpcSubsystem.cpp`	Create/attach cameras and request snapshots

On the UE side, each capture camera is represented as an ATSCaptureCameraActor with a CaptureId and parameter struct.

gRPC surface (what Python can call)¶

Protocol definition:

protobuf/tongsim_lite_protobuf/capture.proto

Python wrapper:

src/tongsim/connection/grpc/capture_api.py (CaptureAPI)

Supported operations:

Operation	gRPC	Notes
List cameras	`CaptureService/ListCaptureCameras`	Returns camera metadata + last known status
Create camera	`CaptureService/CreateCaptureCamera`	Spawns a camera actor; optional attach to a parent actor
Update params	`CaptureService/UpdateCaptureCameraParams`	Fails if the camera is currently capturing
Set pose	`CaptureService/SetCaptureCameraPose`	Moves the camera actor
Attach	`CaptureService/AttachCaptureCamera`	Attach to parent actor + optional socket
Snapshot	`CaptureService/CaptureSnapshot`	Returns one frame (color/depth optional)
Destroy	`CaptureService/DestroyCaptureCamera`	Removes camera; can force-stop capture

Snapshot-focused API

The current gRPC interface is designed around snapshot capture. The UE runtime supports continuous capture internally, and can be extended to expose streaming APIs if needed.

Minimal Python example¶

from tongsim import TongSim
from tongsim.connection.grpc.capture_api import CaptureAPI
from tongsim.math import Transform, Vector3
from tongsim_lite_protobuf import capture_pb2

CAPTURE_PARAMS = {
    "width": 640,
    "height": 480,
    "fov_degrees": 90.0,
    "qps": 10.0,
    "enable_depth": True,
    "depth_near": 10.0,
    "depth_far": 5000.0,
    "depth_mode": capture_pb2.CaptureDepthMode.CAPTURE_DEPTH_LINEAR,
    "color_source": capture_pb2.CaptureColorSource.COLOR_SOURCE_FINAL_COLOR_LDR,
    "color_format": capture_pb2.CaptureRenderTargetFormat.COLOR_FORMAT_RGBA8,
}

with TongSim("127.0.0.1:5726") as ts:
    cam_id = ts.context.sync_run(
        CaptureAPI.create_camera(
            ts.context.conn,
            transform=Transform(location=Vector3(200, 700, 200)),
            params=CAPTURE_PARAMS,
            capture_name="DemoCam",
        )
    )

    frame = ts.context.sync_run(
        CaptureAPI.capture_snapshot(
            ts.context.conn,
            cam_id,
            include_color=True,
            include_depth=True,
            timeout_seconds=1.0,
        )
    )

    ts.context.sync_run(CaptureAPI.destroy_camera(ts.context.conn, cam_id))
    print(frame.keys())

End-to-end demo

Run examples/capture_demo.py to save color/depth outputs under logs/capture_demo_*.

Camera parameters (practical meaning)¶

CaptureCameraParams maps closely to Unreal capture settings:

width, height: output resolution
fov_degrees: horizontal FOV in degrees
enable_depth: enable depth output
depth_near, depth_far, depth_mode: depth encoding and range
color_source, color_format: scene capture source and render target format
enable_post_process, enable_temporal_aa: realism vs determinism tradeoffs

:material-balance: Determinism vs realism

For training you may want to disable temporal AA and heavy post process. For demos you may prefer higher visual quality.

Output decoding (color & depth)¶

Color buffer (`rgba8`)¶

The proto field is named rgba8, but the UE implementation writes bytes in BGRA8 order (Unreal’s common pixel layout).

Convert to RGB with NumPy:

import numpy as np

bgra = np.frombuffer(frame["rgba8"], dtype=np.uint8).reshape(frame["height"], frame["width"], 4)
rgb = bgra[..., [2, 1, 0]]  # B,G,R -> R,G,B

Depth buffer (`depth_r32`)¶

depth_r32 is a packed float32 array (width * height values), little-endian:

import numpy as np

depth = np.frombuffer(frame["depth_r32"], dtype="<f4").reshape(frame["height"], frame["width"])
print(depth.min(), depth.max())

Troubleshooting¶

Snapshot failed / timeout

Increase timeout_seconds (GPU readback can take longer on heavy scenes).
Reduce resolution or disable depth for faster snapshots.
Ensure the UE process is not stalled (PIE paused, breakpoint, heavy shader compilation).

Colors look swapped (blue/red)

Treat rgba8 as BGRA and reorder channels as shown above.

Depth is all zeros / all inf

Verify enable_depth=True.
Check depth_near / depth_far and depth_mode settings.

Next: Data Flow