Multi-Agent Navigation with MACS Environment¶

This project is a reinforcement learning codebase featuring a lightweight adaptation of the powerful XuanCe framework, specifically configured for the Multi-Agent Cooperative Search (MACS) environment within the tongsim simulator. It aims to provide researchers and developers with an out-of-the-box solution for environment setup, model training, and performance evaluation.

MACS Environment Overview Figure: Multi-Agent Cooperative Search (MACS) environment visualization

The MACS Benchmark¶

The Multi-Agent Cooperative Search (MACS) benchmark is a multi-agent cooperative task designed to evaluate agent collaboration capabilities in complex, dynamic environments. Built on the TongSim platform and Unreal Engine (UE), this benchmark provides a testing scenario for multi-agent reinforcement learning algorithms.

Task Overview¶

In a dynamic flood disaster scenario, a multi-agent team must cooperatively gather valuable supplies while evading mobile hazards. This environment serves as a high-challenge testbed for evaluating decentralized decision-making and emergent behaviors under the constraints of partial observability, enforced cooperation, and multiple resource limits.

Key Challenges:

Local Perception & Dynamic Adaptation: Operating with only local sensor data and no global view, agents must navigate and make real-time decisions within an unknown environment populated by mobile hazards.
Cooperative Games & Competitive Dynamics: Supply collection is enforced by requiring the cooperation of n_coop agents. This, combined with finite resources and the tunable local_ratio parameter, introduces a complex game of cooperation and competition between individual and collective interests.
Resource Management: While executing the collection task, agents must meticulously manage their operational costs. This is manifested in two core aspects: first, minimizing energy expenditure from movement (governed by thrust_penalty); and second, avoiding severe performance penalties (defined by hazard_reward) by proactively evading mobile hazards.

Environment Specifications¶

Parameter	Description	Default Value
`n_rescuers`	Number of rescue agents	5
`n_supplies`	Number of valuable supply items	10
`n_hazards`	Number of hazard items	5
`n_coop`	Agents required for successful cooperation	2
`n_sensors`	Number of sensors per agent	30
`sensor_range`	Maximum sensing range	500.0
`supply_reward`	Reward for successfully collecting supplies	10.0
`hazard_reward`	Penalty for encountering hazards	-1.0
`encounter_reward`	Small reward for touching a supply without a capture.	0.01
`thrust_penalty`	Per-step movement cost multiplier, simulating energy use	-0.01
`max_cycles`	Maximum steps per episode	500
`local_ratio`	Ratio of local rewards to global rewards	0.9

Action and Observation Space¶

Action Space:

Each agent's action space is continuous and 2-dimensional, represented as Box(low=-1.0, high=1.0, shape=(2,)). The two action values control:

Horizontal movement (x-axis): Value ranges from -1.0 to 1.0
Vertical movement (y-axis): Value ranges from -1.0 to 1.0

These normalized action values are scaled by an action multiplier to determine the actual displacement in the environment.

Observation Space:

The observation for each agent is constructed using a sensor-based perception system. Each agent is equipped with n_sensors (default: 30) radial sensors distributed uniformly around the agent in a circular pattern, with a maximum sensing range of sensor_range (default: 500.0).

For each sensor ray, the observation includes:

Agent detection (3 features): Normalized distance, orientation_x, orientation_y
Supply detection (2 features): Normalized distance, velocity projection along ray direction
Hazard detection (2 features): Normalized distance, velocity projection along ray direction
Wall detection (1 feature): Normalized distance to environment boundary
Obstacle detection (1 feature): Normalized distance to static obstacles

Additionally, each observation includes 2 binary flags indicating whether the agent:

Recently encountered a supply item (last dimension - 2)
Recently encountered a hazard (last dimension - 1)

The total observation dimension is: (n_sensors * 9) + 2, where 9 is the sum of all feature dimensions per sensor ray.

For implementation details, please refer to macs_dummy.py.

Launch TongSim, and open the L_MACS scene map. You can create the MACS environment using the following code.

def make_env(env_seed):
    """
    Factory function to create a MACS environment instance.

    Args:
        env_seed: Random seed for environment initialization.

    Returns:
        Initialized MACS environment instance.
    """
    env_instance = MACS(
        env_seed=env_seed,
        num_arenas=4,
        max_cycles=500,
        n_rescuers=5,
        n_supplies=10,
        n_hazards=5
    )
    return env_instance

You can drive the agents to perform random movements in the environment using the following code. For details, please refer to dummy_UE.py.

env_fns = [make_env]
envs = TongSimVecMultiAgentEnv(env_fns=env_fns, env_seed=1)

print("[INFO] Environment created successfully!")
print(f"  - Number of parallel environments (num_envs): {envs.num_envs}")
print(f"  - Number of agents (num_agents): {envs.num_agents}")
print(f"  - State space (state_space): {envs.state_space}")

# Initial reset
observations, infos = envs.reset()

# Training loop demonstration
for step in range(300000):
    if step % 1000 == 0:
        print(f"\n--- Training Step {step + 1} ---")

    # Sample random actions for all environments
    actions = []
    for i in range(envs.num_envs):
        arena_actions = {
            agent: envs.action_space[agent].sample()
            for agent in envs.agents
        }
        actions.append(arena_actions)

    # Execute environment step
    next_observations, rewards, terminateds, truncateds, infos = envs.step(actions)

Acknowledgements¶

This project is built upon the fantastic work of the XuanCe team.