Training
Usage¶
Training a New Model¶
To train a new model, specify the algorithm's configuration file. For example, to train MAPPO:
uv run examples/marl/example/train.py --config examples/marl/example/config/mappo.yaml --method mappo
Tip: To reduce memory usage during training, it is highly recommended to disable unnecessary logs in the Unreal Engine console (press
`to open) by running:log LogTemp off log LogTongSimGRPC off
Available Algorithms:
- MAPPO: example/config/mappo.yaml
- IPPO: example/config/ippo.yaml
Configuration: You can customize training parameters (e.g., learning rate, network size, environment settings) by editing the corresponding .yaml file.
Output: Trained models and logs are saved by default in the models/ and logs/ directories, respectively. You can monitor training progress using TensorBoard:
tensorboard --logdir logs
Training Results¶
Below is a sample reward curve from a training session, showing the model's learning progress over time:

Performance Comparison:
The following table shows the performance comparison of different baseline algorithms on the MACS task. Note that due to the inherent stochasticity of multi-agent environments, the reported performance metrics may exhibit minor fluctuations; these results are primarily intended to verify the effectiveness of the environment and provide a baseline for comparison.
| Method | Average Reward per Step | Average Reward |
|---|---|---|
| MAPPO | 0.038 | 19.24 |
| IPPO | 0.030 | 14.75 |
| Random | -0.013 | -6.51 |
Evaluating a Pre-trained Model¶
To evaluate the latest saved model for a given configuration, add the --test flag:
uv run examples/marl/example/train.py --config examples/marl/example/config/mappo.yaml --test --method mappo
The evaluation script will load the most recent checkpoint from the models/ directory and run it in a test environment without further training.