Evolutionary strategies (ES) have emerged as powerful alternatives and complements to gradient-based optimization methods in multi-agent systems. Recent developments in 2024-2025 demonstrate significant advances in combining evolutionary computation with multi-agent reinforcement learning, creating hybrid approaches that leverage the exploration capabilities of evolutionary algorithms with the exploitation strengths of gradient-based methods. This research synthesis examines cutting-edge work in CMA-ES variants, neuroevolution, quality diversity methods, genetic algorithms for team optimization, and hybrid evolutionary-gradient approaches.
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) remains a cornerstone of evolutionary optimization, with significant recent extensions for multi-agent and large-scale problems. Guo et al. (2024) introduced Learning-Based Cooperative Coevolution (LCC), a pioneering framework that dynamically schedules decomposition strategies during optimization processes. Unlike traditional expert-designed approaches, LCC formulates strategy selection as a Markov Decision Process and employs Proximal Policy Optimization to train a neural network that selects optimal variable decomposition strategies. On CEC 2013 benchmarks, LCC-CMAES outperformed 11 comparison algorithms, winning on 11 of 15 problems while requiring zero additional function evaluations for decomposition compared to competitors (4.86E+04 to 1.28E+05).
The knowledge-based perturbation LaF-CMA-ES (KbP-LaF-CMAES) algorithm addresses multimodal optimization through a Leaders and Followers strategy that enables two cooperative populations to evolve synergistically. This collaborative approach demonstrates how multiple subpopulations can maintain diversity while converging toward optimal solutions. Additionally, MODE/CMA-ES integrates multi-operator differential evolution with CMA-ES, enhancing the exploration-exploitation balance through clustering-based approaches.
The integration of neuroevolution with multi-agent reinforcement learning represents a rapidly growing research area. Wang et al. (2025) proposed Evolutionary Policy Optimization (EPO), which maintains a population of agents with unique latent embeddings sharing a common actor-critic network. EPO combines genetic algorithms for selection, crossover, and mutation at the latent level with hybrid policy optimization where the master agent learns from aggregated experience across the population. On challenging dexterous manipulation tasks, EPO significantly outperformed baselines including PPO, PBT, SAPG, and PQL, achieving 35.8% success on two-arm reorientation tasks where baselines achieved near-zero success. The method demonstrates effective scaling with increased parallelization and substantially lower variance across random seeds (20.24% vs. 37.50% for best baseline).
Schossau, Shirmohammadi, and Hintze (2025) investigated cooperative behavior in multi-agent systems through innovative reward distribution schemes. Their research tested three approaches: MEAN (fair resource distribution), MINIMUM (scores based on lowest performer), and MAXIMUM (control based on highest performer). Using both genetic algorithms and Q-learning on foraging tasks with four agents, they found that the minimum-reward scheme proved most effective, with agents evolving to balance collective benefit with individual performance, resulting in fairer outcomes and improved overall efficiency through equitable resource distribution.
Genetic algorithms continue to advance in solving multi-agent coordination problems. Recent 2024 research proposes novel multi-agent path-planning approaches leveraging risk-aware probabilistic roadmap algorithms with customized genetic frameworks for computing safe trajectories for heterogeneous teams in uncertain environments. This work addresses combinatorial optimization challenges inherent in coordinating multiple agents with different capabilities and risk profiles.
A 2025 paper presents genetic algorithms for multi-robot task allocation that jointly maximize diversity within the course of action pool and overall compatibility of agent-task mappings. This dual-objective approach ensures both behavioral diversity and task effectiveness, critical factors for robust multi-agent systems. The confluence of evolutionary computation and multi-agent systems has been extensively surveyed by Chen et al. (2025), who identify two fundamental research directions: agent-based EC (introducing MAS characteristics into EC to improve performance and parallelism) and EC-assisted MAS (using EC to solve optimization problems within multi-agent frameworks).
Quality diversity (QD) optimization has emerged as a transformative approach for multi-agent systems, emphasizing the discovery of diverse, high-performing solutions rather than single optimal solutions. Mix-ME, proposed by Ingvarsson et al. (2023), extends the MAP-Elites algorithm to multi-agent settings through a crossover-like operator that mixes agents from different teams. This addresses the significant gap in QD research, which had primarily focused on single-agent domains. Mix-ME demonstrates competitive or superior performance compared to single-agent baselines in partially observable continuous control tasks, enabling adaptation to varying contexts through diverse solution portfolios.
Multi-Objective MAP-Elites (MOME) advances the field by searching for populations of policies that are both high-performing on multiple objectives and diverse across defined behavior metrics. Creating diverse sets of pre-trained policies overcomes stability problems in novel scenarios, with the multi-objective approach increasing applicability across domains.
QDax, published in the Journal of Machine Learning Research (2024), represents a major infrastructure advancement. This JAX-based library offers 16 state-of-the-art QD and population-based methods with hardware acceleration, delivering up to tenfold speed improvements through just-in-time compilation across GPUs and TPUs. QDax enables QD algorithms to run in minutes rather than the days or weeks traditionally required on CPU clusters, democratizing access to quality diversity methods for multi-agent research.
The convergence of evolutionary algorithms and gradient-based methods represents a frontier in multi-agent optimization. Recent 2024 research introduces multi-objective stochastic gradient operators within evolutionary frameworks, proposing algorithms like MOGBA and HMOEA that enhance local search capability while retaining global exploration through different offspring update strategies for subpopulations. MO-ERL integrates policy gradient-based reinforcement learning with evolutionary algorithms, leveraging RL's strength in exploitation and EAs' proficiency in exploration to navigate trade-offs in multi-objective optimization problems effectively.
The MRPM (Multi-agent Region Protection Method) combines Differential Evolution with MADDPG, where DE facilitates diverse sample exploration and overcomes sparse rewards while MADDPG trains defenders and expedites DE convergence. Published in Complex & Intelligent Systems (February 2024), this work demonstrates that DE plays an indispensable role in achieving diverse samples, making the combination essential for complex multi-agent scenarios.
Recent benchmarks and applications demonstrate the practical impact of evolutionary strategies in multi-agent domains. The QuantEvolve framework automates quantitative strategy discovery through multi-agent evolutionary approaches, while empirical results across benchmarks like rSDE-Bench, VideoWeb, HotPotQA, MBPP, MATH, and GAIA confirm improved coordination efficiency. Self-evolving multi-agent coordination networks (EvoMAC) reach superior coding accuracy on complex software-level benchmarks, demonstrating the scalability of evolutionary approaches.
Multi-agent pursuit-evasion studies using MARL algorithms achieved 99.9% success rates in 1,000 randomized trials, validating learned strategies in adversarial settings. Particle swarm optimization continues advancing through integration with multi-agent systems, with MASTER (Multi-Agent Swarm Optimization with Contribution-Based Cooperation) proposed in 2025 for distributed multi-target localization and data association.
The confluence of evolutionary strategies and multi-agent systems continues accelerating, with future integration expected with deep learning, hierarchical evolution, and large-scale coordination. OpenAI's evolution strategies demonstrate that ES can serve as scalable alternatives to reinforcement learning, achieving comparable performance with significant benefits in code complexity and distributed scaling. The research trajectory suggests increasing hybridization, where adaptive algorithm selection mechanisms dynamically combine evolutionary exploration with gradient-based exploitation based on problem characteristics and optimization progress. As computational resources expand and algorithmic frameworks mature, evolutionary strategies will play increasingly central roles in advancing multi-agent optimization across scientific, engineering, and commercial applications.