ENGINEERING

How we built a 200-agent swarm that converges on consensus in 47 seconds

June 15, 2026 · by Maya C. · 12 min read

1. Introduction

When we set out to build SyntheticPulse, our core thesis was simple: a sufficiently diverse panel of LLM-backed agents, each with a distinct personality matrix, could simulate a consumer focus group with high fidelity. What we didn't anticipate was how hard it would be to get 200 synthetic individuals to agree on anything. In the real world, human focus groups converge through social dynamics — persuasion, peer pressure, deference to expertise, and exhaustion. Synthetic agents have none of those instincts by default. They argue in circles. They dig into positions. They produce beautiful, well-reasoned, perfectly divergent opinions that are useless for anyone trying to make a product decision.

This post is the story of how we solved that problem. Over the course of six months, we built an opinion propagation layer that drives 200 agents to a statistically stable consensus in an average of 47 seconds of wall-clock time. The system uses structured debate rounds, a dynamic opinion propagation graph, vector similarity search for long-term memory, and a carefully calibrated personality diversity model. The result is a synthetic focus group that behaves — in aggregate — remarkably like a human panel, but at 100x the speed and a fraction of the cost.

Before diving into the architecture, it's worth stating the obvious: we are not trying to replace human judgment. We are trying to augment it with a rapid-iteration layer that lets product teams explore the opinion space before committing to expensive human research. Our benchmarks show that synthetic panels predict human panel outcomes within a 6.2% error margin on standardized brand perception surveys — but that convergence only works when the swarm itself reaches genuine internal consensus. A swarm that endlessly debates is worse than useless. It's noise.

This article covers the technical architecture of our consensus engine in detail. We'll walk through the opinion propagation problem, the graph-based debate protocol, the convergence detection mathematics, the vector memory layer, and the empirical results from our production cluster. If you're building multi-agent systems, LLM orchestration pipelines, or any kind of synthetic population simulation, the patterns here should be directly applicable.

2. The Opinion Propagation Problem

The naive approach to multi-agent consensus is a round-robin broadcast: every agent reads every other agent's opinion, updates its own position, and repeats until convergence. This is O(n²) in communication complexity per round. For 200 agents, that means 39,800 pairwise interactions per round. Each interaction requires an LLM call to evaluate the other agent's position, compare it against the agent's own personality matrix, and produce a revised opinion. At roughly 500ms per inference call (using our Mixtral 8x22B cluster), a single round would take 19,900 seconds — over five hours. Convergence typically requires 4–7 rounds. You can see where this is going.

We experimented with a broadcast-barrier pattern early in development. Each agent would emit its opinion to a shared channel, consume all other opinions, and then produce a revised opinion in the next round. This produced two problems. First, the latency was unacceptable even with aggressive batching — we measured 22 minutes for a single round with 150 agents. Second, and more critically, the opinions collapsed. When every agent reads every other agent, the minority positions get drowned out immediately. The swarm converges to a bland, milquetoast average that reflects no one's genuine position. This is the multi-agent equivalent of groupthink, and it destroys the very diversity we engineered into the personality matrix.

A third problem emerged with recency bias. In a round-robin where agents update sequentially, the last agent to speak in a round exerts disproportionate influence on the next round. We attempted random permutation of update order, but that just shifted the bias from one agent to another without solving the fundamental issue. The agents were effectively playing a game of "who talks last wins," which is not how human focus groups work.

We needed a fundamentally different approach. Instead of all-to-all broadcast, we needed a structured interaction topology that preserved minority opinions, limited communication complexity, and produced stable convergence within a bounded number of rounds. The solution was the dynamic opinion propagation graph, which we'll cover in the next section.

3. Graph-Based Debate Rounds

The core insight was to replace all-to-all broadcast with a dynamic directed graph where each agent only reads from a small subset of peers in each round. The graph topology evolves across rounds based on opinion divergence, agent personality traits, and convergence pressure. This reduces per-round communication from O(n²) to O(n · k), where k is the neighborhood size (typically 5–7). For 200 agents with k=6, that's 1,200 interactions per round instead of 39,800 — a 33x reduction.

The graph construction algorithm works as follows. At the start of each debate round, we compute pairwise opinion divergence between all agents using cosine distance on their opinion embedding vectors. We then construct a directed k-nearest-neighbor graph where each agent's outgoing edges point to the k agents with the most divergent opinions from its own. This is the critical design choice: agents are forced to confront viewpoints that differ from theirs, not reinforce their existing position. An agent with a strong opinion about a product feature is connected to agents who disagree, not to its ideological allies.

However, pure divergence-based selection has a problem: it can create disconnected cliques. If you connect every agent only to its ideological opposites, the graph bifurcates into two camps that never receive intermediate perspectives. To solve this, we inject a fraction of exploration edges — random connections weighted by a temperature parameter that decays across rounds. In early rounds, about 30% of edges are exploratory. By the final round, that drops to 5%. This ensures that the graph stays connected and that centrist or ambivalent agents serve as bridges between polarized clusters.

Each round proceeds in three phases: broadcast, assimilation, and expression. In the broadcast phase, every agent publishes its current opinion vector to a shared KV store (Redis with vector extensions). In the assimilation phase, each agent retrieves the opinion vectors of its outgoing neighbors and processes them through its personality-weighted opinion update function. In the expression phase, each agent generates a new opinion via an LLM call conditioned on its assimilated context, its memory of previous rounds, and its personality matrix. The new opinion is embedded and stored, becoming the input for the next round's broadcast phase.

Here is the core algorithm in pseudocode:

function runDebate(agents, rounds, k, temperature):
  opinions = initializeRandomOpinions(agents)

  for round in 1..rounds:
    embeddings = embedOpinions(opinions)

    // Build directed opinion propagation graph
    graph = {}
    for agent in agents:
      divergences = computeCosineDistances(embeddings[agent], embeddings)
      neighbors = selectTopK(divergences, k, highest=true)
      exploration = sampleRandom(agents, p=temperature[round])
      graph[agent] = union(neighbors, exploration)

    // Phase 1: broadcast (all agents write opinions)
    broadcastToKVStore(opinions)

    // Phase 2: assimilation (each agent reads neighbors)
    newOpinions = {}
    for agent in agents:
      neighborOpinions = readFromKVStore(graph[agent])

      // Personality-weighted context aggregation
      context = weightedFuse(
        opinions[agent],
        neighborOpinions,
        weights = computeAttentionWeights(
          personality[agent],
          neighborOpinions
        )
      )

      // Phase 3: expression (LLM generates updated opinion)
      prompt = buildDebatePrompt(
        agentProfile=personality[agent],
        currentPosition=opinions[agent],
        neighborContext=context,
        roundHistory=memory[agent]
      )
      newOpinions[agent] = llmGenerate(prompt, temperature=0.7)

    opinions = newOpinions

    // Early exit if converged
    if computeGlobalDivergence(embeddings) < threshold:
      break

  return opinions

The temperature schedule is important. We use an exponential decay: temperature starts at 0.6 and decays by a factor of 0.7 per round. This means early rounds have high exploration and broad debate, while later rounds tighten into focused convergence. In practice, we find that 5–7 rounds are sufficient for 200 agents, with most of the consensus forming in rounds 3–5. The first two rounds are noisy as agents explore the opinion space and discover the landscape of disagreement.

4. Convergence Measurement

Detecting when a swarm has reached consensus is surprisingly subtle. You cannot simply check if all agents agree — real human focus groups rarely reach unanimity. Instead, you need to detect when the distribution of opinions has stabilized. We define convergence as the point at which the between-round opinion divergence drops below a threshold and stays there for two consecutive rounds.

Our convergence metric is the mean pairwise cosine distance (MPCD) across all agents' opinion embeddings. At round 1, MPCD typically ranges from 0.6 to 0.8 (on a 0–1 scale, where 0 is identical and 1 is orthogonal). As debate progresses, MPCD declines. We declare convergence when MPCD falls below 0.25 and the rate of change (dMPCD/dt) is less than 0.03 between consecutive rounds. This threshold was calibrated against 347 human focus group transcripts from our validation dataset — we found that an MPCD of 0.25 corresponds to the level of agreement you'd see in a human panel that has reached natural consensus.

We also track a secondary metric called cluster entropy. Using HDBSCAN on the opinion embedding space, we count the number of distinct opinion clusters and compute the Shannon entropy of the cluster membership distribution. In early rounds, cluster entropy is high (3–5 distinct clusters, entropy > 1.5). At convergence, we typically see 1–2 clusters with entropy below 0.6. If entropy remains above 1.0 after 7 rounds, we flag the simulation as non-convergent and either extend rounds or surface a warning to the user. This happens in about 8% of simulations, usually when agent personality diversity is set to extreme values.

An important subtlety: convergence does not mean the swarm has found the "correct" answer. It means the swarm has reached a stable distribution that reflects the collective influence of all personality types, memories, and argumentative interactions. We validate correctness separately through our human-panel benchmarking pipeline. The convergence detector's job is purely to tell the orchestration layer when it's safe to stop running inference and return results.

5. Vector Memory Architecture

Each agent in the swarm maintains a long-term memory that persists across debate rounds and even across separate simulation runs. This memory is stored as a set of vector embeddings in a pgvector PostgreSQL instance, indexed using a Hierarchical Navigable Small World (HNSW) index. For a swarm of 200 agents running 7 debate rounds with 3 memory retrievals per round, that's 4,200 vector queries per simulation. Each query must complete in under 15ms to keep total simulation time below our 60-second target.

The memory architecture has three tiers. Episodic memory stores the agent's own opinions and rationale from previous rounds, indexed by round number and opinion embedding. Social memory stores summaries of interactions with other agents, keyed by the neighbor agent's ID and the interaction timestamp. Semantic memory stores facts and beliefs that the agent has adopted as permanent parts of its worldview, updated only when an opinion survives multiple rounds without revision. This tiered structure mirrors the Atkinson-Shiffrin model of human memory and gives agents a realistic forgetting curve: episodic memories decay after 3–4 rounds unless consolidated into semantic memory.

We chose pgvector over specialized vector databases like Pinecone or Weaviate for two reasons. First, we already use PostgreSQL for agent state persistence, so adding the pgvector extension avoided an additional infrastructure dependency. Second, pgvector's HNSW implementation with an IVFFlat fallback gives us sub-10ms query latency at our scale (roughly 10,000 vectors per agent database, 2 million total), which is well within our requirements. We use 768-dimensional embeddings from the intfloat/e5-mistral-7b-instruct model, which strikes a good balance between semantic fidelity and query performance.

Retrieval is a three-stage pipeline: candidate retrieval via HNSW approximate nearest neighbor (top-50), reranking via exact cosine similarity (top-10), and filtering based on recency and relevance metadata. The reranking stage is critical because HNSW's recall at 50 candidates is about 0.92, but we need recall above 0.99 for memory coherence. The exact reranking pass adds about 3ms and closes the recall gap. The filtering stage removes memories older than 5 rounds and those with relevance scores below 0.4.

Latency breakdown: HNSW search averages 4.2ms, exact reranking averages 3.1ms, metadata filtering averages 0.8ms, and network round-trip to the database adds about 1.5ms. Total: ~9.6ms per retrieval, well under the 15ms budget. We run the database on a dedicated r6g.2xlarge instance with 64GB of RAM, which keeps the entire HNSW graph in memory.

6. Personality Diversity Effects

The personality matrix is the secret sauce of the entire system. Each agent is defined by a vector of five OCEAN-style traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) plus three synthetic traits we added: Stubbornness (resistance to opinion change), Social Sensitivity (weight placed on peer opinions vs. internal beliefs), and Argumentativeness (tendency to produce counterarguments rather than concessions). These eight dimensions are sampled from a correlated multivariate distribution calibrated against real human personality data from the SAPA-Project dataset.

Stubbornness is the single most influential parameter for convergence time. Agents with high stubbornness (> 0.7 on a 0–1 scale) require 2.3x more debate rounds to reach convergence than agents with low stubbornness (< 0.3). We deliberately sample 15–20% of the swarm with high stubbornness to simulate the real-world phenomenon that some people simply don't change their minds. However, if stubbornness exceeds 0.85, the agent becomes essentially immutable, which can prevent convergence entirely. Our personality sampler clips stubbornness at 0.85 and scales the top end with a logit function to prevent hard ceiling effects.

Social sensitivity controls how much weight an agent assigns to peer opinions versus its own prior beliefs during the assimilation phase. Low social sensitivity (< 0.3) produces agents that are essentially deaf to debate — they read their neighbors' opinions but barely adjust their own position. High social sensitivity (> 0.7) produces agents that flip positions rapidly, sometimes oscillating between rounds. We found that a beta distribution with α = 2.5 and β = 4 produces the most realistic convergence behavior, with a mean sensitivity around 0.38 and a long tail of highly sensitive agents.

Openness interacts with stubbornness in a nonlinear way. An agent with high openness and low stubbornness is an explorer — it tries on different positions across rounds and often serves as a bridge between polarized clusters. An agent with low openness and high stubbornness is a pillar — it anchors one end of the opinion space and provides stability. The most interesting dynamics emerge when explorers and pillars are balanced. In our default configuration, we use 25% explorers, 25% pillars, and 50% a mix of centrists and followers. This produces convergence in 5.2 rounds on average, compared to 7.8 rounds for a uniform personality distribution.

7. Adversarial Agents

One of our most surprising discoveries came during a debugging session in November 2025. We noticed that simulations with high initial agreement (MPCD below 0.4 at round 1) were converging to low-quality consensus — the agents were agreeing, but agreeing on superficial or obviously flawed positions. It was groupthink, exactly the problem we'd seen with the all-to-all broadcast pattern, but now it was emerging from homophily in the initial opinion distribution rather than from the communication topology.

The fix was to introduce adversarial agents: synthetic devil's advocates whose explicit purpose is to challenge the emerging consensus. These agents are not part of the consumer simulation — they are meta-agents injected by the orchestration layer with a personality profile that maximizes argumentativeness, minimizes agreeableness, and gives them access to a bank of counterarguments drawn from semantic memory. Adversarial agents are connected to the center of the opinion propagation graph (highest-degree nodes), ensuring that their influence spreads through the swarm.

The improvement was dramatic. In a controlled A/B test across 200 simulations, the presence of 3–5 adversarial agents (1.5–2.5% of swarm size) improved consensus quality by 23% as measured by alignment with held-out human panel data. The adversarial agents force the swarm to defend its positions, surfacing reasoning gaps that would otherwise go unchallenged. The effect is strongest in the first three rounds, after which the adversarial agents gradually converge themselves (their argumentativeness decays by 50% per round once the swarm has demonstrated coherent reasoning).

We also experimented with adversarial agent count. Too few (1–2) had negligible effect. Too many (> 8) destabilized the swarm and increased convergence time by 40%. The optimal count scales with the square root of swarm size: for n agents, we use floor(sqrt(n) / 3) adversarial agents. For 200 agents, that's floor(sqrt(200) / 3) = floor(14.14 / 3) = 4 adversarial agents. This formula generalizes well across our test range of 50–500 agents.

8. Performance Benchmarks

We measure simulation performance along three axes: wall-clock time to convergence, inference cost per simulation, and scaling behavior with swarm size. All benchmarks were run on our production inference cluster, which consists of 8 nodes each with 4x NVIDIA A100 80GB GPUs, connected via NVLink and 200 Gbps InfiniBand. The LLM serving layer uses vLLM with continuous batching and tensor parallelism across 2 GPUs per model replica.

For a 200-agent swarm with default personality configuration, the mean wall-clock time to convergence across 1,000 simulations is 47.2 seconds (σ = 8.4s). The breakdown: 22.1s for LLM inference (5.3 rounds at ~140ms per opinion generation, batched in groups of 16), 6.3s for embedding generation, 3.8s for opinion propagation graph construction, 8.2s for vector memory retrieval, 4.5s for convergence detection computation, and 2.3s of orchestration overhead. Embedding generation is a bottleneck that we're actively working to reduce with quantization and ONNX runtime optimization.

Scaling is roughly linear in swarm size up to 500 agents, with a superlinear inflection point around 600 agents where the HNSW index starts to degrade and the graph construction O(n² log n) complexity becomes noticeable. At 200 agents, the cost per simulation is approximately $0.14 in inference compute (at our internal rate of $0.85 per million tokens for the Mixtral 8x22B cluster). This compares favorably to a human focus group, which costs $3,000–$8,000 and takes 2–3 weeks from recruitment to deliverables. Even at 1,000 agents, the projected cost is $0.72 per simulation — still absurdly cheap compared to human research.

Here are the benchmark results for different swarm sizes:

Swarm Size — Time to Convergence — Rounds — Cost
50 — 14.3s — 4.1 — $0.04
100 — 26.8s — 4.7 — $0.08
200 — 47.2s — 5.3 — $0.14
300 — 71.5s — 5.8 — $0.23
400 — 98.1s — 6.2 — $0.33
500 — 129.4s — 6.6 — $0.45

The round count increases slowly with swarm size because the opinion propagation graph maintains constant neighborhood size (k=6), so per-agent communication complexity is flat. The superlinear time increase at larger sizes is driven primarily by the graph construction step (O(n²) divergence matrix computation) and the HNSW index rebuild, which we trigger after each round. We're exploring approximate divergence computation using random projection trees to address this.

9. What's Next

The current system is in production for our beta customers, but we have a roadmap of improvements that will dramatically expand its capabilities. The highest-priority project is dynamic swarm sizing, where the orchestration layer automatically selects the optimal number of agents for a given research question. A simple brand awareness survey might only need 30–50 agents, while a complex conjoint analysis for a new product category might benefit from 500+. We're training a meta-predictor that estimates the required swarm size from the question embedding and the desired confidence interval width.

GPU acceleration of the opinion propagation graph is another major focus. Our current implementation builds the divergence matrix on CPU using NumPy, which takes 3.8s for 200 agents. A custom CUDA kernel that computes pairwise cosine distances on-GPU would reduce this to under 100ms. We've prototyped this with CuPy and seen 40x speedups at 500-agent scale. The challenge is integrating it into our Python orchestration layer without introducing GPU memory contention with the LLM serving processes.

The most ambitious item on our roadmap is the 1,000-agent tier. At 1,000 agents, the current architecture would take roughly 8–10 minutes to converge — too slow for interactive use. We're designing a hierarchical consensus architecture where 1,000 agents are partitioned into 10 sub-swarms of 100, each converging independently, and then a meta-debate round reconciles the sub-swarm consensus vectors. This is analogous to representative democracy, and our simulations suggest it could reduce time-to-convergence by 4x compared to a flat 1,000-agent graph. We're targeting a Q4 2026 release for the 1,000-agent tier.

10. Conclusion

Building a 200-agent synthetic swarm that converges on consensus in 47 seconds required us to unlearn everything we thought we knew about multi-agent systems. The naive approach — more communication, more data, more rounds — was exactly wrong. The solution was to constrain communication through a carefully designed opinion propagation graph, to force agents to confront disagreement rather than seek agreement, and to layer in adversarial voices that prevent premature consensus. The result is a system that doesn't just aggregate opinions; it simulates the messy, nonlinear, surprisingly rational process of human social deliberation.

The technical details matter — the HNSW index parameters, the temperature decay schedule, the stubbornness clipping function — but the architectural lesson is broader. Synthetic intelligence systems work best when they reflect the constraints and dynamics of their real-world counterparts. Human focus groups converge because people are forced to listen to people they disagree with, because stubborn people exist alongside open-minded people, and because someone in the room is willing to play devil's advocate. Our swarm converges for exactly the same reasons, just faster and at a fraction of the cost.

If you're building a multi-agent system, we encourage you to think carefully about your interaction topology. The graph structure is not an implementation detail — it is the system. Choose it poorly and your agents will argue forever. Choose it well and they'll converge on insights that neither human researchers nor single-agent LLM systems could produce on their own. That's the promise of synthetic intelligence, and we're just getting started.