Understanding how Reactive Agents automatically optimizes performance using Thompson Sampling and cluster-based specialization

Core Concepts: Performance

Automatic Optimization

Reactive Agents automatically optimizes performance through Thompson Sampling and K-Means++ clustering. The system learns which configurations perform best and allocates traffic accordingly—no manual tuning required.

How It Works

Thompson Sampling: Tests all configurations initially, then gradually favors better performers using statistical learning.

K-Means++ Clustering: Groups similar requests together so each cluster can discover its own optimal configuration.

Reward-Based Learning: Performance scores from evaluations update statistics, driving selection of better partitions over time.

Key Factors

Temperature & Reasoning: Low = deterministic/fast, Medium = balanced, High = varied/slower

System Prompts: Multiple prompts tested to find optimal phrasing

Model Selection: System learns which models work best for each request type

Evaluation Methods

Enabled evaluation methods determine what gets optimized. Available methods include:

Enable preferred evaluations methods to optimize for balanced quality across different aspects

Convergence Timeline

0-50 requests: Exploration phase
50-200 requests: Convergence phase
200+ requests: Stable phase

Recommendation: Start with 2-3 clusters for most use cases

Best Practices

Setup:

Start conservative (2-3 clusters, 2 models, 2 prompts)
Enable evaluations matching your priorities
Allow 100+ requests before judging performance

Expected Results:

Simple tasks: Converge in 50-100 requests
Complex tasks: Converge in 100-200 requests
Typical improvement: 10-30% after convergence

Performance