Reactive Agents

Core concepts

Partitions

Learn how Reactive Agents's partition system automatically optimizes your skills using Thompson Sampling and K-Means++ clustering.

Core Concepts: Partitions

What is Partition?

Partition in Reactive Agents refers to a specific configuration of hyperparameters (temperature, reasoning level, model, system prompt) optimized for different types of user requests. The system uses Thompson Sampling (multi-armed bandit) and K-Means++ clustering to automatically discover and select optimal configurations.

How It Works

When you create a skill, Reactive Agents automatically:

  1. Creates Multiple Partitions: Generates combinations of models, prompts, and hyperparameters
  2. Groups Requests: Clusters similar requests using semantic similarity
  3. Tests Simultaneously: Routes requests across partitions to measure performance
  4. Learns Over Time: Thompson Sampling selects better-performing partitions
  5. Adapts: Re-clusters periodically to discover new request types

Thompson Sampling

Balances exploration (trying new configurations) with exploitation (using proven ones) by sampling from Beta distributions based on observed performance.

K-Means++ Clustering

Groups requests by semantic similarity, allowing specialized partitions for different request types. System re-clusters periodically to adapt to changing patterns.

Configuration Space

The system explores 9 base partitions covering different temperature and reasoning level combinations:

Temperature RangeReasoning Levels
0 - 0.330, 0.5, 1
0.34 - 0.660, 0.5, 1
0.67 - 10, 0.5, 1

Each base partition is combined with different models and system prompts.

Total Partitions Formula:

Total = configuration_count × allowed_models × system_prompt_count × 9

Example:

  • 3 clusters × 2 models × 2 prompts × 9 base = 108 partitions

Configuration Parameters

  • Configuration Count: Number of request clusters (default: 2-3)
  • Allowed Models: AI models to test (e.g., GPT-4, Claude)
  • System Prompt Count: Number of prompt variations to generate
  • Evaluation Methods: How success is measured (accuracy, latency, etc.)
  • Clustering Interval: How often to re-analyze request patterns

Evaluation and Rewards

Enabled evaluation methods determine what the system optimizes for. Multiple methods average their scores to create a reward signal that updates Thompson Sampling statistics.

Example: If accuracy=0.90, latency=0.75, and cost=0.60, the reward is 0.75, optimizing for all three objectives.

Best Practices

  • Detailed Skill Descriptions: Help generate better system prompts
  • Start Conservative: Begin with 2-3 clusters unless you have diverse request types
  • Allow Learning Time: Run 100+ requests before evaluating performance
  • Monitor Progress: Watch better partitions receive more traffic over time
  • Convergence: Most systems converge within 50-200 requests per cluster

Benefits

  • Automatic Optimization: No manual hyperparameter tuning required
  • Context-Aware: Different partitions for different request types
  • Self-Improving: Performance improves continuously with more data
  • Balanced Learning: Explores new options while exploiting known good ones