Learn how Reactive Agents's partition system automatically optimizes your skills using Thompson Sampling and K-Means++ clustering.

Core Concepts: Partitions

What is Partition?

Partition in Reactive Agents refers to a specific configuration of hyperparameters (temperature, reasoning level, model, system prompt) optimized for different types of user requests. The system uses Thompson Sampling (multi-armed bandit) and K-Means++ clustering to automatically discover and select optimal configurations.

How It Works

When you create a skill, Reactive Agents automatically:

Creates Multiple Partitions: Generates combinations of models, prompts, and hyperparameters
Groups Requests: Clusters similar requests using semantic similarity
Tests Simultaneously: Routes requests across partitions to measure performance
Learns Over Time: Thompson Sampling selects better-performing partitions
Adapts: Re-clusters periodically to discover new request types

Thompson Sampling

Balances exploration (trying new configurations) with exploitation (using proven ones) by sampling from Beta distributions based on observed performance.

K-Means++ Clustering

Groups requests by semantic similarity, allowing specialized partitions for different request types. System re-clusters periodically to adapt to changing patterns.

Configuration Space

The system explores 9 base partitions covering different temperature and reasoning level combinations:

Temperature Range	Reasoning Levels
0 - 0.33	0, 0.5, 1
0.34 - 0.66	0, 0.5, 1
0.67 - 1	0, 0.5, 1

Each base partition is combined with different models and system prompts.

Total Partitions Formula:

Total = configuration_count × allowed_models × system_prompt_count × 9

Example:

3 clusters × 2 models × 2 prompts × 9 base = 108 partitions

Configuration Parameters

Configuration Count: Number of request clusters (default: 2-3)
Allowed Models: AI models to test (e.g., GPT-4, Claude)
System Prompt Count: Number of prompt variations to generate
Evaluation Methods: How success is measured (accuracy, latency, etc.)
Clustering Interval: How often to re-analyze request patterns

Evaluation and Rewards

Enabled evaluation methods determine what the system optimizes for. Multiple methods average their scores to create a reward signal that updates Thompson Sampling statistics.

Example: If accuracy=0.90, latency=0.75, and cost=0.60, the reward is 0.75, optimizing for all three objectives.

Best Practices

Detailed Skill Descriptions: Help generate better system prompts
Start Conservative: Begin with 2-3 clusters unless you have diverse request types
Allow Learning Time: Run 100+ requests before evaluating performance
Monitor Progress: Watch better partitions receive more traffic over time
Convergence: Most systems converge within 50-200 requests per cluster

Benefits

Automatic Optimization: No manual hyperparameter tuning required
Context-Aware: Different partitions for different request types
Self-Improving: Performance improves continuously with more data
Balanced Learning: Explores new options while exploiting known good ones

Partitions

On this page