Optimization Algorithm
Understanding performance impact of configuration parameters
Overview
Reactive Agents uses a sophisticated optimization algorithm that combines Thompson Sampling (a Bayesian multi-armed bandit approach) with K-Means++ clustering to automatically learn and select the best hyperparameter configurations for different types of user requests.
Core Components
1. Thompson Sampling (Multi-Armed Bandit)
The system treats each hyperparameter configuration as an "arm" in a multi-armed bandit problem. Thompson Sampling is used to balance exploration (trying new configurations) with exploitation (using known good configurations).
Implementation (lib/server/middlewares/idkhub-configuration.ts:30-56):
function getOptimalArm(arms: SkillOptimizationArm[]): SkillOptimizationArm {
// Implement Thompson Sampling algorithm for multi-armed bandit
// Thompson Sampling uses Bayesian approach: sample from posterior Beta distribution
// and select the arm with highest sampled value
let optimalArm = arms[0];
let maxSample = -Infinity;
for (const arm of arms) {
// Beta distribution parameters with uniform prior (Beta(1,1))
// alpha = successes + 1, beta = failures + 1
const successes = arm.stats.total_reward;
const failures = arm.stats.n - arm.stats.total_reward;
const alpha = successes + 1;
const beta = failures + 1;
// Sample from Beta(alpha, beta)
const sample = sampleBeta(alpha, beta);
if (sample > maxSample) {
maxSample = sample;
optimalArm = arm;
}
}
return optimalArm;
}Key Characteristics:
- Uses Beta distribution with uniform prior Beta(1,1)
- Samples from posterior distribution to balance exploration and exploitation
- Selects arm with highest sampled value
- Automatically adapts based on observed rewards
2. K-Means++ Clustering
User requests are grouped by semantic similarity using K-Means++ clustering on embeddings. This allows the system to learn different optimal configurations for different types of requests.
Implementation (lib/server/utils/math.ts:100-171):
export function kMeansClustering(
embeddings: number[][],
k: number,
maxIterations = 100,
): ClusterResult {
const n = embeddings.length;
if (k >= n) {
// Each point is its own cluster
return {
clusters: Array.from({ length: n }, (_, i) => i),
centroids: embeddings.map((e) => [...e]),
iterations: 0,
};
}
// Initialize centroids using k-means++
let centroids = initializeCentroidsKMeansPlusPlus(embeddings, k);
const clusters = new Array(n).fill(0);
for (let iteration = 0; iteration < maxIterations; iteration++) {
let changed = false;
// Assign each point to the nearest centroid
for (let i = 0; i < n; i++) {
let minDistance = Infinity;
let nearestCluster = 0;
for (let c = 0; c < k; c++) {
const distance = calculateDistance(embeddings[i], centroids[c]);
if (distance < minDistance) {
minDistance = distance;
nearestCluster = c;
}
}
if (clusters[i] !== nearestCluster) {
clusters[i] = nearestCluster;
changed = true;
}
}
// If no assignments changed, we've converged
if (!changed) {
return { clusters, centroids, iterations: iteration + 1 };
}
// Update centroids
const newCentroids: number[][] = [];
for (let c = 0; c < k; c++) {
const clusterPoints = embeddings.filter((_, i) => clusters[i] === c);
if (clusterPoints.length > 0) {
newCentroids.push(calculateCentroid(clusterPoints));
} else {
// Keep the old centroid if no points are assigned to this cluster
newCentroids.push([...centroids[c]]);
}
}
centroids = newCentroids;
}
return { clusters, centroids, iterations: maxIterations };
}Key Characteristics:
- K-Means++ initialization for better initial centroid selection
- Euclidean distance-based clustering
- Automatic convergence detection
- Maximum 100 iterations limit
3. Cluster Selection
For each incoming request, the system finds the most relevant cluster using cosine similarity:
function getOptimalCluster(
embedding: number[],
clusters: SkillOptimizationCluster[],
): SkillOptimizationCluster {
// Find the cluster with the highest cosine similarity to the embedding
let optimalCluster = clusters[0];
let maxSimilarity = -1;
for (const cluster of clusters) {
const similarity = cosineSimilarity(embedding, cluster.centroid);
if (similarity > maxSimilarity) {
maxSimilarity = similarity;
optimalCluster = cluster;
}
}
return optimalCluster;
}4. Statistical Updates
After each request, the system updates arm statistics using incremental formulas:
Implementation (lib/server/middlewares/optimizer/hyperparameters.ts:19-38):
export async function updateArmStats(
userDataStorageConnector: UserDataStorageConnector,
arm: SkillOptimizationArm,
reward: number,
) {
// Update arm statistics using incremental update formulas for Thompson Sampling
const newN = arm.stats.n + 1;
const newTotalReward = arm.stats.total_reward + reward;
const newMean = newTotalReward / newN;
const newN2 = arm.stats.n2 + reward * reward;
await userDataStorageConnector.updateSkillOptimizationArm(arm.id, {
stats: {
n: newN,
mean: newMean,
n2: newN2,
total_reward: newTotalReward,
},
});
}Configuration Space
The system optimizes across 9 base configurations (lib/server/optimization/base-arms.ts), varying:
| Configuration | Temperature Range | Reasoning Level |
|---|---|---|
| Arm 1 | 0 - 0.33 | 0 (no reasoning) |
| Arm 2 | 0 - 0.33 | 0.5 (moderate) |
| Arm 3 | 0 - 0.33 | 1 (full reasoning) |
| Arm 4 | 0.34 - 0.66 | 0 |
| Arm 5 | 0.34 - 0.66 | 0.5 |
| Arm 6 | 0.34 - 0.66 | 1 |
| Arm 7 | 0.67 - 1 | 0 |
| Arm 8 | 0.67 - 1 | 0.5 |
| Arm 9 | 0.67 - 1 | 1 |
These base configurations are further combined with different:
- Models (e.g., GPT-4, Claude, etc.)
- System prompts
- Other hyperparameters
Creating a large configuration space to explore and optimize.
How Configuration Parameters Affect the Algorithm
The optimization system is highly configurable. Here's how each parameter influences the algorithm:
Skill Description
Impact: Guides initial system prompt generation and defines the task context
- Used to generate seed system prompts via LLM (
lib/server/optimization/utils/system-prompt.ts:20-24) - Provides context for what the skill should accomplish
- Influences the quality of initial configurations before optimization begins
Configuration Count (Number of Clusters)
Impact: Determines how finely request types are segmented
- Directly sets
kin K-Means++ clustering (lib/server/middlewares/optimizer/clusters.ts:34-52) - Higher values = more specialized configurations for different request types
- Lower values = more general configurations across broader request types
- Formula:
Total Arms = configuration_count × allowed_models × system_prompt_count × 9
System Prompt Count
Impact: Expands configuration space with different instruction variants
- Generates multiple system prompt variations per model/cluster combination
- Each system prompt becomes part of separate arms
- After convergence, system prompts are refined through reflection on best-performing arms (
lib/server/middlewares/optimizer/system-prompt.ts:63-138)
Allowed Models
Impact: Multiplies the configuration space by testing different AI models
- Each model gets its own set of arms with all hyperparameter combinations
- Allows the system to learn which models perform best for different request types
- Models are associated with skills via
getSkillModels()(lib/server/optimization/skill-optimizations.ts:44)
Enabled Evaluation Methods
Impact: Defines how success is measured and rewards are calculated
- Multiple evaluation methods can be enabled simultaneously
- Each method produces a score (0-1) for a completed request
- Scores are averaged into a single reward signal (
lib/server/middlewares/optimizer/hyperparameters.ts:12-16) - This reward directly updates Thompson Sampling statistics:
total_reward, which determinesalphain Beta distribution - Different evaluation methods = different optimization objectives
Clustering Interval
Impact: Controls how often the system adapts to changing patterns
- Triggers automatic re-clustering using K-Means++
- Allows system to discover new request types over time
- Old clusters are matched to new clusters to preserve learned statistics
Algorithm Flow with Configuration Parameters
Configuration Space Examples
The total number of arms (configurations) grows multiplicatively with each parameter:
Formula:
Total Arms = configuration_count × allowed_models × system_prompt_count × 9 base armsExample Configurations
| Scenario | Clusters | Models | Prompts | Base Arms | Total Arms |
|---|---|---|---|---|---|
| Small | 2 | 2 | 2 | 9 | 72 |
| Medium | 3 | 3 | 3 | 9 | 243 |
| Large | 5 | 4 | 4 | 9 | 720 |
| Enterprise | 10 | 5 | 5 | 9 | 2,250 |
Impact of increasing each parameter:
- More clusters: Better segmentation of request types, but requires more data to converge
- More models: Tests different AI providers/versions, increases cost but finds optimal model per cluster
- More prompts: More instruction variations to explore, beneficial when task requires precise wording
- Base arms: Fixed at 9 (3 temperature ranges × 3 reasoning levels)
Evaluation Methods and Reward Signal
The enabled evaluation methods directly determine what the system optimizes for. Here's how they create the reward signal:
Evaluation Flow
- Request Completes: AI response is generated using selected arm configuration
- Run Evaluations: Each enabled evaluation method runs independently (
lib/server/middlewares/logs.ts:188-207) - Score Generation: Each method produces a score between 0 and 1
- Reward Calculation: Scores are averaged into a single reward
- Statistical Update: Reward updates Thompson Sampling statistics
Example Evaluation Scenarios
Scenario 1: Single Evaluation Method (Accuracy)
Enabled Methods: [accuracy]
Evaluation Scores: {accuracy: 0.85}
Final Reward: 0.85
→ System optimizes purely for accuracyScenario 2: Multiple Evaluation Methods
Enabled Methods: [accuracy, latency, cost]
Evaluation Scores: {
accuracy: 0.90,
latency: 0.75, (faster = higher score)
cost: 0.60 (cheaper = higher score)
}
Final Reward: (0.90 + 0.75 + 0.60) / 3 = 0.75
→ System balances all three objectivesScenario 3: Custom Evaluation Weights
Note: Current implementation uses simple averaging.
Future: Could implement weighted averaging for prioritizing certain metrics.Impact on Thompson Sampling
The reward directly affects Thompson Sampling's Beta distribution parameters:
// From lib/server/middlewares/idkhub-configuration.ts:30-33
const successes = arm.stats.total_reward; // ← Sum of all rewards
const failures = arm.stats.n - arm.stats.total_reward;
const alpha = successes + 1; // ← Drives Beta distribution
const beta = failures + 1;Key Insight: Changing evaluation methods changes what "success" means, which changes which arms get selected over time.
Optimization Process
- Configuration Setup: Define skill description, clusters, models, prompts, and evaluation methods
- Initialization: Generate system prompts and create full arm configuration space
- Request Arrival: A new user request arrives
- Embedding Generation: Convert request to vector embedding
- Cluster Selection: Find the nearest cluster using cosine similarity
- Arm Selection: Use Thompson Sampling to select a configuration (model + prompt + hyperparameters)
- Execution: Execute the request with the selected configuration
- Evaluation: Run enabled evaluation methods and calculate reward
- Learning: Update arm statistics for future Thompson Sampling decisions
- Re-clustering: Periodically re-cluster to adapt to changing request patterns
- Reflection: After convergence, generate improved system prompts based on best-performing arms
Automatic Re-clustering
The system automatically re-clusters when the clustering_interval is reached (lib/server/middlewares/optimizer/clusters.ts:54-173). This allows the system to:
- Adapt to changing request patterns
- Discover new types of requests
- Optimize configurations for emerging use cases
Benefits
- Adaptive Learning: Automatically learns which configurations work best
- Context-Aware: Different configurations for different types of requests
- Exploration/Exploitation Balance: Thompson Sampling naturally balances trying new configurations with using proven ones
- Scalable: Handles large configuration spaces efficiently
- Self-Improving: Performance improves over time as more data is collected