Scaling Tool Calling with Kubernetes and AgentRPC

In our previous tutorial, we explored how to build a TypeScript application using OpenAI's function calling capabilities and AgentRPC. Now, let's take it a step further and see how AgentRPC can simplify the deployment and scaling of AI tools in Kubernetes environments.

The Challenge with Kubernetes and AI Tools

When deploying AI assistants that need to call external tools in Kubernetes, several challenges typically arise:

Network complexity - Configuring access between services in different namespaces or clusters
Security concerns - Exposing internal APIs to the public internet
Load balancing - Distributing requests across multiple tool replicas
Observability - Monitoring tool usage and performance
Timeout limitations - Managing long-running function calls

AgentRPC provides an elegant solution to these challenges with its long-polling mechanism and managed service approach. With AgentRPC, your Kubernetes pods establish outbound connections to the AgentRPC service, eliminating the need for complex ingress configurations and allowing seamless scaling.

Setting Up a Scalable Weather Service in Kubernetes

Let's see how we can deploy our weather service from the previous tutorial in a Kubernetes cluster with multiple replicas, and connect it to our AI assistant without any complex network configuration.

Step 1: Containerize Your Weather Tool

First, let's create a Dockerfile for our weather service:

FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm install

COPY . .

CMD ["npx", "tsx", "weather-tool.ts"]

Step 2: Create Kubernetes Deployment and Service

# weather-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: weather-service
spec:
  replicas: 3 # Running multiple replicas for scalability
  selector:
    matchLabels:
      app: weather-service
  template:
    metadata:
      labels:
        app: weather-service
    spec:
      containers:
        - name: weather-service
          image: your-registry/weather-service:latest
          env:
            - name: AGENTRPC_API_SECRET
              valueFrom:
                secretKeyRef:
                  name: api-secrets
                  key: agentrpc-api-secret
---
apiVersion: v1
kind: Secret
metadata:
  name: api-secrets
type: Opaque
data:
  agentrpc-api-secret: <base64-encoded-secret>

Deploy these resources to your Kubernetes cluster:

kubectl apply -f weather-service.yaml

Step 3: Update the Weather Tool to Handle Scaling

Now, let's update our weather tool to include information about the replica serving the request:

// weather-tool.ts
import { AgentRPC } from "agentrpc";
import { z } from "zod";
import os from "os";
import dotenv from "dotenv";

dotenv.config();

const rpc = new AgentRPC({
  apiSecret: process.env.AGENTRPC_API_SECRET!,
});

const hostname = os.hostname();

// Register the weather tool with schema validation using Zod
rpc.register({
  name: "getWeather",
  description: "Return weather information at a given location",
  schema: z.object({ location: z.string() }),
  handler: async ({ location }) => {
    console.log(
      `Processing weather request for ${location} on pod ${hostname}`,
    );

    // In a real app, you would call a weather API here
    return {
      location: location,
      temperature: "72°F",
      condition: "Sunny",
      humidity: "45%",
      windSpeed: "5 mph",
      server: hostname, // Include pod hostname for demonstration
    };
  },
});

// Start the RPC server
rpc.listen();

console.log(`Weather tool service is running on pod ${hostname}!`);

How AgentRPC Simplifies Kubernetes Deployment

With our weather service deployed to Kubernetes, let's discuss how AgentRPC simplifies this architecture:

1. No Need for Ingress or API Gateway

Since AgentRPC uses a long-polling mechanism, our weather service doesn't need to be exposed via an ingress controller or API gateway. Each replica connects outbound to AgentRPC's service, eliminating the need for inbound connections.

2. Automatic Load Balancing and Failover

When a tool is called, AgentRPC automatically routes the request to an available replica. If a pod fails or is restarted, AgentRPC detects this and redirects requests to healthy replicas. This provides built-in resilience without requiring additional Kubernetes configuration.

3. Cross-Cluster and Cross-VPC Communication

Your AI assistant could be running in a completely different environment from your tool services. With AgentRPC, there's no need to set up VPN tunnels, direct connections, or expose internal services.

4. Observability with Minimal Setup

AgentRPC provides comprehensive observability for your tools, including:

Request tracing across replicas
Performance metrics for each tool
Error monitoring and alerting
Usage statistics and patterns

All of this comes without having to set up complex monitoring infrastructure.

Horizontal Pod Autoscaling with AgentRPC

One of the key advantages of Kubernetes is the ability to automatically scale your resources based on metrics. With AgentRPC, your tools can seamlessly scale with the Kubernetes Horizontal Pod Autoscaler (HPA).

Let's add an HPA configuration to our weather service:

# weather-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: weather-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: weather-service
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Apply this configuration:

kubectl apply -f weather-hpa.yaml

Now our weather service will automatically scale between 3 and 10 replicas based on CPU utilization. When the load increases, Kubernetes will create more pods, and these new pods will automatically connect to AgentRPC without any additional configuration.

This is particularly useful for AI tools that may experience varying levels of demand. Some notable benefits include:

Cost efficiency - Only run the resources you need at any given moment
Improved response times - Automatically scale up during peak usage
Resilience - Automatically replace unhealthy instances
Zero configuration with AgentRPC - New pods automatically register with the service

AgentRPC handles the connection management and load balancing between these dynamically created replicas, ensuring your AI assistant always has access to the tools it needs.

Implementation Example

Let's see how our client code can remain unchanged, regardless of how we scale our backend services:

// agent-app.ts - No changes required despite Kubernetes deployment
import { OpenAI } from "openai";
import { AgentRPC } from "agentrpc";
import dotenv from "dotenv";

dotenv.config();

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const rpc = new AgentRPC({ apiSecret: process.env.AGENTRPC_API_SECRET });

async function main() {
  // Get AgentRPC tools in OpenAI-compatible format
  const tools = await rpc.OpenAI.getTools();

  // Create a completion with the tools
  const completion = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content:
          "You are a helpful weather assistant. Use the getWeather tool when asked about weather conditions.",
      },
      {
        role: "user",
        content: "What's the weather like in San Francisco today?",
      },
    ],
    tools,
  });

  const message = completion.choices[0]?.message;
  console.log("Initial response:", message.content);

  // Handle tool calls - AgentRPC takes care of routing to available replicas
  if (message?.tool_calls) {
    for (const toolCall of message.tool_calls) {
      console.log("Calling external tool:", toolCall.function.name);

      // Execute the tool and get the result through AgentRPC
      const result = await rpc.OpenAI.executeTool(toolCall);
      console.log("Tool result:", result);

      // Notice we can see which pod handled our request
      console.log(`Request handled by pod: ${result.server}`);

      // Generate a final response with the tool result
      const finalResponse = await openai.chat.completions.create({
        model: "gpt-4o",
        messages: [
          {
            role: "system",
            content: "You are a helpful weather assistant.",
          },
          {
            role: "user",
            content: "What's the weather like in San Francisco today?",
          },
          message,
          {
            role: "tool",
            tool_call_id: toolCall.id,
            name: toolCall.function.name,
            content: JSON.stringify(result),
          },
        ],
      });

      console.log("Final response:", finalResponse.choices[0]?.message.content);
    }
  }
}

main().catch(console.error);

In this example, we can run the AI assistant in any environment, and it will automatically connect to the weather service running in Kubernetes, as long as the AgentRPC service is accessible and the auth context is valid.

Real-World Use Case: Multi-Cluster ML Model Serving

A great real-world application of AgentRPC with Kubernetes is distributed machine learning model serving across multiple specialized clusters.

Imagine you have:

A text-to-image generation model running on GPU nodes in one cluster
A speech-to-text model running on CPU-optimized nodes in another cluster
A text classification model running in a third cluster

Each model is deployed as a separate Kubernetes deployment with specialized resource requirements and scaling characteristics. With AgentRPC, you can:

Register each model as a separate tool
Deploy each model in the most appropriate cluster with optimal resource configuration
Scale each model independently based on its specific demand patterns
Provide a unified interface to LLMs that want to use these tools

// Image generation tool in GPU cluster
rpc.register({
  name: "generateImage",
  description: "Generate an image based on a text prompt",
  schema: z.object({ prompt: z.string() }),
  handler: async ({ prompt }) => {
    // Call local Stable Diffusion model
    return { imageUrl: "https://example.com/image.png" };
  },
});

// Speech recognition tool in CPU cluster
rpc.register({
  name: "transcribeSpeech",
  description: "Transcribe speech audio to text",
  schema: z.object({ audioUrl: z.string() }),
  handler: async ({ audioUrl }) => {
    // Call local Whisper model
    return { transcript: "This is the transcribed text." };
  },
});

With AgentRPC, these tools appear as a unified API to the AI assistant, even though they're running on entirely different clusters with different hardware configurations and scaling requirements.