Scaling Tool Calling with Amazon ECS and AgentRPC

In our previous tutorial, we explored how to build a TypeScript application using OpenAI's function calling capabilities and AgentRPC. Now, let's take it a step further and see how AgentRPC can simplify the deployment and scaling of AI tools in Amazon ECS environments.

The Challenge with ECS and AI Tools

When deploying AI assistants that need to call external tools in Amazon ECS, several challenges typically arise:

Network complexity - Configuring access between services in different VPCs or AWS accounts
Security concerns - Exposing internal APIs to the public internet
Load balancing - Distributing requests across multiple task replicas
Observability - Monitoring tool usage and performance
Timeout limitations - Managing long-running function calls

AgentRPC provides an elegant solution to these challenges with its long-polling mechanism and managed service approach. With AgentRPC, your ECS Fargate tasks establish outbound connections to the AgentRPC service, eliminating the need for complex AWS networking configurations and allowing seamless scaling.

Setting Up a Scalable Weather Service in Amazon ECS

Let's see how we can deploy our weather service from the previous tutorial in an ECS cluster with multiple tasks, and connect it to our AI assistant without any complex network configuration.

Step 1: Containerize Your Weather Tool

First, let's create a Dockerfile for our weather service:

FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm install

COPY . .

CMD ["npx", "tsx", "weather-tool.ts"]

Step 2: Create ECS Task Definition and Service

// weather-task-definition.json
{
  "family": "weather-service",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "weather-service",
      "image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/weather-service:latest",
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/weather-service",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "secrets": [
        {
          "name": "AGENTRPC_API_SECRET",
          "valueFrom": "arn:aws:ssm:us-west-2:123456789012:parameter/agentrpc/api-secret"
        }
      ],
      "portMappings": []
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

Deploy these resources to your AWS account:

aws ecs register-task-definition --cli-input-json file://weather-task-definition.json

aws ecs create-service \
  --cluster your-ecs-cluster \
  --service-name weather-service \
  --task-definition weather-service \
  --desired-count 3 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-12345,subnet-67890],securityGroups=[sg-12345],assignPublicIp=ENABLED}"

Step 3: Update the Weather Tool to Handle Scaling

Now, let's update our weather tool to include information about the task serving the request:

// weather-tool.ts
import { AgentRPC } from "agentrpc";
import { z } from "zod";
import os from "os";
import dotenv from "dotenv";

dotenv.config();

const rpc = new AgentRPC({
  apiSecret: process.env.AGENTRPC_API_SECRET!,
});

const hostname = os.hostname();

// Register the weather tool with schema validation using Zod
rpc.register({
  name: "getWeather",
  description: "Return weather information at a given location",
  schema: z.object({ location: z.string() }),
  handler: async ({ location }) => {
    console.log(
      `Processing weather request for ${location} on task ${hostname}`,
    );

    // In a real app, you would call a weather API here
    return {
      location: location,
      temperature: "72°F",
      condition: "Sunny",
      humidity: "45%",
      windSpeed: "5 mph",
      server: hostname, // Include task hostname for demonstration
    };
  },
});

// Start the RPC server
rpc.listen();

console.log(`Weather tool service is running on task ${hostname}!`);

How AgentRPC Simplifies Amazon ECS Deployment

With our weather service deployed to Amazon ECS, let's discuss how AgentRPC simplifies this architecture:

1. No Need for API Gateway or Application Load Balancer

Since AgentRPC uses a long-polling mechanism, our weather service doesn't need to be exposed via an API Gateway or Application Load Balancer. Each task connects outbound to AgentRPC's service, eliminating the need for inbound connections.

2. Automatic Load Balancing and Failover

When a tool is called, AgentRPC automatically routes the request to an available task. If a task fails or is restarted, AgentRPC detects this and redirects requests to healthy tasks. This provides built-in resilience without requiring additional ECS configuration.

3. Cross-Account and Cross-VPC Communication

Your AI assistant could be running in a completely different AWS account or VPC from your tool services. With AgentRPC, there's no need to set up VPC peering, Transit Gateway, or expose internal services.

4. Observability with Minimal Setup

AgentRPC provides comprehensive observability for your tools, including:

Request tracing across tasks
Performance metrics for each tool
Error monitoring and alerting
Usage statistics and patterns

All of this comes without having to set up complex monitoring infrastructure like CloudWatch dashboards or X-Ray.

Service Auto Scaling with ECS and AgentRPC

One of the key advantages of Amazon ECS is the ability to automatically scale your services based on metrics. With AgentRPC, your tools can seamlessly scale with ECS Service Auto Scaling.

Let's add auto scaling to our weather service:

# First, register a scaling policy
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --scalable-dimension ecs:service:DesiredCount \
  --resource-id service/your-ecs-cluster/weather-service \
  --policy-name weather-service-cpu-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 60
  }'

# Then, configure min and max capacity
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --scalable-dimension ecs:service:DesiredCount \
  --resource-id service/your-ecs-cluster/weather-service \
  --min-capacity 3 \
  --max-capacity 10

Now our weather service will automatically scale between 3 and 10 tasks based on CPU utilization. When the load increases, ECS will create more tasks, and these new tasks will automatically connect to AgentRPC without any additional configuration.

This is particularly useful for AI tools that may experience varying levels of demand. Some notable benefits include:

Cost optimization - Only run the Fargate tasks you need at any given moment
Performance under load - Automatically scale up during peak usage
High availability - Maintain service even during traffic spikes
Seamless integration with AgentRPC - New tasks automatically register with the service

AgentRPC handles the connection management and load balancing between these dynamically created tasks, ensuring your AI assistant always has access to the tools it needs regardless of how ECS scales your services.

Implementation Example

Let's see how our client code can remain unchanged, regardless of how we scale our backend services:

// agent-app.ts - No changes required despite ECS deployment
import { OpenAI } from "openai";
import { AgentRPC } from "agentrpc";
import dotenv from "dotenv";

dotenv.config();

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const rpc = new AgentRPC({ apiSecret: process.env.AGENTRPC_API_SECRET });

async function main() {
  // Get AgentRPC tools in OpenAI-compatible format
  const tools = await rpc.OpenAI.getTools();

  // Create a completion with the tools
  const completion = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content:
          "You are a helpful weather assistant. Use the getWeather tool when asked about weather conditions.",
      },
      {
        role: "user",
        content: "What's the weather like in San Francisco today?",
      },
    ],
    tools,
  });

  const message = completion.choices[0]?.message;
  console.log("Initial response:", message.content);

  // Handle tool calls - AgentRPC takes care of routing to available tasks
  if (message?.tool_calls) {
    for (const toolCall of message.tool_calls) {
      console.log("Calling external tool:", toolCall.function.name);

      // Execute the tool and get the result through AgentRPC
      const result = await rpc.OpenAI.executeTool(toolCall);
      console.log("Tool result:", result);

      // Notice we can see which task handled our request
      console.log(`Request handled by task: ${result.server}`);

      // Generate a final response with the tool result
      const finalResponse = await openai.chat.completions.create({
        model: "gpt-4o",
        messages: [
          {
            role: "system",
            content: "You are a helpful weather assistant.",
          },
          {
            role: "user",
            content: "What's the weather like in San Francisco today?",
          },
          message,
          {
            role: "tool",
            tool_call_id: toolCall.id,
            name: toolCall.function.name,
            content: JSON.stringify(result),
          },
        ],
      });

      console.log("Final response:", finalResponse.choices[0]?.message.content);
    }
  }
}

main().catch(console.error);

In this example, we can run the AI assistant in any environment, and it will automatically connect to the weather service running in Amazon ECS, as long as the AgentRPC service is accessible and the auth context is valid.

Real-World Use Case: Multi-Account Enterprise AI Architecture

A great real-world application of AgentRPC with Amazon ECS is a multi-account AWS enterprise AI architecture.

Imagine a large enterprise with:

A dedicated AI/ML account for managing AI assistants
Multiple product-specific accounts, each with their own ECS services
Strict security and compliance requirements for cross-account access

With traditional approaches, enabling an AI assistant in one account to call services in other accounts would require:

Setting up VPC peering or Transit Gateway connections
Configuring complex IAM roles and policies
Managing API Gateway endpoints with custom authorizers
Dealing with cross-account access control challenges

AgentRPC simplifies this architecture significantly:

// Product database tool in the Product account
// running in ECS with IAM roles for DynamoDB access
rpc.register({
  name: "queryProductCatalog",
  description: "Query product information from the catalog",
  schema: z.object({ productId: z.string() }),
  handler: async ({ productId }) => {
    // Access DynamoDB in the current account with IAM role
    return {
      id: productId,
      name: "Premium Widget",
      price: 299.99,
      inventory: 42,
    };
  },
});

// Customer service tool in the Customer Service account
// running in ECS with access to customer support systems
rpc.register({
  name: "getCustomerTickets",
  description: "Get open support tickets for a customer",
  schema: z.object({ customerId: z.string() }),
  handler: async ({ customerId }) => {
    // Access support ticketing system in current account
    return [
      { id: "T-123", status: "open", subject: "Payment issue" },
      { id: "T-456", status: "pending", subject: "Product question" },
    ];
  },
});

In this architecture:

Each service runs in its own AWS account with appropriate IAM permissions
Services connect outbound to AgentRPC, requiring no inbound connectivity
The AI assistant in the AI/ML account can access tools across multiple accounts
Security is maintained through AgentRPC's authentication mechanism
No cross-account IAM roles or VPC connectivity required