Scaling Tool Calling with Amazon ECS and AgentRPC
In our previous tutorial, we explored how to build a TypeScript application using OpenAI's function calling capabilities and AgentRPC. Now, let's take it a step further and see how AgentRPC can simplify the deployment and scaling of AI tools in Amazon ECS environments.
The Challenge with ECS and AI Tools
When deploying AI assistants that need to call external tools in Amazon ECS, several challenges typically arise:
- Network complexity - Configuring access between services in different VPCs or AWS accounts
- Security concerns - Exposing internal APIs to the public internet
- Load balancing - Distributing requests across multiple task replicas
- Observability - Monitoring tool usage and performance
- Timeout limitations - Managing long-running function calls
AgentRPC provides an elegant solution to these challenges with its long-polling mechanism and managed service approach. With AgentRPC, your ECS Fargate tasks establish outbound connections to the AgentRPC service, eliminating the need for complex AWS networking configurations and allowing seamless scaling.
Setting Up a Scalable Weather Service in Amazon ECS
Let's see how we can deploy our weather service from the previous tutorial in an ECS cluster with multiple tasks, and connect it to our AI assistant without any complex network configuration.
Step 1: Containerize Your Weather Tool
First, let's create a Dockerfile for our weather service:
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npx", "tsx", "weather-tool.ts"]
Step 2: Create ECS Task Definition and Service
// weather-task-definition.json
{
"family": "weather-service",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "weather-service",
"image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/weather-service:latest",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/weather-service",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "ecs"
}
},
"secrets": [
{
"name": "AGENTRPC_API_SECRET",
"valueFrom": "arn:aws:ssm:us-west-2:123456789012:parameter/agentrpc/api-secret"
}
],
"portMappings": []
}
],
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512"
}
Deploy these resources to your AWS account:
aws ecs register-task-definition --cli-input-json file://weather-task-definition.json
aws ecs create-service \
--cluster your-ecs-cluster \
--service-name weather-service \
--task-definition weather-service \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-12345,subnet-67890],securityGroups=[sg-12345],assignPublicIp=ENABLED}"
Step 3: Update the Weather Tool to Handle Scaling
Now, let's update our weather tool to include information about the task serving the request:
// weather-tool.ts
import { AgentRPC } from "agentrpc";
import { z } from "zod";
import os from "os";
import dotenv from "dotenv";
dotenv.config();
const rpc = new AgentRPC({
apiSecret: process.env.AGENTRPC_API_SECRET!,
});
const hostname = os.hostname();
// Register the weather tool with schema validation using Zod
rpc.register({
name: "getWeather",
description: "Return weather information at a given location",
schema: z.object({ location: z.string() }),
handler: async ({ location }) => {
console.log(
`Processing weather request for ${location} on task ${hostname}`,
);
// In a real app, you would call a weather API here
return {
location: location,
temperature: "72°F",
condition: "Sunny",
humidity: "45%",
windSpeed: "5 mph",
server: hostname, // Include task hostname for demonstration
};
},
});
// Start the RPC server
rpc.listen();
console.log(`Weather tool service is running on task ${hostname}!`);
How AgentRPC Simplifies Amazon ECS Deployment
With our weather service deployed to Amazon ECS, let's discuss how AgentRPC simplifies this architecture:
1. No Need for API Gateway or Application Load Balancer
Since AgentRPC uses a long-polling mechanism, our weather service doesn't need to be exposed via an API Gateway or Application Load Balancer. Each task connects outbound to AgentRPC's service, eliminating the need for inbound connections.
2. Automatic Load Balancing and Failover
When a tool is called, AgentRPC automatically routes the request to an available task. If a task fails or is restarted, AgentRPC detects this and redirects requests to healthy tasks. This provides built-in resilience without requiring additional ECS configuration.
3. Cross-Account and Cross-VPC Communication
Your AI assistant could be running in a completely different AWS account or VPC from your tool services. With AgentRPC, there's no need to set up VPC peering, Transit Gateway, or expose internal services.
4. Observability with Minimal Setup
AgentRPC provides comprehensive observability for your tools, including:
- Request tracing across tasks
- Performance metrics for each tool
- Error monitoring and alerting
- Usage statistics and patterns
All of this comes without having to set up complex monitoring infrastructure like CloudWatch dashboards or X-Ray.
Service Auto Scaling with ECS and AgentRPC
One of the key advantages of Amazon ECS is the ability to automatically scale your services based on metrics. With AgentRPC, your tools can seamlessly scale with ECS Service Auto Scaling.
Let's add auto scaling to our weather service:
# First, register a scaling policy
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/your-ecs-cluster/weather-service \
--policy-name weather-service-cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 60
}'
# Then, configure min and max capacity
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/your-ecs-cluster/weather-service \
--min-capacity 3 \
--max-capacity 10
Now our weather service will automatically scale between 3 and 10 tasks based on CPU utilization. When the load increases, ECS will create more tasks, and these new tasks will automatically connect to AgentRPC without any additional configuration.
This is particularly useful for AI tools that may experience varying levels of demand. Some notable benefits include:
- Cost optimization - Only run the Fargate tasks you need at any given moment
- Performance under load - Automatically scale up during peak usage
- High availability - Maintain service even during traffic spikes
- Seamless integration with AgentRPC - New tasks automatically register with the service
AgentRPC handles the connection management and load balancing between these dynamically created tasks, ensuring your AI assistant always has access to the tools it needs regardless of how ECS scales your services.
Implementation Example
Let's see how our client code can remain unchanged, regardless of how we scale our backend services:
// agent-app.ts - No changes required despite ECS deployment
import { OpenAI } from "openai";
import { AgentRPC } from "agentrpc";
import dotenv from "dotenv";
dotenv.config();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const rpc = new AgentRPC({ apiSecret: process.env.AGENTRPC_API_SECRET });
async function main() {
// Get AgentRPC tools in OpenAI-compatible format
const tools = await rpc.OpenAI.getTools();
// Create a completion with the tools
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content:
"You are a helpful weather assistant. Use the getWeather tool when asked about weather conditions.",
},
{
role: "user",
content: "What's the weather like in San Francisco today?",
},
],
tools,
});
const message = completion.choices[0]?.message;
console.log("Initial response:", message.content);
// Handle tool calls - AgentRPC takes care of routing to available tasks
if (message?.tool_calls) {
for (const toolCall of message.tool_calls) {
console.log("Calling external tool:", toolCall.function.name);
// Execute the tool and get the result through AgentRPC
const result = await rpc.OpenAI.executeTool(toolCall);
console.log("Tool result:", result);
// Notice we can see which task handled our request
console.log(`Request handled by task: ${result.server}`);
// Generate a final response with the tool result
const finalResponse = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: "You are a helpful weather assistant.",
},
{
role: "user",
content: "What's the weather like in San Francisco today?",
},
message,
{
role: "tool",
tool_call_id: toolCall.id,
name: toolCall.function.name,
content: JSON.stringify(result),
},
],
});
console.log("Final response:", finalResponse.choices[0]?.message.content);
}
}
}
main().catch(console.error);
In this example, we can run the AI assistant in any environment, and it will automatically connect to the weather service running in Amazon ECS, as long as the AgentRPC service is accessible and the auth context is valid.
Real-World Use Case: Multi-Account Enterprise AI Architecture
A great real-world application of AgentRPC with Amazon ECS is a multi-account AWS enterprise AI architecture.
Imagine a large enterprise with:
- A dedicated AI/ML account for managing AI assistants
- Multiple product-specific accounts, each with their own ECS services
- Strict security and compliance requirements for cross-account access
With traditional approaches, enabling an AI assistant in one account to call services in other accounts would require:
- Setting up VPC peering or Transit Gateway connections
- Configuring complex IAM roles and policies
- Managing API Gateway endpoints with custom authorizers
- Dealing with cross-account access control challenges
AgentRPC simplifies this architecture significantly:
// Product database tool in the Product account
// running in ECS with IAM roles for DynamoDB access
rpc.register({
name: "queryProductCatalog",
description: "Query product information from the catalog",
schema: z.object({ productId: z.string() }),
handler: async ({ productId }) => {
// Access DynamoDB in the current account with IAM role
return {
id: productId,
name: "Premium Widget",
price: 299.99,
inventory: 42,
};
},
});
// Customer service tool in the Customer Service account
// running in ECS with access to customer support systems
rpc.register({
name: "getCustomerTickets",
description: "Get open support tickets for a customer",
schema: z.object({ customerId: z.string() }),
handler: async ({ customerId }) => {
// Access support ticketing system in current account
return [
{ id: "T-123", status: "open", subject: "Payment issue" },
{ id: "T-456", status: "pending", subject: "Product question" },
];
},
});
In this architecture:
- Each service runs in its own AWS account with appropriate IAM permissions
- Services connect outbound to AgentRPC, requiring no inbound connectivity
- The AI assistant in the AI/ML account can access tools across multiple accounts
- Security is maintained through AgentRPC's authentication mechanism
- No cross-account IAM roles or VPC connectivity required