Architecture Overview¶

Route ANS Resolver system architecture and design principles.

System Architecture¶

flowchart TD
    A[Clients<br/>A2A, MCP] -->|HTTP/HTTPS| B[ANS Resolver]

    subgraph B[ANS Resolver - Route ANS]
        C[HTTP API Layer]
        D[Resolver Core]
        E[Cache Layer]
        F[Trust Verifier]
        G[Registry Adapter]

        C --> D
        D --> E
        D --> F
        F --> G
    end

    G --> H[External Services]

    subgraph H[External Services]
        I[GoDaddy API]
        J[DNS]
        K[Redis]
    end

Core Components¶

1. HTTP API Layer¶

Location: internal/server/

Handles HTTP requests and responses:

REST API endpoints for resolution, batch operations, and health checks
Request validation and response formatting
Rate limiting to prevent abuse
Middleware chain for cross-cutting concerns
Prometheus metrics exposure

Design Pattern: Middleware Chain pattern allows composable request processing (logging, rate limiting, authentication, etc.).

2. Resolver Core¶

Location: internal/resolver/

Core resolution logic:

ANSName parsing and validation
Version negotiation using semantic versioning
Agent selection from multiple candidates
Resolution orchestration coordinating cache, registry, and trust verification

Design Pattern: Facade pattern that provides a simple interface to the complex subsystem of caching, registry lookup, version negotiation, and trust verification.

3. Cache Layer¶

Location: internal/cache/

Resolution result caching:

Memory-based cache for single-instance deployments
Redis-based cache for distributed deployments
TTL management with configurable expiration
Cache invalidation strategies

Design Pattern: Strategy pattern allows switching between memory and Redis implementations based on deployment requirements.

Extensibility: New cache backends (Memcached, DynamoDB) can be added by implementing the cache interface without modifying core logic.

4. Trust Verifier¶

Location: internal/trust/

Certificate verification:

Fingerprint validation against published certificates
Certificate chain validation
Trust store management for allowed CAs
Revocation checking (OCSP, CRL)

Design Pattern: Chain of Responsibility for multi-stage verification (fingerprint → chain → revocation).

Design Trade-off: Strict verification by default prevents MITM attacks but requires proper certificate management. Permissive mode available for development.

5. Registry Adapter¶

Location: internal/registry/

Backend registry integration:

GoDaddy API client for managed DNS
DNS TXT record lookup for self-hosted registries
Mock registry for testing environments
Extensible adapter interface

Design Pattern: Adapter pattern allows multiple registry backends (GoDaddy, DNS, custom APIs) behind a uniform interface.

Extensibility: New registries (Namecheap, Cloudflare, Route53) can be added without changing resolver logic.

Design Principles¶

1. Stateless Operation¶

The resolver maintains no session state, enabling:

Horizontal scaling with load balancers
Zero-downtime deployments
Simple disaster recovery (just restart)
No complex state synchronization

Trade-off: Cache is the only state, managed separately (Redis for distributed, memory for single-instance).

2. Fail-Fast¶

Requests fail quickly rather than hanging:

Short timeouts on external calls (registry, trust verification)
Explicit error returns for all failure modes
Circuit breakers prevent cascading failures
Health checks detect degraded state

Benefit: Clients get quick feedback and can retry or failover immediately.

3. Extensibility¶

Plugin architecture via interfaces:

Registry adapters: Add new registry backends
Cache providers: Add new cache implementations
Trust verifiers: Add custom verification logic
Middleware: Add custom request processing

Pattern: Dependency injection allows runtime composition of components.

4. Observable¶

Built-in observability:

Metrics: Prometheus metrics for all operations (requests, cache hits, registry latency)
Logging: Structured JSON logs with trace IDs
Tracing: OpenTelemetry support for distributed tracing
Health checks: Liveness and readiness probes

Design Choice: Zero-config defaults with optional exporters keeps it simple while enabling production observability.

Configuration Hierarchy¶

Configuration sources in priority order:

Command-line flags: Override everything (e.g., --port=8080)
Environment variables: Per-environment config (e.g., SERVER_PORT=8080)
Configuration file: Base configuration (YAML)
Defaults: Sensible defaults for quick starts

Design Rationale: 12-factor app principles for cloud-native deployments.

Resolution Flow¶

flowchart TD
    A[Client Request] --> B[Parse ANSName]
    B --> C{Cache Hit?}
    C -->|Yes| D[Return Cached Result]
    C -->|No| E[Registry Lookup]
    E --> F[Version Negotiation]
    F --> G{Trust Verification}
    G -->|Verified| H[Cache Result]
    G -->|Failed| I[Return Error]
    H --> J[Return Result]

    D -.-> Cache[(Cache)]
    E -.-> Registry[(Registry)]
    G -.-> Trust[(Trust Verifier)]
    H -.-> Cache

Flow Characteristics:

Optimistic caching: Cache checked first, misses go to registry
Version negotiation: Selects best matching version from available agents
Trust verification: Optional but recommended for security
Cache-aside pattern: Application manages cache explicitly

Deployment Patterns¶

Single Instance¶

graph LR
    A[Clients] --> B[ANS Resolver]
    B --> C[Memory Cache]
    B --> D[GoDaddy API]

Use Case: Development, testing, small-scale production
Characteristics: Simple, no dependencies, fast startup
Limitation: No horizontal scaling, cache not shared

Distributed¶

graph TB
    A[Load Balancer] --> B1[Resolver 1]
    A --> B2[Resolver 2]
    A --> B3[Resolver 3]
    B1 --> C[Redis Cache]
    B2 --> C
    B3 --> C
    B1 --> D[GoDaddy API]
    B2 --> D
    B3 --> D

Use Case: High-availability production
Characteristics: Horizontal scaling, shared cache, zero-downtime updates
Requirement: Redis for shared cache state

Edge Deployment¶

graph TB
    A[Global Load Balancer] --> B1[Region 1 Resolver]
    A --> B2[Region 2 Resolver]
    A --> B3[Region 3 Resolver]
    B1 --> C1[Regional Redis]
    B2 --> C2[Regional Redis]
    B3 --> C3[Regional Redis]

Use Case: Global, latency-sensitive applications
Characteristics: Regional deployments, CDN-like caching, 99.99% availability
Trade-off: Regional caches may have different TTLs, eventual consistency

API Design¶

HTTP Endpoints¶

Method	Path	Purpose
GET	`/v1/resolve`	Resolve single ANS name
POST	`/v1/resolve/batch`	Resolve multiple ANS names in one request
GET	`/v1/agent/{ansName}`	Get agent metadata without full resolution
GET	`/v1/agent/{ansName}/verify`	Verify agent certificate independently
GET	`/health`	Liveness probe (is process running?)
GET	`/ready`	Readiness probe (can it serve traffic?)
GET	`/metrics`	Prometheus metrics

Design Choice: RESTful with clear separation between resolution (GET /resolve) and batch operations (POST /resolve/batch).

Versioned API¶

API version in URL path (/v1/...):

Stability: v1 API is stable and backward-compatible
Evolution: v2 can be added alongside v1 without breaking existing clients
Deprecation: Old versions deprecated with long notice period

Error Handling¶

Error response structure:

HTTP status codes: Standard semantics (404 = not found, 500 = server error)
Error details: Structured error info (code, message, request_id)
Idempotency: Safe to retry GET requests
Partial failure: Batch operations return partial results with per-item errors

Failure Modes:

ANS name not found: 404 with error code
Invalid ANS name: 400 with validation details
Registry timeout: 504 with retry-after header
Trust verification failed: 403 with certificate details
Rate limit exceeded: 429 with retry-after header

Performance Characteristics¶

Latency Targets:

Cache hit: < 5ms (p99)
Cache miss: < 100ms (p95) including registry lookup
Batch operations: < 200ms for 10 names (p95)

Throughput:

Single instance: 1000+ req/s with hot cache
Distributed: Limited by registry backend, not resolver

Scaling:

Horizontal: Add more resolver instances behind load balancer
Cache: Redis supports 100k+ ops/s
Bottleneck: External registry API rate limits

Security Model¶

Security boundaries:

Resolver → Client: TLS/HTTPS via reverse proxy (nginx, Traefik)
Resolver → Registry: HTTPS with API authentication
Client → Agent: Agent handles its own TLS and authentication

Trust Model:

Resolver validates agent certificates (fingerprints)
Resolver does NOT authenticate clients (optional via reverse proxy)
Agent endpoints handle their own authentication

Rationale: Resolver is a discovery service (like DNS), security happens at the endpoints being discovered.

Extensibility Points¶

1. Custom Registry Backends¶

Add new registry sources:

Database-backed registries (PostgreSQL, MongoDB)
Blockchain-based registries (ENS-style)
Cloud service directories (AWS Service Discovery, Consul)

Extension Pattern: Implement registry adapter interface with lookup methods.

2. Custom Cache Providers¶

Add new cache backends:

Memcached for specific use cases
DynamoDB for serverless deployments
Local disk cache for edge scenarios

Extension Pattern: Implement cache interface with get/set/delete methods.

3. Custom Trust Verification¶

Add domain-specific verification:

Organization-specific CA requirements
Certificate transparency log checks
Custom revocation mechanisms

Extension Pattern: Implement trust verifier interface with verification logic.

4. Custom Middleware¶

Add request processing:

Custom authentication (OAuth, mTLS)
Request transformation
Response filtering
Custom metrics/logging

Extension Pattern: Standard HTTP middleware signature, compose via chain.

Next Steps¶

Component Details - Deep dive into each component
Resolution Flow - Detailed resolution process
ANS Specification - ANS name format and protocol