The Problem: Tight Coupling and Fragile Events

Legacy systems often suffer from invisible dependencies. The Amazon Key team—responsible for in-garage delivery and building access management—hit a wall: a single failing service could cascade into a system-wide deadlock. Their event schemas were loose, validation was manual, and adding a new subscriber required custom plumbing every time.

Key numbers from the old system:

  • Service integration for a new use case: 5 days
  • Onboarding a new event schema: 48 hours
  • Publisher/subscriber integration: 40 hours

The root cause? A tightly coupled monolith where every service talked to every other service through ad-hoc SNS/SQS pairs. There was no single source of truth for event definitions, no automated validation, and no standardized routing.

The Solution: Single-Bus, Multi-Account with Schema Governance

The team chose Amazon EventBridge as the central nervous system, but they didn't stop there. They built three complementary components that turned a simple event bus into a scalable, developer-friendly platform:

1. Event Schema Repository

Instead of relying on EventBridge's native schema registry alone (which lacks native validation), they built a custom schema repository that acts as the single source of truth. This repository:

  • Stores JSON Schemas for every event type
  • Enforces versioning and deprecation policies
  • Provides a self-service portal for teams to discover and document events
  • Runs automated validation at build time, not runtime

2. Client Library (Type-Safe Code Generation)

At build time, the library generates type-safe bindings from the schema repository. This means developers get autocomplete, compile-time error checking, and automatic serialization/deserialization.

# Example: Publishing an event using the generated client library
from amazon_key.events import DeliveryCompletedEvent
from amazon_key.client import EventBusClient

# The client validates the event against the schema BEFORE publishing
client = EventBusClient()
event = DeliveryCompletedEvent(
    garage_id="garage-123",
    delivery_id="del-456",
    timestamp="2025-03-15T10:30:00Z",
    package_count=2
)
# Validation happens here - if the schema changes, this fails at compile time
client.publish(event)
# Example: Subscribing to events with the same type safety
@client.subscribe(DeliveryCompletedEvent)
def handle_delivery(event: DeliveryCompletedEvent):
    # event.garage_id is a typed string, not a generic dict
    print(f"Delivery {event.delivery_id} completed in garage {event.garage_id}")

3. Subscriber Constructs Library (CDK)

Using AWS CDK, they created reusable infrastructure constructs that automate the setup of:

  • Dedicated event bus per subscriber account
  • IAM roles for cross-account communication
  • CloudWatch dashboards and alarms
// CDK construct for a new subscriber
import { EventBridgeSubscriber } from '@amazon-key/subscriber-constructs';

new EventBridgeSubscriber(this, 'GarageDeliveryService', {
  eventBusName: 'central-event-bus',
  subscriberAccountId: '123456789012',
  events: ['DeliveryCompleted', 'DeliveryFailed'],
  // Automatically creates rules, targets, and monitoring
});

Results: Measurable Impact

MetricBeforeAfterImprovement
Service integration time5 days1 day80% faster
Event schema onboarding48 hours4 hours92% faster
Publisher/subscriber integration40 hours8 hours80% faster
Events processed per secondN/A2,000New capability
p90 latency (ingestion to target)N/A80msConsistent
Success rateN/A99.99%Reliability

What You Can Learn from This Case Study

1. Schema Validation Should Be Client-Side, Not Centralized

The team evaluated a centralized validation service but rejected it due to latency and infrastructure overhead. Instead, they moved validation to the client library, catching errors at build time. This is a key pattern: push validation to the edge, not the bus.

2. Type Safety Is Not Optional for Event-Driven Architectures

Without typed events, you're essentially passing JSON blobs around and hoping everyone agrees on the shape. The client library's code generation eliminates an entire class of runtime errors.

3. CDK Constructs Reduce Cognitive Load

By abstracting infrastructure setup into reusable constructs, the team enabled service teams to focus on business logic. This is the same philosophy behind tools like Backstage for developer portals.

Limitations and Caveats

  • Custom schema repository requires maintenance. If you're a small team, EventBridge's native schema registry might be sufficient. The custom repository adds value only when you have many teams and strict governance requirements.
  • Client-side validation assumes all clients are updated. If a publisher uses an old version of the client library, it might publish invalid events. The team mitigated this with a fallback validation in the subscriber constructs.
  • Cross-account EventBridge adds complexity. While the multi-account pattern improves security, it also introduces additional IAM policy management and CloudWatch log aggregation challenges.

Next Steps

If you're considering a similar architecture:

  1. Start small: Pick one event flow and implement the schema repository + client library pattern before scaling.
  2. Invest in monitoring: Use EventBridge's built-in metrics (Invocations, FailedInvocations, ThrottledRules) and create dashboards early.
  3. Consider schema evolution: Plan for breaking changes. The Amazon Key team used a deprecation policy that allowed both old and new schemas to coexist for a migration window.

For more on event-driven patterns, check out this related case study: How Spotify Honk Automated 240 Dataset Migrations in 6 Months. And if you're building AI systems with culturally-aware datasets, don't miss Nemotron-Personas-Brazil: The Open Dataset for Building Culturally-Grounded AI.


Reference: This analysis is based on the AWS Architecture Blog post Mastering millisecond latency and millions of events.

AWS EventBridge event bus architecture diagram with multiple services and schema repository Algorithm Concept Visual

This content was drafted using AI tools based on reliable sources, and has been reviewed by our editorial team before publication. It is not intended to replace professional advice.