Build a Scalable User Search Layer on Amazon Cognito A Complete Guide

Why You Need a Dedicated Search Layer for Cognito

Amazon Cognito is a fantastic user directory — it handles authentication, MFA, and basic user management out of the box. But when your support team needs to find a user by a partial email, a last name, and a specific access level across thousands of accounts, the built-in ListUsers API falls short.

The problem? Cognito's API supports only exact matches on standard attributes and paginates slowly at scale. For advanced scenarios like fuzzy matching on custom attributes, real-time group segmentation, or audit queries, you need a dedicated search layer.

This guide walks you through building an event-driven architecture that automatically syncs Cognito user data into OpenSearch Serverless via DynamoDB Streams, enabling sub-second search performance at any scale.

AWS architecture diagram showing Cognito, Lambda, DynamoDB, and OpenSearch integration for user search Dev Environment Setup

Architecture Overview: Two Ingestion Flows + One Search Flow

The solution uses three AWS services in a serverless pipeline:

Amazon DynamoDB as the user profile store
AWS Lambda for data transformation and ingestion
Amazon OpenSearch Serverless for the search index

Ingestion Flow 1: Cognito Lambda Triggers

Captures user data during authentication events (sign-up and login).

# Cognito Post-Confirmation Trigger (Python)
import json
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserProfiles')

def lambda_handler(event, context):
    # Extract user attributes from Cognito event
    user_attrs = event['request']['userAttributes']
    user_id = event['userName']
    
    # Create initial record in DynamoDB
    item = {
        'userId': user_id,
        'email': user_attrs.get('email', ''),
        'name': user_attrs.get('name', ''),
        'groups': [],  # populated later via pre-token generation
        'lastLogin': None,
        'createdAt': event['request']['userAttributes'].get('created_at', '')
    }
    
    table.put_item(Item=item)
    return event

Ingestion Flow 2: CloudTrail for Admin Actions

Admin actions (e.g., creating users via CLI) don't trigger Cognito Lambda hooks. CloudTrail + EventBridge fills the gap.

# CloudTrail Event Consumer Lambda (Python)
import json
import boto3
from datetime import datetime

cognito = boto3.client('cognito-idp')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserProfiles')

def lambda_handler(event, context):
    # Parse CloudTrail event for Cognito admin API call
    detail = event['detail']
    if detail['eventSource'] != 'cognito-idp.amazonaws.com':
        return
    
    user_pool_id = detail['requestParameters']['userPoolId']
    username = detail['requestParameters']['username']
    
    # Fetch current user state from Cognito
    response = cognito.admin_get_user(
        UserPoolId=user_pool_id,
        Username=username
    )
    
    # Upsert into DynamoDB
    item = {
        'userId': username,
        'email': next((a['Value'] for a in response['UserAttributes'] if a['Name'] == 'email'), ''),
        'name': next((a['Value'] for a in response['UserAttributes'] if a['Name'] == 'name'), ''),
        'lastModified': datetime.utcnow().isoformat()
    }
    table.put_item(Item=item)
    return event

Search Flow: Querying the Index

# Search Lambda (Python)
import json
import boto3
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

host = 'your-opensearch-domain-endpoint'
region = 'us-east-1'
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)

client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

def lambda_handler(event, context):
    # Parse search parameters from API Gateway event
    query = event.get('queryStringParameters', {})
    search_term = query.get('q', '')
    field = query.get('field', 'email')  # default to email
    
    # Build OpenSearch query with fuzzy matching
    search_body = {
        'query': {
            'multi_match': {
                'query': search_term,
                'fields': [field, 'name^2', 'groups'],
                'fuzziness': 'AUTO',
                'operator': 'and'
            }
        },
        'size': query.get('size', 20)
    }
    
    response = client.search(index='cognito-users', body=search_body)
    
    # Format results
    results = []
    for hit in response['hits']['hits']:
        results.append(hit['_source'])
    
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({'results': results, 'total': response['hits']['total']['value']})
    }

Serverless data pipeline flow for user indexing with DynamoDB Streams and Lambda triggers Software Concept Art

Critical Considerations & Pitfalls

1. Cognito Lambda Trigger Timeout

Cognito enforces a 5-second timeout on trigger functions. If you already have other logic in these triggers (e.g., custom claims, analytics), ensure the combined execution stays well under this limit. Consider provisioned concurrency to avoid cold starts.

2. Data Consistency

CloudTrail events are delivered at least once, but may arrive out of order. Use DynamoDB conditional writes with a version field to prevent stale data from overwriting newer records.

3. OpenSearch Costs

Serverless OpenSearch charges based on OCU (OpenSearch Capacity Units). For high write volumes, consider batching DynamoDB Stream events to reduce index writes.

4. Security

Use Cognito Authorizer on API Gateway to validate JWT tokens before search execution.
The search Lambda should assume a read-only IAM role for OpenSearch.
Encrypt user data at rest in both DynamoDB and OpenSearch.

Next Steps: Where to Go From Here

Deploy the full stack using the AWS CDK repository — it includes a React frontend and complete infrastructure code.
Extend with AI agents: Combine this search layer with Amazon Bedrock to build a conversational user directory assistant. For a broader perspective on unified data strategies, check out this guide on migrating from Oracle to PostgreSQL on Azure.
Monitor performance: Use CloudWatch metrics on Lambda duration, DynamoDB read/write capacity, and OpenSearch query latency to tune your architecture.

Developer querying user directory with fuzzy search and complex filters on AWS cloud

Conclusion

By combining Cognito Lambda triggers, DynamoDB Streams, and OpenSearch Serverless, you can build a real-time, scalable user search layer that handles fuzzy matching, complex filtering, and sub-second response times — without manual batch jobs or custom indexing code.

This pattern empowers your support team to find users across thousands of accounts instantly, enables administrators to segment users by group membership for targeted campaigns, and gives compliance teams the ability to audit user attributes with complex filters.

For a deeper look at how database trends are evolving toward unified data and AI agents, see the analysis on Microsoft's 2026 database vision.

Build a Scalable User Search Layer on Amazon Cognito A Complete Guide

Why You Need a Dedicated Search Layer for Cognito

Architecture Overview: Two Ingestion Flows + One Search Flow

Ingestion Flow 1: Cognito Lambda Triggers

Ingestion Flow 2: CloudTrail for Admin Actions

Search Flow: Querying the Index

Critical Considerations & Pitfalls

1. Cognito Lambda Trigger Timeout

2. Data Consistency

3. OpenSearch Costs

4. Security

Next Steps: Where to Go From Here

Conclusion

Recommended Reading

Share this post

Did you find this post helpful?
It helps the author a lot!

Subscribe

RSS / Atom Feed

Real-time Alerts

Comments 0

Why You Need a Dedicated Search Layer for Cognito

Architecture Overview: Two Ingestion Flows + One Search Flow

Ingestion Flow 1: Cognito Lambda Triggers

Ingestion Flow 2: CloudTrail for Admin Actions

Search Flow: Querying the Index

Critical Considerations & Pitfalls

1. Cognito Lambda Trigger Timeout

2. Data Consistency

3. OpenSearch Costs

4. Security

Next Steps: Where to Go From Here

Conclusion

Recommended Reading

Share this post

Did you find this post helpful?It helps the author a lot!

Subscribe

RSS / Atom Feed

Real-time Alerts

Comments 0

Did you find this post helpful?
It helps the author a lot!