You are here
Home > Design >

How to Design a Real time Chat Application

How to Design a Real time Chat Application?

How to Design a Real time Chat ApplicationReal-time chat applications like WhatsApp, Telegram, and Slack have transformed how we communicate. They enable instant messaging across devices and locations. These messaging platforms must handle millions of concurrent connections, deliver messages with minimal latency, and provide features like message synchronization, notifications, and media sharing. In this section, we’ll design a scalable real-time chat application that delivers messages instantly while maintaining reliability and consistency.

Table of Contents

Problem Statement

Design a real-time chat application that:

– Supports one-on-one and group messaging
– Delivers messages instantly with minimal latency
– Synchronizes messages across multiple devices
– Supports media sharing (images, videos, files)
– Provides message status indicators (sent, delivered, read)
– Scales to millions of concurrent users
– Works reliably even with intermittent connectivity

This is a complex system design challenge that requires balancing real-time performance with reliability and scale.

Requirements Analysis

Functional Requirements

  1. User Management: Registration, authentication, profile Users need accounts to identify themselves.
  2. Contact Management: Add, remove, and block Users need to control who they communicate with.
  3. One-on-One Messaging: Send and receive messages between two This is the core functionality.
  4. Group Messaging: Create groups, add/remove members, send messages to groups. This extends communication to multiple participants.
  5. Message Types: Support text, images, videos, files, location Modern chat apps support rich media.
  6. Message Status: Show sent, delivered, and read Users want to know if their messages were received.
  7. Notifications: Push notifications for new Users need to know when they receive messages while away.
  8. Presence Indicators: Show online/offline status and last This helps users know when others are available.
  9. Message Synchronization: Sync messages across multiple Users expect seamless experiences across phone, tablet, and computer.
  10. Offline Support: Queue messages when offline and send when Communication should work despite connectivity issues.

Non-Functional Requirements

  1. Low Latency: Message delivery in under Real-time communication requires minimal delays.
  2. High Availability: 99% uptime. Communication tools must be reliable.
  3. Scalability: Support millions of concurrent Popular chat applications have massive user bases.
  4. Reliability: No message loss, even during service Messages must not disappear.
  5. Security: End-to-end encryption for Privacy is essential for communication.
  6. Consistency: Consistent message ordering across Messages should appear in the same order everywhere.

System Components and Architecture

High-Level Design

Our real-time chat application consists of these key components:

  1. Client Applications: Mobile, web, and desktop apps
  2. API Gateway: Entry point for all client requests
  3. Authentication Service: Manages user authentication
  4. User Service: Handles user profiles and relationships
  5. Chat Service: Manages one-on-one and group conversations
  6. Presence Service: Tracks online/offline status
  7. Notification Service: Sends push notifications
  8. Media Service: Handles media file storage and processing
  9. WebSocket Service: Maintains persistent connections for real-time communication

Here’s a simplified architecture diagram:

How to Design a Real time Chat Application

This microservices architecture allows each component to scale independently based on demand.

Data Model Design

We need several data models to support our chat application:

User Model:

Table: users
  • user_id (PK): string
  • phone_number: string
  • username: string (optional)
  • name: string
  • profile_picture: string
  • status: string
  • created_at: timestamp
  • last_active: timestamp

Contact Model:

Table: contacts
  • contact_id (PK): string
  • user_id (FK): string
  • contact_user_id (FK): string
  • contact_name: string
  • blocked: boolean
  • created_at: timestamp

Conversation Model:

Table: conversations
  • conversation_id (PK): string
  • type: enum (one_on_one, group)
  • created_at: timestamp
  • updated_at: timestamp
  • last_message_id: string

Conversation Participant Model:

Table: conversation_participants
  • participant_id (PK): string
  • conversation_id (FK): string
  • user_id (FK): string
  • role: enum (member, admin)
  • joined_at: timestamp
  • last_read_message_id: string

Message Model:

Table: messages
  • message_id (PK): string
  • conversation_id (FK): string
  • sender_id (FK): string
  • type: enum (text, image, video, file, location)
  • content: text
  • media_url: string
  • sent_at: timestamp
  • delivered_at: timestamp
  • read_at: timestamp
  • deleted: boolean
  • reply_to_message_id: string (optional)

Device Model:

Table: devices
  • device_id (PK): string
  • user_id (FK): string
  • device_type: enum (ios, android, web, desktop)
  • push_token: string
  • last_active: timestamp

These models provide the foundation for tracking users, conversations, and messages.

Real-time Communication Mechanisms

WebSocket Architecture

For real-time messaging, we’ll use WebSockets:

1.  Connection Establishment:

  1. Client establishes WebSocket connection with server
  2. Connection is authenticated using JWT or similar token
  3. Server maintains mapping of user_id to active connections

Think of this as opening a dedicated phone line that stays open.

2.  Connection Management:

  1. Heartbeat mechanism to detect disconnections
  2. Reconnection strategy with exponential backoff
  3. Connection pooling for scalability

This ensures the connection remains stable and recovers from interruptions.

3.  Message Routing:

  1. Messages are routed based on conversation_id
  2. Server looks up active connections for all participants
  3. Messages are delivered to all connected devices of recipients

This is like a telephone switchboard routing calls to the right recipients.

Alternative Approaches

Although WebSockets are our primary mechanism, we’ll implement fallbacks:

  1. HTTP Long Polling: For environments where WebSockets aren’t The client repeatedly asks the server for updates.
  2. Server-Sent Events (SSE): For one-way server-to-client Good for notifications but not full chat.
  3. Push Notifications: For delivering messages to offline This wakes up mobile apps to retrieve messages.

Having multiple communication methods ensures reliability across different environments.

Message Delivery and Synchronization

Message Flow

The complete message flow works as follows:

1.  Message Creation:

  1. Sender creates message locally
  2. Message gets temporary ID and “sending” status
  3. Message is sent to server via WebSocket

This provides immediate feedback to the sender.

2.  Server Processing:

  1. Server validates message
  2. Server assigns permanent ID and timestamp
  3. Server stores message in database
  4. Server acknowledges receipt to sender

This ensures the message is properly recorded.

3.  Message Delivery:

  1. Server identifies recipient’s active connections
  2. Server sends message to all connected devices
  3. Recipients acknowledge receipt
  4. Server updates message status to “delivered”

This gets the message to all of the recipient’s devices.

4.  Read Receipts:

  1. When recipient reads message, client sends read receipt
  2. Server updates message status to “read”
  3. Server notifies sender of read status

This lets the sender know their message was seen.

Offline Message Handling

For offline recipients:

1.  Message Queuing:

  1. Messages for offline users are stored in a queue
  2. When user comes online, queued messages are delivered
  3. Push notifications alert users of new messages

This ensures messages reach users even when they’re offline.

2.  Message Synchronization:

  1. Clients track last received message ID
  2. On reconnection, clients request all messages since last received
  3. Server sends missing messages in batches

This keeps all devices in sync, even after disconnections.

Group Chat Implementation

Group chats introduce additional complexity:

1.  Group Creation and Management:

  1. Any user can create a group
  2. Creator becomes admin by default
  3. Admins can add/remove members and other admins

This establishes the social structure of the group.

2.  Message Distribution:

  1. Messages are sent to all group members
  2. Server fans out messages to all recipients
  3. Delivery and read receipts are aggregated

This efficiently delivers messages to multiple recipients.

3.  Scalability Challenges:

  1. Large groups (1000+ members) require special handling
  2. For very large groups, read receipts may be disabled
  3. Messages may be delivered in batches

This addresses the challenges of very large groups.

Media Handling

For sharing media files:

1.  Upload Process:

  1. Client compresses media before upload
  2. Media is uploaded to cloud storage (S3, GCS)
  3. Server generates thumbnails for images and videos
  4. Message contains media URL and metadata

This efficiently handles potentially large media files.

2.  Download Process:

  1. Thumbnails are downloaded automatically
  2. Full media is downloaded on demand or based on settings
  3. Progressive loading for large files

This optimizes bandwidth usage.

3.  Storage Optimization:

  1. Deduplication for identical files
  2. Multiple resolution versions for images and videos
  3. Automatic deletion of old media (configurable)

This reduces storage costs and improves performance.

Presence Indicators

To show online status:

1.  Status Tracking:

  1. Clients send heartbeats to server
  2. Server updates presence database
  3. Status changes are broadcast to relevant contacts

This shows who’s currently available.

2.  Last Seen:

  1. Timestamp of last activity is recorded
  2. Shown as “last seen at [time]” when offline
  3. Privacy settings can restrict visibility

This helps users know when someone was last active.

3.  Typing Indicators:

  1. Client sends typing event when user starts typing
  2. Server broadcasts to conversation participants
  3. Typing indicator expires after short timeout

This creates a more interactive feeling conversation.

Security Considerations

End-to-End Encryption

To ensure message privacy:

1.  Key Exchange:

  1. Each client generates public/private key pair
  2. Public keys are exchanged through server
  3. Signal Protocol or similar for key management

This establishes secure communication channels.

2.  Message Encryption:

  1. Messages encrypted with recipient’s public key
  2. Server cannot decrypt message content
  3. Group messages use group key management

This ensures only intended recipients can read messages.

3.  Authentication:

  1. Verify identity through phone number verification
  2. Two-factor authentication for account recovery
  3. Device verification for new logins

This prevents unauthorized access to accounts.

Scaling Considerations

Connection Scaling

To handle millions of concurrent connections:

1.  Connection Pooling:

  1. Distribute connections across multiple WebSocket servers
  2. Use consistent hashing to route users to servers
  3. Implement server-to-server communication for message delivery

This distributes the connection load.

2.  Horizontal Scaling:

  1. Add more WebSocket servers as user base grows
  2. Use load balancers with sticky sessions
  3. Implement auto-scaling based on connection count

This allows the system to grow with demand.

Database Scaling

To handle high message volume:

1.  Database Sharding:

  1. Shard by conversation_id or user_id
  2. Use NoSQL databases for message storage
  3. Implement read replicas for high-read scenarios

This distributes database load.

2.  Caching Strategy:

  1. Cache recent conversations and messages
  2. Cache user presence information
  3. Use Redis or similar for distributed caching

This reduces database load for frequent operations.

Fault Tolerance and Recovery

To ensure reliability:

1.  Message Persistence:

  1. Store messages durably before acknowledgment
  2. Implement message deduplication
  3. Use write-ahead logging for recovery

This prevents message loss.

2.  Service Redundancy:

  1. Deploy services across multiple regions
  2. Implement automatic failover
  3. Use circuit breakers to prevent cascading failures

This maintains availability during partial outages.

3.  Disaster Recovery:

  1. Regular backups of critical data
  2. Cross-region replication
  3. Documented recovery procedures

This protects against major failures.

Solution Walkthrough

Let’s walk through the complete flow of our chat application:

User Registration and Authentication

1.  Registration:

  1. User downloads app and enters phone number
  2. System sends verification code via SMS
  3. User enters code to verify identity
  4. System creates user account and generates authentication tokens

This securely establishes the user’s identity.

2.  Authentication:

  1. User logs in with phone number
  2. System verifies identity with SMS code or stored token
  3. System issues new authentication token
  4. User connects to WebSocket server with token

This secures ongoing access to the system.

One-on-One Chat

1.  Starting a Conversation:

  1. User selects contact from contact list
  2. Client checks if conversation exists, creates if not
  3. Client displays conversation history

This initiates or resumes a conversation.

2.  Sending a Message:

  1. User types and sends message
  2. Client displays message with “sending” status
  3. Client sends message to server via WebSocket
  4. Server processes and stores message
  5. Server sends acknowledgment to sender
  6. Client updates message status to “sent”
  7. Server delivers message to recipient
  8. Recipient sends delivery receipt
  9. Server updates message status to “delivered”
  10. Sender client updates message status

This ensures reliable message delivery with status tracking.

3.  Reading a Message:

  1. Recipient opens conversation
  2. Client marks messages as read locally
  3. Client sends read receipts to server
  4. Server updates message status
  5. Server notifies sender
  6. Sender client updates message status to “read”

This completes the message status lifecycle.

Group Chat

1.  Creating a Group:

  1. User selects “New Group” option
  2. User selects contacts to add
  3. User sets group name and optional image
  4. Client sends group creation request to server
  5. Server creates group and adds members
  6. Server notifies all members

This establishes a new group conversation.

2.  Group Messaging:

  1. Similar to one-on-one messaging
  2. Server fans out messages to all members
  3. Delivery and read receipts are aggregated
  4. Group updates are broadcast to all members

This efficiently handles multi-participant conversations.

Performance Optimization

To ensure sub-500ms message delivery:

1.  Connection Optimization:

  1. Keep WebSocket connections alive
  2. Implement connection pooling
  3. Use binary protocols for message transmission

This minimizes connection overhead.

2.  Message Prioritization:

  1. Prioritize message delivery over status updates
  2. Process messages in parallel
  3. Use separate queues for different message types

This ensures important operations happen first.

3.  Geographic Distribution:

  1. Deploy servers in multiple regions
  2. Route users to nearest server
  3. Implement cross-region message delivery

This reduces network latency.

Common Pitfalls and How to Avoid Them

  1. Connection Management: Implement robust reconnection Connection issues are the most common source of problems.
  2. Message Ordering: Use logical timestamps for consistent Different network paths can cause messages to arrive out of order.
  3. Group Chat Scaling: Special handling for large Group chats can create fan- out challenges.
  4. Media Handling: Optimize for different network Large media files can cause performance issues.
  5. Security Vulnerabilities: Regular security audits and Chat applications are high-value targets for attackers.

By addressing these challenges, we can build a real-time chat application that delivers messages instantly while maintaining reliability, security, and scalability for millions of users.

This system design solution demonstrates how to handle real-time communication, message delivery, and synchronization at scale. The principles here apply to many real- time communication systems beyond chat applications.


You may also go through a separate article on System Design Core Concepts.

Additionally, test your knowledge by attempting System Design Interview Questions Practice MCQs.

Interested in a series of articles on System Design?, kindly check System Design Tutorials.

Leave a Reply


Top