Monday, March 3, 2025

Designing a Scalable and Real-Time Messaging System

Programming LanguageDesigning a Scalable and Real-Time Messaging System


Introduction

In this article, we will explore building a highly scale distributed messaging system like whatsapp[1].

Requirements

Functional Requirements

  1. 1:1 chat
  2. Send Text/Images
  3. Last seen
  4. Notify user about the new messages while he was offline, when he comes back online
  5. Read receipt (single double and blue tick)

Non-functional requirements

  1. Low latency – people should receive msg immediately
  2. High availability – should not go down
  3. No lag – real time system

API

  • POST /api/v1/chat/{conversationId} Body – {Message text}
  • GET /api/v1/chat/{conversationId} Returns {List}
  • GET /api/v1/chat/status/{userId}
  • GET /api/v1/lastseen/{userId}

Let’s look at the overall architecture of the whole system. First we will discuss the chatting solution and then we’ll discuss other pieces that surround it.

HLD

Chat server, clients and db.

Deep Dive

  • System needs to receive incoming message, deliver outgoing messages.
  • Store and retrieve message from db
  • Store user’s last seen record

Message Delivery Models

Pull Model: In this approach, clients periodically check the server for new messages. The server stores undelivered messages and provides them when the recipient requests updates. To minimize latency, clients must poll frequently, often receiving empty responses when no messages are pending. This method can be inefficient as it consumes unnecessary resources.
Push Model: Active users maintain an open connection with the server, allowing instant message delivery as soon as they arrive. This eliminates the need for tracking pending messages and ensures low-latency communication. WebSockets[3] are commonly used to implement this model.
[2]

WebSocket Handling

A WebSocket handler (WSH) on the backend maintains open connections with all active users who have an internet connection. These connections enable real-time message transmission across various platforms, including mobile apps, web browsers, and smartwatches.

Websocket connections are bi-directional, any party (client/server) can send messages to the other one.

WebSocket and Message Management

WebSocket Manager (WSM): WSM tracks which devices are connected to which users. It operates on a database, storing connection details between users and WebSockets. If a connection drops, the user reconnects to a different WebSocket server, and WSM updates this information in the database.
Message Service (MS): This component stores all system messages and retrieves unread messages for users. It also runs on a database to ensure reliability.
Facebook: Stores all messages permanently in its database.
WhatsApp: Stores messages temporarily—once a message is delivered and acknowledged, it is deleted from the system.

Use cases

Assumptions:
User U1 is connected to WSH1 and wants to send message M1 to user U2.
WSM returns WSH2 which is connected to U2

Case 1: Both U1 and U2 are online and sending message to each other


There are multiple calls going to WSM -> We can keep a cache in front of WSM for optimization which will contain all users (online/offline)
2.1 and 2.2 will happen in parallel.
U2 is using the app and reads the delivered message, so U2 sends “Received and Read” status.

Case 2 – U1 sends a msg and U2 is offline

Image description
If U2 is offline, the message is saved in db via MS and WSH1 sends Sent status to U1.

Case 3 – U2 comes online

Image descriptionU2 requests for all messages which are not received or not read.

Case 4 – U1 is offline and sends a msg
Messages will be stored locally on the phone db. Whenever device comes online, it will push the message from the db to websocket handler

Send File

U1 -> U2
In the approach below the image will be sent instead of text so as a result for each connection more network bandwidth would be required as it sends image over the wire

Image description

Optimized

WSH1 will get the URL from the Image server and give it to U1. U1 directly uploads the image on the given URL and sends a message to WSH1. Then the URL is sent to U2 as a text message. Once U2 receives the URL it will also directly download the image from the image server.
Here the device can compress the image before uploading in the image server.

DB Schema

WSM DB
Schema -> UserId, WSH id, timestamp (last seen)
Queries -> GetWSH(userId)

Messaging Service DB
Schema -> conversationId, userTo, userFrom, timestamp, status, fileUrl, type (type of file image, video, text)
Partition key -> conversationId, sortKey -> timestamp_uuid
Queries
getMessageGreaterThanTimestamp(conversationId, timestamp, maxCount) -> will paginate results if the result is greater than maxCount
getMessageInfo(conversationId, timestamp)
Puts
putMessage(conversationId, userFrom, userTo, timestamp…)

Conversation DB
Schema -> UserId1,UserId2,ConversationId; PK or ParKey- User1_User2
Queries
getConvers(U1, U2);
getConversation(U1) – Secondary Index – on both U1 and U2
We can use No SQL databases[4] like AWS DynamoDB

Conclusion

A messaging system should be fast, reliable, and scalable. The push model with WebSockets enables real-time communication, while efficient WebSocket management ensures smooth interactions. Whether storing messages permanently or temporarily, the goal remains the same—delivering messages instantly while keeping the system efficient.

References

  1. https://www.whatsapp.com/
  2. Push Vs Pull model https://medium.com/@_JeffPoole/thoughts-on-push-vs-pull-architectures-666f1eab20c2
  3. What is web socket. https://www.geeksforgeeks.org/what-is-web-socket-and-how-it-is-different-from-the-http/
  4. No SQL Data base https://www.mongodb.com/resources/basics/databases/nosql-explained

About me

I am a Software Engineer with over a decade of experience in scalable, high-performance distributed systems. I have worked on cloud-native architectures, database optimization, and large-scale distributed systems
Connect with me at Linkedin

Check out our other content

Check out other tags:

Most Popular Articles