Hello all!

In this blog, I will share my journey with you - on how to create a scalable serverless application in AWS. We will be using completely serverless technologies like AWS Lambda, API Gateway, AWS DynamoDB etc. You can follow along with this guide and implement this app on your own and deploy in your own AWS account at no-cost. I did it all in free tier. I will be demonstrating complex overloading patterns in DynamoDB, API Gateway Websockets and some smart approaches on designing event driven web applications.

Introduction#

Let’s start with what we are going to build. So, I used to use this app called omegle a lot before but now it’s dead. I always wanted to implement something similar myself. And with my years of experience working with AWS technologies, I thought how about I implement something similar but completely serverless, free and very scalable.

A user journey in our chat app looks like this:

  • A user logs into the system.
  • The user is matched with another random user who also has just logged into the system.
  • They can text back and forth anonymously.
  • Either of them can disconnect at any time and the other user can continue the conversation with next person.

Pretty simple, right?

Designing DynamoDB Schema#

The first thing we need to think about is storing the state. First let’s think what is the state? What information do we need to store? Let’s look into our requirement list above.

  • A user logs into the system.

We don’t need any state to fulfill this requirement. We will build a websocket endpoint in API Gateway, connect that to AWS Lambda. No state required. Let’s look at the second requirement.

  • The user is matched with another random user who also has just logged into the system.

This is the meat of the system. For a matching to happen we need to know what other users are in the pool. We need to store that somewhere. We also need to store whether the user is UNMATCHED or already MATCHED.

  • They can text back and forth anonymously.

We need to find out who is the other person we need to deliver the message to. We need to store this information somewhere. We could issue a JWT and pass it to the client and validate that and figure out who the user is connected to. But this introduces a possible attack vector and we don’t want to do that.

  • Either of them can disconnect at any time and the other user can continue the conversation with next person.

This will be covered automatically if we fulfill all the previous requirements. When a user disconnect - we need to be able to run some sort of query to find the matched user and disconnect both of them.

Understanding the Data Model#

When designing database in NoSQL database, we need to first think about different access patterns to get the schema right. This is quite different than relational databases. In relational databases - given the data model, if we normalize the table into 3NF and fine tune by creating indexes and constraints, we will get a pretty good design. But this approach doesn’t work in NoSQL databases.

The whole point of NoSQL database is to distribute the query as much as we can so that we don’t keep hitting the same server. This is how NoSQL scales and becomes more efficient. This is achieved by partitioning the data into multiple shards by the help of partition key. This makes it critical to understand the data access pattern before designing the schema.

Some unwritten rules for DynamoDB are:

  • Try to use overloading as much in your table schemas. Your attributes do not need to be rigid. (We will look into this in depth)
  • SCAN are smell of bad design. Try to avoid SCAN at all costs by redesigning the data or adding Global indexes.

Let’s think what will be different access patterns in our application.

  • User will need to be frequently fetching the UNMATCHED users from the database.
  • For every message, the system should query the database to find the matched person and deliver it to them.

So, we want the system to be efficient in retrieving these things.

We can design the schema like below.

Remember the first rule, we don’t want to make the attributes rigid so that they only contain one specific thing like name, age etc. So, ideally the data we might want to store in the database are following:

  • connectionId: we need this information to know who is the user and how to deliver
  • status: the user can be either MATCHED or UNMATCHED
  • timestamp: we want to store when did the user join - so that in case of a lot of users, we can perform matching of oldest two user who have waited for long time

Partition Key#

We can partition this data by using their match status: MATCHED or UNMATCHED.

But there will be very few data queries who will be partitioning by just MATCHED status. Most of the queries would rather benefit from querying the connectionId for MATCHED users. This is where we create a composite key.

We will define the partition key like this. We will call it pk but it will have two types of values. This is the concept of overloading.

  • UNMATCHED: This will be set for unmatched users since we will be querying a lot of unmatched users and in most cases we will want just first unmatched user to do matching.
  • CHAT#{connectionId}: This is a composite key. This key will have two information: the user is matched + their connection ID.

Sort Key#

Now, let’s think about the sort key. How do we want to filter the data once we partition it? One pattern that is immediately obvious is to figure out the user who is unmatched for the longest period of time.

Maybe timestamp? But this information is not necessary for matched user. What do we do? We overload this as well. Let’s think what information might be user who is matched? THE MATCHED USER’S CONNECTIOND ID!!! Bingo. We have a sort key.

  • {timestamp}#{connectionId}: This will be for the unmatched user. We want to sort the data by the timestamp so we can quickly find out which user has waited for the most time.
  • {matchedConnectionId}: This is not that useful but it makes us easier to find out which user is he matched against.

Ok, our keys are sorted. We can add other metadata in other attributes but let’s just keep connectionId as a attribute for simplicity.

But we are missing one part of the puzzle.

Let’s say we have a unmatched waiting user in the pool. Let’s call him Ram. He wants to disconnect now. How do we find the corresponding row in the database for Ram? We don’t want to scan the whole table and find him. Worst case - he can be either in MATCHED partition or UNMATCHED partition.

This is where we introduce a Global Index. We will define a index on connectionId. We are lucky we decided to put that as an attribute.

So, the schema looks like following now:

DynamoDB Table Schema:
- Partition Key: "UNMATCHED" | "CHAT#{connectionId}"
- Sort Key: "{timestamp}#{connectionId}" | "{matchedConnectionId}"
- Attributes:
    - connectionId: "{connectionId}" (this is GSI1PK)

This is how it will look like in cloudformation.

  ConnectionsTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: "pk"
          AttributeType: "S"
        - AttributeName: "sk"
          AttributeType: "S"
        - AttributeName: "connectionId"
          AttributeType: "S"
      KeySchema:
        - AttributeName: "pk"
          KeyType: "HASH"
        - AttributeName: "sk"
          KeyType: "RANGE"
      ProvisionedThroughput:
        ReadCapacityUnits: 1
        WriteCapacityUnits: 1
      SSESpecification:
        SSEEnabled: False
      GlobalSecondaryIndexes:
        - IndexName: gsi1
          KeySchema:
            - AttributeName: connectionId
              KeyType: HASH
          Projection:
            ProjectionType: ALL
          ProvisionedThroughput:
            ReadCapacityUnits: 1
            WriteCapacityUnits: 1
      TableName: Connections

Query Design#

Now we have the database sorted. Let’s think few key events in the system.

  • connect
  • disconnect
  • message
  • status (to see if the user is matched/unmatched)

connect#

Let’s focus on the connect event. What should happen when the user connects? Pretty simple - there are only two states. Either the user gets immediately matched to a user on the pool or the user stays on the waiting pool.

First, we will find if there are any unmatched user in the pool

response = dynamodb_table.query(
    KeyConditionExpression=Key("pk").eq("UNMATCHED"),
    ScanIndexForward=False,
    Limit=1,
)

If there are none - we add the user himself in the pool.

timestamp = int(time.time() * 1000)
sk = f"{timestamp}#{connection_id}"

dynamodb_table.put_item(
    Item={
        "pk": "UNMATCHED",
        "sk": sk,
        "connectionId": connection_id,
    },
)

Ok now let’s get into the complicated part. What if we found a user who is already in the waiting pool. We need to match these two users.

We need to do three things in that case.

  • Delete the entry of the matched user from the database.
  • Create two new entries for both of the matched user.

How can we make all these ACID? We use transactions of DynamoDB and bundle all of these three in one single transaction.

dynamodb_client.transact_write_items(
    TransactItems=[
        # Delete the connection we are matching with.
        {
            "Delete": {
                "TableName": dynamodb_table.table_name,
                "Key": {
                    "pk": {"S": "UNMATCHED"},
                    "sk": {"S": f"{matched_connection['sk']}"},
                },
                "ConditionExpression": "attribute_exists(pk)",
            }
        },
        # Add the new matched connection for current user.
        {
            "Put": {
                "TableName": dynamodb_table.table_name,
                "Item": {
                    "pk": {"S": f"CHAT#{connection_id}"},
                    "sk": {"S": matched_connection["connectionId"]},
                    "connectionId": {"S": connection_id},
                },
                "ConditionExpression": "attribute_not_exists(pk)",
            }
        },
        # Add the new matched connection for matched user.
        {
            "Put": {
                "TableName": dynamodb_table.table_name,
                "Item": {
                    "pk": {
                        "S": f"CHAT#{matched_connection['connectionId']}"
                    },
                    "sk": {"S": connection_id},
                    "connectionId": {
                        "S": matched_connection["connectionId"]
                    },
                },
                "ConditionExpression": "attribute_not_exists(pk)",
            }
        },
    ]
)

After this - we notify the matched user as well that a match has been found.

apigateway_client.post_to_connection(
    ConnectionId=matched_connection["connectionId"],
    Data=json.dumps({"type": "system", "code": "partner_connected"}),
)

disconnect#

Disconnect is pretty simple. We query the database using our GSI index. We fetch the user’s entry from the database. The user can be either matched or unmatched.

If we user is not matched - we simply delete their entry.

response = dynamodb_table.query(
    IndexName="gsi1",
    KeyConditionExpression=Key("connectionId").eq(connection_id),
    Limit=1,
)

if not response.get("Items", []):
    return

user_connection = response["Items"][0]

if not str(user_connection["pk"]).startswith("CHAT#"):
    dynamodb_table.delete_item(
        Key={
            "pk": user_connection["pk"],
            "sk": user_connection["sk"],
        },
        ConditionExpression="attribute_exists(connectionId)",
    )
    return

But if the user is connected, we need to delete both of the chat entries. We do this in a transaction and modify the matched user as well.

            dynamodb_client.transact_write_items(
                TransactItems=[
                    {
                        "Delete": {
                            "TableName": dynamodb_table.table_name,
                            "Key": {
                                "pk": {"S": user_connection["pk"]},
                                "sk": {"S": user_connection["sk"]},
                            },
                            "ConditionExpression": "attribute_exists(pk)",
                        }
                    },
                    # Delete the matched user's item
                    {
                        "Delete": {
                            "TableName": dynamodb_table.table_name,
                            "Key": {
                                "pk": {
                                    "S": f"CHAT#{user_connection['sk']}"
                                },
                                "sk": {"S": connection_id},
                            },
                            "ConditionExpression": "attribute_exists(pk)",
                        }
                    },
                ]
            )
            # notify the matched user about disconnection
            apigateway_client.post_to_connection(
                ConnectionId=str(user_connection["sk"]),
                Data=json.dumps({"type": "system", "code": "partner_disconnected"}),
            )

You should be able to figure out how to work with the other two events by now. They are pretty simple.

  • message is just querying the table to find their matched user and deliver message to them.
  • status is just querying the table for the current user and return the whole object.

You can find the complete code if you want to try it for yourself and explore. I’ve included the cloudformation template and API code as well.

Link will be updated here shortly.

Thanks for reading!