System design concepts

Table of Contents

Sharding
Resharding
Microservices Architecture
Event Driven Architecture

Sharding

What if we have more data in data tier? than app and cache tier?
What if we have more write ops than read ops?

We would then need to divide/partition/shard the data.

Each key-value pair belongs to one shard. Each shard is a separate

database and has separate cache. To keep availability high, each shard has to be replicated using the same replication schemes.

To calculate # of shards, one approach is:

\[ \text{# of Shards} = \frac{\text{Total data size}}{\text{Space available in commodity server}} \]

How to map key value pair to a specific shard?
- We have to keep query load to be distributed as well as data size distributed.
- Otherwise some shards may be getting disproportionate amount of traffic. This skew is called hotspot.
- We need unpredictable but deterministic approach to assign k-v pairs to shard. A similar example is MD5 cryptographic hash function.
Request routing:
- When a request comes in with a key, a partition aware router knows which shard server the key belongs to and its ip address. If the database is sharded and replicated, then partition aware router can do some load balancing while routing the request to a specific replica of shard server.
- The partition aware router can do regular ping to the shard servers to check their health and accordingly route the future requests.

Resharding

What if we do not know the # of shards upfront and if we need to incrementally increase shards?
The hash function that uses # of shards and key could change when shards increased. This means data has to move from one shard server to another and this is called restructuring or rebalancing or resharding. This is very expensive operation. Every single key value pair has to be moved from one shard server to another and this will be of \(O(k)\).

Rules to shard horizontally vs vertically.

Step 1: Know your bottlenecks.
- Do I have more users?
- Do I have unstructured data?
- Based on user ids, we can give range
- Do you want to move partial data from one place to another?
- Are we going to just append data or are we going to create new cluster and insert the data in this new cluster?

Microservices Architecture

Overview:

Applications can be architectured in a few different ways. The main two approaches are monolithic architecture and microservices architecture.

Monolithic Architecture

In Monolithic design, all of the business functionalities are clubbed together. For example, in an E-commerce application, payment service, notification service, user mgmt service, product service etc. all are clubbed together in one single app which in turn talks to the underlying database.

One advantage of this approach is that all services are in one name space resulting faster intercommunication and less configuration overheads

However, as the application scales, it becomes harder to maintain the codebase and clearly separate functionalities. Turnaround time for deployment goes up and overall product complexity exponentially increases.

Microservices Architecture

In microservices architecture, services that have different/independent business scope and functions are separated into independent services with its own technological stack.

Some of the characteristics of this architecture are:

The individual microservices have its own technological stack, including DB and DB mgmt model.
The services communicate with each other using REST API, event streaming and message brokers
The services are based on business functions.
The services are usually small, loosely coupled, and can be deployed independently.

Components of microservices architecture

There are mainly 8 components in microservice architecture:

Patterns

Client: Can be a mobile app, chatbot, desktop app etc.
Identity Provider: allows to authenticate/authorize the client. Provides access management solution
API Gateway:
- Provides entry point for the client.
- Forwards requests to different microservices. Client makes call via URL and
- API gateway has the mapping knowledge of url to specific microservice componenet and routes it accordingly. This provides backend abstraction and clients can be managed without exposing to
client.
- API gateway can also add as load balancer.

Different clients may have different needs of data. API gateway provides this abstraction
The API gateway handles requests in one of two ways. Some requests are simply proxied/routed to the appropriate service. It handles other requests by fanning out to multiple services.
Rather than provide a one-size-fits-all style API, the API gateway can expose a different API for each client. For example, the Netflix API gateway runs client-specific adapter code that provides each client with an API that’s best suited to its requirements.
The API gateway might also implement security, e.g. verify that the client is authorized to perform the request

Service Discovery

Service instances can have dynamically assigned network locations and because of various factors such as autoscaling, failover and upgrades the service instances change dynamically. The service location is no more static and this is where service discovery helps.

There are two types of service discovery patterns:

Client side discovery: The client queries the service registry which is a database of available service instances. The client then uses load balancing algo to select one and makes a request.

Server side discovery: The client makes a request to the load balancer which inturn queries the service registry and routes the request to an available service instance. Service instances are registered and deregistered in service registry. Example: AWS Elastic Load balancer.

Event Driven Architecture

The characteristic of EDA is loosely coupled components and services.
Small scale EDA can be implemented directly between the services. Another option is to have a mediator in between the service and subscribers. Or having an event bus in the middle.

The Event Bus can be based on the following components:
- Azure Queue
- Azure Service Bus
- Apache Kafka
- RabbitMQ

The choice is based on the scale of the events, and architectural complexity.

Monolith	Modular Monolith	EDA	SOA

If you have million items, how can a user find what the user needs?

This is called product discovery. The second problem is to ship the product within in SLA. Align inventory and order fullfillment. How does suppliers improve their sales? Reports and marketing insight.

Q. What are the issues if the consumer thread is multithreaded and reads from the partition in parallel?

Checkpionting becomes more complicated
hard to preserve orders.
Mechanism of avoiding double counting. (if the same message is submitted to partition multiple times, double counting can happen. One approach is to use a distributed cache which maintains a unique identifier and it can remove duplicates if multiple same messages arriave within a time threshold.

Step by step guide for System Design: questions to ask interviewer:

Users/Customers:
- Who uses the system?
- Per hour stat only for owner?
- How will the system be used? only one dept?
- Is data used in realtime
Scale (read and Write)
- How much data is coming and retrieved?
- How much data QPS?
- Should we deal with sudden traffic spikes?
Performance
- Should we do batch or stream processing?
- How fast data mustbe retrieved…or response time? IF response time is small, then minimum processing while reading and aggregration happens in write.
Cost
- Open source framework
- Maintenance means public cloud

Design a video view count service:

HLD:

A db to store data
Web server to process incoming video view events and store in db.
to retrieve view count, another service is introduced.

Start with defining a data model:

Shoudl we store each indiviudal event
or aggregrate data ( per minute) in real-time

pros and cons:

individual event can be stored really fast
can slice and dice data in all manners
recalculate numbers from scratch.
cannot read data quickly.
when total is requested, we need to aggregate while reading
more money to store more data
aggegate data read is faster
decision making in real time
ability to filter or aggegrate differently is very difficult.
Need to preaggreate in mem before storing in db
hard to fix errors

ask interviewr:

expected data delay
mabe combine both approaches as needed. system becomes more complex and expensive

Where to store the data:

evaluate both sql and nosql options
How to make both reads and writes fast?
HOw not to lose data in case of faults?
How to achieve strong consistency? and related tradeoffs?
How to recover data in case of an outage?
How to ensure data security?
How to make it extensible for data model changes in the future?

SQL DB:

Sharding or horizontal partition and requires many machines.
We need to know which machine stores and retrieves the data from it
A light proxy server can route to the requests to the right db and no need for the service to know each and every db.
Proxy needs to know if a shard dies or when a new shard is added… This can be done using a configuration service and it maintains the health via heartbeats.
Should this light proxy server (cluster proxy) directly call all the shards in different machines? One other option is to introduce a shard proxy. This can cache query results, monitor DB, publish metrics and terminate queries that take long time to retrieve data.
Scalability and perf. have been addressed. What about availability? What if shard dies? How to make data is not lost. For this we need replication.

Data Replication:

Keep one shard as the master shard. and create a copy or copies that act as read replicas.
The writes go via master shard and the reads can go to either master or follower shards.
The replicas can be in different data centers so if one datacenter goes down, we still ensure availability of data.
Data replication from master to followers can be done via synchronosly or async.
Youtube has implemented similar solution to scale large clusters of mysql instances. This is called Vitess

Back of the envelope Calculations:

Four types of estimations:
- Load or Traffic Estimates
  - Write to Read Estimate: write to read ratio - 1:100 and 1M writes per day = 1M = 10^6 * 24 * 60 * 60 = 10^6 * 25 * 4000 = 10^6 * 10^5 = 10 req / sec So reads per sec = 1000 req/sec
- Database Storage Estimates
  - To calculate storage estimate, we need to know how much is each write request and how long to store.
    - if each write request is 1KB, and we have to store for 10 years, storage for 10 years = 1M requests and each request is 1KB and we have to store for 10 years = 10^6 * 10^3 * 4000 = 4 TB storage for one day = 1 Million requests * 1KB = 1 GB

Design Topics

Design Tinder:

Functional Requirements:

Store Profiles
- Images (maybe 5 images per user)
Recommend Matches - # of active users
Note Matches
Direct Messaging

Services:

Profile service
- Add profile
- Update Profile

Questions:

How are we going to store images?
- Should image be stored as file or blob (Binary large object) File - Cheaper, faster, and with static content, we can use CDN Database - Provides mutability ACID properties, access control and .
How is user authenticated in profile service?
- Instead of sending username passwd, send token.
- What happens when multiple services require authentication? Should all of them reach out to profile srevice?
- One solution is to use gateway. Client talks to the gateway and noone else talks to the gateway.
- gateway takes user inputs and checks with profile service about authentication status and routes the service and takes the response back to the user.
Where should be images be stored?
- In the same place as profile service or separately. Maybe have a separate image service which will have distributed file system to store images.
How one client can connect to another client via direct messaging?
- do not use client server protocol
- Use peer to peer protocol - XMPP is one example