System Design

April 9, 2021 (last updated April 27, 2024)

Availability Patterns

Fail over

active/passive failover: when the active server heartbeat is lost, the passive server takes over as actice and resumes service
active/active failover: both servers are managing traffic and balancing load between them, if one goes down either the DNS or the application would know to direct traffic to the only active server

Replication

What is replication?

replication means duplicating our data, having copies on the same data stored across multiple nodes
replication is a way to make our systems more robust, fault tolerant and potentially more performant
- example -- if a node goes down, or we want to bring data physically closer to our users

Things to consider

replication means we have to think about consistency
how do we ensure consistency of data across the various nodes
how do we handle reads and writes when there are multiple copies of the data?
how do we decide what node accepts a read, and which node accepts a write?

Consistency Patterns

https://www.allthingsdistributed.com/2008/12/eventually_consistent.html

Weak Consistency

after a write, reads may or may not see it. but its NOT guaranteed
mostly used in scenarios where data is not persisted

Eventual Consistency

writes are asynchronous... reads will eventually see it

Strong Consistency

writes are synchronous..... reads will see it right away

How do we arrive at these consistency Patterns?

Leaders and Followers

Leaders and Followers

Catchup recovery

failover

Multi Leader

Leaderless

to read --- https://lethain.com/introduction-to-architecting-systems-for-scale/

Gateway

communicates with user service to determine if this is an authenticated request and if yes directs it to the correct service

Grokking The Systems Design Interview

Load Balancing

we can add LBs at 3 places
1. between user and web server
2. between web server and internal layer (application servers/cache servers)
3. between internal layer and database
Load Balancers themselves can be a single point of failure, to avoid this introudce redundant load balancers
- ie cluster of load balancers.

Load Balancing Algorithms:

all algorithms involve health checks to make sure a particular server is up.

Least Connection Method

directs traffic to server with the fewest active connections

Least Response Time Method

directs to server with fewest activce connections and lowest average response time

Least Bandwidth Method

directs to server that current has the least amount of traffic (measured in megabits per second -- Mbps)

Round Robin Method

cycles through a list of server and sends each new request to the next server
most useful when the servers are of equal specification and there are not many persistent connections

Weighted Round Robin

better handles servers with different processing capacities
eacher server is assigned a weight based on processing capacity
servers with higher weights recieve new connections and more connections over those with lower weights

IP Hash

a hash of the client's IP is calculated to redirect the request

Caching

Application Server Cache

a cache placed directly on a request layer node enables response data to be stored locally
each time a request is made, node quickly returns locally cached data if available, else it will go to disk
what happens when you have multiple nodes serving requests?
- each node can host its own cache, however if the same request is randomly distributed to different nodes, cache misses will increase
- potential solutions: global caches and distributed caches

Content Delivery Network (CDN)

a type of cache that is useful for sites that serve large amount of static media
a request will first ask CDN for some static media
- if locally available, the CDN will serve the content
- else, it will query the backend server, cache it locally, and serve it to the requesting user.

Cache Invalidation

How do we ensure that when data is modified in the database, it is then removed from the cache to prevent inconsistencies? This process is called cache invalidation

Write-through cache

we write to the database and to the cache simultaneously
creates high latency for writes since each write must be done twice

Write-around

delete from cache and write to database only ??

Write-back cache

data is written to cache alone and persisted to perminent storage after some specified time interval

Cache Eviction Policies

FIFO -- without any regard to access patterns
LIFO -- without any regard to access patterns
Least Recently Used (LRU)
Most Recently Used (MRU)
Least Frequently Used (LFU) -- counts how often items are needed
Random Replacement - random eviction when space is necessary

Data Partitioning

Partitioning Methods

Horizontal partitioning

different rows on different servers
also called sharding
also called range based partitioning
con -- if the range isnt chosen carefully, could lead to some servers recieving higher load

Vertical Partitioning

divide data to store tables related to a specific feature in their own server

Directory Based Partitioning

???

Partitioning Criteria

key / hash based partitioning

List Partitioning

round robin paritioning

composite partitioning

Indexes

improve read performance but decrease writes
its an additional data structure that takes a column and maps its value to its location in a table
duplicates are allowed on secondary indexes

Proxies

Redundancy and Replication

SQL v NoSQL

CAP Theorem

Consistent Hashing

Long-Polling v WebSockets v Server-Sent Events

Grokking the Advanced System Design Interview

Bloom Filter

tells whether an element may be in a set or definitely is not in a set
the only possible errors are false positives
bloom filters are created for SSTables
in bigtable and cassandra, any read operation has to read from all sstables. if these tables are not in memory, you may end up needing many disk accesse
they help reduce the number of disk accesses by predicting if an SSTable may contain data for a given row or column pair
in some cases, a small amount of server memory used for storing bloom filters drastically reduces the number of disk seeks, thereby improving read performance

Consistent Hashing

( a solution for partitioning ) 2 Challenges when we distribute and partition data--

How do we know on which node a particular piece of data will be stored?
When we add or remove nodes, how do we know what data will be moved where?
How can we minimize data movement when nodes join or leave?

Consistent Hashing maps data to physical nodes and ensures that only a small set of keys move when servers are addor removed

the node is like a ring, where each node gets a range of keys,
when a node is added or removed from the ring, only the next node is affected
virtual nodes help make sure the nodes stay balanced

![https://www.educative.io/courses/grokking-adv-system-design-intvw/3Yw5qVDnq9R]

Quorum

( a solution for replication ) How do we make sure that all replicas are consistent?

a quorum is the minimum number of servers on which a distributed operation needs to be performed succesfully before declaring the operations overall success
aka the minimum number of servers that need to respond and agree
Quorum is achieved when nodes follow the below protocol:
- R + W > N
- N = nodes in the quorum group
- W = minimum write nodes
- R = minimum read nodes
... skipped some details for now

Leader and Follower

Allow a single server to be responsibel fro data replication and coordinating work

users only write to the leader, and the leader will then write to the followers
followers can serve read requests for load balancing
failover -- the process of assigning a new leader in case the leader fails
in kafka - each partition has a designated leader

Write ahead Log

Machines can fail or restart at any time. How can the program know what it was doing before a system crash?

WAL: each modification to the system is first written an append only log on the disk. If the machine crashes the system will be able to recover and reapply the operation if necessary.
results in a reduced number of disk writes
each node would maintain its own WAL

Segmented Log

A single log can become difficult to manage, and can become a performance bottleneck.

a segmented log is a WAL that has been broken down into smaller segments.

High Water Mark

When the leader fails and a new leader is to be selected, there may be some data from the old leader that was not yet replicated to the new leader. How can the system know which writes have been succesfully replicated onto the followers?

A high Water Mark is the last log entry on the leader that has been succesfully replicated to a quorum of followers.
the client can read data only up to the high water mark index.

Heartbeat

each server periodically sends a heartbeat message to a central monitoring server or other servers in the system to show it is still alive and functioning

Gossip Protocol

In the absence of a central monitoring server, how do we efficiently communicate to other servers in the cluster that other servers are up and running?

Gossip Protocol: each node keeps track of a specified number of other nodes in the cluster.
they gossip (share) this information to one other random node every second
eventually, each node will know the state of every other node in the cluster
used by dynamo and cassandra

Hinted Handoff

When a node that was previously down comes back online again, how should we write data to it?

for nodes that are down, the system keeps notes (hints) of all the write requests they have missed.
one the failing nodes recover, the write requests are forwarded to them based on the stored hints
the node coordinating the operation to the failed node will write a hint in text file on its local disk
- when the coordinating node discovers that the failed node has recovered, it forwards the write requests to the target
used by cassandra and dynamo

Read Repair

How do we repair stale data on nodes that have recovered from a failure?

read repair: we repair stale data during the read operation
- at that time we can read data from multiple nodes, perform a comparison, and update stale data.
used by cassandra and dynamo