Choosing the right database is one of the most critical decisions you’ll make in a system design interview.
The database you pick often dictates how well your system performs under load, how easily it scales, and how gracefully it handles complexity in real-world scenarios.
That’s why a strong understanding of different database types and when to use each is a key part of acing system design interviews.
In this article, we’ll walk through the 10 must-know database types for system design interviews. For each one, we’ll explain:
What it is
When to use it (with examples)
Key design considerations
Popular databases you can mention in interviews
1. Relational
A Relational Database stores data in structured tables with rows and columns. It’s like an Excel sheet, but much more powerful.
Each table represents an entity (like Users
, Orders
, or Products
), and relationships between tables are defined using foreign keys. It uses SQL (Structured Query Language) to query and manipulate data.
When to use it
Your data is structured and relational
Relational databases are ideal when your data consists of clearly defined entities with strong relationships.
Example - In an e-commerce platform:
Users
places anOrder
Order
can contain multipleProducts
Products
have associatedReviews
and belong to aCategory
A relational schema with foreign keys makes it easy to model and enforce these relationships, and SQL allows efficient querying via joins.
You need strong consistency
Relational databases provide full ACID compliance, ensuring reliable transactions.
Example - In a digital payments system, transferring money between accounts requires atomic updates. If one step fails, the entire transaction rolls back, preserving data integrity.
This level of data integrity and transactional safety is what relational databases excel at.
You require complex queries and reporting
Relational databases provide SQL, a powerful and expressive query language. SQL supports filtering, aggregation, grouping, and multi-table joins.
Design Considerations
Indexing
Indexes speed up read-heavy queries by allowing the database to quickly locate rows.
Create indexes on frequently queried columns (e.g., user_id
, email
). Use composite indexes for multi-column filters. Avoid over-indexing in write-heavy systems, as it can slow down inserts and updates.
Normalization vs Denormalization
Normalize to reduce redundancy and ensure consistency. Denormalize in read-heavy systems to reduce join overhead.
Joins
Joins are powerful for analytics and reporting. However, avoid excessive joins on large tables as they can become performance bottlenecks. Never design for cross-shard joins unless absolutely necessary.
Sharding
Sharding enables horizontal scaling but introduces complexity.
Choose high-cardinality shard keys (e.g., user_id
) to distribute load evenly. Be mindful that cross-shard queries and transactions are difficult to implement.
Scaling
Vertical scaling (adding more CPU/RAM to a single machine): Easy but limited.
Horizontal scaling: Add read replicas, partition large tables, and use caching (e.g., Redis) for frequently accessed data
Example databases
PostgreSQL – Open-source, feature-rich, ACID-compliant
MySQL – Widely used, especially in LAMP stack applications
Oracle DB – Enterprise-grade RDBMS
2. In-Memory
An In-Memory Database stores data directly in RAM instead of disk. This makes it blazingly fast for read and write operations
When to use it
You need ultra-low latency
In-memory databases are ideal for applications that demand near-instantaneous responses.
Example: A real-time leaderboard in a gaming app where scores are updated and ranked instantly.
The data is temporary or can be regenerated
In-memory databases are great for storing data that can be recomputed if lost.
Example: Caching trending search results to reduce repeated computation and speed up queries.
You want to reduce load on your main database
In-memory stores can act as a high-speed caching layer to offload frequent reads from your primary database.
Example: Caching user profile data in a social media platform to avoid repeated lookups in the main DB.
Design Considerations
Volatility
Since RAM is volatile, data is lost on crash or restart unless persistence is enabled.
Tools like Redis offer optional persistence via:
RDB (snapshotting): Saves data at intervals
AOF (Append Only File): Logs each write operation
Eviction Policies
RAM is fast, but limited. When memory runs out, older or less-used data is evicted. Common eviction policies include LRU, LFU and TTL.
Keep It Lean
Avoid storing large files or infrequently accessed data. Store only hot and frequently accessed data such as user sessions and recent activity.
Replication
Replication can improve read performance and provide failover support in case the primary instance goes down. Redis supports replica nodes and automatic failover using Sentinel or Cluster mode.
However, replication is typically asynchronous, so there's a risk of data loss if the primary fails before sync. Always persist critical data in a durable system like a relational database.
Example databases
Redis – Lightning fast, supports data structures like lists, sets, sorted sets, and pub-sub
Memcached – Simple key-value store for caching
Apache Ignite – Distributed in-memory store with SQL support
Hazelcast – Often used in Java-based enterprise applications
3. Key-Value
A Key-Value Database is the simplest type of database. It stores data as a collection of key-value pairs, where each key is unique and maps directly to a value. Think of it like a giant, distributed
HashMap
.
There are no tables, schemas, or relationships—just keys and values. This makes key-value stores extremely fast and highly scalable.
When to use it
You need fast lookups by unique key
Key-value stores offer constant-time (O(1)) reads and writes, making them ideal for quick access using a unique identifier.
Examples:
Storing user sessions as
userId → sessionInfo
in a web application.Storing
shortURL → fullURL
mappings in a URL shortener service.
You don’t need complex queries or relationships
If your data doesn't require joins, filtering, or relational constraints, a key-value store is a simple and scalable choice.
You’re dealing with high-volume, low-latency workloads
Key-value stores are designed for massive throughput and are often used in systems that demand millions of reads/writes per second with minimal latency.
Design Considerations
Lookup-only access
You can only retrieve values by key. They typically don’t provide filtering, sorting, or joining. Secondary indexes are typically not supported.
No enforced schema
Key-value databases are schema-less. Values can be strings, JSON, or binary blobs.
This gives you flexibility but also puts the burden on your application to handle serialization/deserialization and versioning of the data model.
Easy horizontal scaling
Key-based partitioning enables seamless distribution across nodes using consistent hashing or range-based partitioning.
To distribute keys evenly, choose high-cardinality keys (avoid country_code
if most users are from one region)
Example databases
Redis – Also acts as a key-value store with rich data types
Amazon DynamoDB – Managed, horizontally scalable key-value store
Riak KV – Distributed, fault-tolerant key-value system
Aerospike – High performance for low-latency read/write at scale
4. Document
A Document Database stores data as documents, typically in JSON or BSON format. Each document is a self-contained unit with fields and values making it flexible and schema-less.
Think of it as a database that stores records the way you'd naturally structure data in code.
When to use it
Data structures vary across records
Document databases are ideal when different records have different fields.
Example: In a Content Management System (CMS), content types like blog posts, product pages, and landing pages all have unique attributes. Document stores let each record define its own structure without enforcing a fixed schema.
You need to store nested or hierarchical data
Document databases make it easy to store deeply nested data supporting complex, embedded objects without joins.
Example: A user profile with nested addresses and preferences can be stored as a single document:
{
"user_id": "u123",
"name": "Alice",
"addresses": [{ "type": "home", "city": "Boston" }],
"preferences": { "notifications": true }
}
You want schema flexibility and fast iteration
Document databases let you evolve your schema without downtime or migrations
Example: For a new application, features can be added by simply introducing new fields. No migrations required.
Design Considerations
Indexing
Indexing is crucial for performance but indexing deeply nested fields may add overhead.
Index frequently queried fields (
user_id
,email
, etc.)Use compound indexes for multi-field queries
Document Size Limits
Most systems (e.g., MongoDB) have limits (like 16MB per document). Large documents may need to be split or restructured.
Denormalization
Related data is often embedded to avoid joins. This improves read performance but can increases write complexity and risk of duplication.
Sharding
Most document databases support horizontal scaling via sharding, but it requires careful design:
Use a high-cardinality shard key (e.g.,
user_id
) to avoid skew and hot partitionsEnsure the shard key is included in your queries
Example databases
MongoDB – The most popular document store, widely used across startups and enterprises
Couchbase – Combines key-value and document storage with powerful querying
Firebase Firestore – Realtime document database, ideal for mobile/web apps
5. Graph
A Graph Database is designed to store and navigate relationships. It represents data as nodes (entities) and edges (relationships between entities). This structure makes it ideal for scenarios where connections are as important as the data itself.
When to use it
Relationships are central to your data
Graph databases excel when the focus is on how entities are connected.
Example - In a social network, you might need to find "friends of friends"
You need traversals or recommendations
Graph databases are optimized for graph traversals, making them ideal for discovering indirect relationships and patterns.
Example:
Movies watched by people similar to you
Customers who bought Product A also bought Product B
You need to run complex relationship queries efficiently
Graph databases are optimized for querying densely connected data like in a social network.
Example - Find all users that are two degrees away from a user named "Alice"
(i.e., friends of Alice’s friends).
MATCH (a:User {name: "Alice"})-[:FOLLOWS]->(:User)-[:FOLLOWS]->(fof:User)
RETURN DISTINCT fof.name
Design Considerations
Traversal Efficiency
Graph databases handle multiple level of relationships far more efficiently than relational joins. Performance remains stable even with deep traversals.
Indexing
While traversals are optimized, indexing is still essential for quickly locating the starting node(s) of a query. Index common node and relationship properties like user_id
, email
, or timestamp
.
Schema Flexibility
While schema is optional, consistent labeling (e.g., User
, FOLLOWS
) helps maintain query clarity and performance.
Scalability
Not all graph DBs scale horizontally. For large datasets and high throughput, choose distributed graph databases like: Dgraph, TigerGraph, Neo4j Enterprise.
Query Language
Graph databases use domain-specific languages to query and manipulate graph data.
Cypher (used by Neo4j): SQL-like syntax
MATCH (u:User)-[:FOLLOWS]->(f:User) RETURN f.name
Gremlin (Apache TinkerPop): A functional, traversal-based query language
Example databases
Neo4j – The most popular graph database, known for its Cypher query language
Amazon Neptune – Fully managed graph database supporting both RDF and property graphs
ArangoDB – Multi-model (including graph), good for flexible use cases
6. Wide-Column
A Wide-Column Database stores data in tables, rows, and columns, but unlike traditional relational databases, each row can have a different set of columns. It’s optimized for large-scale, write-heavy workloads and high-speed data retrieval across massive datasets.
Under the hood, data is stored by column families (groupings of related columns), which allows for fast access to specific columns without scanning entire rows.
When to use it
You need high write throughput at scale
Wide-column databases are optimized for write-heavy append-only workloads. They can ingest millions of rows per second with minimal latency.
Example: A time-series logging service that collects logs from thousands of services every second.
Your data grows continuously and at massive scale
Wide-column databases are ideal for storing petabytes of time-partitioned or user-specific data that grows rapidly. Each row can store a different set of columns, allowing for highly flexible and sparse storage.
You want fast lookups and flexible row-level schemas
Wide-column stores offer low-latency access to specific rows and columns, making them perfect for read patterns where you know exactly what you want to retrieve.
Example: A message inbox where each user_id
is a row, and each message thread is stored as a column (msg_1
, msg_2
, etc.). Different users can have different numbers of messages without affecting storage or performance.
Design Considerations
Schema design
In wide-column databases, the most critical part of performance comes down to how you design:
Row keys – Used to identify rows uniquely (e.g.,
user_id
,device_id
)Partition keys – Determines how data is distributed across nodes
Clustering columns – Defines how data is sorted within a partition
Poor design can lead to hot partitions, skewed distribution and inefficient reads.
Denormalization
Unlike relational databases, wide-column stores do not support joins efficiently. Instead, you duplicate related data across multiple tables or rows to speed up reads.
Indexing
Wide-column databases support primary indexes through:
Partition keys (for fast lookups)
Clustering keys (for sorting and range queries within a partition)
Secondary indexes do exist, but often expensive in terms of performance.
Sharding and Replication
These databases are designed to automatically handle horizontal sharding and replication. But bad partition keys (e.g., low-cardinality values) can cause hotspots.
Tunable consistency
Wide-column databases often support tunable consistency, letting you balance between availability, performance, and data correctness.
Strong consistency: Guarantees up-to-date data, but can add latency.
Eventual consistency: Faster, more scalable, but may return stale data.
Example databases
Apache Cassandra – Highly scalable and fault-tolerant; used by Netflix, Instagram
ScyllaDB – A high-performance drop-in replacement for Cassandra
Google Bigtable – The original wide-column database, used by Google Search and Maps
HBase – Hadoop-based, good for big data pipelines
7. Time-Series
A Time-Series Database (TSDB) is purpose-built to store, retrieve, and analyze data points that are time-stamped.
Every record is tied to a specific moment making it perfect for tracking changes over time.
When to use it
Data is generated in chronological order
Time-series databases excel when your data is produced in a steady, timestamped flow, often from machines, users, or external systems.
Example: Monitoring tools like Prometheus track CPU, memory, and network usage every few seconds.
You need to perform rollups, aggregations, or downsampling
TSDBs efficiently summarize historical data into trends using rollups (e.g., averages, min/max) and downsampling (e.g., hourly or daily aggregates).
Example: A fitness app recording heart rate every minute can show real-time stats, hourly averages, and daily trends over time.
You need high write volume with time-bound queries
TSDBs are designed for massive write throughput and queries that are scoped to specific time windows, such as:
“What was the average temperature in the last 24 hours?”
“Show all alerts that occurred between 2–4 PM yesterday.”
Design Considerations
Write-optimized
TSDBs use append-only writes, time-based compression, and sorted storage to support high ingestion rates with minimal disk I/O.
Time-based retention
Data is auto-expired based on retention policies.
Example: Keep raw data for 30 days, and aggregated data for 1 year.
Time as the primary index
In most TSDBs, time is the primary index, which means: Queries scoped by time (e.g., “last 7 days”) are extremely efficient.
Some TSDBs also support tag-based filtering (e.g., by region
or device_id
).
Downsampling
To manage storage and maintain query performance, it's common to roll up raw data into aggregated summaries over fixed intervals.
Example: Compute and store average CPU usage per minute, per hour, or per day.
Sharding
To support horizontal scalability, TSDBs often shard data by:
Time ranges (e.g., day/week/month buckets)
Series IDs (unique combinations of metric name + tag set)
Querying across many shards may still be costly if not time-bounded.
Example databases
InfluxDB – Popular open-source TSDB with a SQL-like query language
TimescaleDB – Built on PostgreSQL; combines relational power with time-series efficiency
Prometheus – Widely used in monitoring and observability (e.g., Kubernetes)
Amazon Timestream – Fully managed, serverless time-series database on AWS
8. Text-Search
A Text-Search Database is designed to efficiently store, index, and search through large volumes of textual data. It goes beyond simple substring matching by supporting full-text search, ranking, tokenization, stemming, fuzzy matching, and relevance scoring.
Instead of scanning documents line by line, it builds inverted indexes—a map of words to the documents they appear in, allowing lightning-fast text lookups.
When to use it
You need fast, flexible search over text
Full-text search databases are built to index and query large volumes of unstructured or semi-structured text quickly and accurately.
Example: In an e-commerce store, find products that match the phrase "running shoes", ranked by relevance.
You want ranked or fuzzy search
Text-search databases support intelligent matching, even when the query isn’t exact. They handle typos, variations, and synonyms for a smoother user experience.
Example: A blog platform where searching “recieve” still returns results for “receive”.
You need search plus structured filtering
Text-search engines support combining keyword relevance with filters like location, price, or categories.
Example: A real estate site where users search for "3-bedroom house near park" and filter by price range, location, and amenities.
Design Considerations
Inverted Indexing
Instead of storing data row-by-row like a typical database, text-search engines build an inverted index—a data structure that maps terms (words) to the documents they appear in.
Example: "shoe" → [doc_2, doc_4, doc_7]
This allows for lightning-fast lookups, even in large datasets.
Tokenization & Stemming
Before indexing, search engines tokenize the text (split it into individual words) and often apply stemming or lemmatization to reduce words to their root form.
“Running”, “runs”, and “ran” → “run”
“Shoes” → “shoe”
Relevance Scoring
Text-search engines don’t just match documents, they rank them by relevance using scoring algorithms like TF-IDF or BM25.
Scalability
Many modern text-search engines (like Elasticsearch, OpenSearch, and Solr) support horizontal scaling, distributed indexing, sharding, and replication for high availability and performance.
Example databases
Elasticsearch – Most popular open-source full-text search engine, widely used across industries
Apache Solr – Enterprise-grade search platform based on Lucene
MeiliSearch – Lightweight, fast search engine for developer-friendly integrations
Typesense – Open-source, typo-tolerant search engine optimized for instant search
9. Spatial
A Spatial Database is designed to store and query geospatial data—information about locations, shapes, distances, and coordinates on Earth. It supports complex spatial operations like proximity search, intersection, bounding box queries, and geofencing.
When to use it
Your system uses location-based features
Spatial database are ideal for apps that store and query real-world locations.
Example - A ride-hailing app like Uber:
Find drivers within a 3 km radius of the rider
Continuously update and query driver positions
You need to store and query shapes or regions
Spatial databases handle geometric operations like contains
, intersects
, and within
.
Example - In logistics, define delivery zones as polygons and detect when vehicles enter or exit geofenced areas.
Design Considerations
Spatial Indexing
Spatial databases optimize spatial queries using specialized indexing structures like:
R-Trees – Ideal for range queries and bounding box searches
QuadTrees – Useful for dividing 2D space into quadrants, good for sparse data
Geohashing – Converts latitude/longitude into hash strings, enabling prefix-based lookups and distributed sharding
Accuracy vs Performance
For performance reasons, many spatial systems use approximations such as:
Bounding boxes instead of exact polygons
Radius-based filtering instead of full geodistance calculations
These are often good enough for user-facing features (e.g., "cafes near me") but may not be suitable for mission-critical use cases like aviation, navigation, or legal land boundaries.
Scalability
Spatial queries are compute-intensive. To scale:
Use indexes on geo fields
Cache frequently queried regions
Limit search scope (e.g., within a city)
Integrations
You may not need a dedicated spatial DB. Many databases offer built-in or extended spatial support:
PostgreSQL + PostGIS
MongoDB (2D and spherical indexes)
Elasticsearch (geo-distance and geo-shape queries)
Example databases
PostGIS – Spatial extension for PostgreSQL, widely used for GIS applications
MongoDB – Supports geospatial indexes and queries out of the box
Google BigQuery GIS – Supports large-scale spatial analysis on cloud
ElasticSearch – Offers location-based search with geo-distance and geo-shape filters
10. Blob Store
A Blob Store is a storage system optimized for handling large, unstructured binary files like images, videos, PDFs, backups, or logs.
Instead of storing content in rows and columns, it stores it as binary blobs and uses metadata (like filenames or upload timestamps) for retrieval.
When to use it
You need to store large media files
Blob stores are optimized for storing and serving binary files like images, videos, and audio at scale.
Example: A video platform (e.g., YouTube) stores media in blobs, while metadata (user ID, tags, timestamps) is stored in a relational or NoSQL database.
The data doesn’t fit a traditional database
Relational and NoSQL databases are designed for structured data, not for storing massive binary payloads. Blob storage is a better fit when you're dealing with documents or file-based assets that aren’t meant to be queried or filtered by content.
You want scalable, low-cost storage
Blob stores offer pay-as-you-go storage, automatic replication, and built-in durability making them ideal for storing large volumes of cold or rarely accessed data.
Design Considerations
Metadata Management
Blob stores typically don’t support querying metadata. Store metadata (e.g., uploader, upload time) in a database, linked by the blob’s key or filename.
Access Control
Blob stores typically do not have built-in user authentication or fine-grained permission systems. You'll need to handle access control explicitly to prevent unauthorized access.
Common strategies include:
Signed URLs for time-limited access
Token-based access tied to user permissions
IAM policies at the bucket or object level
Chunking large files
Uploading or downloading large files (e.g., videos, high-res images) can fail or time out if done in a single request. Blob storage platforms support chunked or multi-part uploads, which split files into smaller parts.
This improves reliability, performance and supports resuming if a transfer fails midway.
Scalability and durability
Blob stores like Amazon S3 are designed to scale almost infinitely, handling billions of files and petabytes of data with:
Automatic replication across multiple availability zones
Up to 11 nines (99.999999999%) durability
CDN Integration
To reduce latency and improve performance for end users, especially in media-heavy applications, integrate your blob store with a Content Delivery Network (CDN) like CloudFront or Cloudflare.
CDN caches content closer to users reducing load on your origin blob store and speeding up delivery of large assets like images and videos.
Example databases / services
Amazon S3 – Industry standard for blob storage, with versioning, encryption, and lifecycle rules
Google Cloud Storage – Scalable and secure object storage on GCP
Azure Blob Storage – Microsoft’s equivalent for storing large unstructured data
MinIO – Open-source, S3-compatible object storage for on-prem or self-hosted environments
Thank you for reading!
If you found it valuable, hit a like ❤️ and consider subscribing for more such content every week.
P.S. If you’re enjoying this newsletter and want to get even more value, consider becoming a paid subscriber.
As a paid subscriber, you'll receive an exclusive deep dive every week, access to a structured system design resource, and other premium perks.
There are group discounts, gift options, and referral bonuses available.
Checkout my Youtube channel for more in-depth content.
Follow me on LinkedIn and X to stay updated.
Checkout my GitHub repositories for free interview preparation resources.
I hope you have a lovely day!
See you soon,
Ashish
Great write up. How would you compare/describe a vector db using this format?