ClickHouse is a high-performance column-oriented OLAP database purpose-built for real-time analytical queries on large datasets. For eCommerce, it enables sub-second queries on billions of rows of clickstream, order, and behavioural data — powering real-time dashboards, personalisation engines, and A/B testing analytics at a fraction of the cost of alternatives like BigQuery or Redshift.
What Is ClickHouse and Why Is It Fast?
ClickHouse is an open-source columnar OLAP (Online Analytical Processing) database management system originally built by Yandex for their web analytics platform and open-sourced in 2016. Unlike row-oriented databases (MySQL, PostgreSQL) that store all columns of a row together, ClickHouse stores each column's data separately on disk — enabling queries that only read the columns they need, with aggressive compression that can reduce storage by 10–100× compared to row stores.
eCommerce Use Cases for ClickHouse
ClickHouse Schema Design for eCommerce
ClickHouse table design differs significantly from relational database design. The primary considerations for eCommerce analytics:
-- Events table: clickstream and order events (ReplacingMergeTree for deduplication)
CREATE TABLE ecommerce_events (
event_date Date,
event_time DateTime,
session_id String,
user_id UInt64,
event_type LowCardinality(String), -- 'page_view', 'add_to_cart', 'purchase'
product_id UInt32,
category LowCardinality(String),
revenue Decimal(12, 2),
country LowCardinality(String),
device LowCardinality(String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date) -- partition by month for efficient time-range queries
ORDER BY (event_date, user_id, event_time) -- primary key for data locality
SETTINGS index_granularity = 8192;
-- Materialised view: pre-aggregate daily product metrics
CREATE MATERIALIZED VIEW daily_product_metrics
ENGINE = SummingMergeTree()
PARTITION BY event_date
ORDER BY (event_date, product_id, category)
AS SELECT
event_date,
product_id,
category,
countIf(event_type = 'page_view') AS views,
countIf(event_type = 'add_to_cart') AS add_to_carts,
countIf(event_type = 'purchase') AS purchases,
sumIf(revenue, event_type = 'purchase') AS total_revenue
FROM ecommerce_events
GROUP BY event_date, product_id, category;
Data Ingestion Architecture
| Ingestion Method | Use Case | Throughput | Notes |
|---|---|---|---|
| HTTP API (native format) | Application events, direct insert | High | Batch inserts (1000+ rows) for efficiency |
| Kafka integration (ClickHouse Kafka engine) | Event stream from application | Very high | Native Kafka consumer built into ClickHouse |
| S3 / object storage import | Batch historical data backfill | Very high | s3() table function reads directly from S3 |
| ClickPipes (ClickHouse Cloud) | Managed Kafka/Kinesis ingestion | High | Fully managed, no infrastructure required |
| dbt + ClickHouse adapter | Transformed data from data warehouse | Medium | Use for reporting layers, not raw event ingestion |
ClickHouse Cloud vs Self-Hosted
- Fully managed — no cluster administration
- Auto-scaling compute and storage separately
- ClickPipes for managed data ingestion
- Multi-region availability
- Usage-based pricing — pay per query and storage
- Best for: most eCommerce teams without dedicated data infra engineers
- Full control over configuration and hardware
- Predictable costs at high scale
- Data sovereignty / regulatory compliance
- ClickHouse Keeper for coordination (replaces ZooKeeper)
- Requires cluster management expertise
- Best for: large-scale deployments (>100TB), strict data residency requirements
Key eCommerce Query Patterns
-- Funnel analysis: view → cart → purchase conversion by category
SELECT
category,
uniqIf(user_id, event_type = 'page_view') AS viewers,
uniqIf(user_id, event_type = 'add_to_cart') AS added_to_cart,
uniqIf(user_id, event_type = 'purchase') AS purchasers,
round(purchasers / viewers * 100, 2) AS conversion_rate
FROM ecommerce_events
WHERE event_date >= today() - 7
GROUP BY category
ORDER BY conversion_rate DESC;
-- Cohort retention: % of users who purchased in week 1 who returned in week N
SELECT
cohort_week,
week_number,
round(returning_users / cohort_size * 100, 1) AS retention_rate
FROM (
SELECT
toStartOfWeek(min_purchase_date) AS cohort_week,
toUInt32(dateDiff('week', min_purchase_date, purchase_date)) AS week_number,
uniq(user_id) AS returning_users,
any(cohort_size) AS cohort_size
FROM (
SELECT
user_id,
event_time AS purchase_date,
min(event_time) OVER (PARTITION BY user_id) AS min_purchase_date,
uniq(user_id) OVER (PARTITION BY toStartOfWeek(min(event_time) OVER (PARTITION BY user_id))) AS cohort_size
FROM ecommerce_events
WHERE event_type = 'purchase'
)
GROUP BY cohort_week, week_number
)
ORDER BY cohort_week, week_number;