AI fraud detection: real-time transaction scoring guide

Q: What does SCALE D2C offer for FinTech and Embedded Finance?

End-to-end FinTech and Embedded Finance strategy, implementation, and optimisation. Contact us for a free consultation.

Q: How long does a FinTech and Embedded Finance engagement take?

Strategy: 4–8 weeks. Full implementation: 3–12 months.

Q: Does SCALE D2C work with all business sizes?

Yes — D2C brands to enterprise. View our pricing .

Real-time AI fraud detection — scoring every payment transaction in under 100ms using machine learning models that consider hundreds of features simultaneously — has reduced card fraud losses by 30–50% at financial institutions that have deployed it. The combination of gradient boosting models, graph neural networks for network fraud patterns, and streaming ML inference infrastructure represents the current state of the art. This guide covers the fraud detection ML stack, the feature engineering that matters most, and the production architecture for sub-100ms transaction scoring.

The Fraud Detection ML Stack

Layered Fraud Detection Architecture

Production fraud detection uses multiple ML layers operating at different latencies: (1) Real-time scoring (10–100ms) — gradient boosting or neural network model scoring each transaction as it arrives, using pre-computed features; (2) Near-real-time (1–30s) — streaming aggregations over a 1–60 minute window (velocity checks, merchant-level patterns); (3) Batch enrichment (hourly/daily) — graph analysis, account-level risk scoring, device fingerprint updates. The real-time layer makes the accept/decline decision; the near-real-time and batch layers feed features into the next real-time score.

The Features That Matter Most

Feature Category	Examples	Fraud Signal
Velocity features	Transactions in last 1h/4h/24h; spend in last hour	Fraudsters use compromised cards quickly — velocity spikes
Merchant risk	Merchant fraud rate (30/60/90 day); merchant category	High-risk merchants (crypto, gift cards) have higher fraud rates
Geographic anomaly	Distance from home location; impossible travel velocity	Card used in two countries 2 hours apart = fraud signal
Device/channel	New device flag; device fingerprint age; channel mismatch	First use of device for high-value transaction
Behavioural baseline	Transaction amount vs account average; time-of-day vs history	3am $2,000 transaction for account that never spends after midnight
Network features	Shared IP, device, or email with known fraud accounts	Fraud rings use the same infrastructure across many accounts

50ms

Target real-time fraud scoring latency — pre-computed feature vectors stored in Redis, model inference via ONNX Runtime, response within the payment network timeout window of 100ms

0.1%

Target false positive rate — every false decline is a frustrated customer and lost revenue. Production fraud models are tuned to minimize false positives at a given fraud detection rate, not to maximize detection alone

Velocity

Velocity features (transaction count and amount over sliding time windows) are consistently the highest-importance features in production fraud models — fraudsters act fast on compromised credentials, creating detectable velocity spikes

Infrastructure

Real-Time Feature Store

Pre-compute velocity features and store in Redis with TTL: on each incoming transaction, update counters for card-level and account-level velocity windows (1h, 4h, 24h) atomically in Redis using INCR with EXPIRE. At scoring time, retrieve pre-computed features in <5ms. Technology stack: Kafka for transaction streaming → Flink or Spark Streaming for feature computation → Redis for feature store → REST API for model serving. Alternatively, use a purpose-built feature store (Feast, Tecton, AWS SageMaker Feature Store) for managed feature computation and serving.

Redis velocity countersKafka + Flink streamingFeast feature store

Model

XGBoost + ONNX for Sub-50ms Scoring

Train XGBoost with class weights to handle extreme class imbalance (0.1% fraud rate). Export to ONNX: model.save_model("fraud_model.json") then convert with onnxmltools. Serve via ONNX Runtime in a FastAPI container — ONNX Runtime achieves 1–5ms inference for a 500-tree XGBoost model. Total pipeline latency: 5ms feature retrieval + 3ms inference + 2ms overhead = 10ms. Deploy on Kubernetes with HPA for traffic spikes. Retrain weekly on rolling 90-day window to adapt to concept drift. Our ML team builds production fraud scoring systems.

XGBoost + ONNX RuntimeFastAPI servingWeekly retraining

Real-Time Fraud Detection ML

Our ML development, data analytics, and DevOps teams build production real-time fraud scoring systems for financial services and fintech companies. Book a free advisory session.

SCALE D2C Editorial Team

FinTech and Embedded Finance Research · March 2026

Frequently Asked Questions

End-to-end FinTech and Embedded Finance strategy, implementation, and optimisation. Contact us for a free consultation.

Strategy: 4–8 weeks. Full implementation: 3–12 months.

Yes — D2C brands to enterprise. View our pricing.

AI fraud detection: real-time transaction scoring guide

The Fraud Detection ML Stack

The Features That Matter Most

Frequently Asked Questions

Ready to Implement FinTech and Embedded Finance?