Home Blog FinTech and Embedded Finance AI fraud detection: real-time transaction scoring guide
πŸ’³ FinTech and Embedded Finance June 14, 2026 12 min read

AI fraud detection: real-time transaction scoring guide

FinTech and Embedded Finance Enterprise Guide 2026 SCALE D2C FinTech and Embedded Finance Enterprise Guide 2026

Real-time AI fraud detection β€” scoring every payment transaction in under 100ms using machine learning models that consider hundreds of features simultaneously β€” has reduced card fraud losses by 30–50% at financial institutions that have deployed it. The combination of gradient boosting models, graph neural networks for network fraud patterns, and streaming ML inference infrastructure represents the current state of the art. This guide covers the fraud detection ML stack, the feature engineering that matters most, and the production architecture for sub-100ms transaction scoring.

The Fraud Detection ML Stack

Layered Fraud Detection Architecture
Production fraud detection uses multiple ML layers operating at different latencies: (1) Real-time scoring (10–100ms) β€” gradient boosting or neural network model scoring each transaction as it arrives, using pre-computed features; (2) Near-real-time (1–30s) β€” streaming aggregations over a 1–60 minute window (velocity checks, merchant-level patterns); (3) Batch enrichment (hourly/daily) β€” graph analysis, account-level risk scoring, device fingerprint updates. The real-time layer makes the accept/decline decision; the near-real-time and batch layers feed features into the next real-time score.

The Features That Matter Most

Feature CategoryExamplesFraud Signal
Velocity featuresTransactions in last 1h/4h/24h; spend in last hourFraudsters use compromised cards quickly β€” velocity spikes
Merchant riskMerchant fraud rate (30/60/90 day); merchant categoryHigh-risk merchants (crypto, gift cards) have higher fraud rates
Geographic anomalyDistance from home location; impossible travel velocityCard used in two countries 2 hours apart = fraud signal
Device/channelNew device flag; device fingerprint age; channel mismatchFirst use of device for high-value transaction
Behavioural baselineTransaction amount vs account average; time-of-day vs history3am $2,000 transaction for account that never spends after midnight
Network featuresShared IP, device, or email with known fraud accountsFraud rings use the same infrastructure across many accounts
50ms
Target real-time fraud scoring latency β€” pre-computed feature vectors stored in Redis, model inference via ONNX Runtime, response within the payment network timeout window of 100ms
0.1%
Target false positive rate β€” every false decline is a frustrated customer and lost revenue. Production fraud models are tuned to minimize false positives at a given fraud detection rate, not to maximize detection alone
Velocity
Velocity features (transaction count and amount over sliding time windows) are consistently the highest-importance features in production fraud models β€” fraudsters act fast on compromised credentials, creating detectable velocity spikes
01
Infrastructure
Real-Time Feature Store

Pre-compute velocity features and store in Redis with TTL: on each incoming transaction, update counters for card-level and account-level velocity windows (1h, 4h, 24h) atomically in Redis using INCR with EXPIRE. At scoring time, retrieve pre-computed features in <5ms. Technology stack: Kafka for transaction streaming β†’ Flink or Spark Streaming for feature computation β†’ Redis for feature store β†’ REST API for model serving. Alternatively, use a purpose-built feature store (Feast, Tecton, AWS SageMaker Feature Store) for managed feature computation and serving.

Redis velocity countersKafka + Flink streamingFeast feature store
02
Model
XGBoost + ONNX for Sub-50ms Scoring

Train XGBoost with class weights to handle extreme class imbalance (0.1% fraud rate). Export to ONNX: model.save_model("fraud_model.json") then convert with onnxmltools. Serve via ONNX Runtime in a FastAPI container β€” ONNX Runtime achieves 1–5ms inference for a 500-tree XGBoost model. Total pipeline latency: 5ms feature retrieval + 3ms inference + 2ms overhead = 10ms. Deploy on Kubernetes with HPA for traffic spikes. Retrain weekly on rolling 90-day window to adapt to concept drift. Our ML team builds production fraud scoring systems.

XGBoost + ONNX RuntimeFastAPI servingWeekly retraining
Real-Time Fraud Detection ML

Our ML development, data analytics, and DevOps teams build production real-time fraud scoring systems for financial services and fintech companies. Book a free advisory session.

Frequently Asked Questions

End-to-end FinTech and Embedded Finance strategy, implementation, and optimisation. Contact us for a free consultation.

Strategy: 4–8 weeks. Full implementation: 3–12 months.

Yes β€” D2C brands to enterprise. View our pricing.

FINTECH AND

Ready to Implement FinTech and Embedded Finance?

Our specialist team delivers measurable ROI for enterprise and D2C brands.

Free Audit