Real-time AI fraud detection β scoring every payment transaction in under 100ms using machine learning models that consider hundreds of features simultaneously β has reduced card fraud losses by 30β50% at financial institutions that have deployed it. The combination of gradient boosting models, graph neural networks for network fraud patterns, and streaming ML inference infrastructure represents the current state of the art. This guide covers the fraud detection ML stack, the feature engineering that matters most, and the production architecture for sub-100ms transaction scoring.
The Fraud Detection ML Stack
The Features That Matter Most
| Feature Category | Examples | Fraud Signal |
|---|---|---|
| Velocity features | Transactions in last 1h/4h/24h; spend in last hour | Fraudsters use compromised cards quickly β velocity spikes |
| Merchant risk | Merchant fraud rate (30/60/90 day); merchant category | High-risk merchants (crypto, gift cards) have higher fraud rates |
| Geographic anomaly | Distance from home location; impossible travel velocity | Card used in two countries 2 hours apart = fraud signal |
| Device/channel | New device flag; device fingerprint age; channel mismatch | First use of device for high-value transaction |
| Behavioural baseline | Transaction amount vs account average; time-of-day vs history | 3am $2,000 transaction for account that never spends after midnight |
| Network features | Shared IP, device, or email with known fraud accounts | Fraud rings use the same infrastructure across many accounts |
Pre-compute velocity features and store in Redis with TTL: on each incoming transaction, update counters for card-level and account-level velocity windows (1h, 4h, 24h) atomically in Redis using INCR with EXPIRE. At scoring time, retrieve pre-computed features in <5ms. Technology stack: Kafka for transaction streaming β Flink or Spark Streaming for feature computation β Redis for feature store β REST API for model serving. Alternatively, use a purpose-built feature store (Feast, Tecton, AWS SageMaker Feature Store) for managed feature computation and serving.
Train XGBoost with class weights to handle extreme class imbalance (0.1% fraud rate). Export to ONNX: model.save_model("fraud_model.json") then convert with onnxmltools. Serve via ONNX Runtime in a FastAPI container β ONNX Runtime achieves 1β5ms inference for a 500-tree XGBoost model. Total pipeline latency: 5ms feature retrieval + 3ms inference + 2ms overhead = 10ms. Deploy on Kubernetes with HPA for traffic spikes. Retrain weekly on rolling 90-day window to adapt to concept drift. Our ML team builds production fraud scoring systems.
Our ML development, data analytics, and DevOps teams build production real-time fraud scoring systems for financial services and fintech companies. Book a free advisory session.