Solution Background and Business Value

Chargeback fraud is a significant issue for e-commerce platforms and online retailers. This occurs when a customer makes a purchase using their credit card and later disputes the transaction with their bank, claiming it was unauthorized or fraudulent. If the bank approves the chargeback, the transaction is reversed, and the merchant may bear the financial loss.

Machine learning models can help detect and prevent chargeback fraud before it happens, allowing businesses to:

  • Reduce fraudulent transactions by identifying high-risk purchases early.

  • Minimize financial losses by preventing chargebacks from occurring.

  • Improve fraud detection processes by integrating ML into fraud prevention systems.

Data Requirements and Schema

Kumo AI can analyze data in its raw relational form, meaning we can directly use tables without extensive feature engineering. Graph Neural Networks (GNNs) leverage relationships between entities (e.g., users, orders, chargebacks) to improve fraud detection accuracy.

Core Tables

  1. Accounts Table

    • Stores user account details.

    • Key attributes:

      • account_id: Unique identifier.

      • Optional: Creation date, location, age, account type.

  2. Orders Table

    • Stores details of each order.

    • Key attributes:

      • order_id: Unique order identifier.

      • account_id: Links the order to a user.

      • timestamp: Time of purchase.

      • Optional: Order value, payment method, shipping details.

  3. Chargebacks Table

    • Stores information about chargeback claims.

    • Key attributes:

      • chargeback_id: Unique identifier.

      • order_id: Links the chargeback to an order.

      • timestamp: Time of chargeback request.

      • label: Indicates whether the chargeback was fraudulent (1) or legitimate (0).

Additional Tables (Optional)

  • Items Table: Stores item-level details within an order.

  • Order-Items Table: Links orders to specific items purchased.

  • Payment Methods Table: Stores payment details (e.g., card type, account linkage).

  • Merchants Table: Information on merchants selling products.

  • Account Events Table: Tracks user account activity.

Entity Relationship Diagram (ERD)

Predictive Queries

We can detect chargeback fraud at different levels:

1. Predict Fraudulent Chargebacks

This model predicts whether a chargeback is fraudulent:

PREDICT chargebacks.LABEL
FOR EACH chargebacks.chargeback_id
  • At inference time, we leave LABEL empty for new chargebacks and generate fraud risk scores.

2. Predict Fraud Risk at the Order Level

To anticipate fraud at the order level, we move the fraud label to the orders table:

PREDICT orders.LABEL
FOR EACH orders.order_id

3. Predict Future Chargeback Fraud

For proactive fraud detection, we can predict whether an order or account will experience a fraudulent chargeback in the next X days:

-- Predict if an order will receive a fraudulent chargeback
PREDICT FIRST(chargebacks.LABEL = 1, 0, X) > 0
FOR EACH orders.order_id
ASSUMING COUNT(chargebacks.*, 0, X) > 0

-- Predict if an account will be associated with at least one fraudulent chargeback
PREDICT COUNT(chargebacks.LABEL = 1, 0, X) > 0
FOR EACH accounts.account_id
ASSUMING COUNT(orders.*, 0, X) > 0

Deployment Strategy

The best deployment strategy depends on fraud detection system maturity:

1. Batch Predictions for Fraud Analysts

  • Fraud teams manually review and label chargebacks.

  • ML model predictions prioritize high-risk chargebacks for faster action.

  • Predictions are generated daily or hourly in batch mode.

WHERE chargebacks.TIMESTAMP > MIN_TIMESTAMP

2. Real-Time Chargeback Fraud Detection

  • The system generates real-time risk scores when an order is placed.

  • If a transaction is high risk, additional verification or manual review is triggered.

  • ML embeddings are used to enhance rule-based fraud detection.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

accounts = kumo.Table.from_source_table(
    source_table=connector.table('accounts'),
    primary_key='account_id',
).infer_metadata()

orders = kumo.Table.from_source_table(
    source_table=connector.table('orders'),
    time_column='timestamp',
).infer_metadata()

chargebacks = kumo.Table.from_source_table(
    source_table=connector.table('chargebacks'),
    time_column='timestamp',
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'accounts': accounts,
        'orders': orders,
        'chargebacks': chargebacks,
    },
    edges=[
        dict(src_table='orders', fkey='account_id', dst_table='accounts'),
        dict(src_table='chargebacks', fkey='order_id', dst_table='orders'),
    ],
)

graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="PREDICT chargebacks.LABEL FOR EACH chargebacks.chargeback_id"
)
pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")