We generate candidate features, test them on the dataset, and show they discriminate over-predicted vs well-predicted cases inside the cohort.
We ran a signal scan across all candidate variables in the fraud_exploratory_signals_prod_us dataset, comparing transactions in this hotspot (Over-pred) to those in the same cohort where the model predictions matched reality (Well-pred). The goal was to identify signals with statistically significant differences and moderate-to-large effect sizes, indicating they could help explain โ and potentially mitigate โ the over-prediction problem. These are the top 3 features that might be responsible for the over-prediction:
โ 1. merchant_repeat_count_30d
What it is: The count of purchases at the same merchant in the last 30 days.
How it was found: This variable showed one of the largest effect sizes (โ0.43) in the scan โ over-pred cases averaged 3.1 visits, while well-pred cases averaged 5.4 visits.
Why it matters: High repeat-visit behavior is a known protective factor in fraud โ habitual customers tend to be low risk. The model appears to undervalue this loyalty signal, treating even frequent shoppers as higher risk than warranted.
โ 2. avg_ticket_amount_90d
What it is: The average purchase amount for the same merchant over the past 90 days.
How it was found: Statistically significant (p = 0.020) with a smaller but consistent effect size (โ0.15) โ over-pred cases averaged $78, while well-pred cases averaged $84.
Why it matters: Slightly higher average spend over a long period often signals a stable customer relationship and trust in the merchant. The model may not be accounting for this sustained spend pattern as a protective factor.
โ 3. card_present_flag
What it is: Indicates whether the transaction was processed with the physical card present (vs. e-commerce, in-app, or card-not-present channels).
How it was found: Highly significant (p < 0.001) with an effect size of โ0.18 โ card-present rates were 96% in the well-pred group vs. 91% in over-pred.
Why it matters: Card-present transactions, especially in supermarkets, carry much lower fraud risk than card-not-present transactions. The model seems to be underweighting this known fraud-reducing factor in this segment.
โ
Conclusion:
These three drivers highlight that the false-positive hotspot is concentrated in a population with strong loyalty, stable spend, and mostly in-person transactions โ all historically low-risk behaviors. The model's residuals suggest it is overestimating risk here, likely due to missing or under-weighted features that explicitly encode these protective patterns.