We Got 99 Problems but Fraud Ain’t One
Running a real-money gaming site is wrought with challenges. Whenever there’s money being transacted, and particularly money given away, nefarious agents emerge from every corner of the internet trying to run scams and see just how much they can get away with without being caught. There are some tools for sites to defend themselves against fraud – but the bad actors can be highly clever and aggressive. Often times basic data-driven analytics isn’t enough to really figure out what’s going on.
I’m the CIO at Victiv.com, a daily fantasy sports site that hosts real-money games weekly. We’ve got single prize pools in the works for $100,000’s of dollars. We are running a free-money giveaway campaign where new users can earn up to $30 in free play on the site, and an additional $20 for referring friends who play. Needless to say the stakes are high, and the incentive for multi-accounting and other unsavory practices is real. To catch these problems before it’s too late requires an advanced toolset, going beyond traditional tables of data, unique e-mail verification and other standard practices.
Here’s where advanced analytics come in. We can use a type of mathematical object called a graph to visualize how our users, and our users’ technology and devices, relate to one another. Graphs can be constructed on any set of dyadic data, or in english, data of the type: A is related to B, B is related to C, A is related to C, B is related to D. These data dyads can be strung together, analyzed and visualized (using the branch of mathematics that focuses on theses sorts of things, the aptly named ‘graph theory’) to see if, for instance, D is related to A through any path (it certainly is).
Applying some network graph visualization algorithms from Mathematica 10 to a random sample of Victiv‘s affiliate data, we can see an interesting sample of the types of graphs formed by our affiliate partners and their affiliated users:
As is often the case in real-world graph data, we see a distribution of affiliate sizes that can be inferred from the size and density of the ‘hub and spoke’ structures they are attached to. Small affiliates might form a sparse star, or even a single line. Others form large clusters, with affiliated users who are themselves affiliated with other users, and so on. Interesting, yes, but this wouldn’t help us identify fraud unless we were able to add some important contextual information.
In practice this referencing information might include things like ‘what percentage of affiliated users are depositing players?’ or ‘how many have verified accounts through some form of social media’? But these data-driven items can’t always capture the whole story, and they certainly aren’t as cool to visualize. So let’s focus on another bit of data – the relationship between users and their devices. Most fraud is perpetrated by a single user, connected on one or a few devices, pretending to be many unique users. Checking the IP data can help get a handle on how many devices each user is connecting with – instances where users are sharing devices are situations that are potential fraud risks. I say potential here because most of the time there is a perfectly mundane reason behind IP sharing and many other apparent anomalies in user data. Operators need to check every possible angle to be 99% certain that something is amiss before taking action on any account. The fallout for acting brashly towards an innocent user can be just as bad as being the victim of fraud.
Below is a random sample of network graphs produced using Victiv‘s database of anonymized users and their IP addresses:
Notice how much more rich and complex this data is compared to the affiliate-relationship data. The number of IP addresses for a user is not as subject to power-law like forces as abstract things like human-to-human relationships. Devices are tangible and real, whereas human relationships in the digital world are ephemeral and nearly limitless. As a result there are far fewer giant starburst type graphs present in the IP sample. Instead we see more complex organic looking patterns emerging and long strings of connected users and devices. There is evidence of some fraud in these images alone – obviously the details of who these users are have been obfuscated for the purposes of this post – but to get some more insight about what we might look at next, let’s focus on the largest connected network of IP addresses and user accounts. Zooming in on that network looks like this:
Spaghetti anyone? Obviously this graph stands out from the others as being odd, a testament to the power of visualization to help identify anomalies in complex data. Who are these users and why are they so intra-connected? How can so many distinct users share so many IP addresses? There’s got to be something strange going on here, right?
Well, yes and no. It turns out that this is the graph of the administrative accounts at Victiv. We all share the same few dozen IP addresses from our offices in Austin TX! When I saw this, without knowing where it came from, I was immediately excited (and terrified) to think I had found my first great case of fraud on Victiv. Digging into the details I was relieved and slightly humored to find out that the detective, in this case, had only managed to snoop out himself.
As an aside – some of the other large network graphs which at first glance appear like possible fraud candidates have geo-locations remarkably close to the locations of the headquarters of some of our most well known competitors in the daily fantasy sports space. No fraud involved, but apparently Victiv has aroused their attention as the most rapidly growing daily fantasy sports site in the industry.
Taking our fake analysis one step further – we can use Mathematica‘s powerful network graph analysis tools to do all sorts of analytics on these graphs. We can identify what we think of as a ‘typical’ user to IP mapping, and from this we can find the cases that most greatly diverge from the expected pattern. Complex sounding properties like the global and local clustering coefficient, cumulative degree distribution, and resilience can be calculated and compared between each graph. Depending on the problem being addressed these properties can tell us interesting and actionable things about our user’s behavior. One of the easier advanced tactics is just using the CommunityGraphPlot function and letting Mathematica find clusters of users and IP addresses for you. Below is the community graph plot of the admin accounts and their associated IP addresses from Victiv.com:
The algorithms have cleanly identified groups of likely related users and devices, and separated them out into identifiable communities. If this weren’t a bunch of our in-house developers I might be inclined to suspect a very large, very coordinated syndicate of multi-accounting users was at play here. Fortunately for Victiv, that’s not the case.
Anyway – even if you aren’t interested in the tangible insights that can be gleaned from this sort of analysis, the pictures themselves are simply really, really cool. I’m always excited when I encounter a problem that I know will benefit from network analysis, and fortunately fraud in real-money gaming is the perfect candidate. If you like this content, are interested in fantasy sports or data analytics feel free to follow me on Twitter @ TheRotoquant. Also – sign up today at Victiv.com for a shot at winning $30 free playing fantasy football against our cyber-intelligent daily fantasy sports bot dubbed the #VICTRON! We’ve also just announced the $300,000 VICTIV Bowl for the end of this football season – we are awarding $100 tickets to it each week in guaranteed tournaments. This weekend we’re running $50,000 in guaranteed and free contests for the NFL and NHL – sign up today to win big!