mql.fm Episode 2

$60 Million or $60 Billion, the Ad Fraud Question

Show Notes

In this episode I’m joined by Dr. Augustine Fou, an experienced marketer turned ad-fraud investigator who’ll blow your mind with his insights into the seedy underworld of the digital advertising industry and tips on how you can avoid falling afoul of these dastardly schemes.

Augustine Fou’s Links:

Jacques Links:

Transcript

Jacques: [00:00:08] Welcome to MQL.fm, the marketing operations podcast.

Joining me on this episode is Augustine Fou, an experienced marketer and ad fraud researcher.

So, thank you very much for joining me. I really appreciate your time. so, let’s start with, well, why don’t you tell me a little bit about yourself, what you do, how you got to, where you are.

Augustine Fou: [00:00:33] All right. My name is Augustine Fou. I am an ad fraud researcher. I’ve been doing digital marketing for 25 plus years, but I’ve kind of focused in on this topic of ad fraud in recent years, because with the rise of programmatic ad technologies, the amount of fraud has also gone way up. And that’s because as we started to automate the buying and selling of ads and placing them on millions and millions of sites that no one’s ever seen before, it’s actually created more opportunity for the bad guys to also scale the fraud.

So currently I’m studying this problem because it’s right at the intersection between technology and marketing. And a lot of marketers have shifted billions of dollars from other channels, like TV and print into digital. But now it’s almost like there’s unlimited inventory for them to buy because the bad guys with botnets can just create, you know, unlimited supply.

So that’s my work right now.

Jacques: [00:01:32] Great. That’s super interesting because obviously RTB and digital advertising and I guess display and all of that kind of stuff. It’s. It’s a massive industry. Hundreds of billions of dollars or pounds a year.

Augustine Fou: [00:01:47] Yeah, 150 billion here in the US and 350 billion worldwide and that’s year after year. So that’s a huge pot of gold.

Jacques: [00:01:55] And how big is fraud within that? What percentage?

Augustine Fou: [00:01:59] That is the $60 million or $60 billion question? I actually don’t usually use a dollar amount. I use a percentage to give a range. And you might think I’m joking, but basically what we’re seeing is the fraud ranges between 1% and 99%. So, it’s, it’s huge because it totally depends on, how vigilant the marketers are when they’re buying, how vigilant their media agencies are when they’re buying, how clean the supply paths are.

I mean, this all speaks to the complexity that is in the, uh, kind of in the infrastructure and more complexity means more opportunities for bad guys to hide the fraudulent activity. So then by the time the good guys discover it, uh, it’s been happening for a long time. So, it’s, it’s literally meant to say there’s a very wide range of possibilities.

Now, of course, there are fraud detection technologies that are supposed to help detect the fraud. But again, in those cases, it depends on what they’re looking for. Right. If they’re not looking for certain types of fraud, that’s why they’re simply not detecting it. It’s not good or bad. It’s just that they’re not even looking.

Jacques: [00:03:11] So at work we use one of these technologies and it reports more fraud than we actually spend on a monthly basis.

Augustine Fou: [00:03:18] I mean, there are things that you can kind of see, you know, in terms of your own metrics, right? So, I had a client a long time ago, a pharma company, where they’re seeing more clicks than the number of ad impressions that they show. So, how’s that even possible, you know, as it turns out, there was a bot that went crazy.

So, you know, basically these click-through URLs, can be seen because they’re just plain text URLs. So, the bots are now used to clicking on click through URLs to simulate an engagement, simulate behavior. Right. Because if the bots didn’t click on anything, those ads would have 0% click through rates and then the marketers would get suspicious, you know, very quickly, like, why is this not performing?

Jacques: [00:03:59] Yeah, I guess you’d see it very quickly.

Augustine Fou: [00:04:02] So the bots are just copying off those links. Yeah. And then clicking it. And then in this one case, it just went haywire. So, there were more clicks than ad impressions. And we see that in the data as well.

Jacques: [00:04:14] So, I guess the, the bots have gotten significantly more sophisticated from, I guess, even kind of a few years ago, whereas you say, it’s very much sufficient to just have a bot that views an impression or clicks an advert versus one that now needs to actually interact with the website in question that it’s a clickthrough.

Augustine Fou: [00:04:32] So you can think of it as a wide range of capabilities because some of the bots don’t actually have to work that hard. Right? If their job is to cause an impression to load, in some cases, they can just load the webpage and the impressions will load. All right. So that’s the simplest of the bots, but then there are other cases where, the bots have to be more advanced so that they can defeat fraud detection.

So in some of those cases, they might need to be malware on real devices, because if it’s on a real device and the device is on, say your home network, the IP address is going to look right, because it’s a residential IP address. The device is going to look right, because it’s actually a real iPhone. Right.

But the malware is working in the background and loading lots and lots of ad impressions. So, I would characterize it as there’s a wide range of capabilities or sophistication in terms of the bots, but they only need to work as hard as they need to work to make money.

Jacques: [00:05:28] Yeah, I think that’s probably an important point here is that they, the only incentive to be more sophisticated is the fact that they start getting caught. And at that stage, I guess it’s a race to the bottom.

Augustine Fou: [00:05:41] Yeah. I mean, some of these bots, they can now regularly fake mouse movement, page scrolling and clicks. So even those things are no longer reliable indicators that it’s a human interacting with your page.

Jacques: [00:05:53] so I guess that kind of brings us to the next question, which is, how do you detect this fraud? How do you detect that these bots, uh, having an impression on adverts rather than legitimate users?

Augustine Fou: [00:06:04] Yeah, so I’ll, I’ll talk about how we do it. And you know, I really don’t know how the other companies do it because they’re all black box and, you know, obviously they won’t reveal their secret sauce, but I’ll talk about how I do it. And this goes back to the range of different bots, right? Some are pretty rudimentary.

Where you can catch them simply by looking for discrepancies in the JavaScript. Right? So, JavaScript, standard language, Google analytics uses JavaScript to collect data. But instead of using it for quantity purposes, like how many people came to your site? How long did they stay? How many pages did they look at?

We actually triangulate two or more parameters together to look for discrepancies. So, I’ll use a very simple example, right? If the bot says it’s an iPhone, but the screen resolution that we detect is 1920 by 1080. That’s not a normal iPhone screen resolution. So that’s a red flag. Right. And if the IP address is a data center, like Amazon Web Services, that’s another red flag, so on and so forth.

So when you have dozens and dozens of these, um, kind of mismatches in the data, then you can pretty much be sure that that one particular impression or that one particular page view was being generated by a bot. And these are the most rudimentary ones, but what happens if we are dealing with more advanced bots that are faking the mouse movement, faking the page scrolling and that kind of stuff, we’re actually looking at what we call entropy analysis.

Right. And so, we’re looking at timings and we’re looking at order and disorder. So essentially, we’re looking for things that are just too disordered or too abnormal, if you will. Right. So, uh, everyone’s familiar with bell-shaped curves right now, use a kind of a simple example to illustrate.

Servers have very powerful CPU’s, desktops and laptops have medium powerful CPU’s okay. Then you can imagine, mobile devices and tablets, have the least powerful CPU. So, when we asked the browser to do a computation or complete a sort or some kind of mathematical function, we can actually time how long it takes for it to complete it.

So, if you imagine these bell-shaped curves, a server would actually complete those calculations in the shortest amount of time. And so, there’s a bell-shaped curve centered around very low timeframes desktops and laptops. Will have a bell-shaped curve centered around a little bit. It’ll take a little bit longer, right?

So that hump of that bell-shaped curve would be a little higher. And then the last one would be mobile devices. It should actually take the longest for a mobile phone or a tablet with, you know, with not a very powerful CPU, to complete the calculation. So now if the user agent says it’s a tablet or a smartphone, but the computation is completed in server like timeframes, right.

Something is wrong with that. So those are the kinds of discrepancies in the timings that we look for and that’s just one parameter. Imagine we have 300 parameters, and we look across each single parameter and look for outliers, the ones that look abnormal. And then also when we take pairs of these parameters or triplets or quadruplets, we’re looking in kind of four-dimensional space to see are these, behaviors out of normal.

And then when we see that we can now confidently mark it as something that it’s not a human.

Jacques: [00:09:25] That’s really interesting and I think what you’re touching on is a lot of data analysis that, and I mean, speaking entirely from a personal perspective, I don’t have a stats background. My maths is, let’s say rudimentary at best. And a lot of the things you’re talking about here are very sophisticated statistical techniques.

Augustine Fou: [00:09:44] So I didn’t do that myself. I work with an actual stats guy, so, but at least I can kind of explain it to someone in layman’s terms. Right. So, I have to understand how it works and then my stats guy helped me turn it into code. So, these are the kinds of things and that. I think the long story short is it’s really hard for the bots to fake every single parameter correctly.

It’s very easy for them to fake a single parameter by itself correctly. They can say the screen resolution is this. They can say the user agent is Roku stick. So, you know, when we talk about CTV fraud, all the bots have to do is say that they’re a Roku device, it’s, it’s literally that simple right now.

So, in those cases, any single parameter can be easily faked by the bots. Cause there’s just declaring it. But combinations of parameters will be a little bit harder and combinations of timings will be even harder. So those are kind of layers that we use to detect the bots and it kind of corresponds to the levels of sophistication, right?

Some bots are really rudimentary, and we even see amateurs or script kiddies get away with ad fraud. All they have to do is spin up a bunch of headless browsers in the data center to repeatedly load web pages on a site that they own. So, they’re just generating traffic on a site that carries ads and they can make some pocket change, lunch money.

But then there are the larger scale operations where they’re using much more advanced botnets, whether it’s made from malware or other devices, or maybe browser extensions or toolbars and things like that. As long as they have an internet connection and can load webpages or HTML, they can repeatedly load ad impressions and web pages to make money.

So those are the kind of larger scale type thing. So, there’s a huge range of sophistication.

Jacques: [00:11:26] Yeah. And I guess, so you mentioned that anyone can run these scripts on a data center. Has there been a rise in ad fraud since the availability of these huge data centers like AWS, like GCloud, et cetera?

Augustine Fou: [00:11:38] You know, previously bad guys actually would have to buy their own computer hardware, right by servers, by bandwidth in the data center. Now you pay for only what you use. So, you know, cloud has just made it so much more accessible because you don’t have to have upfront capital costs. That’s why, it’s made ad fraud accessible to script kiddies.

And you know, some fraudsters are also selling their wares, right? So, there are software programs that help you simulate a desktop browser. So, it will say it’s this user agent, like a Safari browser or something, it’ll help you rotate the IP address, so you can disguise your location.

You think back to the movies, the crime movies where the bad guys are bouncing telephone calls around the world, right. To stay hidden, same thing’s happening here. The bots are bouncing their traffic through residential proxies. So, it’s not so obvious that they’re coming from data centers.

So, you’re right. You know, all these cloud services have just eliminated that barrier of entry because you don’t have to pay for hardware upfront. You can just pay as you go. So, it’s just made it a lot easier. And it happened to correspond in time with the rise of programmatic as well. So, a lot of confluence of different things just made fraud a lot easier to do, and therefore it’s much larger scale than it ever could have been before these services came along.

Jacques: [00:13:00] So what’s interesting is, all of these big cloud services are being provided by companies that by and large make significant parts of, if not the most significant part of their revenue from advertising itself, from providing the bidding process or the infrastructure to run this advertising.

Augustine Fou: [00:13:17] So I, I don’t think they’re con you know, I don’t think they’re committing the fraud, but they’re certainly complicit to it because they, they allow it to happen. They either provide the infrastructure or they look the other way and allow it to happen. The incentives are simply misaligned because if you think about an ad exchange, they make money based on the number of ad impressions that flow through their systems.

So, the larger, the quantity, the larger their profits. So even though they’re not the ones operating the botnets and committing the fraud itself, they’re very happy that the vol, you know, about the volumes that are created by the fraud. So, they’re not in a huge hurry to shut it down because that could mean lopping off 50% of their revenue.

All right. So, it’s really a case of misaligned incentives and the middlemen that benefit from the flow. So literally going from agencies to exchanges, to some of the site networks of sites, uh, they have no incentive to solve the fraud. And that’s why we see that it persists.

Jacques: [00:14:15] So one of the big kind of. Topics well I guess two years ago, but still kind of ongoing is the GDPR and recent legislation around data privacy. And one of the big impacts that that appears to be having most recently with the Dutch regulators ruling on the IAB framework, is to do with the RTB process, with digital advertising and how it doesn’t actually conform to the standards of privacy and consent that they’re required to conform by. Do you think there’s been any sort of impacts, from the GDPR and some of the legislation?

Augustine Fou: [00:14:49] Yeah. So not specifically on fraud, but I think the movement towards privacy, is definitely going to be good for consumers. Because I think consumers were simply unaware of that happening. So, when a consumer goes to New York times, they believe that they’re interacting with that publisher. What they don’t know is the hundreds of trackers that are loaded onto the page behind the scenes that are basically collecting and siphoning data off to other ad tech companies, none of which the consumer would ever recognize.

Even if they’re their name of the company was told to them, So I think these privacy regulations are overall going to be better sitting the consumer’s actual privacy and helping to kind of reverse the tide of what we call surveillance capitalism. Where they’re collecting data for the purpose of buying and selling the data and making profits from that.

So, I think all of this kind of the awareness started to come around in 2016 with the Facebook Cambridge Analytica scandal. And that’s again, because consumers assume that they were interacting with Facebook. What they didn’t know was that Facebook was selling their data or allowing other third parties to come in and buy their data.

So, Cambridge Analytica was doing that and also scraping, so some of them didn’t even buy the data, they would just scrape or collect the data without permission. So, in those cases, when that was brought to the fore, consumers started realizing, Oh, well, a lot of my data is just leaking everywhere.

And all these ad tech companies are profiting off of it. So, when we talk about these privacy regulations, they’re really getting at, did the consumer give these ad tech companies consent to collect their data, to buy and sell it and use it for other purposes. And so now, finally, that these regulations have been passed.

Um, we’re actually starting to see a glimmer of hope. So, there was a study done by user centrics in Germany. They’re a consent management platform based in Germany, but what they did is they reviewed a number of sites in Germany and the UK and the us, and in certain countries like Germany and the UK, where they’re further along in the enforcement of GDPR, we actually see a lower number, lower average number of trackers on those websites in Germany and in the UK compared to in the U S where it’s still like 400 different trackers on the page.

Right. So. Clearly in certain countries when enforcement has already begun, we’re starting to see a little bit of a reduction in terms of the numbers of trackers and ads they’re being shown on sites. So, I think that’s a step in the right direction if you will. And I think over time as more of these laws get enforced more of the ad tech players, the publishers, the marketers are going to actually have to have consent before they collect the data and then use it for targeting otherwise they will be in non-compliance

Jacques: [00:17:35] And it seems like at least certainly from a marketing perspective, my experience is that marketers don’t necessarily understand the legislation. They don’t necessarily understand that by dropping Google Tag Manager and 20 scripts of being loaded through Google Tag Manager, they’re facilitating this.

Augustine Fou: [00:17:53] Yeah. Survey surveillance. Yeah. Surveillance economy or, or, you know, some, some privacy advocates call it the greatest data breach, ongoing data breach in the history of mankind. Right. And that kind of is right. Cause you’re surfing a website. You don’t know that all your data is being collected and siphoned off somewhere.

Right. So, I think increasingly consumers are taking steps to protect themselves. So, they’re using ad blockers and brave browser and things like that to kind of block those trackers, but for marketers and publishers. So, I’m going to talk for just a minute on terms of marketers, you said they’re not as knowledgeable about it.

Because historically they’ve just relied on the ad tech companies that serve their ads for them to have collected consent, but the problem is how does a marketer know that the ad tech company collected consent properly or that you know, is in compliance with the law? So, one of the recommendations I’ve had, given to marketers is they actually should collect consent for themselves.

And the reason for this is if an unknown ad tech company, right? So maybe you’ve heard of Crux or low to me or LiveRamp or any of these ad tech companies. Right? Most consumers have it because these are behind the scenes B2B companies that buy and sell data. So. If a Nike, or if a Hershey’s asked the consumer, will you give us consent to collect your data so that we can show you ads later?

They’re probably going to say yes, because it’s an advertiser that they recognize. It’s a company that they recognize. And so, the advertiser now has their own consent given to them by the consumer so that they can collect their data and use it for showing ads later. So that’s. You know, uh, it helps reduce the complexity, helps reduce the compliance risk for the marketer.

And similarly, for the publishers, they have the human audiences, humans go to New York times, humans go to Hearst or Conde Nast or any of these mainstream legitimate publishers. And if those publishers ask them for consent, they will probably give it to the con to the publisher. Right. Cause they voluntarily went to the site.

Whereas they don’t know about any of those ad tech companies that are loaded on the page and collecting their data. So, in those cases, the publishers also have an opportunity now to collect consent from their first party audiences. The humans that go there. And again, it helps them reduce their compliance risk because they’re not dependent on some other third party, like an ad tech company to have properly collected consent and can demonstrate that afterwards. So as we move forward, um, you know, into this new world, if publishers have consent directly from the consumer, if marketers have consent directly from the consumer, now these two parties can actually do advertising right with the consent of the consumer.

And it’s almost like we’re getting back to the good old days of digital advertising in 1995. Where it’s these three parties and they know the implicit contract of the internet. Where the consumer knows that they’re getting free content from the site because it’s being ad supported. And that’s where the advertisers can come in.

It really helps to eliminate that fourth leg of the stool that threw it out of balance. Fourth leg being ad tech, right where they came in, they’re the ones collecting data and they’re the ones maximizing their own profits at the expense of everyone else at the expense of the other three legs. So, if we can get back to this future of a balanced three-legged stool with the original three parties at the start of the internet, right advertisers, publishers, and consumers, I think we do much better digital marketing.

Jacques: [00:21:33] I think you touched upon kind of an important point there in that this extra leg, within the process. Not only is costing everyone revenue, but from a publisher standpoint or from an advertiser standpoint, if I want to advertise for the New York times audience, I can go to Google and target those people without needing to place an advert on the New York times.

Augustine Fou: [00:21:54] And therein lies a problem, right? Because that’s where fraud comes in as well, because some bad sites are going to pretend to be New York times. We call that domain spoofing. So, when you’re buying New York times inventory for 3 cents or very, very cheaply somewhere else, how do you know it’s actually New York times? You don’t right. So, there’s a lot more opportunity for bad guys to enter the system and commit fraud. So, when you talk about, you know, these direct buys, right, that’s actually a good recommendation for, for marketers. If they buy from New York Times, New York Times will show their ads as simple as that.

So, what we’re doing is we’re cutting out a lot of the middlemen. Who were extracting profits from for themselves. So I don’t think we’ve mentioned this yet, but the, the ISBA, so it’s the association that represents British advertisers, as well as the ANA Association of National Advertisers here in the US and the WFA World Federation of Advertisers.

All of these industry trade bodies have conducted industry-wide studies since 2015 to show that at least 50% of the dollar 50 cents on every dollar that a marketer spends goes into the pockets of the ad tech middlemen and not towards showing the ad . We call that working media. So did your dollar get through to the publisher so that they can actually show your ad and all of these studies have uh, consistently found that greater than 50% of the dollar is siphoned off or, you know, captured as profits by the middleman. So less than 50 cents on the dollar is actually going towards showing ads.

Jacques: [00:23:31] And of that 50% then what is then subjected to ad fraud?

Augustine Fou: [00:23:37] Yeah, that’s on top of the 50%, right? This is literally the costs of, of buying through the supply chain, simply because these middlemen are taking, uh, taking a cut, but, uh, the, the. Cost of ad fraud. It goes on top of that. Right? So, you’re starting with 50 cents because you’ve already paid the 50 cent ad tech tax.

So, marketers, if they reduce or eliminate all of these middlemen by buying directly from the publisher, then much, much more of their dollar goes towards showing ads. Now some will argue, Oh, well, you can’t place ads by hand anymore. That’s correct. You, you still have to use technology.

We’re using programmatic technologies to place the ads, but you’re still buying as direct as possible from the publisher. And then the ads on their ad server will get served up. So, in those cases, I guess the takeaway is shorten the supply chain as much as possible. And not only will that reduce your costs that are going into the pockets of the middleman it’ll reduce the opportunity for fraud to also start eating away at your ad budgets.

Jacques: [00:24:41] That’s a really interesting point. Um, one of the, the fun example, uh, I say fun. One of the examples I’ve seen recently or over the last few years is, advertising within newsletters and done via, I guess, programmatic, uh, based on the email addresses as a, as an identifier. Um, have you looked into the, the fraud that exists within that landscape at all?

Augustine Fou: [00:25:05] Yeah. I mean, I guess the general principle I would say is don’t assume that there’s no fraud. All right. So, there’s a couple of things at play here. So, a lot of, marketers and platforms have now, just assumed that the email address is a useful indicator. Um, but just keep in mind, email addresses are the simplest thing for bots to create more of.

You can just start up unlimited numbers of email addresses. So even though on the one side, the email address is used for identity, in terms of targeting in programmatic channels, don’t assume that there’s low fraud. Now, there are certain industries like the pharmaceutical industry where they’ll use things like, an NPI number, which basically means it’s a doctor’s kind of registration number, for being a doctor that helps you reduce some of the fraud.

But just keep in mind that bots can just copy off that number and replay it. Right. They’re just pretending to be that doctor. Uh, and they’re still able to commit fraud in those systems, even though you thought you were targeting a real doctor. And I think the same phenomenon happens in mobile devices.

Right? So. Each mobile device has either an Android ID or an Apple identifier IDFA. So, for the bots to successfully pretend to be an iPhone, they can just copy off an IDFA and just keep using that. So, Oh, it’s a valid, uh, mobile identifier. So, let’s keep marketing to it. That doesn’t necessarily mean that is actually a human, right.

It could still be a bot pretending to be that mobile device. So just kind of the, the, uh, awareness and the vigilance in looking for fraud, uh, will be how you mitigate that in your campaigns.

Jacques: [00:26:42] And I guess with the Android and iPhone piece touched on there, we’ve also seen kind of fraud happening through fraudulent SDKs. So, SDK’s loading or loading their adverts instead of a legitimate advertiser.

Augustine Fou: [00:26:56] Yep. So, it’s almost like anytime you, you get, you put some software, uh, onto your computer voluntarily or involuntarily, right? So voluntarily you install some stuff on your, on your PC, but for example, there’s a whole bunch of these free VPNs that humans are installing or these free toolbars and free browser extensions.

So, when you install that. Are you sure they’re not committing fraud behind the scenes? You gave them permission to be on your computer and your computer has an internet connection. so, they can continuously load ads in the background to make profits for themselves. And similar to on your mobile device, if you have malware or if you have rogue apps, like the flashlight app that just keeps loading ad impressions in the background, or even innocent apps that installed an SDK, that’s being controlled by someone else. An SDK is kind of like just a shortcut, right? So, it’s a software development kit that some lazy developers just put into their system so that they can start running ads.

They didn’t want to reinvent the wheel. And so obviously there’s real legitimate reasons for that, but sometimes if you put someone else’s code into your app, you run the risk of them misbehaving. So even an innocent app with a misbehaving SDK in it can still be generating ad fraud and the app maker doesn’t know the human that uses the app on their phone doesn’t know. And it’s all generating profits for the SDK maker.

Jacques: [00:28:21] I guess my only experience of SDKs is from a marketing perspective, loading various Martech SDKs onto apps for, you know, for push notifications for…

Augustine Fou: [00:28:33] yeah, those are probably fine. Those are probably fine, but you know, the bad guys have created a lot of these free SDKs and again, they’re, they’re kind of just taken advantage of laziness. So, if a developer doesn’t want to spend time building something, Oh, here’s an SDK over here. You know, it’s going to help me run ads.

Let me just copy and paste that. So, when that happens, those are, you know, just think of it as many, many more loopholes where the opportunity of fraud goes up.

Jacques: [00:29:00] And as an attack factor, I guess, SDKs are a pretty easy way, if you can get into an SDK, you can get into tens of millions of devices.

Augustine Fou: [00:29:09] Exactly. So, you know, if you have a popular app, um, you know, and different devices like Android versus iPhone, right? So, a lot of people assume that iPhones were immune to this kind of stuff, but in recent, there’s been several case studies where SDK is there in the apps themselves. Some of them again are innocent apps, right? But they’re, they’re using these SDKs for running ads, but because the SDK was in there and loaded, voluntarily on the iPhone, it provided a back door for all, all types of malicious activities. So, the SDK could then see all the activity that was happening on the phone, as well as side load, other apps.

So, the security implications of that is, um, when a user is typing their passwords, that can be observed. So they’re basically harvesting passwords when they’re logging into their bank accounts, all of that can be observed by the SDK and then furthermore, they can side load other malicious apps and, Oh, by the way, you can set the app icon to be transparent.

So, the human can’t even see that it’s loaded on your phone. So, all of these are literally loopholes that consumers are not aware of, not enough publishers are aware of or marketers are aware of. So, there’s, you know, the reason I’m I’m saying is as a security researcher and as a ad fraud researcher, these are the kinds of things that I see.

And so, you have to account for the possibility that, you know, each specific thing like a SDK is opening up loopholes or vulnerabilities in the system. So, it might be fraud. It might be malware. It might be, um, you know, harvesting consumers, passwords or things like that.

Jacques: [00:30:47] So I guess the general advice in these instances is to just shorten the supply chain as much as possible.

Augustine Fou: [00:30:53] Yeah and use, uh, SDKs or software from reputable sources that have been around for a long time. Right? I mean, yes, it does make it harder for startups and entrants, but they have to prove, uh, they are trustworthy over time. Right. You, you can’t just say, Oh, I’m trustworthy. And then everyone believes you, you have to earn and prove your, your trustworthiness over time.

So I think a lot of these things were just common sense things that were just skipped over or, um, you know, just bypass because, you know, everyone wanted to make more money as quickly as possible. So that’s part of what we’ve seen happen in ad tech, right? In the chase for larger quantities in these new channels, like mobile and then video, and then now CTV, a lot of those marketers, have thrown caution to the wind and that’s why we’re seeing a lot of the dollars being eaten up by fraud.

Jacques: [00:31:45] I guess I, I recall that, the. I believe they’re an ad network but Criteo, was caught kind of fingerprinting people, and that kind of caused a bit of a furrow because people understandably kind of wanted to be able to delete their cookies and, you know, and others were able to fingerprint them and track them regardless of what cookies they have on their device.

Augustine Fou: [00:32:05] Yeah, I wouldn’t necessarily fault Criteo for that. It’s so it’s such a standard practice among ad tech companies. And the reason for that is, you know, way back in 2010, the EFF, which is kind of like, uh, you know, advocacy, uh, organization for, for consumer privacy, they pointed out that if cookies were deleted, you could simply fingerprint the device so that you could still re identify the person anyway.

And fingerprinting for the audience just means taking a bunch of JavaScript parameters and smashing it together. Right. So, if you have enough of these parameters, you essentially have a way of uniquely identifying that. That device. It’s the IP address? It’s the browser. It’s the operating system is the exact screen resolution, the list of plugins and fonts.

So, you can imagine that’s specific enough to that individual that you can say, Oh, okay, well it’s this person’s or this devices fingerprint. So that’s been going on for it. For 10 years, at least. Right. And so other companies, you know, cookies were just a convenience, right? They could just set a cookie and then if you see the same cookie, you know, it’s the same user coming back to your site, but when you track them across different websites, a lot of these companies have been using fingerprints for a long time.

So, I wouldn’t necessarily fault Criteo or single them out because it was such a common practice right now. And even with third party cookies going away. In coming months because browsers are actually starting to enforce those privacy things on the behalf of the consumer, it still doesn’t matter for these ad tech companies because they have years of practice doing fingerprinting anyway.

Jacques: [00:33:37] I was just about, say that do you think we’ll see more fingerprinting as a result of third-party cookie’s trying to…

Augustine Fou: [00:33:45] Yes, I mean, basically these companies are going to be forced to using that as the primary mechanism. Right. And so, so it’s, it then boils down to the same, ad tech arms race, right? So certain browsers, will have mechanisms to defeat fingerprinting, and then, you know, these fingerprinting technologies will have new methods to continue to. Exactly the same as what’s happening in ad fraud. Right? So, when a fraud detection, tech company detect some kind of fraud, the bad guys will say, Oh, well, these bots are not making money anymore. So, let’s innovate or let’s tweak them so that they continue to get by detection again. So, it’s a continuous arms race between the good guys and bad guys.

And ultimately, you know, it costs more time. It costs more compute power, and ultimately these costs get passed along to whether it’s the publisher, the marketer, or the end consumer. Right because these ad tech companies are still going to profit from it, right? They’ll say, Oh, you know, buy our fraud detection, because we can detect these bots.

So all of these are extra costs that wouldn’t have been necessary in the first place if marketers just paid publishers directly to show their ads and the consumer’s privacy wouldn’t be violated in the way it is right now, you know, after 10 years.

Jacques: [00:35:04] So one of the things you said earlier was, you were looking at how a site like the New York Times, or really any big publisher, the cookies that are being loaded, what scripts are being loaded from other scripts and that kind of visualization. I know that you have put together this tool that allows individuals to kind of visualize how sites they visit, um, how they can, what’s been loaded on those websites.

Augustine Fou: [00:35:26] Yeah, it’s called page x-ray. Uh, so anyone interested in checking it out it’s pagexray.fouanalytics.com/

Jacques: [00:35:34] I will put a link to that in the podcast description.

Augustine Fou: [00:35:37] Yep. So, it’s a way to see what loads, what. So, there’s only a certain amount of these trackers that you can see when you view source on the page. And those are the ones that the publisher has voluntarily put on the page itself. But the problem is some of these JavaScript, when it runs, it’s going to call other JavaScript and other tags and other ads and things like that.

So, it leads to this entire cascade, or you can kind of think of it as a tree visualization of all these branches branching out from that set that was loaded on the page originally. So, part of the page x-ray tool is to kind of show, um, anyone there’s, all these things that are happening on the page that you may not be aware of.

And that’s not going to be evident, even if you looked at the page source by itself. And it’s a tool, a page, x-ray’s a tool that privacy researchers are now using because we also show the flag of the country in which those servers came from. So, if a tag or tracker is loaded from a server from another country, then the data that’s being collected is being shuttled off to that other country.

And that becomes important under GDPR regulations, because for example, the EU has much stricter privacy regulations in the US so when you see data being shuttled from the EU, out of the EU to the US that’s problematic, right? So, privacy researchers are now using that to say, okay, where is the data going? And are any of the trackers being called from servers in the US and therefore is the data leaking, to the US which is a no-no.

Jacques: [00:37:13] What’s kind of the, the worst example of one of these tree diagrams you’ve seen.

Augustine Fou: [00:37:19] I would say the, the fraudulent websites or the less than scrupulous ones tend to have a lot more of the trackers because part of their business model is to buy and sell data. So, the more users they can track, the larger the data set they have to sell. So categorically consumers should be very wary of these free recipe sites. Recipe sites by far have the largest number of trackers because on the one side, humans do look for recipes.

So, these are probably valid human users and therefore it makes them extremely valuable to ad tech companies, to be able to target them. To separate them from the bots. So, I won’t name specific sites, but categorically recipe sites, weather sites, uh, some news sites fake or otherwise. will have, by far the largest number of trackers, whereas some of these mainstream sites, uh, you know, if you think about New York times or, uh, not Wall Street Journal, they have a ton of, uh, uh, trackers, but things like ESPN, some of these sites don’t depend on loading so, so many trackers on the page and therefore it’s a much better consumer experience for the users. Because it’s the trackers that are actually causing your browser to slow down. So, you’re there for the content, but you don’t know, there’s 300 other things being loaded into the page. It’s slowing down your browser.

So I think that’s part of why some consumers are, are taking steps to protect themselves by using ad blockers or ad blocking browsers that also block these trackers.

Jacques: [00:38:47] You also said that these websites that are committing fraud buying and selling data, um, but they’re also, buying and selling data that is from bots. Are they selling this fake data to ad networks?

Augustine Fou: [00:39:02] Yes, very easily. So, I’ll use a very specific example. If you think about the pharmaceutical industry, they want to get their ads in front of doctors. So that’s what we call a high value segment, audience segment for those companies. So how does ad tech think that they’re doctors? Right? Cause in most cases, the doctors don’t log into the site and so they have to just observe their behavior, like what list of sites they visit.

So, if a particular user visits, new England Journal of Medicine, Journal of Clinical Oncology, some of these medical journal sites. You would deduce that they’re a human right and they’re a doctor. So, what the bad guys are doing is having their bots deliberately visit these sites. These sites don’t have fraud on them, but what the bots are doing is they’re deliberately visiting a collection of reputable sites to make themselves appear to be a certain audience segment.

So, when these bots make themselves appear to be doctors or oncologists or specialists, then that becomes a highly desirable audience for the pharmaceutical marketers to target. And so, the ad tech companies will say, Oh, well, here’s a doctor that just showed up on this no-name site over here.

They think it’s a doctor simply because they observed the behavior before. So, when this user, this bot, visits, what we call a cash out site, it’s a site. If the ad loads on that site, that site makes the money. So, they’re basically tricking the marker into thinking they’re showing an ad to a doctor that’s going onto this no name website for some reason.

And that’s how they’re siphoning dollars away from the marketers in a way from legitimate publishers, like the journals and medical journals, going to these cash out sites. And so, in that case, bots are for sure able to pretend to be certain audience segments for the purpose of tricking the marketer into targeting ads to them.

Jacques: [00:40:52] okay. And what kind of, so you’ve mentioned doctors and medicine as being a good example. What else is particularly prevalent with this kind of fraud?

Augustine Fou: [00:41:02] So we call this retargeting fraud, right? So, you’re retargeting a user that visited either your site or a collection of sites because you’re deducing their behavior and what they want. But this is pretty rampant. And, you know, typically marketers pay higher CPMs for retargeting. And the original idea was fine, right?

If a person visited your site, they’ve kind of expressed interest. And so, you want to re- target them with an ad, hopefully to get them to actually trigger the purchase. So, the original idea was fine, but the idea didn’t take into account that bots could take advantage of it. So. Throughout the course of the year.

It’s not just the high value audience, segments, like physicians, that, uh, that are affecting, say pharmaceutical marketers, literally every single industry. Right. So, uh, the bots would deliberately look at backpacks during back to school season. So, all the marketers that are targeting back to school, audiences, will get tricked by them.

The bots might look at swing sets in the spring, when consumers are shopping for swing sets. So literally it goes across the board, any, industry vertical, any topic area that bots can mimic the behavior. And so, we’ve seen cases over the years where data sellers like loademie, had to purge 400 million profiles from their database of 4 billion profiles.

So, it’s a 10% purge because all of those were known to be bots. So, they had to do it otherwise they’d go out of business. So, I applaud them for doing that, but you know, why stop at 400 million, there might be more bots in the remaining 3.6 billion, right? So, they need to keep looking and keep cleaning it out because the bots are constantly behaving in certain ways so that they can appear to be a desirable audience segment so that they can keep marketer’s money flowing into the pockets of the bad guys.

Jacques: [00:42:49] What can marketers do to, to eliminate this kind of fraud and to protect their budgets.

Augustine Fou: [00:42:55] It’s going to be common sense because you know, things like, and people would just kind of shrug that. I would say, how is it possible? That’s common sense, right? These bots are sophisticated, whatever, but you should be able to see it in the data. So, you know, I’ll cite some almost comical examples over the years, right?

Where these data platform data sellers will say, Oh, well there’s 300 million auto intenders in the US. Right. Uh, but there’s only 300 million people in the US right. That number is just way out of whack to reality, because there’s some babies that are not going to buy your car. There’s some old people they’re not going to buy your car.

So, there can’t be 300 million auto intenders in the US as an example. So. If you just use your common sense to double check things or, or just look closely, look more closely and ask harder questions. That’s going to go a long way to solving, you know, a lot of the fraud already. Once you take care of this basic stuff, then you can go on to using more advanced technologies to help you find the remaining fraud.

But I would say in general, there’s been this kind of euphoria around buying more data buying more ad impressions, right? And a lot of these marketers have kind of taken their TV buying mentality of reach and frequency into digital channels. And they’re still asking for more reach and more frequency. You know, like more ads, how many more billions of ads can you buy for me an and how, how low of a price can you get me?

So, the agencies love to oblige. They’re happily doing that for you because in programmatic channels, there’s virtually limitless supply and that’s because it’s not tied to reality anymore. It’s not tied to humans going to websites anymore. They can generate unlimited quantities of impressions using bot traffic and fake sites.

So that’s how you can keep buying more and more quantity, seemingly without any, any limits or boundaries and the prices still go down. And so that’s why there’s been this kind of abnormal euphoria around buying more and more. I mean, I’ve used the term they’re buying digital ads as if they were shopping at Costco, right.

It’s like, it’s okay to, to buy toilet paper in bulk, but definitely don’t buy digital ads in bulk because you’re going to be getting crap.

Jacques: [00:45:08] We saw a great example of that with Uber who sued their ad agency.

Augustine Fou: [00:45:14] They did. Uh, so the first lawsuit was they sued their agency, but the agency was able to push back and say, well, you told us to keep buying this stuff, even though we told you it was complete crap, we could see it in the data. So that lawsuit didn’t go anywhere. So, Uber is now, engaging in a second lawsuit and they’re basically suing a hundred mobile exchanges for outright fraud. And they could actually name five of them, but 95 of those mobile exchanges are John Doe’s. They could not even tell who they were. And what had happened is these mobile exchanges were not only falsifying the placement reports. They were making it look like the ads ran on legitimate sites and apps when they didn’t.

And in some cases, these mobile exchanges were fabricating, uh, making up the placement reports entirely when no ads even ran. So that’s plain old mail fraud, wire fraud. It was not what the marketer thought they were buying and paid for. So that’s clear-cut fraud. It’s not that, Oh, you know, you were buying something that we thought was fraudulent or not.

So, in this case, I think they will, be able to move forward in that lawsuit. But unfortunately, a lot of those mobile exchanges don’t exist anymore. So, there’s nobody to even chase down and put in jail anymore. So that was the problem of, you know, when Uber just went on and said, just get us more app installs, you know, and then kind of the, to close that story.

Uh, when Kevin Frisch over at Uber, he’s the analytics guy when he looked at the data and when they paused 80% of their app install budgets, the app installs just continued. And that’s because humans actually wanted to install the Uber app anyway. We call those organic installs. So, humans were installing that, but what the fraudsters were doing is they were doing a click flooding or click injection where they’re basically claiming credit for that install.

They were saying that it was due to a paid marketing campaign, as opposed to the human just voluntarily installing the app itself. And so, when they claim credit for that, they were tricking the attribution systems into saying, oh, well, there’s all these installs that happened because of this paid marketing program.

Therefore, we need to pay all these app install fees, right. They call it a cost per install to the, these mobile exchanges for helping us do that when none of it actually occurred or it wasn’t because of the ads running and humans clicking on it. So, these are well-documented examples where the marketer found out far too late.

And now their only recourse is to sue or try to get their money back. But I’ll say, you know, once the money’s gone and in the pockets of the bad guys, you’re never getting it back. So, it’ll be a much better practice to reduce the fraud or buy more cleanly to begin with. So, your money doesn’t flow into the pockets of the bad guys in the first place, because you’re never getting it back afterwards.

Jacques: [00:48:07] And when you say bad guys, I guess it probably is very much organized crime in this case.

Augustine Fou: [00:48:12] Well, there’s a lot of bad guys. Um, but yeah, I’ve been asked that question quite a bit as well. So, who are the bad guys? How many of them are there? Um, there’s layers, if you will, right. Think about layer cake. There’s only a small number of hackers that are necessary because these hackers maintain vast botnets and they’re basically renting time on their botnets. You couldn’t possibly use all the traffic that their botnets could generate. So, traffic sellers, or sites that want traffic, are just renting time, kind of like timeshare on this botnet. So, then you say, okay, we need exactly 10 million page views to our site. Here’s how much we’re going to pay you.

So, there’s a small number of hackers that have botnets that are ready to go. So, they can generate any number of impressions or a quantity of traffic that you want to buy. Then there’s a whole layer of middlemen traffic, sellers, and resellers. They’re basically buying this and marking it up.

They’re just being entrepreneurs. They’re buying low and selling high. That’s all it is arbitrage. And then there’s the exchanges. So, the smaller exchanges will have no name sites that again, have no humans going there. So, the only way they can increase their revenue is by using bot traffic. So, they do.

And so, these smaller exchanges generate a bunch of impressions, then they sell it to medium sized exchanges that then sell it to large exchanges. So again, you know, you said organized crime. Yes. The money is so large that organized crime is getting involved in it because they can own an exchange. Right?

They can own some of the pipes. They can be multiple layers in layer cake. Right. I call them vertically integrated bad guys. So, they might as well extract money from every layer, every step in that supply chain. But yes, the dollars are so large. Like we said, at the beginning of this podcast, it’s $150 billion here in the US and 350 billion worldwide year after year.

So, this enormous pot of gold gets refilled every year by marketers and the bad guys are over overjoyed in stealing from it because it’s so easy. They can do it with code. They can do it from the comfort of their homes. They never have to leave their home. It’s not like they have to pick up a gun and shoot someone or risk their own lives doing it, like robbing a bank.

They can just sit here and use software. So, it’s the most lucrative industry for, for criminals and it’s the easiest to commit.

Jacques: [00:50:34] So you own, or you build a product called Fou analytics can you tell me a little bit about that?

Augustine Fou: [00:50:40] Yeah, so it’s called analytics even though it started out as fraud detection because, you know, in 2012, 2013, when I started building the technology, well, the reason I built it is because I couldn’t trust data from anyone else. And so, think about that Uber example, they just falsified the records.

So even if they gave me log files, I could not rely on it. Because it could have been tampered with. So, when we started building the technology for the first eight years it was basically a dashboard that I logged into and then I screenshotted from it and then showed my clients these recommendations.

I kind of delivered it via PowerPoint, but this year I’m opening up the platform to others. So just like you log into Google analytics, marketers, publishers, agencies can now log into Fou Analytics and see for themselves. So, it solves one of the key problems with fraud detection because the fraud detection tech companies, again are black box.

So, they don’t tell you how they measured it. They just give you a number it’s like 10% bots or 10% IVT, but they can’t explain it. They won’t explain it. And therefore, the buyers of those technologies are kind of still left with, okay, well, how did they get that number? So, the point of Fou Analytics is so that the marketers can log into a dashboard. They can see how much bot activity there is, and then they can also then isolate, okay, well, this is the dark red, where’s it coming from? And if they can see where it’s coming from usually it’s like a domain or a bunch of apps that are committing the fraud. They can take the next step, which is turn off those domains and apps in the media buy.

And so again, that solves the black box issue because they can now see and understand why it’s being marked as fraudulent, and then they can take action. So, it’s actionable as well. So, you know, like when you just use the other fraud detection, tech companies, and they tell you it’s 10%, what can you do?

The only thing a marketer can do is ask for 10% back a refund after the campaign is over. But in this case, when you can, every week, every month, or however frequently you want, if you can keep turning off the domains and apps that are eating up your ad budget, fraudulently, progressingly cleaning your campaign while it’s still running.

So that way your campaign can actually run more cleanly before it’s over and you know, marketers are afraid, Oh, then do we have to turn off thousands and thousands of sites and apps? No, you really only have to focus on the top 10, most egregious ones. So, if it’s an app that’s eating up only 50 impressions, it really doesn’t matter in the greater scheme.

But if this flashlight app is eating up 5% or 10% of all of your impressions, that’s significant. So, you definitely want to turn those off first. So, part of what I do is I train the marketers on how to look at the analytics themselves. So that they can actually be part of the solution.

They can do it themselves. They don’t need me. They, it’s a tool set. It’s a, it’s an analytics platform for them to see where the fraud is coming from. Understand why it’s fraud and therefore take action to mitigate it and progressively clean their campaigns.

Jacques: [00:53:36] I guess what I’m thinking is with this kind of tool, when clearly you can identify domains that affect multiple advertisers, is there a view to, uh, to make a, a global block list?

Augustine Fou: [00:53:49] Yes. I’ve been asked that a lot as well. So yes, that would be useful. And yes, there are some cheaters that stick around for a long time, but more often than not. You’ll see some of these criminal organizations, if one company has caught, they’re going to shut it down and start up another company, you literally see criminals starting up different company names so that they can continue the same fraud. The techniques are proven. So, they’ll keep doing the same techniques to make money off of fraud, but the companies will change. And very similarly, the domains and apps also change. Because for example, when you see White Ops, they caught MethBot and some of these large botnets and they published 600,000 IP addresses that were being used by MethBot in 2016, within six hours those IP addresses were already swapped out. And the bad guys went hunting using other IP addresses so they could continue the fraud. So that was a long way of saying that, you know, even if we have a global blacklist of domains, it’s not going to necessarily stave off the new domains that come online.

So, the bad guys are constantly rotating through domains. So, the domains that we had blocked, say three months ago, are not necessarily the same domains that we need to block now. So that’s why it’s a continuous process of kind of what I call progressive cleaning. So, some of the bad guys are skillful enough that they do stick around for long periods of time.

And some of the domains that they use are kind of borderline questionable, right? So, some will say, Oh, well, there’s got to be some humans that go there. So, it’s plausible, but they’re basically inflating their revenue by also blending in bot traffic. So, in some of those cases, you may not want to turn off the entire domain.

You just want to note what portion of the traffic is, is fake or fraudulent. And sometimes it comes down to more of a, a judgment call on the part of the marketer. Do they want to keep buying from this domain when clearly, they’re doing some forms of cheating, right? It may not be a hundred percent fraudulent but they’re clearly engaging in some nefarious activities. So then in that sense even though we could have a global block list, uh, we don’t tend to do that and share it with all clients because ultimately for each client, depending on how they start their buys, there might be different domains and different apps that are negatively affecting their campaigns. So, but once we can see those in the first week or so, we can already turn them off.

Jacques: [00:56:09] Great. I’m aware of the time. So, I don’t want to keep you longer than I need to, but I was wondering if you had any kind of final thoughts or comments on what marketers can do to, uh, to combat this fraud and what they can do to get around it.

Augustine Fou: [00:56:23] I think for marketers, the most important thing is to understand that there is fraud and that there’s way more than their current fraud detection providers are telling them. And also, certainly way more than the ANA the Association of National Advertisers think there is. Cause they’re relying on these fraud detection companies that can’t see much of the fraud.

Okay, so that’s one thing. The second thing is you don’t have to have very advanced technologies to help you detect and mitigate it. I would say, look closely, look at your own analytics, use some common sense. Ask questions if things just don’t seem right to you. And some of those basic things will already allow the marketers say, okay, well something’s wrong here and let’s look into it. So, when they start asking a question, they’ll say, how are my ads showing up on all these sites? You know, either that I’ve blocked already, that I don’t want may not even be relevant. So beyond just fraud it, it’s kind of getting back to the basics of doing digital marketing.

Not even just trying to reduce the fraud by itself. I think, you know, we also talk about data cleanliness. Uh, you know, that the targeting is not as accurate as you think. So. You know, if you, if you get back to basic marketing, blocking and tackling, you’re already going to be doing a lot better digital marketing.

And especially in these times when budgets are tight, right due to the virus and everything now is a great chance for you to go back and say, okay, where’s the, where’s my unnecessary spending? These are things that we should cut quickly and cut a lot of cause that’s going to mean any dollar that’s saved goes straight to the bottom line.

So for some of these marketers, that’s going to significantly move the needle on their profitability because they’re not wasting as much money and, you know, we talked earlier buying as much direct as possible because then your dollars are not getting chopped up by and, and, you know, going into the pockets of the middleman, it’s actually going towards the publisher for the purpose of showing your ads.

So, I think those are just general recommendations for marketers.

Jacques: [00:58:20] Great. Thank you very much. I really appreciate you taking the time to speak with me. It’s been hugely insightful, certainly on my part and just, yeah, thank you very much.

Augustine Fou: [00:58:28] Thank you, Jacques, thank you, Jacques. So, thanks for all the great questions and I’ll see you soon. Thank you. Bye.