Fighting Abuse @Scale brings together engineers, data scientists, product managers, operations specialists, and others fighting fraud, spam, and abuse on the internet at massive scale. At this one-day conference, practitioners will hear presentations on state-of-the art technologies used to protect people on the internet and have the opportunity to discuss abuse fighting with like-minded people from around the industry.
The conference builds on the successful Spam Fighting @Scale conferences of May 2015 and November 2016, and will feature speakers from Facebook, LinkedIn, Airbnb, Microsoft, and others, talking about recent advances in stopping bad guys on the internet. In this iteration we will focus particularly on techniques and experiences with artificial intelligence and machine learning in abuse fighting.
The invitation-only event will feature invited talks, a happy hour, and a "birds-of-a-feather" lunch where attendees can join organized discussions around a particular subtopic of abuse fighting.
Fraudulent accounts are an issue most online services grapple with. Microsoft and our consumer services including Xbox Live, Outlook.com and Skype are no exception. The fight against fraudulent accounts requires a multi-layer approach, which includes a real time system that uses Machine Learning to adaptively detect fraudulent signups and prevent the accounts from being provisioned. This session will cover these different protection layers, how they all tie together, and best practices we learned while building these systems.
False news poses a serious risk to online communities, and to the citizen informedness necessary to healthy democracies. In this talk, we will cover how Facebook, in collaboration with partners in Journalism, addresses misinformation as an online ecosystem problem. We look at each link in the chain by which misinformation spreads, and apply a coordinated set of product quality improvements to reduce the prevalence of misinformation, and the profitability or social impact of false news, at every stage. In particular, machine learning acts as a critical force multiplier on top of the work of fact checking organizations: we work to accelerate the velocity, and increase the comprehensiveness of impact, of fact checking through a variety of machine learning techniques.
While criminal prosecutions are frequently seen as the ideal outcome to address cybercrime, for a variety of reasons (scale, timeliness, jurisdiction, prosecutorial priority, lack of statutory authority, evidentiary deficiencies, etc) only a subset of abusive actors can be identified, apprehended and brought to justice via criminal process. Thus, the vast majority of activity taken against Internet miscreants today is civil in nature. Both public and private entities employ a range of technical and legal means to protect their interests, both in limiting the damage posed by abusers and in disrupting or disincenting their activities. These include various kinds of filtering, blacklisting, deranking, server takedowns, domain name seizures, merchant account shutdown, and botnet takedowns (among others). However, all of these tools incur some costs and we cannot do everything all the time. Unfortunately, we lack a firm foundation for reasoning about the efficacy of these interventions — which truly disrupt abusive activity and which represent a minor nuisance for them? — and thus there is rarely much strategic thinking about how to use our capabilities in the most effective way. This talk will not settle this issue, but I hope to shed light on it. Through a set of empirical examples covering six years I will explore how different kinds of civil interventions impact the bottom line of various scams — at times in unintuitive ways — and I will suggest a framework for thinking about such interventions to maximize effectiveness.
At LinkedIn, we employ machine learning and statistical models to detect and take down fake accounts. It is important to measure the performance of models in order to understand blind spots, mitigate false positives and estimate impact. However, using the traditional textbook performance metrics - precision, recall, false positive rate, etc. - isn't always feasible in the real world. For example, we don't know what we haven't caught yet, false positives don't always complain (and can remain undetected), user reports are noisy, human review of accounts is not scalable, and some types of abuse are low volume but high damage. In this talk, we will discuss how we employ approximations, trade-offs and metric proxies to measure the success of our defenses.
Where are all these fake X’s coming from? Replace X with likes, follows, or tweets and you get the idea. In this talk we examine the problem of coordinated abusive behavior in Online Social Networks where actions are orchestrated by account automation services like collusion networks. We propose the idea of using honeypot accounts to infiltrate such services towards mapping their infrastructure and operational characteristics. This approach allows us to collect ground truth data that can be used to understand the size of the problem caused by such services, and then to effectively disrupt them. We also present two case studies of successful disruption on Facebook and Instagram.
The dream of the internet is to give a voice to every one - bringing diverse opinions and perspectives together - enriching conversations. Unfortunately conversations online often get taken over by a handful of people who shout and yell and actually silence others by simply being so loud and rude that no one wants to engage in the discussion. In this session, we’ll explore new ways in which we are empowering communities to protect the quality of their online conversations using ML. I’ll talk about the different aspects of the communities’ experience, as well as challenges to overcome in this space.
A large amount of spam on platforms such as Facebook contains links to bring users off of Facebook to attacker controlled websites. These websites are usually set up either to monetize through sales of low quality goods, or to expand the attacker's scope by distributing malware or phishing unsuspecting users. In this talk, we will describe how Facebook protects its users on the platform through classification of every URL posted on the site and the parts of the Web reachable through Facebook. In particular, we will look at how we classify different parts of a URL, how we can make use of behavioral signals in classification, and how we protect our users from costly false positives.
Like all online businesses, Airbnb faces fraudsters who attempt to use stolen credit cards. In this talk
I’ll walk through how we leverage machine learning, experimentation, and analytics to identify and block fraudsters while minimizing impact on the overwhelming majority of good users. First, I’ll introduce how we use machine-learning models to trigger frictions targeted at blocking fraudsters. Then, I’ll outline how we choose the model’s threshold by minimizing a loss function, and dive into each term in the loss function: the costs of false positives, false negatives, and true positives. Finally, I’ll walk through a numerical example comparing the optimization of blocking transactions versus applying a friction and discuss some extensions to the optimization framework.
Judging Ads and Posts by their Neighbours: Label Propagation through Semantic Image Clustering with Billions of Images
Coming up with new ideas for effective advertisements is hard work - it's much easier to just generate variants on some basic themes. The same is true for bad ads, and so we built a classifier that finds similar looking, sounding, and reading ad content and assume that the same review decision applies. With the advent of semantically meaningful embeddings for the various media types used for advertisements we can now measure the distance between existing items and new items. We've found this approach very effective and in this talk I will explain how we built it and how we scaled it to handle the billions of images we process for ads on Facebook.