Fraud detection and prevention, what is data science’s role?

What is meant by the word “fraud”?

Action contrary to truth and righteousness, which harms the people against whom it is committed – Royal Spanish Academy

In law, fraud is intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud – Wikipedia

All fraud definitions agree on the fact that fraud is a loss-benefit ratio. A harmed benefactor, individual or organization, believe that fraudsters commit fraud to obtain a profit in misappropriate manner.

Therefore, not to mention ethical considerations, the act of fraud involves deception and circumventing control mechanisms to obtain a profit or something that someone else does not want to give.

This operation has two fundamental consequences: first, a fraudster knows that they will obtain a benefit only if they are not discovered; therefore, they deeply hide their behavior.

Second, fraud is dynamic. In the field of organizations, fraud has a diversity of forms, actors, and mechanisms, so its detection and detective tasks are only the beginning. To protect organizations from losing value, an iterative and feedable process must be run.

Who is our corporate Sherlock Holmes?

Classic fraud detection methodologies are based on analysis carried out by experts fully acquainted with the business. A functional analyst will take the following steps to complete a detection process:

  1. Devise a suspicious case: the starting point of any fraud analysis is having the suspicion that unusual behavior is  generating illegal advantages for certain system users. It is the analyst’s task to identify suspicious cases based on their knowledge of a system’ weaknesses, user experience, previous history, intuition, among others.
  2. Find case-related information: once a potential suspicious case is identified, they collect as much information as possible for further analysis to decide if it constitutes fraud. To this end it is necessary to know the users involved, their behavior, what organizational mechanisms are attacked, and its consequences for the organization, among others.
  3. Analyzing whether it is fraud: this is the most time-consuming task performed by a functional analyst. However, the objective of the task, unusual behavior, does not necessarily mean it is a fraudulent activity against an organization, unless there is sufficient evidence.
  4. Take action: once fraud is found, all the users involved must be identified, as well as their potential networks of fraudsters. As a result, two types of actions are taken:
    • Corrective actions and measures:  to correct an organization’s loss to fraud, demanding damages to the users involved, and creating mechanisms to prevent future attempts to carry out this same type of fraud (alert systems, business rules, auditing the people related to the case of fraud, sanctions, and legal measures against perpetrators).
    • Preventive measures: after fraud detection, all an organization’s security mechanisms are reviewed. Fraudulent behavior reveals corporate weaknesses that must be corrected to avoid potential fraud, which may not necessarily be equal to that identified. Preventive actions help close corporate security gaps by increasing protection against fraudulent behavior in advance (analysis of criminal networks, strengthening internal control measures, development of irrigation matrices).

The main weakness of the following approach is step one, because it requires the analyst to imagine fraud situations. Although a business expert knows better than anyone the system’s weaknesses, quite often, imagining cases and finding out where to start, is a slow and frustrating process, due to the large amount of information and starting paths. Moreover, the time taken performing this task could be spent in step three of the process, where it is essential that analysts spend their time to create direct value to an organization.

New sources, new methods, how can my System escalate research methods?

 Thanks to the large amount of data organizations store, today we can work with a data-driven approach or based on statistical procedures, Machine Learning and Big Data techniques. These tools are used to create effective methods for detecting suspicious cases or unusual behavior, and to automate steps 1 and 2 of classic fraud analyses.

These methods seek to provide an analyst with a roadmap based on an organization’s data, which is like a library of “treasure maps”: it is no longer a matter of going through the analytical path in pursuit of a case of fraud but follow the indications of multiple cases of fraudulent behavior, even unthinkable by the researcher. This expands an analyst’s chances to detect fraud by targeting anormal behavior.

The main advantages of generating a data-driven fraud detection system are:

  • Accuracy: increases detection power by processing massive volumes of information, to spot fraud patterns that are not visible to the human eye. Focusing on subsets of anomaly detected data improves a researcher’s accuracy when analyzing fraud cases, because they show where more fraud than average has been committed .
  • Efficiency: the volume of organizational data to be analyzed requires processes’ automation in order to fully exploit their potential. In addition, fraud detection often has a time limit because the closer we are to the moment an action was committed, the lower the number of users which will be discussing the fraud, and the more likelihood there is to obtain compensation for the fraudster crime, without incurring in organizational expenses. Automatic fraud detection methods speed up processes, and additional real-time alert systems can be set up to prevent potential fraud.

However, it is important to note that as organizations evolve and controls are strengthened, fraudsters are compelled to find new ways to commit fraud without being exposed. Their behavior resembles very closely ordinary behavior, therefore, a comprehensive fraud detection system, based on statistical methods and digital tools provides the organization with the ability to adapt to the dynamism of fraud detection.

Who pulls the System’s strings of fraud detecting?

Referring back to the question of “Who’s the Sherlock Holmes?” The answer in a system’s case is teamwork. A task as dynamic as fraud detection demands a work methodology teaming the synergy of technical actors, data scientists, and functional experts. The new workflow involves the following steps and actors:

  1. Devising fraud detection mechanisms: the starting point is to choose the methods that best suit the nature of an organization; therefore, engineers and data scientists must work in conjunction with functional analysts, who are the business experts, the real detectives. The aim of this step is to assess potentials and limitations of several methods, suggest implementation options and devise a system’s scope according to a task.
  2. Finding information related to the case: in the field of Machine Learning (ML) and automate learning, a famous quote is “garbage is input, garbage is output”, i.e., the generated models are as good as the data we give them to learn. A team’s task in this case is to ensure the quality of data they are dealing with which is not solely an engineer’s task, but also a functional analyst’s and a data scientist’s, because they must define data’s functionality, with a view to the task.
  3. Modeling our data: developing and implementing ML models is always an iterative process, although the main players in this step come from the technical area, it is important to keep communication channels open to an organization’s functional area. The model evaluation’s results should not bear in mind only metrics’ performance: they must be interpretable to serve the ultimate purpose which is fraud detection.
  4. Checking if it is undoubtedly fraud: In this step, a functional analysis will determine if detected behaviors are fraud against an organization. Moreover, introducing advanced data visualization techniques to define fraudulent profiles, and social network analysis facilitate research and increase a system’s synergy.
  5. Take action: this time, it is an organization’s analysts and managers who will interact with the fraudster. Additionally, it is crucial to automatically add all confirmed fraud’s feedback information to the historical basis of fraud cases to increase a system’s knowledge base, and gain from experience.

Intelligent fraud detection: What is Machine Learning’s role?

The final question is to learn which statistical and Machine Learning methods automate this fraud detection.

The answer is in steps 1 and 3 of the working methodology explained in the previous section, i.e., the selection of techniques and models depends on the task and scope we want to assign to fraud detection, and on the data we have.  There is no one answer, nor a recommended approach. Consequently, I will theoretically raise the pros and cons of the three paths which have been most used to perform fraud analysis, in recent years.

1. Anomalies’ detection based on unsupervised learning techniques or descriptive data analytics

These methods involve finding behaviors that deviate from standard data. To this end, ordinary behavior must be identified, and apprehended, while records that do not adapt to these depicted profiles are highlighted.

These models generate groups or clusters of individuals based on defined variables, such as their access to clusters. It is pivotal to know each and every profile for later interpretation, and because they indicate which records they belong to according to their characteristics. However, even within their group, certain variables reveal they have abnormal behavior.

  • The main advantage of these approaches is that they are unsupervised methods, so there is no need to have background data of fraudulent or non-fraudulent behavior.
  • In turn, an interesting detail of these methods is that analysts can uncover unthinkable fraudulent behavior, meaning historically unknown to the organization.
  • Although not all out-of-the-rule behavior involves fraud, it is necessary to create these variable based models because they detect rare cases. For instance, imagine a person’s gender is uncertain: it may be irrelevant information in certain areas, but it may be extremely important in the case of certain types of fraud for the remuneration they have declared.

2. Fraud prediction monitored models based on historical data from previous and known fraud events

This case involves creating monitored data-driven models labelled “fraud” or “non-fraud” to detect potential fraud cases from their data patterns.

The model would differentiate, with some degree of statistical reliability, whether a new event is among known fraud cases.

  • The main advantage of this approach is that it triggers an automatic alarm system of pattern detection that fraudsters cannot cover or identify an event which has not been defined as a business rule. In other words, these systems detect complex patterns and multivariate relationships.
  • Not only do they foresee potential fraud, but also estimate the scope of past fraud historically committed in the organization.
  • The most prominent disadvantage of this approach is that it needs labelled data, i.e., known fraud data, therefore, it is useless to identify new types of fraud. It can detect variations of known frauds because it can recognize general patterns, but it does not apply to totally diverse types of fraud in the new event of fraud, which will eventually create a “fraud”/ “non-fraud” mark for prediction purposes.

3. Social media analysis

Fraud modeling through social media analysis is a visual tool to discover interconnections and association networks among individuals, companies, and different entities related to one or multiple fraudulent behaviors.

These models identify a network’s scope, how communications flow, and discover unthinkable patterns, by means of graphical and statistical analysis.

  • An important advantage is that network analysis is visually friendly: graphical tools bring an analysis closer to business users or managers, who can directly understand complex patterns. The most illustrative picture of the use of this method is shown in movie thrillers: it is the typical police or detective’s board, where a victim and suspects’ information are connected with a red piece of wool. Likewise, an analysis can be presented by showing analytics workflow, with tools we can understand, even if we are not familiar with them.
  • In recent years, another feature of these kinds of fraud analytic methods which has gained popularity, is that they discover networks of partners and fraudsters who are likely to commit more than one type of fraud. These approaches focus on people and their relationships, especially on their psychological manner with which they relate with other people involved. Consequently, from one manner of known fraud, these methods can potentially dismantle a complete network. Needless to say, this derives in an even higher direct value for an organization.
  • The disadvantage of this approach is that it only applies to networks, so it is important to know which entities and links you want to model in order to extract direct value. If we load all the connections of an organization, we will probably not know where to start the analysis, therefore it is advisable.

For the sake of the article’s brevity and approach, I decided not to go into details as to which algorithms are used in each model[1], I preferred to advocate for the importance of a systemic approach when it comes to fraud detection and analysis.

Final remarks

As a result, my final reflection remarks the complementarity of these approaches, the more techniques I can integrate into my Fraud Detection System, I can increase my chances of anticipation, and adapt not only to the characteristics of organizational fraud, but complement it with the analysis of fraudsters, while improving fraud historical cases or casuistic, which in turn, will provide feedback to other parts of the system.

Another feature to point out is the dynamic nature of the task. Undertaking fraud detection projects usually engages us in tasks similar to the famous cat and mouse games, that is, it is not only the detective who seeks dynamic methods of detection but also fraudsters seek new loop holes in the system, so the game starts again. Therefore, it is important to include audit and maintenance mechanisms in Fraud Detection Systems to monitor the validity of case studies to remain at the forefront of knowledge related to fraudulent behaviors in the sector.

In short, owing to the rise of intrinsically dynamic data science, changing and sometimes mutant because it is multidisciplinary, static business rule systems became systems which provided feedback, automatic but evolving methodologies of fraud detection. Companies would gain value if they thought of the task as a constantly evolving transversal system, by embarking on complementary projects or models to attack the phenomenon from multiple fronts. This would prevent and detect fraud, consequently reducing losses while protecting their organization.

Camila Palomeque
Data & Analytics Consultant

[1]For more information on implementation, I recommend reading “Fraud analytics using descriptive, predictive, and social network a guide to data science for fraud, – techniques” by Bart Baesen et al.