Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Anomaly Detection Techniques: A Broad Review, Study notes of Data Mining

A comprehensive review of anomaly detection techniques for numeric and symbolic data. The authors discuss various categories of anomaly detection techniques, including statistical, machine learning, and neural network-based methods. They also highlight the importance of understanding the nature of input data and the challenges in applying anomaly detection techniques in different domains such as intrusion detection, insider trading, and sensor networks. The document also mentions various applications of anomaly detection techniques in text data, image analysis, and astronomical data.

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

jugnu900
jugnu900 🇺🇸

4.4

(7)

236 documents

1 / 72

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A modified version of this technical report will appear in ACM Computing Surveys, September 2009.
Anomaly Detection : A Survey
VARUN CHANDOLA
University of Minnesota
ARINDAM BANERJEE
University of Minnesota
and
VIPIN KUMAR
University of Minnesota
Anomaly detection is an important problem that has been researched within diverse research areas
and application domains. Many anomaly detection techniques have been specifically developed
for certain application domains, while others are more generic. This survey tries to provide a
structured and comprehensive overview of the research on anomaly detection. We have grouped
existing techniques into different categories based on the underlying approach adopted by each
technique. For each category we have identified key assumptions, which are used by the techniques
to differentiate between normal and anomalous behavior. When applying a given technique to a
particular domain, these assumptions can be used as guidelines to assess the effectiveness of the
technique in that domain. For each category, we provide a basic anomaly detection technique, and
then show how the different existing techniques in that category are variants of the basic tech-
nique. This template provides an easier and succinct understanding of the techniques belonging
to each category. Further, for each category, we identify the advantages and disadvantages of the
techniques in that category. We also provide a discussion on the computational complexity of the
techniques since it is an important issue in real application domains. We hope that this survey
will provide a better understanding of the different directions in which research has been done on
this topic, and how techniques developed in one area can be applied in domains for which they
were not intended to begin with.
Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications—
Data Mining
General Terms: Algorithms
Additional Key Words and Phrases: Anomaly Detection, Outlier Detection
1. INTRODUCTION
Anomaly detection refers to the problem of finding patterns in data that do not
conform to expected behavior. These non-conforming patterns are often referred to
as anomalies, outliers, discordant observations, exceptions, aberrations, surprises,
peculiarities or contaminants in different application domains. Of these, anomalies
and outliers are two terms used most commonly in the context of anomaly detection;
sometimes interchangeably. Anomaly detection finds extensive use in a wide variety
of applications such as fraud detection for credit cards, insurance or health care,
intrusion detection for cyber-security, fault detection in safety critical systems, and
military surveillance for enemy activities.
The importance of anomaly detection is due to the fact that anomalies in data
translate to significant (and often critical) actionable information in a wide variety
of application domains. For example, an anomalous traffic pattern in a computer
To Appear in ACM Computing Surveys, 09 2009, Pages 1–72.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48

Partial preview of the text

Download Anomaly Detection Techniques: A Broad Review and more Study notes Data Mining in PDF only on Docsity!

A modified version of this technical report will appear in ACM Computing Surveys, September 2009.

Anomaly Detection : A Survey

VARUN CHANDOLA

University of Minnesota

ARINDAM BANERJEE

University of Minnesota

and

VIPIN KUMAR

University of Minnesota

Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic tech- nique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications— Data Mining General Terms: Algorithms

Additional Key Words and Phrases: Anomaly Detection, Outlier Detection

1. INTRODUCTION

Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. These non-conforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in different application domains. Of these, anomalies and outliers are two terms used most commonly in the context of anomaly detection; sometimes interchangeably. Anomaly detection finds extensive use in a wide variety of applications such as fraud detection for credit cards, insurance or health care, intrusion detection for cyber-security, fault detection in safety critical systems, and military surveillance for enemy activities. The importance of anomaly detection is due to the fact that anomalies in data translate to significant (and often critical) actionable information in a wide variety of application domains. For example, an anomalous traffic pattern in a computer

To Appear in ACM Computing Surveys, 09 2009, Pages 1–72.

2 · Chandola, Banerjee and Kumar

network could mean that a hacked computer is sending out sensitive data to an unauthorized destination [Kumar 2005]. An anomalous MRI image may indicate presence of malignant tumors [Spence et al. 2001]. Anomalies in credit card trans- action data could indicate credit card or identity theft [Aleskerov et al. 1997] or anomalous readings from a space craft sensor could signify a fault in some compo- nent of the space craft [Fujimaki et al. 2005]. Detecting outliers or anomalies in data has been studied in the statistics commu- nity as early as the 19th^ century [Edgeworth 1887]. Over time, a variety of anomaly detection techniques have been developed in several research communities. Many of these techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We hope that it facilitates a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

1.1 What are anomalies?

Anomalies are patterns in data that do not conform to a well defined notion of normal behavior. Figure 1 illustrates anomalies in a simple 2-dimensional data set. The data has two normal regions, N 1 and N 2 , since most observations lie in these two regions. Points that are sufficiently far away from the regions, e.g., points o 1 and o 2 , and points in region O 3 , are anomalies.

x

y

N 1

N 2

o 1

o 2

O 3

Fig. 1. A simple example of anomalies in a 2-dimensional data set.

Anomalies might be induced in the data for a variety of reasons, such as malicious activity, e.g., credit card fraud, cyber-intrusion, terrorist activity or breakdown of a system, but all of the reasons have a common characteristic that they are interesting to the analyst. The “interestingness” or real life relevance of anomalies is a key feature of anomaly detection. Anomaly detection is related to, but distinct from noise removal [Teng et al. 1990] and noise accommodation [Rousseeuw and Leroy 1987], both of which deal

4 · Chandola, Banerjee and Kumar

which the anomalies need to be detected. Researchers have adopted concepts from diverse disciplines such as statistics, machine learning, data mining, information theory, spectral theory, and have applied them to specific problem formulations. Figure 2 shows the above mentioned key components associated with any anomaly detection technique.

Anomaly Detection Technique

Application Domains

Medical Informatics

Intrusion Detection

...

Fault/Damage Detection

Fraud Detection

Research Areas

Information Theory

Machine Learning

Spectral Theory

Statistics

Data Mining

...

Problem Characteristics

Nature of Data Labels^ Anomaly Type Output

Fig. 2. Key components associated with an anomaly detection technique.

1.3 Related Work

Anomaly detection has been the topic of a number of surveys and review articles, as well as books. Hodge and Austin [2004] provide an extensive survey of anomaly detection techniques developed in machine learning and statistical domains. A broad review of anomaly detection techniques for numeric as well as symbolic data is presented by Agyemang et al. [2006]. An extensive review of novelty detection techniques using neural networks and statistical approaches has been presented in Markou and Singh [2003a] and Markou and Singh [2003b], respectively. Patcha and Park [2007] and Snyder [2001] present a survey of anomaly detection techniques

Anomaly Detection : A Survey · 5

used specifically for cyber-intrusion detection. A substantial amount of research on outlier detection has been done in statistics and has been reviewed in several books [Rousseeuw and Leroy 1987; Barnett and Lewis 1994; Hawkins 1980] as well as other survey articles [Beckman and Cook 1983; Bakar et al. 2006]. Table I shows the set of techniques and application domains covered by our survey and the various related survey articles mentioned above.

1 2 3 4 5 6 7 8

Techniques

Classification Based √ √ √ √ √ Clustering Based √ √ √ √ Nearest Neighbor Based √ √ √ √ √ Statistical √ √ √ √ √ √ √ Information Theoretic √ Spectral √

Applications

Cyber-Intrusion Detection √ √ Fraud Detection √ Medical Anomaly Detection √ Industrial Damage Detection √ Image Processing √ Textual Anomaly Detection √ Sensor Networks √

Table I. Comparison of our survey to other related survey articles.1 - Our survey 2 - Hodge and Austin [2004], 3 - Agyemang et al. [2006], 4 - Markou and Singh [2003a], 5 - Markou and Singh [2003b], 6 - Patcha and Park [2007], 7 - Beckman and Cook [1983], 8 - Bakar et al [2006]

1.4 Our Contributions

This survey is an attempt to provide a structured and a broad overview of extensive research on anomaly detection techniques spanning multiple research areas and application domains. Most of the existing surveys on anomaly detection either focus on a particular application domain or on a single research area. [Agyemang et al. 2006] and [Hodge and Austin 2004] are two related works that group anomaly detection into multiple categories and discuss techniques under each category. This survey builds upon these two works by significantly expanding the discussion in several directions. We add two more categories of anomaly detection techniques, viz., information theoretic and spectral techniques, to the four categories discussed in [Agyemang et al. 2006] and [Hodge and Austin 2004]. For each of the six categories, we not only discuss the techniques, but also identify unique assumptions regarding the nature of anomalies made by the techniques in that category. These assumptions are critical for determining when the techniques in that category would be able to detect anomalies, and when they would fail. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains.

Anomaly Detection : A Survey · 7

the case of multivariate data instances, all attributes might be of same type or might be a mixture of different data types. The nature of attributes determine the applicability of anomaly detection tech- niques. For example, for statistical techniques different statistical models have to be used for continuous and categorical data. Similarly, for nearest neighbor based techniques, the nature of attributes would determine the distance measure to be used. Often, instead of the actual data, the pairwise distance between instances might be provided in the form of a distance (or similarity) matrix. In such cases, techniques that require original data instances are not applicable, e.g., many sta- tistical and classification based techniques. Input data can also be categorized based on the relationship present among data instances [Tan et al. 2005]. Most of the existing anomaly detection techniques deal with record data (or point data), in which no relationship is assumed among the data instances. In general, data instances can be related to each other. Some examples are sequence data, spatial data, and graph data. In sequence data, the data instances are linearly ordered, e.g., time-series data, genome sequences, protein sequences. In spatial data, each data instance is related to its neighboring instances, e.g., vehicular traffic data, ecological data. When the spatial data has a temporal (sequential) component it is referred to as spatio-temporal data, e.g., climate data. In graph data, data instances are represented as vertices in a graph and are connected to other vertices with edges. Later in this section we will discuss situations where such relationship among data instances become relevant for anomaly detection.

2.2 Type of Anomaly

An important aspect of an anomaly detection technique is the nature of the desired anomaly. Anomalies can be classified into following three categories:

2.2.1 Point Anomalies. If an individual data instance can be considered as anomalous with respect to the rest of data, then the instance is termed as a point anomaly. This is the simplest type of anomaly and is the focus of majority of research on anomaly detection. For example, in Figure 1, points o 1 and o 2 as well as points in region O 3 lie outside the boundary of the normal regions, and hence are point anomalies since they are different from normal data points. As a real life example, consider credit card fraud detection. Let the data set correspond to an individual’s credit card transactions. For the sake of simplicity, let us assume that the data is defined using only one feature: amount spent. A transaction for which the amount spent is very high compared to the normal range of expenditure for that person will be a point anomaly.

2.2.2 Contextual Anomalies. If a data instance is anomalous in a specific con- text (but not otherwise), then it is termed as a contextual anomaly (also referred to as conditional anomaly [Song et al. 2007]). The notion of a context is induced by the structure in the data set and has to be specified as a part of the problem formulation. Each data instance is defined using following two sets of attributes:

8 · Chandola, Banerjee and Kumar

(1) Contextual attributes. The contextual attributes are used to determine the context (or neighborhood) for that instance. For example, in spatial data sets, the longitude and latitude of a location are the contextual attributes. In time- series data, time is a contextual attribute which determines the position of an instance on the entire sequence.

(2) Behavioral attributes. The behavioral attributes define the non-contextual char- acteristics of an instance. For example, in a spatial data set describing the average rainfall of the entire world, the amount of rainfall at any location is a behavioral attribute.

The anomalous behavior is determined using the values for the behavioral attributes within a specific context. A data instance might be a contextual anomaly in a given context, but an identical data instance (in terms of behavioral attributes) could be considered normal in a different context. This property is key in identifying contextual and behavioral attributes for a contextual anomaly detection technique.

Monthly Temp

Time

Mar Jun^ Sept^ Dec Mar Jun Sept Dec Mar Jun Sept Dec

t 1 t 2

Fig. 3. Contextual anomaly t 2 in a temperature time series. Note that the temperature at time t 1 is same as that at time t 2 but occurs in a different context and hence is not considered as an anomaly.

Contextual anomalies have been most commonly explored in time-series data [Weigend et al. 1995; Salvador and Chan 2003] and spatial data [Kou et al. 2006; Shekhar et al. 2001]. Figure 3 shows one such example for a temperature time series which shows the monthly temperature of an area over last few years. A temperature of 35F might be normal during the winter (at time t 1 ) at that place, but the same value during summer (at time t 2 ) would be an anomaly. A similar example can be found in the credit card fraud detection domain. A contextual attribute in credit card domain can be the time of purchase. Suppose an individual usually has a weekly shopping bill of $100 except during the Christmas week, when it reaches $1000. A new purchase of $1000 in a week in July will be considered a contextual anomaly, since it does not conform to the normal behavior of the individual in the context of time (even though the same amount spent during Christmas week will be considered normal). The choice of applying a contextual anomaly detection technique is determined by the meaningfulness of the contextual anomalies in the target application domain.

10 · Chandola, Banerjee and Kumar

It should be noted that while point anomalies can occur in any data set, collective anomalies can occur only in data sets in which data instances are related. In contrast, occurrence of contextual anomalies depends on the availability of context attributes in the data. A point anomaly or a collective anomaly can also be a contextual anomaly if analyzed with respect to a context. Thus a point anomaly detection problem or collective anomaly detection problem can be transformed to a contextual anomaly detection problem by incorporating the context information.

2.3 Data Labels

The labels associated with a data instance denote if that instance is normal or anomalous^1. It should be noted that obtaining labeled data which is accurate as well as representative of all types of behaviors, is often prohibitively expensive. Labeling is often done manually by a human expert and hence requires substantial effort to obtain the labeled training data set. Typically, getting a labeled set of anomalous data instances which cover all possible type of anomalous behavior is more difficult than getting labels for normal behavior. Moreover, the anomalous behavior is often dynamic in nature, e.g., new types of anomalies might arise, for which there is no labeled training data. In certain cases, such as air traffic safety, anomalous instances would translate to catastrophic events, and hence will be very rare. Based on the extent to which the labels are available, anomaly detection tech- niques can operate in one of the following three modes:

2.3.1 Supervised anomaly detection. Techniques trained in supervised mode as- sume the availability of a training data set which has labeled instances for normal as well as anomaly class. Typical approach in such cases is to build a predictive model for normal vs. anomaly classes. Any unseen data instance is compared against the model to determine which class it belongs to. There are two major is- sues that arise in supervised anomaly detection. First, the anomalous instances are far fewer compared to the normal instances in the training data. Issues that arise due to imbalanced class distributions have been addressed in the data mining and machine learning literature [Joshi et al. 2001; 2002; Chawla et al. 2004; Phua et al. 2004; Weiss and Hirsh 1998; Vilalta and Ma 2002]. Second, obtaining accurate and representative labels, especially for the anomaly class is usually challenging. A number of techniques have been proposed that inject artificial anomalies in a normal data set to obtain a labeled training data set [Theiler and Cai 2003; Abe et al. 2006; Steinwart et al. 2005]. Other than these two issues, the supervised anomaly detection problem is similar to building predictive models. Hence we will not address this category of techniques in this survey.

2.3.2 Semi-Supervised anomaly detection. Techniques that operate in a semi- supervised mode, assume that the training data has labeled instances for only the normal class. Since they do not require labels for the anomaly class, they are more widely applicable than supervised techniques. For example, in space craft fault detection [Fujimaki et al. 2005], an anomaly scenario would signify an accident, which is not easy to model. The typical approach used in such techniques is to

(^1) Also referred to as normal and anomalous classes.

Anomaly Detection : A Survey · 11

build a model for the class corresponding to normal behavior, and use the model to identify anomalies in the test data. A limited set of anomaly detection techniques exist that assume availability of only the anomaly instances for training [Dasgupta and Nino 2000; Dasgupta and Majumdar 2002; Forrest et al. 1996]. Such techniques are not commonly used, primarily because it is difficult to obtain a training data set which covers every possible anomalous behavior that can occur in the data.

2.3.3 Unsupervised anomaly detection. Techniques that operate in unsupervised mode do not require training data, and thus are most widely applicable. The techniques in this category make the implicit assumption that normal instances are far more frequent than anomalies in the test data. If this assumption is not true then such techniques suffer from high false alarm rate. Many semi-supervised techniques can be adapted to operate in an unsupervised mode by using a sample of the unlabeled data set as training data. Such adaptation assumes that the test data contains very few anomalies and the model learnt during training is robust to these few anomalies.

2.4 Output of Anomaly Detection

An important aspect for any anomaly detection technique is the manner in which the anomalies are reported. Typically, the outputs produced by anomaly detection techniques are one of the following two types:

2.4.1 Scores. Scoring techniques assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly. Thus the output of such techniques is a ranked list of anomalies. An analyst may choose to either analyze top few anomalies or use a cut-off threshold to select the anomalies.

2.4.2 Labels. Techniques in this category assign a label (normal or anomalous) to each test instance. Scoring based anomaly detection techniques allow the analyst to use a domain- specific threshold to select the most relevant anomalies. Techniques that provide binary labels to the test instances do not directly allow the analysts to make such a choice, though this can be controlled indirectly through parameter choices within each technique.

3. APPLICATIONS OF ANOMALY DETECTION

In this section we discuss several applications of anomaly detection. For each ap- plication domain we discuss the following four aspects:

—The notion of anomaly.

—Nature of the data.

—Challenges associated with detecting anomalies.

—Existing anomaly detection techniques.

Anomaly Detection : A Survey · 13

Technique Used Section References Statistical Profiling using Histograms

Section 7.2.1 Forrest et al [1996; 2004; 1996; 1994; 1999],Hofmeyr et al. [1998] Kosoresow and Hofmeyr [1997] Jagadish et al. [1999] Cabrera et al. [2001] Gonzalez and Dasgupta [2003] Das- gupta et al [2000; 2002] Ghosh et al [1999a; 1998; 1999b] Debar et al. [1998] Eskin et al. [2001] Marceau [2000] Endler [1998] Lane et al [1999; 1997b; 1997a] Mixture of Models Section 7.1.3 Eskin [2000] Neural Networks Section 4.1 Ghosh et al. [1998] Support Vector Ma- chines

Section 4.3 Hu et al. [2003] Heller et al. [2003]

Rule-based Systems Section 4.4 Lee et al[1997; 1998; 2000]

Table II. Examples of anomaly detection techniques used for host based intrusion detection.

Nino [2000]. Some anomaly detection techniques used in this domain are shown in Table II.

3.1.2 Network Intrusion Detection Systems. These systems deal with detecting intrusions in network data. The intrusions typically occur as anomalous patterns (point anomalies) though certain techniques model the data in a sequential fashion and detect anomalous subsequences (collective anomalies) [Gwadera et al. 2005b; 2004]. The primary reason for these anomalies is due to the attacks launched by outside hackers who want to gain unauthorized access to the network for information theft or to disrupt the network. A typical setting is a large network of computers which is connected to the rest of the world via the Internet. The data available for intrusion detection systems can be at different levels of granularity, e.g., packet level traces, CISCO net-flows data, etc. The data has a temporal aspect associated with it but most of the techniques typically do not handle the sequential aspect explicitly. The data is high dimensional typically with a mix of categorical as well as continuous attributes. A challenge faced by anomaly detection techniques in this domain is that the nature of anomalies keeps changing over time as the intruders adapt their network attacks to evade the existing intrusion detection solutions. Some anomaly detection techniques used in this domain are shown in Table III.

3.2 Fraud Detection

Fraud detection refers to detection of criminal activities occurring in commercial organizations such as banks, credit card companies, insurance agencies, cell phone companies, stock market, etc. The malicious users might be the actual customers of the organization or might be posing as a customer (also known as identity theft). The fraud occurs when these users consume the resources provided by the orga- nization in an unauthorized way. The organizations are interested in immediate detection of such frauds to prevent economic losses. Fawcett and Provost [1999] introduce the term activity monitoring as a general approach to fraud detection in these domains. The typical approach of anomaly

14 · Chandola, Banerjee and Kumar

Technique Used Section References Statistical Profiling using Histograms

Section 7.2.1 NIDES [Anderson et al. 1994; Anderson et al. 1995; Javitz and Valdes 1991], EMERALD [Porras and Neumann 1997], Yamanishi et al [2001; 2004], Ho et al. [1999], Kruegel at al [2002; 2003], Mahoney et al [2002; 2003; 2003; 2007], Sargor [1998] Parametric Statisti- cal Modeling

Section 7.1 Gwadera et al [2005b; 2004], Ye and Chen [2001]

Non-parametric Sta- tistical Modeling

Section 7.2.2 Chow and Yeung [2002]

Bayesian Networks Section 4.2 Siaterlis and Maglaris [2004], Sebyala et al. [2002], Valdes and Skinner [2000], Bronstein et al. [2001] Neural Networks Section 4.1 HIDE [Zhang et al. 2001], NSOM [Labib and Ve- muri 2002], Smith et al. [2002], Hawkins et al. [2002], Kruegel et al. [2003], Manikopoulos and Pa- pavassiliou [2002], Ramadas et al. [2003] Support Vector Ma- chines

Section 4.3 Eskin et al. [2002]

Rule-based Systems Section 4.4 ADAM [Barbara et al. 2001a; Barbara et al. 2003; Barbara et al. 2001b], Fan et al. [2001], Helmer et al. [1998], Qin and Hwang [2004], Salvador and Chan [2003], Otey et al. [2003] Clustering Based Section 6 ADMIT [Sequeira and Zaki 2002], Eskin et al. [2002], Wu and Zhang [2003], Otey et al. [2003] Nearest Neighbor based

Section 5 MINDS [Ertoz et al. 2004; Chandola et al. 2006], Eskin et al. [2002] Spectral Section 9 Shyu et al. [2003], Lakhina et al. [2005], Thottan and Ji [2003],Sun et al. [2007] Information Theo- retic

Section 8 Lee and Xiang [2001],Noble and Cook [2003]

Table III. Examples of anomaly detection techniques used for network intrusion detection.

Technique Used Section References Neural Networks Section 4.1 CARDWATCH [Aleskerov et al. 1997], Ghosh and Reilly [1994],Brause et al. [1999],Dorronsoro et al. [1997] Rule-based Systems Section 4.4 Brause et al. [1999] Clustering Section 6 Bolton and Hand [1999]

Table IV. Examples of anomaly detection techniques used for credit card fraud detection.

detection techniques is to maintain a usage profile for each customer and monitor the profiles to detect any deviations. Some of the specific applications of fraud detection are discussed below.

3.2.1 Credit Card Fraud Detection. In this domain, anomaly detection tech- niques are applied to detect fraudulent credit card applications or fraudulent credit card usage (associated with credit card thefts). Detecting fraudulent credit card applications is similar to detecting insurance fraud [Ghosh and Reilly 1994].

16 · Chandola, Banerjee and Kumar

processing system for unauthorized and illegal claims. Detection of such fraud has been very important for the associated companies to avoid financial losses. The available data in this domain are the documents submitted by the claimants. The techniques extract different features (both categorical as well as continuous) from these documents. Typically, claim adjusters and investigators assess these claims for frauds. These manually investigated cases are used as labeled instances by supervised and semi-supervised techniques for insurance fraud detection. Insurance claim fraud detection is quite often handled as a generic activity mon- itoring problem [Fawcett and Provost 1999]. Neural network based techniques have also been applied to identify anomalous insurance claims [He et al. 2003; Brockett et al. 1998].

3.2.4 Insider Trading Detection. Another recent application of anomaly detec- tion techniques has been in early detection of Insider Trading. Insider trading is a phenomenon found in stock markets, where people make illegal profits by acting on (or leaking) inside information before the information is made public. The inside information can be of different forms [Donoho 2004]. It could refer to the knowledge of a pending merger/acquisition, a terrorist attack affecting a particular industry, a pending legislation affecting a particular industry or any information which would affect the stock prices in a particular industry. Insider trading can be detected by identifying anomalous trading activities in the market. The available data is from several heterogenous sources such as option trading data, stock trading data, news. The data has temporal associations since the data is collected continuously. The temporal and streaming nature has also been exploited in certain techniques [Aggarwal 2005]. Anomaly detection techniques in this domain are required to detect fraud in an online manner and as early as possible, to prevent people/organizations from making illegal profits. Some anomaly detection techniques used in this domain are listed in Table VI.

Technique Used Section References Statistical Profiling using Histograms

Section 7.2.1 Donoho [2004],Aggarwal [2005]

Information Theo- retic

Section 8 Arning et al. [1996]

Table VI. Examples of different anomaly detection techniques used for insider trading detection.

3.3 Medical and Public Health Anomaly Detection

Anomaly detection in the medical and public health domains typically work with pa- tient records. The data can have anomalies due to several reasons such as abnormal patient condition or instrumentation errors or recording errors. Several techniques have also focussed on detecting disease outbreaks in a specific area [Wong et al. 2003]. Thus the anomaly detection is a very critical problem in this domain and requires high degree of accuracy. The data typically consists of records which may have several different types of features such as patient age, blood group, weight. The data might also have

Anomaly Detection : A Survey · 17

Technique Used Section References Parametric Statisti- cal Modeling

Section 7.1 Horn et al. [2001],Laurikkala et al. [2000],Solberg and Lahti [2005],Roberts [2002],Suzuki et al. [2003] Neural Networks Section 4.1 Campbell and Bennett [2001] Bayesian Networks Section 4.2 Wong et al. [2003] Rule-based Systems Section 4.4 Aggarwal [2005] Nearest Neighbor based Techniques

Section 5 Lin et al. [2005]

Table VII. Examples of different anomaly detection techniques used in medical and public health domain.

temporal as well as spatial aspect to it. Most of the current anomaly detection techniques in this domain aim at detecting anomalous records (point anomalies). Typically the labeled data belongs to the healthy patients, hence most of the tech- niques adopt semi-supervised approach. Another form of data handled by anomaly detection techniques in this domain is time series data, such as Electrocardiograms (ECG) (Figure 4) and Electroencephalograms (EEG). Collective anomaly detection techniques have been applied to detect anomalies in such data [Lin et al. 2005]. The most challenging aspect of the anomaly detection problem in this domain is that the cost of classifying an anomaly as normal can be very high. Some anomaly detection techniques used in this domain are listed in Table VII.

3.4 Industrial Damage Detection

Industrial units suffer damage due to continuous usage and the normal wear and tear. Such damages need to be detected early to prevent further escalation and losses. The data in this domain is usually referred to as sensor data because it is recorded using different sensors and collected for analysis. Anomaly detection techniques have been extensively applied in this domain to detect such damages. Industrial damage detection can be further classified into two domains, one which deals with defects in mechanical components such as motors, engines, etc., and the other which deals with defects in physical structures. The former domain is also referred to as system health management.

3.4.1 Fault Detection in Mechanical Units. The anomaly detection techniques in this domain monitor the performance of industrial components such as motors, turbines, oil flow in pipelines or other mechanical components and detect defects which might occur due to wear and tear or other unforseen circumstances. The data in this domain has typically a temporal aspect and time-series analysis is also used in some techniques [Keogh et al. 2002; Keogh et al. 2006; Basu and Meckesheimer 2007]. The anomalies occur mostly because of an observation in a specific context (contextual anomalies) or as an anomalous sequence of observations (collective anomalies). Typically, normal data (pertaining to components without defects) is readily available and hence semi-supervised techniques are applicable. Anomalies are re- quired to be detected in an online fashion as preventive measures are required to be taken as soon as an anomaly occurs. Some anomaly detection techniques used in this domain are listed in Table VIII.

Anomaly Detection : A Survey · 19

Technique Used Section References Mixture of Models Section 7.1.3 Byers and Raftery [1998],Spence et al. [2001],Tarassenko [1995] Regression Section 7.1.2 Chen et al. [2005], Torr and Murray [1993] Bayesian Networks Section 4.2 Diehl and Hampshire [2002] Support Vector Ma- chines

Section 4.3 Davy and Godsill [2002],Song et al. [2002]

Neural Networks Section 4.1 Augusteijn and Folkert [2002],Cun et al. [1990],Hazel [2000],Moya et al. [1993],Singh and Markou [2004] Clustering Section 6 Scarth et al. [1995] Nearest Neighbor based Techniques

Section 5 Pokrajac et al. [2007],Byers and Raftery [1998]

Table X. Examples of anomaly detection techniques used in image processing domain.

Technique Used Section References Mixture of Models Section 7.1.3 Baker et al. [1999] Statistical Profiling using Histograms

Section 7.2.1 Fawcett and Provost [1999]

Support Vector Ma- chines

Section 4.3 Manevitz and Yousef [2002]

Neural Networks Section 4.1 Manevitz and Yousef [2000] Clustering Based Section 6 Allan et al. [1998],Srivastava and Zane-Ulman [2005],Srivastava [2006]

Table XI. Examples of anomaly detection techniques used for anomalous topic detection in text data.

ous attributes such as color, lightness, texture, etc. The interesting anomalies are either anomalous points or regions in the images (point and contextual anomalies). One of the key challenges in this domain is the large size of the input. When dealing with video data, online anomaly detection techniques are required. Some anomaly detection techniques used in this domain are listed in Table X.

3.6 Anomaly Detection in Text Data

Anomaly detection techniques in this domain primarily detect novel topics or events or news stories in a collection of documents or news articles. The anomalies are caused due to a new interesting event or an anomalous topic. The data in this domain is typically high dimensional and very sparse. The data also has a temporal aspect since the documents are collected over time. A challenge for anomaly detection techniques in this domain is to handle the large variations in documents belonging to one category or topic. Some anomaly detection techniques used in this domain are listed in Table XI.

3.7 Sensor Networks

Sensor networks have lately become an important topic of research; more from the data analysis perspective, since the sensor data collected from various wireless sensors has several unique characteristics. Anomalies in data collected from a sensor

20 · Chandola, Banerjee and Kumar

Technique Used Section References Bayesian Networks Section 4.2 Janakiram et al. [2006] Rule-based Systems Section 4.4 Branch et al. [2006] Parametric Statisti- cal Modeling

Section 7.1 Phuong et al. [2006], Du et al. [2006]

Nearest Neighbor based Techniques

Section 5 Subramaniam et al. [2006], Kejia Zhang and Li [2007], Id´e et al. [2007] Spectral Section 9 Chatzigiannakis et al. [2006]

Table XII. Examples of anomaly detection techniques used for anomaly detection in sensor net- works.

network can either mean that one or more sensors are faulty, or they are detecting events (such as intrusions) that are interesting for analysts. Thus anomaly detection in sensor networks can capture sensor fault detection or intrusion detection or both. A single sensor network might comprise of sensors that collect different types of data, such as binary, discrete, continuous, audio, video, etc. The data is generated in a streaming mode. Often times the environment in which the various sensors are deployed, as well as the communication channel, induces noise and missing values in the collected data. Anomaly detection in sensor networks poses a set of unique challenges. The anomaly detection techniques are required to operate in an online approach. Due to severe resource constraints, the anomaly detection techniques need to be light- weight. Another challenge is that data is collected in a distributed fashion, and hence a distributed data mining approach is required to analyze the data [Chatzi- giannakis et al. 2006]. Moreover, the presence of noise in the data collected from the sensor makes anomaly detection more challenging, since it has to now distinguish between interesting anomalies and unwanted noise/missing values. Table XII lists some anomaly detection techniques used in this domain.

3.8 Other Domains

Anomaly detection has also been applied to several other domains such as speech recognition [Albrecht et al. 2000; Emamian et al. 2000], novelty detection in robot behavior [Crook and Hayes 2001; Crook et al. 2002; Marsland et al. 1999; 2000b; 2000a], traffic monitoring [Shekhar et al. 2001], click through protection [Ihler et al. 2006], detecting faults in web applications [Ide and Kashima 2004; Sun et al. 2005], detecting anomalies in biological data [Kadota et al. 2003; Sun et al. 2006; Gwadera et al. 2005a; MacDonald and Ghosh 2007; Tomlins et al. 2005; Tibshirani and Hastie 2007], detecting anomalies in census data [Lu et al. 2003], detecting associations among criminal activities [Lin and Brown 2003], detecting anomalies in Customer Relationship Management (CRM) data [He et al. 2004b], detecting anomalies in astronomical data [Dutta et al. 2007; Escalante 2005; Protopapas et al. 2006] and detecting ecosystem disturbances [Blender et al. 1997; Kou et al. 2006; Sun and Chawla 2004].

  1. CLASSIFICATION BASED ANOMALY DETECTION TECHNIQUES

Classification [Tan et al. 2005; Duda et al. 2000] is used to learn a model (classifier) from a set of labeled data instances (training) and then, classify a test instance into