Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ISYE6501 glossary in alphabetical, Study notes of Data Mining

ISYE6501 glossary in alphabetical

Typology: Study notes

2024/2025

Uploaded on 04/17/2025

momo_momo
momo_momo 🇺🇸

2 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms
have other definitions beyond the scope of this course. Many of these terms have precise
mathematical definitions not included here (or even glossed over here), because they are beyond
the scope of the course.
1
GLOSSARY FOR ISYE 6501 INTRODUCTION TO ANALYTICS MODELING
(Organized alphabetically; for topic-by-topic glossary, see other file)
LETTER
PAGE
1,2,A
2
B
3
C
4
D
6
E
7
F
8
G,H,I
10
K
11
L
12
M
13
N
15
O,P
16
Q,R
18
S
20
T
22
U,V,W
23
Z
24
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download ISYE6501 glossary in alphabetical and more Study notes Data Mining in PDF only on Docsity!

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

GLOSSARY FOR ISYE 6501 INTRODUCTION TO ANALYTICS MODELING

(Organized alphabetically; for topic-by-topic glossary, see other file)

LETTER PAGE

1,2,A 2

B 3

C 4

D 6

E 7

F 8

G,H,I 10

K 11

L 12

M 13

N 15

O,P 16

Q,R 18

S 20

T 22

U,V,W 23

Z 24

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

1-norm Similar to rectilinear distance; measures the sum of the lengths of each dimension of a vector from the origin. If 𝑧𝑧 = (𝑧𝑧 1 , 𝑧𝑧 2 , … , 𝑧𝑧𝑚𝑚)^ is a vector in an 𝑚𝑚-dimensional space, then its 1-norm is �^1 |𝑧𝑧 1 |^1 + |𝑧𝑧 2 |^1 + ⋯ + |𝑧𝑧𝑚𝑚|^1 = |𝑧𝑧 1 | + |𝑧𝑧 2 | + ⋯ + |𝑧𝑧| = ∑ 𝑚𝑚 𝑖𝑖= 1 | 𝑧𝑧𝑖𝑖|.

2-norm Similar to Euclidian distance; measures the straight-line length of a vector from the origin. If 𝑧𝑧 = (𝑧𝑧 1 , 𝑧𝑧 2 , … , 𝑧𝑧𝑚𝑚) is a vector in an 𝑚𝑚- dimensional space, then its 2-norm is 2 �^ (𝑧𝑧 1 )^2 + (𝑧𝑧 2 )^2 + ⋯ + (𝑧𝑧𝑚𝑚)^2 = �^2 ∑^ 𝑚𝑚 𝑖𝑖= 1 (𝑧𝑧𝑖𝑖)^2.

A/B testing Test of two alternatives to see if either one leads to better outcomes.

Accuracy Fraction of data points correctly classified by a model; equal to 𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇+𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇.

Action In ARENA, something that is done to an entity.

Additive seasonality Seasonal effect that is added to a baseline value (for example, “the temperature in June is 10 degrees above the annual baseline”).

Adjusted R-squared/Adjusted R 2

Variant of R 2 that encourages simpler models by penalizing the use of too many variables.

AIC Akaike information criterion

Akaike information criterion (AIC)

Model selection technique that trades off between model fit and model complexity. When comparing models, the model with lower AIC is preferred. Generally penalizes complexity less than BIC.

Algorithm Step-by-step procedure designed to carry out a task.

Analysis of Variance/ANOVA Statistical method for dividing the variation in observations among different sources.

Approximate dynamic program

Dynamic programming model where the value functions are approximated.

Arc Connection between two nodes/vertices in a network. In a network model, there is a variable for each arc, equal to the amount of flow on the arc, and (optionally) a capacity constraint on the arc’s flow. Also called an edge.

Area under curve/AUC Area under the ROC curve; an estimate of the classification model’s accuracy. Also called concordance index.

ARIMA Autoregressive integrated moving average.

Arrival rate Expected number of arrivals of people, things, etc. per unit time -- for example, the expected number of truck deliveries per hour to a warehouse.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

Binary variable Variable that can take just two values: 0 and 1.

Binomial distribution Discrete probability distribution for the exact number of successes, k, out of a total of n iid Bernoulli trials, each with probability p: Pr(𝑘𝑘) = �

𝑘𝑘�^ 𝑝𝑝

Blocking Factor introduced to an experimental design that interacts with the effect of the factors to be studied. The effect of the factors is studied within the same level (block) of the blocking factor.

Box and whisker plot Graphical representation data showing the middle range of data (the “box”), reasonable ranges of variability (“whiskers”), and points (possible outliers) outside those ranges.

Box-Cox transformation Transformation of a non-normally-distributed response to a normal distribution.

Branching Splitting a set of data into two or more subsets, to each be analyzed separately.

CART Classification and regression trees.

Categorical data Data that classifies observations without quantitative meaning (for example, colors of cars) or where quantitative amounts are categorized (for example, “0-10, 11-20, …”).

Causation Relationship in which one thing makes another happen (i.e., one thing causes another).

Chance constraint A probability-based constraint. For example, a standard linear constraint might be 𝐴𝐴𝐴𝐴 ≤ 𝑏𝑏. A similar chance constraint might be Pr(𝐴𝐴𝐴𝐴 ≤ 𝑏𝑏) ≥ 0. 95.

Change detection Identifying when a significant change has taken place in a process.

Classification The separation of data into two or more categories, or (a point’s classification) the category a data point is put into.

Classification tree Tree-based method for classification. After branching to split the data, each subset is analyzed with its own classification model.

Classifier A boundary that separates the data into two or more categories. Also (more generally) an algorithm that performs classification.

Clique A set of nodes where each pair is connected by an arc.

Cluster A group of points identified as near/similar to each other.

Cluster center In some clustering algorithms (like 𝑘𝑘-means clustering), the central point (often the centroid) of a cluster of data points.

Clustering Separation of data points into groups (“clusters”) based on

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

nearness/similarity to each other. A common form of unsupervised learning.

Collective outlier A set of data points that is (uncommonly) different from others – for example, a missing heartbeat in an electrocardiogram; we don’t know exactly which millisecond it should’ve happened in, but collectively there’s a set of milliseconds that it’s missing from.

Concave function A function f() where for every two points 𝐴𝐴 and 𝑦𝑦, 𝑓𝑓(𝑐𝑐𝐴𝐴 + ( 1 − 𝑐𝑐)𝑦𝑦) ≥ 𝑐𝑐𝑓𝑓(𝐴𝐴) + ( 1 − 𝑐𝑐)𝑓𝑓(𝑦𝑦) for all 𝑐𝑐 between 0 and 1. In two dimensions, this means if the points (𝐴𝐴, 𝑓𝑓(𝐴𝐴)) and (𝑦𝑦, 𝑓𝑓(𝑦𝑦)) are connected with a straight line, the line is always below [or equal to] the function's curve between those two points. If 𝑓𝑓() is concave, then −𝑓𝑓() is convex.

Concordance index Area under the ROC curve; an estimate of the classification model’s accuracy. Also called AUC.

Confusion matrix Visualization of classification model performance.

Constant A number that remains the same.

Constraint Part of an optimization model that describes a restriction on the solution (the values of the variables).

Contextual outlier A data point that is (uncommonly) far from other data points related to it – for example, in Atlanta, a 90-degree (Fahrenheit) day in winter is an outlier, but a 90-degree day in summer is not.

Continuous-time simulation A simulation that models a system continuously, at every instant of time; continuous-time simulation models are often based on differential equations.

Control (1) A variable whose value remains constant for all runs of an experiment, so changes in this variable don’t affect the experiment. (2) Design an experiment where some factors (“controls” by definition (1)) are held constant to avoid them affecting the outcome.

Convex function A function f() where for every two points 𝐴𝐴 and 𝑦𝑦, 𝑓𝑓(𝑐𝑐𝐴𝐴 + ( 1 − 𝑐𝑐)𝑦𝑦) ≤ 𝑐𝑐𝑓𝑓(𝐴𝐴) + ( 1 − 𝑐𝑐)𝑓𝑓(𝑦𝑦) for all 𝑐𝑐 between 0 and 1. In two dimensions, this means if the points (𝐴𝐴, 𝑓𝑓(𝐴𝐴)) and (𝑦𝑦, 𝑓𝑓(𝑦𝑦)) are connected with a straight line, the line is always above [or equal to] the function's curve between those two points. If 𝑓𝑓() is convex, then −𝑓𝑓() is concave.

Convex hull (of a set of points) Smallest convex shape that the set of points is contained in.

Convex optimization model An optimization model where the objective function is to minimize a convex function (or maximize a concave function) and the constraints define a convex set of feasible solutions.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

each run.

Detrending Removal of trend, such as a change in the mean over time, from time- series data.

Diagnostic odds ratio Ratio of the odds that a data point in a certain category is correctly classified by a model, to the odds that a data point not in that category is incorrectly classified by the model; equal to 𝑇𝑇𝑇𝑇^ 𝐹𝐹𝑇𝑇

⁄ 𝐹𝐹𝑇𝑇 ⁄𝑇𝑇𝑇𝑇^ =^

𝑇𝑇𝑇𝑇×𝑇𝑇𝑇𝑇 𝐹𝐹𝑇𝑇×𝐹𝐹𝑇𝑇.

Diet problem Classical optimization model for finding the least-costly set of foods that meets all dietary requirements.

Differencing Using the difference of successive values in time series data, rather than the values themselves. Sometimes nonstationary data will have stationary differences.

Dimension A feature of the data points (for example, height or credit score). (Note that there is also a mathematical definition for this word.)

Discrete-event simulation A simulation that models a system that changes when specific events occur.

Distance How far it is between two points -- but there are different ways to measure it (see Minkowski distance).

Distribution-fitting Determining whether a set of data seems to follow a certain probability distribution, or determining which of several distributions the data is close to.

Double exponential smoothing

Two-parameter exponential smoothing technique that incorporates trend.

Dynamic programming Optimization approach that involves making a sequence of decisions over time, based on the current state of a system.

Earth Name of many implementations of multi-adaptive regression spline (MARS) model, because “MARS” is a trademark.

Edge Connection between two nodes/vertices in a network. In a network model, there is a variable for each edge, equal to the amount of flow on the arc, and (optionally) a capacity constraint on the edge’s flow. Also called an arc.

Eigenvalue Amount by which an eigenvector gets rescaled in a linear transformation.

Eigenvector Non-zero vector that does not change direction when a linear transformation is applied to it, but only gets rescaled by the eigenvalue

Elastic net Combination of lasso and ridge regression.

Elbow diagram A graph of improvement in function value as something else (e.g.,

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

number of clusters) increases or decreases; the spot where improvement levels out is the "elbow".

EM algorithm Expectation-maximization algorithm.

Empirical Bayes model Model that uses Bayes' theorem to update an initial guess/distribution based on observed data.

Entity A person/thing moving through a simulation.

Error (per data point) The difference (or absolute difference, squared difference, or other measure) between the estimate of a piece of data and its true value.

Error (total over data set) The total of all errors in a data set.

Euclidian distance/straight- line distance

The length of a straight line (the 2-norm distance) between two points. If 𝐴𝐴 = (𝐴𝐴 1 , 𝐴𝐴 2 , … , 𝐴𝐴𝑚𝑚)^ and 𝑦𝑦 = (𝑦𝑦 1 , 𝑦𝑦 2 , … , 𝑦𝑦𝑚𝑚)^ are two points in an 𝑚𝑚- dimensional space, then the Euclidian distance between them is �^2 (𝐴𝐴 1 − 𝑦𝑦 1 )^2 + (𝐴𝐴 2 − 𝑦𝑦 2 )^2 + ⋯ + (𝐴𝐴𝑚𝑚 − 𝑦𝑦𝑚𝑚)^2 = (^2) �^ ∑ 𝑚𝑚 𝑖𝑖= 1 (𝐴𝐴𝑖𝑖 − 𝑦𝑦𝑖𝑖)^2.

Expectation-maximization algorithm (EM algorithm)

General description of an algorithm with two steps (often iterated), one that finds the function for the expected likelihood of getting the response given current parameters, and one that finds new parameter values to maximize that probability.

Exploitation Using known information to get good outcomes.

Exploration Finding new/better/more information to determine how to optimize output.

Exponential distribution A continuous probability distribution of the time between events: 𝑓𝑓(𝐴𝐴) = 𝜆𝜆𝑒𝑒−𝜆𝜆𝜆𝜆. If the number of events in a fixed time follows the Poission distribution, then the time between them has the exponential distribution. The exponential distribution has the memoryless property.

Exponential smoothing Data smoothing technique in which older observations are assigned exponentially decresing weights, so more emphasis is given to recent observations.

Factorial design Tests of different combinations of factor values over multiple factors, to find each one’s effect, and interaction effects, on the outcome.

Fall out Fraction of data points not in a certain category that are incorrectly

classified by a model; equal to

𝐹𝐹𝑇𝑇 𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇. Also called false positive rate.

False negative (FN) Data point that a model incorrectly classifies as not being in a certain category. (“Negative” means the model classified it as not being in the category, and “False” means the model’s classification is incorrect.) Sometimes abbreviated as “FN”.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

to find each one’s effect, and interaction effects, on the outcome.

Game theory The study of competitive strategic decision-making where the outcome of each participant’s actions is dependent on another participant’s actions.

GARCH Generalized autoregressive conditional heteroscedasticity.

Generalized autoregressive conditional heteroscedasticity (GARCH)

Autoregressive method used to model variance in time series data.

Geometric distribution Discrete probability distribution of the number of iid Bernoulli trials, each with success probability 𝑝𝑝, before the first success: Pr(𝑘𝑘) = (1 − 𝑝𝑝)𝑘𝑘^ 𝑝𝑝. Also can be defined as the total number of trials through the first success (so Pr(𝑘𝑘) = (1 − 𝑝𝑝)𝑘𝑘−1^ 𝑝𝑝). To find the number of trials before the first failure, a similar distribution would be Pr(𝑘𝑘) = 𝑝𝑝𝑘𝑘( 1 − 𝑝𝑝).

Global optimum/maximum/minimum

A solution that achieves the best objective value among all of the feasible solutions; sometimes also used to refer to the best objective value achievable among a set of feasible solutions.

Graph Among other definitions, another name for a network.

Greedy algorithm Algorithm that makes the immediately-best choice at each step.

Heteroscedasticity When the variability of a response is different across the range of predictor values.

Heuristic Algorithm that is not guaranteed to find the absolute best (optimal) solution.

Hit rate Fraction of data points in a certain category that are correctly classified

by a model; equal to

𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇; also called the true positive rate, sensitivity, and recall.

Holt-Winters method/Winters' method

Three-parameter exponential smoothing technique that incorporates trend and seasonality; also called triple exponential smoothing.

Hypothesis test Statistical test to determine the probability that a property of a sample of data is true for the whole population.

iid Independent and identically distributed.

Improving direction Vector of changes to a solution to an optimization problem, such that the objective function gets better when moving the solution some distance in the vector's direction.

Imputation Inserting values where data is missing.

Independent A is "independent" of B if the probability or probability distribution of

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

A is not affected by B. For example, whether a coin flip is heads or tails is (I assume) independent of the number of fish in the ocean exactly 100 years ago to this day, but the temperature today is not independent of the temperature yesterday (if it was hot yesterday, it's more likely to be hot today too, etc.).

Independent and identically distributed (iid)

Things that follow the same probability distribution, including the same parameter(s), and whose values are independent of each other. For example, multiple flips of the same coin are iid.

Infinity-norm Specific case of p-norm when 𝑝𝑝 = ∞. Sounds weird, but it just reduces to the largest of the dimensions. If 𝑧𝑧 = (𝑧𝑧 1 , 𝑧𝑧 2 , … , 𝑧𝑧𝑚𝑚)^ is a vector in an 𝑚𝑚-dimensional space, then its ∞-norm is max𝑖𝑖|𝑧𝑧𝑖𝑖|. If 𝐴𝐴 = (𝐴𝐴 1 , 𝐴𝐴 2 , … , 𝐴𝐴𝑚𝑚) and 𝑦𝑦 = (𝑦𝑦 1 , 𝑦𝑦 2 , … , 𝑦𝑦𝑚𝑚) are two points in an 𝑚𝑚- dimensional space, then the ∞-norm distance between them is max𝑖𝑖|𝐴𝐴𝑖𝑖 − 𝑦𝑦𝑖𝑖|.

Initialization Setting starting values in an algorithm, or setting the first solution value for an "direction/step-size" optimization algorithm.

Integer program Optimization model where the objective function is a linear function of the variables, the constraints are linear equations and/or linear inequalities in terms of the variables, and some or all variables are restricted to have integer values.

Interaction term Variable in a model that is the combination of two or more other variables; for example, if 𝐴𝐴 1 and 𝐴𝐴 2 are variables, (𝐴𝐴 1 𝐴𝐴 2 ) is an interaction term/interaction variable.

Interarrival time The time between two consecutive arrivals of people, things, etc. -- for example, the time between consecutive phone calls to a service hotline.

Iterate Repeat the same steps of a process.

k-fold cross-validation Validation technique where data is divided into several parts (“folds”), and each part is used to validate a model fit to the remaining parts. Often a more robust validation approach than splitting data into training and validation sets.

𝑘𝑘-means algorithm Clustering algorithm that defines 𝑘𝑘 clusters of data points, each corresponding to one of 𝑘𝑘 cluster centers selected by the algorithm.

𝑘𝑘-Nearest-Neighbor (KNN) Classification algorithm that defines a data point’s category as a function of the nearest 𝑘𝑘 data points to it.

𝑘𝑘-Nearest-Neighbor regression

Regression model where a data point’s response is estimated based on the responses of the 𝑘𝑘 nearest data points with known response.

Kendall notation Notation to describe various types of queuing models -- for example,

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

estimate a response between 0 and 1: 𝑦𝑦 =

1 1 +𝑒𝑒−�𝑎𝑎^0 +∑^ 𝑎𝑎𝑖𝑖𝑥𝑥𝑖𝑖

𝑚𝑚 𝑖𝑖= 1 �. Also called a logistic regression.

Louvain algorithm Algorithm for finding highly-connected communities in networks.

Lower tail Lowest-value part of a distribution

Machine Apparatus that can do something; in “machine learning”, it often refers to both an algorithm and the computer it’s run on. (Fun fact: before computers were developed, the term “computers” referred to people who did calculations quickly in their heads or on paper!)

Machine learning Use of computer algorithms to learn and discover patterns or structure in data, without being programmed specifically for them.

Manhattan distance The sum of the lengths in each dimension between two points (the 1- norm distance). If 𝐴𝐴 = (𝐴𝐴 1 , 𝐴𝐴 2 , … , 𝐴𝐴𝑚𝑚) and 𝑦𝑦 = (𝑦𝑦 1 , 𝑦𝑦 2 , … , 𝑦𝑦𝑚𝑚) are two points in an 𝑚𝑚-dimensional space, then the rectilinear distance between them is �^1 |𝐴𝐴 1 − 𝑦𝑦 1 |^1 + |𝐴𝐴 2 − 𝑦𝑦 2 |^1 + ⋯ + |𝐴𝐴𝑚𝑚 − 𝑦𝑦𝑚𝑚|^1 = |𝐴𝐴 1 − 𝑦𝑦 1 | + |𝐴𝐴 2 − 𝑦𝑦 2 | + ⋯ + |𝐴𝐴𝑚𝑚 − 𝑦𝑦𝑚𝑚| = ∑ 𝑚𝑚𝑖𝑖=1| 𝐴𝐴𝑖𝑖 − 𝑦𝑦𝑖𝑖|. Also called Rectilinear or 1-norm distance.

Mann-Whitney test Nonparametric test to determine whether medians of two independent or unpaired samples (possibly of different size) are the same. Also called Wilcoxon sum rank test.

Margin For a single point, the distance between the point and the classification boundary; for a set of points, the minimum distance between a point in the set and the classification boundary. Also called the separation.

Markov chain Process where a system changes its state in a way that depends only on its current state.

Markov decision process Markov chain model where decisions are made at some states, and state transitions have associated rewards.

MARS Multi-adaptive regression splines.

Mathematical programming Mathematical optimization, often using variables, constraints, and objective function.

Maximization problem Optimization model where the objective is to find the feasible solution that maximizes the value of the objective function.

Maximum flow problem Network optimization model that finds the most flow that can be sent from one specific node to another.

Maximum likelihood A method that finds the set of parameter values for which a model is most likely to generate the actual values of the data.

McNemar's test Nonparametric test for comparing paired samples where the output is

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

yes/no (or A/B, or 0/1, etc.).

Memoryless (distribution) Probability distributions where the past history of outcomes does not influence the probability of the outcome of future events. The exponential and geometric distributions have this property.

Memoryless (Markov chain) Property that the next state of the system is dependent only on the current state, not any previous states.

Minimization problem Optimization model where the objective is to find the feasible solution that minimizes the value of the objective function.

Minkowski distance (of order 𝑝𝑝)

The 𝑝𝑝-norm distance between two points. If 𝐴𝐴 = (𝐴𝐴 1 , 𝐴𝐴 2 , … , 𝐴𝐴𝑚𝑚)^ and 𝑦𝑦 = (𝑦𝑦 1 , 𝑦𝑦 2 , … , 𝑦𝑦𝑚𝑚) are two points in an 𝑚𝑚-dimensional space, then the Minkowski distance of order p between them is �|𝐴𝐴 1 − 𝑦𝑦 1 |𝑝𝑝^ + |𝐴𝐴 2 − 𝑦𝑦 2 |𝑝𝑝^ + ⋯ + |𝐴𝐴𝑚𝑚 − 𝑦𝑦𝑚𝑚|𝑝𝑝

𝑝𝑝 = 𝑝𝑝�∑ 𝑚𝑚 𝑖𝑖= 1 |𝐴𝐴𝑖𝑖 − 𝑦𝑦𝑖𝑖|𝑝𝑝.

Misclassified Put into the wrong category by a classifier.

Miss rate Fraction of data points in a certain category that are incorrectly

classified by a model; equal to (^) 𝑇𝑇𝑇𝑇𝐹𝐹+𝑇𝑇𝐹𝐹𝑇𝑇. Also called false negative rate.

Missing data Values of data that are missing from a data set

Mixed strategy/randomized strategy

A strategy where a participant’s action is determined randomly according to probabilities – for example, in “rock, paper, scissors”, someone who randomly chooses between the three options with probability 1 / 3 each is using a mixed strategy.

Model (mathematical) A mathematical description of a system. Because real-life systems are complex, mathematical models of them are only approximate. In analytics, the term “model” is used in at least three different ways: (1) A general type of mathematical approach, like “regression”; (2) A general type of mathematical approach with specific parameters, like “regression using credit score and income as predictors”; (3) A general type of mathematical approach with specific parameters and values for the parameters, like “regression, with the prediction equal to 100,000, plus 100 times credit score, plus 3 times income”.

Modularity Measure of the density of connections between communicates in a network.

Module In ARENA, a building-block of a simulation, or the process, resource, etc. it represents.

Most optimal Please don't say this (or "more optimal"). "Optimal" means "best", and "most best" or "more best" are not proper English.

Moving average Smoothing technique that replaces data values with the mean of a number of consecutive observed values.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

Normal distribution Continuous probability distribution: 𝑓𝑓(𝐴𝐴) = (^) 𝜎𝜎√^12 𝜋𝜋 𝑒𝑒−

(𝑥𝑥−𝜇𝜇)^2 2 𝜎𝜎^2. Model error is often assumed to be normally distributed (for example, in linear regression).

Objective function Part of an optimization model that measures the quality of a solution (the values of the variables).

Observation (1) A measurement of one attribute of a data point. (2) A measurement of all attributes of a data point (i.e., a full row of data). (3) The act of watching/measuring/recording something.

Optimal Best possible, while satisfying all constraints.

Optimal solution A solution that satisfies a set of constraints, and has the best-possible objective value.

Optimization Finding the values of variables/decisions that yield the best value of an objective function while satisfying a set of constraints (restrictions).

Order of magnitude The relative size of something, often denoted by multiples of 10 so that difference in the order of magnitude of two numbers is the difference in how many digits they haves. So, loosely speaking, a 2-digit number is one order of magnitude smaller than a 3-digit number, a 7-digit number is two orders of magnitude smaller than a 9-digit number, two 4-digit numbers have the same order of magnitude, etc.

Orthogonal At right angles to one another (like “perpendicular” but generalized to more dimensions). Statistically, if two attributes are orthogonal then they are independent.

Outcome A variable of interest that a model tries to estimate or predict.

Outlier A data point or set of points that's far from the rest in one way or another (see point outlier, contextual outlier, collective outlier).

Overfitting Building a model that describes random effects instead of or in significant addition to the real effects; often caused by having too many factors or parameters compared to the number of data points. Overfitted models will have high prediction errors.

Paired samples Data with two different outcomes for each data point. Often helpful for comparing the method that generated outcome #1 with the method that generated outcome #2 to see which is better.

Parameter A constant whose value determines something about a system, expression, etc. For example, if we remove a variable from a regression model whenever its p-value is "too high", above 𝑃𝑃, then 𝑃𝑃 is a parameter, and setting it to different values can mean we get different models.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

Parametric test Statistical test that assumes the data being tested is sampled from a distribution governed by certain parameter(s). Parametric tests often focus on the mean.

PCA Principal component analysis.

Perturbation A change (usually small) from the actual or expected value of something.

𝑝𝑝-norm Measures vector length similar to the Minkowski distance of order 𝑝𝑝. If 𝑧𝑧 = (𝑧𝑧 1 , 𝑧𝑧 2 , … , 𝑧𝑧𝑚𝑚) is a vector in an 𝑚𝑚-dimensional space, then its 𝑝𝑝- norm is �|𝑧𝑧 1 |𝑝𝑝^ + |𝑧𝑧 2 |𝑝𝑝^ + ⋯ + |𝑧𝑧𝑚𝑚|𝑝𝑝

𝑝𝑝 = �∑^ 𝑚𝑚 𝑖𝑖= 1 |𝑧𝑧𝑖𝑖|𝑝𝑝

𝑝𝑝 .

Point outlier A data point that is (uncommonly) far from other data points – for example, an outdoor temperature reading of 200 degrees Fahrenheit.

Poisson distribution A discrete probability distribution of the number of iid events

happening within a fixed time: Pr(𝑘𝑘)^ =

𝜆𝜆𝑘𝑘𝑒𝑒−𝜆𝜆 𝑘𝑘!. If the time between the events follows the exponential distribution, then the number of events follows the Poisson distribution.

Poisson regression Regression that assumes the response has a Poisson distribution.

Positive likelihood ratio Ratio of the fraction of data points in a certain category that are correctly classified as being in that category, to the fraction of data points not in the category that are incorrectly classified as being in the category; equal to sensitivity/(1-specificity) =

𝑇𝑇𝑇𝑇/(𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇) 𝐹𝐹𝑇𝑇/(𝐹𝐹𝑇𝑇+𝑇𝑇𝑇𝑇).

Positive predictive value Fraction of data points classified as being in a certain category that are

really in that category; equal to (^) 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇. Also called precision.

Precision In analytics, the fraction of data points classified as being in a certain

category that are really in that category; equal to (^) 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇. Also called positive predictive value.

Prediction Estimate of what will happen in the future, or of something unknown (e.g., missing data) that happened.

Predictive analytics Loosely speaking, the use of analytics to estimate or predict what will happen.

Predictor A characteristic or measurement that is used to estimate (“predict”) the future value of something – for example, a person’s height or the color of a car. A “feature” or “attribute”; in the standard tabular format, a column of data.

Prescriptive analytics Loosely speaking, the use of analytics to suggest or prescribe what’s best to do.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

positive rate.

Receiver operating characteristic curve (ROC curve)

Graph that plots the true positive rate against the false positive rates for different classification cutoff thresholds.

Rectilinear distance The sum of the lengths in each dimension between two points (the 1- norm distance). If 𝐴𝐴 = (𝐴𝐴 1 , 𝐴𝐴 2 , … , 𝐴𝐴𝑚𝑚)^ and 𝑦𝑦 = (𝑦𝑦 1 , 𝑦𝑦 2 , … , 𝑦𝑦𝑚𝑚)^ are two points in an 𝑚𝑚-dimensional space, then the rectilinear distance between them is �^1 |𝐴𝐴 1 − 𝑦𝑦 1 |^1 + |𝐴𝐴 2 − 𝑦𝑦 2 |^1 + ⋯ + |𝐴𝐴𝑚𝑚 − 𝑦𝑦𝑚𝑚|^1 = |𝐴𝐴 1 − 𝑦𝑦 1 | (^) + |𝐴𝐴 2 − 𝑦𝑦 2 | (^) + ⋯ + |𝐴𝐴𝑚𝑚 − 𝑦𝑦𝑚𝑚| (^) = ∑ 𝑚𝑚𝑖𝑖=1| 𝐴𝐴𝑖𝑖 − 𝑦𝑦𝑖𝑖|. Also called Manhattan or 1-norm distance.

Regression Statistical model that describes relationships between variables, and/or predicts future values of a response..

Regression splines Regression model where different functions are used for different ranges of the data. Also called spline regression.

Regression tree Tree-based method for regression. After branching to split the data, each subset is analyzed with its own regression model.

Regularization Addition of term(s) to the model to reduce model complexity or overfitting. For example, adding a penalty to the objective function in regression can help reduce overfitting (see ridge regression).

Replication Running a stochastic simulation multiple times to sample the distribution of possible simulation results. “A replication” also refers to a single one of many runs of the simulation.

Resource In ARENA, the “doers” – for example, a call center worker at a queue.

Response A variable of interest that a model tries to estimate or predict.

Response surface Sequential experimentation strategy to understand the relationship between response and input factors, and/or optimize the response.

Ridge regression Method of regularization by limiting the sum of the squares of the coefficients. Will reduce the magnitude of coefficients, not the number of variables chosen.

Robust solution A solution that whose worst-case outcome over all possible scenarios is least bad.

ROC curve Receiver operating characteristic curve.

Root The first, complete data set in a tree model.

R-squared/R 2 Measure of linear regression model quality, the fraction of variance in the response that is explained by the model. Also called coefficient of determination.

Definitions in this document are meant to be in the context of ISYE 6501 only. Some of these terms have other definitions beyond the scope of this course. Many of these terms have precise mathematical definitions not included here (or even glossed over here), because they are beyond

Scaling Shrinking or expanding, and moving, the range of data to fit exactly into a specific interval (for example, between 0 and 1, or between 100 and 800).

Scenario Specific case/instance of an uncertain outcome; one approach to stochastic optimization is to optimize over a number of scenarios simultaneously.

Seasonality/cycles Repeating pattern in data values over time, often at consistent intervals (for example, temperature variations throughout the year that repeat each year at about the same time).

Seasonality length/cycle length

Fixed time period at which cycles/seasonalities repeat themselves.

Sensitivity Fraction of data points in a certain category that are correctly classified

by a model; equal to (^) 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇; also called the true positive rate, hit rate, and recall.

Sequential game A game in which participants choose their actions one after another, so participants who choose later have knowledge of the earlier actions.

Service rate Rate at which entities are processed.

Shortest path problem Network optimization model that finds the shortest route in a network from one specific node to another.

Simplicity (of a model) Having fewer parameters; opposite of complexity of a model. Often helpful for avoiding overfitting and increasing interpretability.

Simulation A model that imitates the operation or behavior of a real system.

Simultaneous game A game in which all participants choose their actions at the same time.

Single exponential smoothing Exponential smoothing technique with just one parameter, that does not incorporate trend or seasonality.

Smoothing Time series analysis technique to help filter out underlying randomness/noise. Examples include moving average, exponential smoothing, and ARIMA.

Smoothing constant Parameter in exponential smoothing to determine the relative importance of recent observations and previous estimates. Smoothing constants are between 0 and 1; a higher value indicates more reliance on observation, and a lower value indicates more reliance on previous estimates.

Solution (in the optimization sense)

A vector of values, one for each variable in an optimization model.

Specificity Fraction of data points not in a certain category that are correctly