Regularization in machine learning | Study notes Introduction to Machine Learning

Regularization in machine learning is a technique used to prevent overfitting by adding a penalty

to the model's complexity. Overfitting occurs when a model learns not only the underlying

patterns in the training data but also the noise, resulting in poor performance on new, unseen

data. Regularization helps in creating simpler models that generalize better to new data.

1. L1 Regularization (Lasso)

2. L2 Regularization (Ridge)

3. Elastic Net Regularization

• Mechanism: Adds the absolute values of the coefficients to the loss function.

Cost Function = Loss + λ∑ Wi ∣ ∣

Here, λ is the regularization parameter or hyperparameter it take any value for 0 to infinity, and

are the model coefficients.

• Effect: L1 regularization tends to produce sparse models, where some of the coefficients

are exactly zero. This property makes Lasso useful for feature selection, as it effectively

removes less important features.

• Use Cases: When you want to perform feature selection or when you have a large

number of features, and you suspect that many of them are irrelevant.

• Mechanism: Adds the squared values of the coefficients to the loss function.

CostFunction = Loss + ∑ (Wi)2λ

• Effect: L2 regularization distributes the penalty among all coefficients, leading to

smaller coefficients overall but rarely zeroing them out completely. It helps in

stabilizing the model by reducing the variance.

• Use Cases: When you have many features, and you believe that most of them

contribute to the outcome, but you want to avoid large coefficients that might make

the model sensitive to small changes in the input data

• Mechanism: Combines both L1 and L2 penalties. Cost Function = Loss = + λ∑ Wi + λ ∑ ∣ ∣

(Wi)2

• Effect: Elastic Net benefits from both L1 and L2 regularization, promoting sparsity while

also maintaining some stability in the model.

• Use Cases: When you have highly correlated features, Elastic Net can help by selecting

groups of correlated features together. It is also useful when you want a balance

between feature selection (L1) and coefficient shrinkage (L2).

1. Prevents Overfitting: By adding a penalty for large coefficients, regularization

discourages the model from fitting the noise in the training data, leading to better

generalization.

2. Simplifies the Model: Especially with L1 regularization, unimportant features can be

removed, resulting in a simpler and more interpretable model.

3. Stabilizes Predictions: Regularization reduces the variance of the model's predictions,

making them more stable and less sensitive to fluctuations in the input data.

Partial preview of the text

Download Regularization in machine learning and more Study notes Introduction to Machine Learning in PDF only on Docsity!

Regularization in machine learning is a technique used to prevent overfitting by adding a penalty to the model's complexity. Overfitting occurs when a model learns not only the underlying patterns in the training data but also the noise, resulting in poor performance on new, unseen data. Regularization helps in creating simpler models that generalize better to new data.

L1 Regularization (Lasso)
L2 Regularization (Ridge)
Elastic Net Regularization

Mechanism: Adds the absolute values of the coefficients to the loss function. Cost Function = Loss + λ∑ ∣ Wi∣ Here, λ is the regularization parameter or hyperparameter it take any value for 0 to infinity, and are the model coefficients.
Effect: L1 regularization tends to produce sparse models, where some of the coefficients are exactly zero. This property makes Lasso useful for feature selection, as it effectively removes less important features.
Use Cases: When you want to perform feature selection or when you have a large number of features, and you suspect that many of them are irrelevant.

• Mechanism: Adds the squared values of the coefficients to the loss function.

Cost Function = Loss + λ∑ (Wi)

• Effect: L2 regularization distributes the penalty among all coefficients, leading to

smaller coefficients overall but rarely zeroing them out completely. It helps in

stabilizing the model by reducing the variance.

• Use Cases: When you have many features, and you believe that most of them

contribute to the outcome, but you want to avoid large coefficients that might make

the model sensitive to small changes in the input data

Mechanism: Combines both L1 and L2 penalties. Cost Function = Loss = + λ∑ ∣ Wi ∣+ λ ∑ (Wi)
Effect: Elastic Net benefits from both L1 and L2 regularization, promoting sparsity while also maintaining some stability in the model.
Use Cases: When you have highly correlated features, Elastic Net can help by selecting groups of correlated features together. It is also useful when you want a balance between feature selection (L1) and coefficient shrinkage (L2).

Prevents Overfitting: By adding a penalty for large coefficients, regularization discourages the model from fitting the noise in the training data, leading to better generalization.
Simplifies the Model: Especially with L1 regularization, unimportant features can be removed, resulting in a simpler and more interpretable model.
Stabilizes Predictions: Regularization reduces the variance of the model's predictions, making them more stable and less sensitive to fluctuations in the input data.

In the context of linear regression, the standard objective function without regularization is: Cost Function=∑(yi - ŷi)2, where yi are the actual values and ŷi are the predicted values. With regularization, this becomes:

• L1 Regularization: Cost Function = ∑(yi - ŷi) + λ∑ ∣ Wi∣

• L2 Regularization: Cost Function = ∑(yi - ŷi) + λ∑ (Wi)

• Elastic Net: Cost Function = ∑(yi - ŷi) + λ∑ ∣ Wi ∣ + + λ∑ (Wi)

Regularization is a standard feature in many machine learning libraries. For example:

Ridge Regression: sklearn.linear_model.Ridge
Lasso Regression: sklearn.linear_model.Lasso
Elastic Net: sklearn.linear_model.ElasticNet
You can add L1, L2, or both penalties to layers using the kernel_regularizer parameter.

Regularization in machine learning, Study notes of Introduction to Machine Learning

Often downloaded together

Related documents

Partial preview of the text

Download Regularization in machine learning and more Study notes Introduction to Machine Learning in PDF only on Docsity!

• Mechanism: Adds the squared values of the coefficients to the loss function.

Cost Function = Loss + λ∑ (Wi)

• Effect: L2 regularization distributes the penalty among all coefficients, leading to

smaller coefficients overall but rarely zeroing them out completely. It helps in

stabilizing the model by reducing the variance.

• Use Cases: When you have many features, and you believe that most of them

contribute to the outcome, but you want to avoid large coefficients that might make

the model sensitive to small changes in the input data

• L1 Regularization: Cost Function = ∑(yi - ŷi) + λ∑ ∣ Wi∣

• L2 Regularization: Cost Function = ∑(yi - ŷi) + λ∑ (Wi)

• Elastic Net: Cost Function = ∑(yi - ŷi) + λ∑ ∣ Wi ∣ + + λ∑ (Wi)