Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Principal Component Analysis: Understanding Data Variation and Dimensionality Reduction, Study notes of Mathematical Statistics

Jain University Mathematical Statistics

Principal component analysis (pca) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The ideas, steps, and applications of pca, including gene expression analysis and data visualization.

Typology: Study notes

2023/2024

Uploaded on 12/26/2023

abhinandan-uk 🇮🇳

1 document

1 / 23

This page cannot be seen from the preview

Don't miss anything!

bg1

Principal Component

Analyasis

by- ABHINANDAN

usn-22BTRCB002

pf3

pf4

pf5

pf8

pf9

pfa

pfd

pfe

pff

pf12

pf13

pf14

pf15

pf16

pf17

Related documents

Dimensionality Reduction with Principal Component Analysis (PCA)

Principal Components Analysis: A Statistical Approach to Dimensionality Reduction

Linear Dimensionality Reduction: Principal Component Analysis (PCA)

Principal Component Analysis (PCA) for Dimensionality Reduction

Principal Component Analysis

Data Analysis Principal Components Analysis, Lecture Slide - Engineering

(1)

Dimension Reduction: Principal Component Analysis and Fisher Linear Discriminant - Prof. S

PCA: Dimensionality Reduction and Identifying New Variables in Gene Expression Data

Data Mining: Techniques for Summarization and Dimensionality Reduction - Prof. Jennifer L.

Principal Component Analysis (PCA) - Prof. Nguyen Thanh

Exploring Data: Factor Analysis and Principal Component Analysis for Data Reduction

Unsupervised Learning: Clustering Techniques and Dimensionality Reduction

Partial preview of the text

Download Principal Component Analysis: Understanding Data Variation and Dimensionality Reduction and more Study notes Mathematical Statistics in PDF only on Docsity!

Principal Component

Analyasis

by- ABHINANDAN

usn-22BTRCB

Principal Components Analysis Ideas ( PCA)

(^) Does the data set ‘span’ the whole of d dimensional space?
(^) For a matrix of m samples x n genes, create a new covariance matrix of size n x n.
(^) Transform some large number of variables into a smaller number of uncorrelated variables called principal components (PCs).
(^) developed to capture as much of the variation in data as possible

X X Principal Component Analysis Note: Y1 is the first eigen vector, Y2 is the second. Y ignorable. Y Y x x x x x x x x x x x x x x x x x x x x x x x x x Key observation: variance = largest!

Principal Component Analysis: one attribute first

(^) Question: how much spread is in the data along the axis? (distance to the mean)
(^) Variance=Standard deviation^ 30 40 30 35 30 15 30 15 18 15 30 24 40 42 Temperat ure

More than two attributes: covariance matrix

(^) Contains covariance values between all possible dimensions (=attributes):
(^) Example for three attributes (x,y,z):

Eigenvalues & eigenvectors

(^) Vectors x having same direction as A x are called eigenvectors of A ( A is an n by n matrix).
(^) In the equation A x =λ x , λ is called an eigenvalue of A.

Principal components

(^) 1. principal component (PC1)
- (^) The eigenvalue with the largest absolute value will indicate that the data have the largest variance along its eigenvector, the direction along which there is greatest variation
(^) 2. principal component (PC2)
- (^) the direction with maximum variation left in data, orthogonal to the 1. PC
(^) In general, only few directions manage to capture most of the variability in the data.

Steps of PCA

(^) Let be the mean vector (taking the mean of all rows)
(^) Adjust the original data by the mean X’ = X –
(^) Compute the covariance matrix C of adjusted X
(^) Find the eigenvectors and eigenvalues of C. - (^) For matrix C, v ectors e (=column vector) having same direction as C e : - (^) eigenvectors of C is e such that C e =λ e , - (^) λ is called an eigenvalue of C. - (^) C e =λ e ⇔ ( C -λI) e = - (^) Most data mining packages do this for you.

Principal components - Variance

Transformed Data

Eigenvalues λ j corresponds to variance on each component j
Thus, sort by λ j
Take the first p eigenvectors e i; where p is the number of top eigenvalues
(^) These are the directions with the largest variances

Covariance Matrix

(^) C=
(^) Using MATLAB, we find out:
- (^) Eigenvectors:
- (^) e1=(-0.98,-0.21), λ1=51.
- (^) e2=(0.21,-0.98), λ2=560.
- (^) Thus the second eigenvector is more important! 106 482 75 106

Principal components

(^) General about principal components
- (^) summary variables
- (^) linear combinations of the original variables
- (^) uncorrelated with each other
- (^) capture as much of the original variance as possible

Two Way (Angle) Data Analysis Genes 10 3

10 4 Samples 10 1

2 Gene expression matrix Sample space analysis Gene space analysis Conditions 10 1

2 Genes 10 3

10 4 Gene expression matrix

PCA - example