Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Passenger Car Market Segmentation with Business Analytics & Big Data, Study Guides, Projects, Research of Database Management Systems (DBMS)

A project report submitted as part of a master of fashion management degree at the national institute of fashion technology (nift). The report focuses on the application of cluster analysis to segment the passenger car market. The report includes data analysis using logistic regression and conditional inference trees, with variables such as age, gender, estimatedsalary, lb, ac, and fm. The goal is to identify customer segments based on their purchasing behavior.

Typology: Study Guides, Projects, Research

2023/2024

Uploaded on 03/12/2024

yoshita-gupta
yoshita-gupta 🇮🇳

1 document

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BUSINESS ANALYTICS & BIG DATA
A Project Report Submitted
In Partial Fulfilment of the Requirements
For the degree Master of Fashion Management
End-term assignment
Submitted by -Yoshita Gupta (MFM/22/945)
Under the Supervision of Ms. Akanksha Dayma
(Assistant Professor)
Department of Fashion Management Studies
National Institute of Fashion Technology (NIFT)
Talpura, Chebb, Kangra, Himachal Pradesh, PIN- 176001
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Passenger Car Market Segmentation with Business Analytics & Big Data and more Study Guides, Projects, Research Database Management Systems (DBMS) in PDF only on Docsity!

BUSINESS ANALYTICS & BIG DATA

A Project Report Submitted

In Partial Fulfilment of the Requirements

For the degree Master of Fashion Management

End-term assignment

Submitted by - Yoshita Gupta (MFM/22/945)

Under the Supervision of Ms. Akanksha Dayma

(Assistant Professor)

Department of Fashion Management Studies

National Institute of Fashion Technology (NIFT)

Talpura, Chebb, Kangra, Himachal Pradesh, PIN- 176001

What is Big Data Analytic and Big Data? Business analytics and big data are essential tools for businesses that want to stay competitive in today's data-driven economy. They enable organizations to make better decisions, optimize their operations, and gain valuable insights into customer behavior, market trends, and other important factors that impact their bottom line.Business analytics (BA) refers to "The skills, technologies, practices for continuously developing new insights and understanding of business performance based on data and statistical methods".

  1. Data Management
  2. Data Visualization: making chart (pie chart n all)
  3. Machine Learning: python, R studio Significance of Business Analytics:

● data driven decision

● convert data into valuable information.

● eliminate guesswork

● faster answers

● reduce costs

4. Complex- 7+5i

> A="MOBILE"

> A

[1] "MOBILE"

> A="YOSHITA"

> A

[1] "YOSHITA"

> > NUM1=

> NUM

[1] 3

> CLASS(NUM1)

Error in CLASS(NUM1) : could not find function "CLASS"

> class (NUM1) [1] "numeric"

> > LOG1=TRUE

> class (LOG1) [1] "logical"

> CHAR2="CHARACTER"

> class(CHAR2)

[1] "character"

> > COMPLEX1= 5+7i

> COMPLEX1 [1]

5+7i

> class(COMPLEX1)

[1] "complex" >

OPERATOR in R

1. Assignment operator

> B = 1

> B

[1] 1

> U <- 7

> U

[1] 7

> 7 - > U

> U

2. Relational operator

> NUM1 =

> NUM2 =

> NUM1+NUM

[1] 30

> NUM1 = 30

> NUM2 =

> NUM1 +NUM

[1] 40

> NUM1 - NUM

[1] 20

> NUM1*NUM

[1] 300

> NUM1/NUM

[1] 3

> NUM2-NUM

[1] - 20

3. Airthmetic operator

> N=

> N

[1] 12

> APPLE=TRUE

> BANANA=FALSE

>APPLE&BANANA

[1] FALSE

> ORANGE=FALSE

> PEAR=FALSE

> ORANGE&PEAR

[1] FALSE

DATA STRUCTURES IN R

1. Vector

2. List

3. Matrix

4. Array

5. Factor

6. Data frame

1. Vector:

It is a homogeneous single dimension data frame.

> VEC1=C(1,3,5)

Error in C(1, 3, 5) : object not interpretable as a factor (C chota rakhna, it's used to combine)

> VEC1=c(1,3,5)

> VEC1 [1] 1 3 5

> class(VEC1)

[1] "numeric"

> NAME=c("YOSHITA","DOO","QWERTY")

> NAME [1] "YOSHITA" "DOO" "QWERTY"

> YUIOP=c(Y, U,Y)

> YUIOP

[1] TRUE FALSE TRUE

> > MOBILE 1=c(1,T,2,F)

MOBILE 1

[1] 1 1 2 0

> class(MOBILE)

[1] "numeric"

> U=c(2,"A",3,"B")

> U [1] "2" "A" "3" "B"

> CLASS(U)

Error in CLASS(U) : could not find function "CLASS" ( caps was on)

> class(U)

[1] "character"

> Y=c(1,"a",T)

> Y

[1] "1" "a" "TRUE"

> class(Y)

[1] "character"

Highest precedence is of

CHARACTER > NUMERIC > LOGIC

> Y[2]

[1] "a"

> U[3]

[1] "3"

Now we are extracting the elements from indexes

> HAHA[[3]][1]

[1] FALSE

> HAHA[[2]][2]

[1] "b"

3. Matrix

Matrix is a two-dimensional homogeneous data structure

> m1=matrix(c(1,2,3,5,6,7,8,56))

> m

[,1]

[1,] 1

[2,] 2

[3,] 3

[4,] 5

[5,] 6

[6,] 7

[7,] 8

[8,] 56

> 1 column and 8 rows

We are making different rows and different columns

> m1=matrix(c(1,2,3,5,6,7,8,56),nrow = 2,ncol=4)

> m

[,1] [,2] [,3] [,4]

[1,] 1 3 6 8

[2,] 2 5 7 56

These values are stored column wise

Now to do it row wise

> m1=matrix(c(1,2,3,5,6,7,8,56),nrow = 2,ncol=4,byrow = T)

> m1 [,1] [,2] [,3] [,4]

[1,] 1 2 3 5

[2,] 6 7 8 56

> m1=matrix(c(1,2,3,5,6,7,8,56),nrow = 2,ncol=4, byrow=F)

> m

Now we will learn about how to extract the element in matrix

> m1[1,2]

[1] 3

> m1[2,4]

[1] 56

Inbuilt Functions in R

1. str()

2. head()

3. tail()

4. table()

5. min()

6. max()

7. range()

> View(iris)

> str(iris)

'data.frame':150 obs. of 5 variables:

$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4. ...

$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3. ...

$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1. ...

$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0. ...

$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

dataframe means table

observations means rows

variable means columns

factor means characters

str means structure which tells about the table

head se top ki 6 entries dega and tail se last ki 6 entries.

data"

data.frame(fruit_name=c("apple","banana","gauva"),fruit_cost+c(10,20,30))- fruits Error in data.frame(fruit_name = c("apple", "banana", "gauva"), fruit_cost + : object 'fruit_cost' not found data.frame(fruit_name=c("apple","banana","guava"),fruit_cost=c(10,20,30))- fruits fruits fruit_name fruit_cost 1 apple 10 2 banana 20 3 guava 30 fruits$fruit_name [1] "apple" "banana" "guava" fruits$fruits_cost

NULL

fruits$fruit_cost [1] 10 20 30 View(iris) if(iris$Sepal.Length[1]>4){print("sepallengthisgreaterthan4")} [1] "sepallengthisgreaterthan4" if(iris$Sepal.Length[3]>6){print}("sepallengthisgreaterthan6")} Error: unexpected '}' in "if(iris$Sepal.Length[3]>6){print}("sepallengthisgre aterthan6")}" if(iris$Sepal.Length[3]>6){print("sepallengthisgreaterthan6")} if(iris$Sepal.Length[3]>6){print("sepal length is greater than 6")} if(iris$Sepal.Length[3]>6){print("sepal length is greater than 6")} else{pr int("sepal length is greater than 4")} [1] "sepal length is greater than 4" if(iris$Sepal.Length[3]>6){print("sepal length is greater than 6")} else{pr int("sepal length is greater than 4")}

[1] "sepal length is greater than 4"

vec1<-1: for(i in vec1){print(i+6)} [1] 7 [1] 8 [1] 9 [1] 10 [1] 11 [1] 12 [1] 13 [1] 14 [1] 15 for(i in vec1){print(i+7)} [1] 8 [1] 9 [1] 10 [1] 11 [1] 12 [1] 13 [1] 14 [1] 15 [1] 16 for(i in vec1){print(i*7)} [1] 7 [1] 14 [1] 21 [1] 28 [1] 35 [1] 42 [1] 49 [1] 56 [1] 63

for(i in vec1){print(i/5)} [1] 0. [1] 0. [1] 0. [1] 0. [1] 1 [1] 1. [1] 1. [1] 1. [1] 1. for(i in vec1){print(i-5)} [1] - 4 [1] - 3 [1] - 2 [1] - 1 [1] 0 [1] 1 [1] 2

[1] 3 [1] 4

for(i in vec1){print(i*5)} [1] 5 [1] 10 [1] 15 [1] 20 [1] 25 [1] 30 [1] 35 [1] 40 [1] 45 View(iris) if(iris$Sepal.Length[1]>4){print("sepallengthisgreaterthan4")} [1] "sepallengthisgreaterthan4" if(iris$Sepal.Length[3]>6){print}("sepallengthisgreaterthan6")} Error: unexpected '}' in "if(iris$Sepal.Length[3]>6){print}("sepallengthisgre aterthan6")}" if(iris$Sepal.Length[3]>6){print("sepallengthisgreaterthan6")} if(iris$Sepal.Length[3]>6){print("sepal length is greater than 6")} if(iris$Sepal.Length[3]>6){print("sepal length is greater than 6")} else{pr int("sepal length is greater than 4")} [1] "sepal length is greater than 4" if(iris$Sepal.Length[3]>6){print("sepal length is greater than 6")} else{pr int("sepal length is greater than 4")} [1] "sepal length is greater than 4" vec1<-1: for(i in vec1){print(i+6)} [1] 7 [1] 8 [1] 9 [1] 10 [1] 11 [1] 12 [1] 13 [1] 14 [1] 15 for(i in vec1){print(i+7)} [1] 8 [1] 9 [1] 10 [1] 11 [1] 12 [1] 13

[1] 14 [1] 15

LINEAR REGRESSION (LM FUNCTION)

➢ Linearregression<-lm(size~weight,data+Mousedata)

mousedata size weight tail 1 1.4 0.9 0. 2 2.6 1.8 1. 3 1.0 2.4 0. 4 3.7 3.5 2.

plot(mousedata$weight,mousedata$size) linearregression<-lm(size~weight,data=mousedata) summary(linearregression) Call: lm(formula = size ~ weight, data = mousedata) Residuals: 1 2 3 4 5 0.4843 0.6019 - 1.7197 - 0.3427 0. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) - 0.1666 1.3816 - 0.121 0. weight 1.2027 0.5060 2.377 0.0979.


Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.242 on 3 degrees of freedom Multiple R-squared: 0.6531, Adjusted R-squared: 0. F-statistic: 5.648 on 1 and 3 DF, p-value: 0.

abline(linearregression,col="pink",lwd=5)

plot(mousedata) multiple.regression<-lm(size~weight+tail,data=mousedata) summary(mousedata) size weight tail Min. :1.00 Min. :0.9 Min. :0. 1st Qu.:1.40 1st Qu.:1.8 1st Qu.:0. Median :2.60 Median :2.4 Median :1.

$ EstimatedSalary: int 19000 20000 43000 57000 76000 58000 84000 150000 33000 65 000 ... $ Purchased : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...

xtabs(~Gender+Purchased,data=logisticregression) Purchased Gender 0 1 Female 127 77 Male 130 66 xtabs(~Age+Purchased,data = logisticregression) Purchased Age 0 1 18 5 0 19 7 0 20 7 0 21 4 0 22 5 0 23 6 0

24 9 0 25 6 0 26 16 0 27 11 2 28 11 1 29 9 1 30 9 2 31 10 1 32 4 5 33 8 1 34 5 1 35 29 3 36 7 5 37 13 7 38 12 1 39 9 6 40 12 3 41 15 1 42 10 6 43 1 2 44 1 1 45 1 6 46 5 7 47 2 12 48 1 13 49 2 8 50 1 3 51 1 2 52 1 5 53 0 5 54 0 4 55 0 3 56 0 3 57 0 5 58 0 6 59 2 5 60 0 7

SEGMENTATION USING CLUSTERT ANALYSIS:

OUTLINE -

  • About market segmentation
  • About cluster analysis
  • Example (Segmenting Passenger car market)

“Kotler” anything that is capable of satisfying felt need is a PRODUCT.

CLUSTER: collection of data objects.

Class of technique use to classify object in to relatively homogenous grp called cluster.

Finding similarities between data acc to the characteristics found in data

BROAD STEPS IN CLUSTER ANALYSIS

  • Defining the variables on the which the clustering will be based.
  • Collect data on the selected variables.
  • Standardized the data collected.
  • Measuring the inter respondents’ distance.
  • Grouping the objects based on distances between them.

MAJOR CLUSTERING APPROCHES:

  • PARTIONING APPROCH- there are 2 to 3 groups exist and we assign according to data there get

1 cluster and take a centroid. It is also a k- mean clustering.

  • HIERARCHIAL APPROCH- individual is separate cluster and some it ends that entire data set is

one cluster

K- MEAN CLUSTERING METHOD

EUCLIDEAN DISTANCE

DISTANCE BETWEEN CLUSTER

  • Single linkage- take a centroid or average
  • Complete linkage-

LOGISTIC REGRESSION- GLM FUNCTION

logisticregression<-read.csv(file.choose(),header = T) View(logisticregression) str(logisticregression) 'data.frame': 400 obs. of 5 variables: $ User.ID : int 15624510 15810944 15668575 15603246 15804002 15728773 15 598044 15694829 15600575 15727311 ...

$ Gender : chr "Male" "Male" "Female" "Female" ... $ Age : int 19 35 26 27 19 27 27 32 25 35 ... $ EstimatedSalary: int 19000 20000 43000 57000 76000 58000 84000 150000 33000 6 5000 ... $ Purchased : int 0 0 0 0 0 0 0 1 0 0 ... GLM FUNCTION - LOGISTIC REGRESSION

mydata<-glm(Purchased~Age+ Gender+EstimatedSalary, data=logisticregression,fami ly='binomial')