








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of how to perform descriptive statistics analysis using the r programming language. It covers the key measures of central tendency (mean, median, mode) and variability (range, variance, standard deviation) that are commonly used to summarize and understand data. The document demonstrates the application of these techniques on two popular datasets - the 'mtcars' and 'iris' datasets - using r functions such as summary(), str(), and aggregate(). The goal is to help students and researchers gain a better understanding of their data through descriptive analysis, which is a crucial first step in many machine learning and data science workflows. By studying this document, readers will learn how to extract meaningful insights from small to medium-sized datasets, laying the foundation for more advanced data analysis and modeling.
Typology: Exercises
1 / 14
This page cannot be seen from the preview
Don't miss anything!
(An Autonomous Institute under UGC Act 1956)
a. Write an R script to find basic descriptive statistics using summary, str, quartile function on mtcars & cars datasets. b. Write an R script to find subset of dataset by using subset (), aggregate () functions on iris dataset
representative methods like using charts, graphs, tables, excel files, etc. In the descriptive analysis, we describe our data in some manner and present it in a meaningful way so that it can be easily understood. Most of the time it is performed on small data sets and this analysis helps us a lot to predict some future trends based on the current findings. Some measures that are used to describe a data set are measures of central tendency and measures of variability or dispersion. Process of Descriptive Analysis The measure of central tendency: Measure of variability Measure of central tendency It represents the whole set of data by a single value. It gives us the location of central points. There are three main measures of central tendency: Mean Mode Median
Measure of variability is known as the spread of data or how well is our data is distributed. The most common variability measures are: Range Variance Standard deviation
Descriptive Analysis helps us to understand our data and is a very important part of Machine Learning. This is due to Machine Learning being all about making predictions. On the other hand, statistics is all about drawing conclusions from data, which is a necessary initial step for Machine Learning. Let’s do this descriptive analysis in R.
Descriptive analyses consist of describing simply the data using some summary statistics and graphics. Here, we’ll describe how to compute summary statistics using R software. Mean It is the sum of observations divided by the total number of observations. It is also defined as average which is the sum divided by count. Median It is the middle value of the data set. It splits the data into two halves. If the number of elements in the data set is odd then the center element is median and if it is even then the median would be the average of two central elements. Mode It is the value that has the highest frequency in the given data set. The data set may have no mode if the frequency of all data points is the same. Also, we can have more than one mode if we encounter two or more data points having the same frequency. Variance It is defined as an average squared deviation from the mean. It is being calculated by finding the difference between every data point and the average which is also known as the mean, squaring them, adding all of them, and then dividing by the number of data points present in our data set. Standard Deviation It is defined as the square root of the variance. It is being calculated by finding the Mean, then subtract each number from the Mean which is also known as average and square the result. Adding all the values and then divide by the no of terms followed the square root.