Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

data aggregation in python, Summaries of Programming Languages

Dr. Babasaheb Ambedkar Marathwada University Programming Languages

the file briefly explains the use of data aggregation in python programming

Typology: Summaries

2020/2021

Uploaded on 06/18/2023

AvaniPrithiani 🇮🇳

2 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

Python | Pandas dataframe.aggregate()

Python is a great language for doing data analysis, primarily because of the fantastic

ecosystem of data-centric Python packages. Pandas is one of those packages and makes

importing and analyzing data much easier.

Dataframe.aggregate() function is used to apply some aggregation across one or more

column. Aggregate using callable, string, dict, or list of string/callables. Most frequently used

aggregations are:

sum: Return the sum of the values for the requested axis

min: Return the minimum of the values for the requested axis

max: Return the maximum of the values for the requested axis

Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)

Parameters:

func : callable, string, dictionary, or list of string/callables. Function to use for aggregating

the data. If a function, must either work when passed a DataFrame or when passed to

DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column

names.

axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column.

1 or ‘columns’: apply function to each row.

Returns: Aggregated DataFrame

For link to CSV file Used in Code, click here

Example #1: Aggregate ‘sum’ and ‘min’ function across all the columns in data frame.

# importing pandas package

import pandas as pd

# making data frame from csv file

df = pd.read_csv("nba.csv")

# printing the first 10 rows of the dataframe

df[:10]

Partial preview of the text

Download data aggregation in python and more Summaries Programming Languages in PDF only on Docsity!

Python | Pandas dataframe.aggregate()

Python is a great language for doing data analysis, primarily because of the fantastic

ecosystem of data-centric Python packages. Pandas is one of those packages and makes

importing and analyzing data much easier.

Dataframe.aggregate() function is used to apply some aggregation across one or more

column. Aggregate using callable, string, dict, or list of string/callables. Most frequently used

aggregations are:

sum: Return the sum of the values for the requested axis

min: Return the minimum of the values for the requested axis

max: Return the maximum of the values for the requested axis

Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)

Parameters:

func : callable, string, dictionary, or list of string/callables. Function to use for aggregating

the data. If a function, must either work when passed a DataFrame or when passed to

DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column

names.

axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column.

1 or ‘columns’: apply function to each row.

Returns: Aggregated DataFrame

For link to CSV file Used in Code, click here

Example #1: Aggregate ‘sum’ and ‘min’ function across all the columns in data frame.

importing pandas package

import pandas as pd

making data frame from csv file

df = pd.read_csv("nba.csv")

printing the first 10 rows of the dataframe

df[:10]

Aggregation works with only numeric type columns.

Applying aggregation across all the columns

sum and min will be found for each

numeric type column in df dataframe

df.aggregate(['sum', 'min'])

Output:

For each column which are having numeric values, minimum and sum of all values has been

found. For dataframe df , we have four such columns Number, Age, Weight, Salary.

Example #2:

In Pandas, we can also apply different aggregation functions across different columns. For

that, we need to pass a dictionary with key containing the column names and values

containing the list of aggregation functions for any specific column.

importing pandas package

import pandas as pd

making data frame from csv file

df = pd.read_csv("nba.csv")

We are going to find aggregation for these columns

df.aggregate({"Number":['sum', 'min'], "Age":['max', 'min'], "Weight":['min', 'sum'], "Salary":['sum']})

df = pd.read_csv("nba.csv")

Print the dataframe

df

Now apply the groupby() function.

applying groupby() function to

group the data on team value.

gk = df.groupby('Team')

Let's print the first entries

in all the groups formed.

gk.first()

Output :

Let’s print the value contained any one of group. For that use the name of the team. We use

the function get_group() to find the entries contained in any of the groups.

Finding the values contained in the "Boston Celtics" group

gk.get_group('Boston Celtics')

Output :

Example #2: Use groupby() function to form groups based on more than one category (i.e.

Use more than one column to perform the splitting).

importing pandas as pd

import pandas as pd

Creating the dataframe

df = pd.read_csv("nba.csv")

First grouping based on "Team"

Within each team we are grouping based on "Position"

gkk = df.groupby(['Team', 'Position'])

Print the first value in each group

gkk.first()

import pandas as pd import numpy as np

Create a sample dataframe

df = pd.DataFrame({"dept": np.random.choice(["IT", "HR", "Sales", "Production"], size=50), "gender": np.random.choice(["F", "M"], size=50), "age": np.random.randint(22, 60, size=50), "salary": np.random.randint(20000, 90000, size=50)}) df.index.name = "emp_id"

Calculate mean data of gender groups

df.groupby('gender').mean().add_prefix('mean_')

Output:

Example 2: Performing multiple aggregate operations using the aggregate function

( DataFrameGroupBy.agg ) which accepts a string, function or a list of functions.

Import required libraries

import pandas as pd import numpy as np

Create a sample dataframe

df = pd.DataFrame({"dept": np.random.choice(["IT", "HR", "Sales", "Production"], size=50), "gender": np.random.choice(["F", "M"], size=50), "age": np.random.randint(22, 60, size=50), "salary": np.random.randint(20000, 90000, size=50)})

df.index.name = "emp_id"

Calculate min, max, mean and count of salaries

in different departments for males and females

df.groupby(['dept', 'gender'])['salary'].agg(["min", "max", "mean", "count"])

Output:

Example 3: Specifying multiple columns and their corresponding aggregate operations as

follows.

Import required libraries

import pandas as pd import numpy as np

Create a sample dataframe

df = pd.DataFrame({"dept": np.random.choice(["IT", "HR", "Sales", "Production"], size=50), "gender": np.random.choice(["F", "M"], size=50), "age": np.random.randint(22, 60, size=50), "salary": np.random.randint(20000, 90000, size=50)}) df.index.name = "emp_id"

Calculate mean salaries and min-max age of employees

in different departments for gender groups

Create bins or groups and apply operations

The cut method of Pandas sorts values into bin intervals creating groups or categories.

Aggregation or other functions can then be performed on these groups. Implementation of

this is shown below:

Example : Age is divided into age ranges and the count of observations in the sample data is

calculated.

Import required libraries

import pandas as pd import numpy as np

Create a sample dataframe

df = pd.DataFrame({"dept": np.random.choice(["IT", "HR", "Sales", "Production"], size=50), "gender": np.random.choice(["F", "M"], size=50), "age": np.random.randint(22, 60, size=50), "salary": np.random.randint(20000, 90000, size=50)}) df.index.name = "emp_id"

Create bin intervals

bins = [20, 30, 45, 60]

Segregate ages into bins of age groups

df['categories'] = pd.cut(df['age'], bins, labels=['Young', 'Middle', 'Old'])

Calculate number of observations in each age category

df['age'].groupby(df['categories']).count()

Output:

Transformation

Transformation is performing a group-specific operation where the individual values are

changed while the shape of the data remains same. We use the transform() function to do so.

Example :

Import required libraries

import pandas as pd import numpy as np

Create a sample dataframe

df = pd.DataFrame({"dept": np.random.choice(["IT", "HR", "Sales", "Production"], size=50), "gender": np.random.choice(["F", "M"], size=50), "age": np.random.randint(22, 60, size=50), "salary": np.random.randint(20000, 90000, size=50)}) df.index.name = "emp_id"

Calculate mean difference by transforming each salary value

df['mean_sal_diff'] = df['salary'].groupby( df['dept']).transform(lambda x: x - x.mean()) df.head()

data aggregation in python, Summaries of Programming Languages

Related documents

Partial preview of the text

Download data aggregation in python and more Summaries Programming Languages in PDF only on Docsity!

Python | Pandas dataframe.aggregate()

Python is a great language for doing data analysis, primarily because of the fantastic

ecosystem of data-centric Python packages. Pandas is one of those packages and makes

importing and analyzing data much easier.

Dataframe.aggregate() function is used to apply some aggregation across one or more

column. Aggregate using callable, string, dict, or list of string/callables. Most frequently used

aggregations are:

sum: Return the sum of the values for the requested axis

min: Return the minimum of the values for the requested axis

max: Return the maximum of the values for the requested axis

Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)

Parameters:

func : callable, string, dictionary, or list of string/callables. Function to use for aggregating

the data. If a function, must either work when passed a DataFrame or when passed to

DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column

names.

axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column.

1 or ‘columns’: apply function to each row.

Returns: Aggregated DataFrame

For link to CSV file Used in Code, click here

Example #1: Aggregate ‘sum’ and ‘min’ function across all the columns in data frame.

importing pandas package

making data frame from csv file

printing the first 10 rows of the dataframe

Aggregation works with only numeric type columns.

Applying aggregation across all the columns

sum and min will be found for each

numeric type column in df dataframe

Output:

For each column which are having numeric values, minimum and sum of all values has been

found. For dataframe df , we have four such columns Number, Age, Weight, Salary.

Example #2:

In Pandas, we can also apply different aggregation functions across different columns. For

that, we need to pass a dictionary with key containing the column names and values

containing the list of aggregation functions for any specific column.

importing pandas package

making data frame from csv file

We are going to find aggregation for these columns

Print the dataframe

Now apply the groupby() function.

applying groupby() function to

group the data on team value.

Let's print the first entries

in all the groups formed.

Output :

Let’s print the value contained any one of group. For that use the name of the team. We use

the function get_group() to find the entries contained in any of the groups.

Finding the values contained in the "Boston Celtics" group

Output :

Example #2: Use groupby() function to form groups based on more than one category (i.e.

Use more than one column to perform the splitting).

importing pandas as pd

Creating the dataframe

First grouping based on "Team"

Within each team we are grouping based on "Position"

Print the first value in each group

Create a sample dataframe

Calculate mean data of gender groups

Output:

Example 2: Performing multiple aggregate operations using the aggregate function

( DataFrameGroupBy.agg ) which accepts a string, function or a list of functions.

Import required libraries

Create a sample dataframe

Calculate min, max, mean and count of salaries

in different departments for males and females

Output:

Example 3: Specifying multiple columns and their corresponding aggregate operations as

follows.

Import required libraries

Create a sample dataframe

Calculate mean salaries and min-max age of employees

in different departments for gender groups

Create bins or groups and apply operations

The cut method of Pandas sorts values into bin intervals creating groups or categories.

Aggregation or other functions can then be performed on these groups. Implementation of

this is shown below: