Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Introduction to Numpy, Study notes of Engineering

This document introduces the basics of NumPy, a package for data analysis and scientific computing with Python. It covers the creation of arrays, their attributes, indexing, slicing, and joining and splitting of arrays. The document also highlights the differences between lists and arrays and provides examples of creating arrays from scratch using NumPy. useful for students studying data science or scientific computing with Python.

Typology: Study notes

2022/2023

Available from 10/30/2023

338-bhushan-navale
338-bhushan-navale 🇮🇳

1 document

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
FUNDAMENTALS OF DATA SCIENCE
Introduction to Numpy
INTRODUCTION
NumPy stands for ‘Numerical Python’. It is a package for data analysis and scientific
computing with Python. NumPy uses a multidimensional array object, and has functions
and tools for working with these arrays. The powerful n-dimensional array in NumPy
speeds-up data processing. NumPy can be easily interfaced with other Python packages
and provides tools for integrating with other programming languages like C, C++ etc.
Installing NumPy
NumPy can be installed by typing following command: pip
install NumPy
Array
We have learnt about various data types like list, tuple, and dictionary. In this chapter we
will discuss another datatype ‘Array’. An array is a data type used to store multiple values
using a single identifier (variable name). An array contains an ordered collection of data
elements where each element is of the same type and can be referenced by its index
(position). The important characteristics of an array are:
Each element of the array is of same data type, though the values stored in them
may be different. The entire array is stored contiguously in memory. This makes
operations on array fast.
Each element of the array is identified or referred using the name of the Array
along with the index of that element, which is unique for each element. The index of an
element is an integral value associated with the element, based on the elements
position in the array. For example consider an array with 5 numbers:
[ 10, 9, 99, 71, 90 ]
Here, the 1st value in the array is 10 and has the index value [0] associated with it; the
2nd value in the array is 9 and has the index value [1] associated with it, and so on. The
last value (in this case the 5th value) in this array has an index [4]. This is called zero
based indexing. This is very similar to the indexing of lists in Python. The idea of arrays
is so important that almost all programming languages support it in one form or another.
NumPy Array
NumPy arrays are used to store lists of numerical data, vectors and matrices. The NumPy
library has a large set of routines (built-in functions) for creating, manipulating, and
transforming NumPy arrays. Python language also has an array data structure, but it is
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Introduction to Numpy and more Study notes Engineering in PDF only on Docsity!

Introduction to Numpy

INTRODUCTION

NumPy stands for ‘Numerical Python’. It is a package for data analysis and scientific computing with Python. NumPy uses a multidimensional array object, and has functions and tools for working with these arrays. The powerful n-dimensional array in NumPy speeds-up data processing. NumPy can be easily interfaced with other Python packages and provides tools for integrating with other programming languages like C, C++ etc. Installing NumPy NumPy can be installed by typing following command: pip install NumPy Array We have learnt about various data types like list, tuple, and dictionary. In this chapter we will discuss another datatype ‘Array’. An array is a data type used to store multiple values using a single identifier (variable name). An array contains an ordered collection of data elements where each element is of the same type and can be referenced by its index (position). The important characteristics of an array are: Each element of the array is of same data type, though the values stored in them may be different. The entire array is stored contiguously in memory. This makes operations on array fast. Each element of the array is identified or referred using the name of the Array along with the index of that element, which is unique for each element. The index of an element is an integral value associated with the element, based on the element’s position in the array. For example consider an array with 5 numbers: [ 10, 9, 99, 71, 90 ] Here, the 1st value in the array is 10 and has the index value [0] associated with it; the 2nd value in the array is 9 and has the index value [1] associated with it, and so on. The last value (in this case the 5th value) in this array has an index [4]. This is called zero based indexing. This is very similar to the indexing of lists in Python. The idea of arrays is so important that almost all programming languages support it in one form or another.

NumPy Array

NumPy arrays are used to store lists of numerical data, vectors and matrices. The NumPy library has a large set of routines (built-in functions) for creating, manipulating, and transforming NumPy arrays. Python language also has an array data structure, but it is

not as versatile, efficient and useful as the NumPy array. The NumPy array is officially called ndarray but commonly known as array. In rest of the chapter, we will be referring to NumPy array whenever we use “array”. following are few differences between list and Array.

Creation of Arrays from Scratch

There are several ways to create arrays. To create an array and to use its methods, first we need to import the NumPy library. #NumPy is loaded as np (we can assign any #name), numpy must be written in lowercase

import numpy as np The NumPy’s array() function converts a given list into an array. For example, #Create an array called array1 from the #given list. array1 = np.array([10,20,30]) #Display the contents of the array

Ones(): We can create an array with all elements initialised to 1 using the function ones(). By default, the data type of the array created by ones() is float. The following code will create an array with 3 rows and 2 columns.

array6 = np.ones((3,2)) array6 array([[1., 1.], [1., 1.], [1., 1.]]) Full(): Return a new array of given shape and type, filled with `fill_value np.full((3,3),5) array([[5, 5, 5], [5, 5, 5], [5, 5, 5]]) Arange(): We can create an array with numbers in a given range and sequence using the arange() function. This function is analogous to the range() function of Python. array7 = np.arange(6)

an array of 6 elements is created with start value 5 and

step size 1 >>> array array([0, 1, 2, 3, 4, 5])

Creating an array with start value - 2, end # value 24 and

step size 4

array8 = np.arange( - 2, 24, 4 ) array8 array([-2, 2, 6, 10, 14, 18, 22]) Linspace(): Creates an evenly spaced values from the given interval.

Create an array of five values evenly spaced between 0 and 1

>>>np.linspace(0, 1, 5) Out[16]: array([ 0. , 0.25, 0.5 , 0.75, 1. ]) Random(): Creates an random values from the given interval. Random() , has main three types of methods:-random.rand(), random.randn() and random.randint(). Below are the examples.

  • Create a 3x3 array of uniformly distributed random values. np.random.random((3, 3)) Out[17]: array([[ 0.99844933, 0.52183819, 0.22421193], [ 0.08007488, 0.45429293, 0.20941444], [ 0.14360941, 0.96910973, 0.946117 ]])
  • Create a 3x3 array of normally distributed random values with mean 0 and standard deviation 1 np.random.normal(0, 1, (3, 3)) Out[18]: array([[ 1.51772646, 0.39614948, - 0.10634696], [ 0.25671348, 0.00732722, 0.37783601], [ 0.68446945, 0.15926039, - 0.70744073]])
  • Create a 3x3 array of random integers in the interval [0, 10) np.random.randint(0, 10, (3, 3)) Out[1: array([[2, 3, 4], [5, 7, 8], [0, 5, 0]])
  • Eye(): Creates an Identity Matrix. Create a 3x3 identity matrix np.eye(3) Out[20]: array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])

Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many. Attributes of arrays: Determining the size, shape, memory consumption, and data types of arrays. Let’s create 1D array and 2D array and observe the attributes of it Create an 1D array called array1 from the #given list.

array1 = np.array([10,20,30]) #Display the contents of the array array array([10, 20, 30]) Creating a 2-D Array We can create a two dimensional (2-D) arrays by passing nested lists to the array() function. array3 = np.array([[2.4,3], [4.91,7],[0,-1]]) a r r a y 3 a r r a y ( [[2.4,3.], [ 4.9 1, 7. ], [ 0. , - 1. ]]) Some important attributes of a NumPy ndarray object are: i) ndarray.ndim: gives the number of dimensions of the array as an integer value. Arrays can be 1-D, 2-D or n-D. In this chapter, we shall focus on 1-D and 2-D arrays only. NumPy calls the dimensions as axes (plural of axis). Thus, a 2- D array has two axes. The row-axis is called axis-0 and the column-axis is called axis-

  1. The number of axes is also called the array’s rank.

    array1.ndim 1 array3.ndim 2 ii) ndarray.shape : It gives the sequence of integers indicating the size of the array for each dimension.

array1 is 1D-array, there is nothing # after , in sequence

array1.shape (3,) array3.shape (3, 2) The output (3, 2) means array3 has 3 rows and 2 columns. iii) ndarray.size: It gives the total number of elements of the array. This is equal to the product of the elements of shape. array1.size 3 array3.size 6 iv) ndarray.dtype: is the data type of the elements of the array. All the elements of an array are of same data type. Common data types are int32, int64, float32, float64, U32, etc. array1.dtype dtype('int32') >>> array2.dtype dtype('') >>> array3.dtype dtype('float64') v) ndarray.itemsize : It specifies the size in bytes of each element of the array. Data type int32 and float32 means each element of the array occupies 32 bits in memory. 8 bits form a byte. Thus, an array of elements of type int32 has itemsize 32/8=4 bytes. Likewise, int64/float64 means each item has itemsize 64/8=8 bytes. array1.itemsize 4 # memory allocated to each integer in bytes array2.itemsize 128 # memory allocated to string array3.itemsize 8 #memory allocated to float type vi) ndarray.nbytes : Lists the total size (in bytes) of the array: array1. nbytes 16 # Total memory allocated to array in bytes.

56

marks [0,4] index Out of Bound "Index Error". Index 4 is out of bounds for axis with size 3

Slicing of arrays: Getting and setting smaller subarrays within a larger array.

Just as we can use square brackets to access individual array elements, we can also

use them to access subarrays with the slice notation, marked by the colon (:)

character.

The NumPy slicing syntax follows that of the standard Python list; to access a

slice of an array x, use this: Syntax: x[start:stop:step]

Example:

array8 array([-2, 2, 6, 10, 14, 18, 22]) # excludes the value at the array2.dtype dtype('') >>> array3.dtype dtype('float64') v) ndarray.itemsize : It specifies the size in bytes of each element of the array. Data type int32 and float32 means each element of the array occupies 32 bits in memory. 8 bits form a byte. Thus, an array of elements of type int32 has itemsize 32/8=4 bytes. Likewise, int64/float64 means each item has itemsize 64/8=8 bytes. >>> array1.itemsize 4 # memory allocated to each integer in bytes >>> array2.itemsize 128 # memory allocated to string >>> array3.itemsize 8 #memory allocated to float type vi) ndarray.nbytes : Lists the total size (in bytes) of the array: >>> array1. nbytes 16 # Total memory allocated to array in bytes. 56 >>> marks [0,4] index Out of Bound "Index Error". Index 4 is out of bounds for axis with size 3 #### Slicing of arrays: Getting and setting smaller subarrays within a larger array. #### Just as we can use square brackets to access individual array elements, we can also #### use them to access subarrays with the slice notation, marked by the colon (:) #### character. #### The NumPy slicing syntax follows that of the standard Python list; to access a #### slice of an array x, use this: Syntax: x[start:stop:step] #### Example: >>> array8 array([-2, 2, 6, 10, 14, 18, 22]) # excludes the value at the end index array8[3:5] array([10, 14])

reverse the array >>> array8[ :

: - 1] array([22, 18, 14, 10, 6, 2, - 2]) Now let us see how slicing is done for 2-D arrays. For this, let us create a 2-D array called array9 having 3 rows and 4 columns.

array9 = np.array([[ - 7, 0, 10, 20], [ - 5, 1, 40, 200], [ - 1, 1, 4, 30]])

access all the elements in the 3rd column

array9[0:3,2] array([10, 40, 4]) Note that we are specifying rows in the range 0:3 because the end value of the range is excluded.

access elements of 2nd and 3rd row from 1st # and 2nd column

array9[1:3,0:2] array([[-5, 1], [-1, 1]]) If row indices are not specified, it means all the rows are to be considered. Likewise, if column indices are not specified, all the columns are to be considered. Thus, the statement to access all the elements in the 3rd column can also be written as: array9[:,2] array([10, 40, 4])

Example2:

In[16]: x = np.arange(10) x

Out[16]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In[20]: x[::2] # every other element

Out[20]: array([0, 2, 4, 6, 8])

In[21]: x[1::2] # every other element, starting at index 1

Out[21]: array([1, 3, 5, 7, 9])

Reshaping of arrays

We can modify the shape of an array using the reshape() function. Reshaping an array cannot be used to change the total number of elements in the array. Attempting to change the number of elements in the array using reshape() results in an error. Example

array3 = np.arange(10,22) array array([10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21]) 106 Informatics Practices – Class XI array3.reshape(3,4) array([[10, 11, 12, 13], [14, 15, 16, 17], [18, 19, 20, 21]])

np.concatenate ((array1,array2), axis=1) array([[ 10, 20, 0, 0, 0], [-30, 40, 0, 0, 0]]) Example-2: In[43]: x = np.array([1, 2, 3]) y = np.array([3, 2, 1]) np.concatenate([x, y]) Out[43]: array([1, 2, 3, 3, 2, 1]) In[46]: # concatenate along the first axis np.concatenate([grid, grid]) Out[46]: array([[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]) In[47]: # concatenate along the second axis (zero-indexed) np.concatenate([grid, grid], axis=1 ) Out[47]: array([[1, 2, 3, 1, 2, 3], [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use

the np.vstack (vertical stack) and np.hstack (horizontal stack)

functions:

In[48]: x = np.array([1, 2, 3])

  • grid = np.array ([[9, 8, 7], [6, 5, 4]]) #vertically stack the arraysnp.vstack([x, grid]) Out[48]: array([[1, 2, 3], [9, 8, 7], [6, 5, 4]]) # horizontally stack the arrays
  • y = np.array ([[99],
[99]])

np.hstack([grid, y]) Out[49]: array([[ 9, 8, 7, 99], [ 6, 5, 4, 99]]) .

Splitting Arrays

The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points: We can split an array into two or more subarrays. numpy.split() splits an array along the specified axis. We can either specify sequence of index values where an array is to be split; or we can specify an integer N, that indicates the number of equal parts in which the array is to be split, as parameter(s) to the NumPy.split() function. By default, NumPy.split() splits along axis = 0. Consider the array given below: In[50]: x = [1, 2, 3, 99, 99, 3, 2, 1] x1, x2, x3 = np.split(x, [3, 5]) print (x1, x2, x3) [1 2 3] [99 99] [3 2 1]

  • Notice that N split points lead to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar: Vsplit(): In[51]: grid = np.arange(16).reshape((4, 4))
  • grid Out[51]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) In[52]: upper, lower = np.vsplit (grid, [2]) print (upper) [[0 1 2 3] [4 5 6 7]] print (lower) [[ 8 9 10 11] [12 13 14 15]]

Above table shows few functions defined in numpy on one or more types, covering a wide variety of operations. Some of these ufuncs are called automatically on arrays when the relevant infix notation is used ( e.g. , add(a, b) is called internally when a + b is written and a or b is an ndarray). Nevertheless, you may still want to use the ufunc call in order to use the optional output argument(s) to place the output(s) in an object (or objects) of your choice. Recall that each ufunc operates element-by-element. Therefore, each ufunc will be described as if acting on a set of scalar inputs to return a set of scalar outputs.

Example 1 :

x, y = np.array([1, 2, 3]), np.array([4, 5, 6])

Addition or multiplication by a scalar act on each element of the array.

np.add ( x , 10) # Add 10 to each entry of x. array([11, 12, 13]) np.multiply(x , 4) # Multiply each entry of x by 4. array([ 4, 8, 12])

Example 2: The numpy.floor() function

This function is used to return the floor value of the input data which is the largest integer not greater than the input value. Consider the following example.

Example

  1. import numpy as np
  2. arr = np.array([12.202, 90.23120, 123.020, 23.202])
  3. print (np.floor(arr))
  4. Output:
  5. [ 12. 90. 123. 23.]

Advanced U Functions

Absolute: The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs: In[12]: np.absolute(x) Out[12]: array([2, 1, 0, 1, 2]) In[13 ]: np.abs(x) Out[13]: array([2, 1, 0, 1, 2]) Trigonometric functions

  • NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions. We’ll start by defining an array of angles: In[15]: theta = np.linspace(0, np.pi, 3) Now we can compute some trigonometric functions on these values:
  • In[16]: print ("theta = ", theta)
  • print ("sin(theta) = np.sin(theta) )
  • print ("cos(theta) = ", np.cos(theta) )
  • print ("tan(theta) = ", np.tan(theta) ) Exponents and logarithms Another common type of operation available in a NumPy ufunc are the exponentials: In[18]: x = [1, 2, 3] print ("x =", x) print ("e^x =", np.exp(x)) print ("2^x =", np.exp2(x)) print ("3^x =", np.power( 3, x)) x = [1, 2, 3] e^x = [ 2.71828183 7.3890561 20.08553692]

Out[30]: array([[ 1, 2, 3, 4, 5], [ 2, 4, 6, 8, 10], [ 3, 6, 9, 12, 15], [ 4, 8, 12, 16, 20], [ 5, 10, 15, 20, 25]])

Aggregations

Below table shows few Aggregation functions defined in numpy , used to perform many useful statistical operations on arrays. Below are the few basic aggregation functions used: Let us consider two arrays:

arrayA = np.array([1,0,2,-3,6,8,4,7]) arrayB = np.array([[3,6],[4,2]])

  1. The max() function finds the maximum element from an array.

    max element form the whole 1 - D array

arrayA.max() 8

max element form the whole 2 - D array

arrayB.max() 6

if axis=1, it gives column wise maximum

arrayB.max(axis=1) array([6, 4]) # if axis=0, it gives row wise maximum >>> arrayB.max(axis=0) array([4, 6])

  1. The min() function finds the minimum element from an array.

    arrayA.min() - 3 >>> arrayB.min() 2 >>> arrayB.min(axis=0) array([3, 2])

  2. The sum() function finds the sum of all elements of an array.

    arrayA.sum() 25 arrayB.sum() 15 #axis is used to specify the dimension #on which sum is to be made. Here axis = 1 #means the sum of elements on the first row arrayB.sum(axis=1) array([9, 6])

  3. The mean() function finds the average of elements of the array.

    arrayA.mean() 3.125 >>> arrayB.mean() 3.75 >>> arrayB.mean(axis=0) array([3.5, 4. ]) >>> arrayB.mean(axis=1) array([4.5, 3. ])