














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This document introduces the basics of NumPy, a package for data analysis and scientific computing with Python. It covers the creation of arrays, their attributes, indexing, slicing, and joining and splitting of arrays. The document also highlights the differences between lists and arrays and provides examples of creating arrays from scratch using NumPy. useful for students studying data science or scientific computing with Python.
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!
NumPy stands for ‘Numerical Python’. It is a package for data analysis and scientific computing with Python. NumPy uses a multidimensional array object, and has functions and tools for working with these arrays. The powerful n-dimensional array in NumPy speeds-up data processing. NumPy can be easily interfaced with other Python packages and provides tools for integrating with other programming languages like C, C++ etc. Installing NumPy NumPy can be installed by typing following command: pip install NumPy Array We have learnt about various data types like list, tuple, and dictionary. In this chapter we will discuss another datatype ‘Array’. An array is a data type used to store multiple values using a single identifier (variable name). An array contains an ordered collection of data elements where each element is of the same type and can be referenced by its index (position). The important characteristics of an array are: Each element of the array is of same data type, though the values stored in them may be different. The entire array is stored contiguously in memory. This makes operations on array fast. Each element of the array is identified or referred using the name of the Array along with the index of that element, which is unique for each element. The index of an element is an integral value associated with the element, based on the element’s position in the array. For example consider an array with 5 numbers: [ 10, 9, 99, 71, 90 ] Here, the 1st value in the array is 10 and has the index value [0] associated with it; the 2nd value in the array is 9 and has the index value [1] associated with it, and so on. The last value (in this case the 5th value) in this array has an index [4]. This is called zero based indexing. This is very similar to the indexing of lists in Python. The idea of arrays is so important that almost all programming languages support it in one form or another.
NumPy arrays are used to store lists of numerical data, vectors and matrices. The NumPy library has a large set of routines (built-in functions) for creating, manipulating, and transforming NumPy arrays. Python language also has an array data structure, but it is
not as versatile, efficient and useful as the NumPy array. The NumPy array is officially called ndarray but commonly known as array. In rest of the chapter, we will be referring to NumPy array whenever we use “array”. following are few differences between list and Array.
There are several ways to create arrays. To create an array and to use its methods, first we need to import the NumPy library. #NumPy is loaded as np (we can assign any #name), numpy must be written in lowercase
import numpy as np The NumPy’s array() function converts a given list into an array. For example, #Create an array called array1 from the #given list. array1 = np.array([10,20,30]) #Display the contents of the array
Ones(): We can create an array with all elements initialised to 1 using the function ones(). By default, the data type of the array created by ones() is float. The following code will create an array with 3 rows and 2 columns.
array6 = np.ones((3,2)) array6 array([[1., 1.], [1., 1.], [1., 1.]]) Full(): Return a new array of given shape and type, filled with `fill_value np.full((3,3),5) array([[5, 5, 5], [5, 5, 5], [5, 5, 5]]) Arange(): We can create an array with numbers in a given range and sequence using the arange() function. This function is analogous to the range() function of Python. array7 = np.arange(6)
step size 1 >>> array array([0, 1, 2, 3, 4, 5])
step size 4
array8 = np.arange( - 2, 24, 4 ) array8 array([-2, 2, 6, 10, 14, 18, 22]) Linspace(): Creates an evenly spaced values from the given interval.
>>>np.linspace(0, 1, 5) Out[16]: array([ 0. , 0.25, 0.5 , 0.75, 1. ]) Random(): Creates an random values from the given interval. Random() , has main three types of methods:-random.rand(), random.randn() and random.randint(). Below are the examples.
Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many. Attributes of arrays: Determining the size, shape, memory consumption, and data types of arrays. Let’s create 1D array and 2D array and observe the attributes of it Create an 1D array called array1 from the #given list.
array1 = np.array([10,20,30]) #Display the contents of the array array array([10, 20, 30]) Creating a 2-D Array We can create a two dimensional (2-D) arrays by passing nested lists to the array() function. array3 = np.array([[2.4,3], [4.91,7],[0,-1]]) a r r a y 3 a r r a y ( [[2.4,3.], [ 4.9 1, 7. ], [ 0. , - 1. ]]) Some important attributes of a NumPy ndarray object are: i) ndarray.ndim: gives the number of dimensions of the array as an integer value. Arrays can be 1-D, 2-D or n-D. In this chapter, we shall focus on 1-D and 2-D arrays only. NumPy calls the dimensions as axes (plural of axis). Thus, a 2- D array has two axes. The row-axis is called axis-0 and the column-axis is called axis-
array1.ndim 1 array3.ndim 2 ii) ndarray.shape : It gives the sequence of integers indicating the size of the array for each dimension.
array1.shape (3,) array3.shape (3, 2) The output (3, 2) means array3 has 3 rows and 2 columns. iii) ndarray.size: It gives the total number of elements of the array. This is equal to the product of the elements of shape. array1.size 3 array3.size 6 iv) ndarray.dtype: is the data type of the elements of the array. All the elements of an array are of same data type. Common data types are int32, int64, float32, float64, U32, etc. array1.dtype dtype('int32') >>> array2.dtype dtype('
') >>> array3.dtype dtype('float64') v) ndarray.itemsize : It specifies the size in bytes of each element of the array. Data type int32 and float32 means each element of the array occupies 32 bits in memory. 8 bits form a byte. Thus, an array of elements of type int32 has itemsize 32/8=4 bytes. Likewise, int64/float64 means each item has itemsize 64/8=8 bytes. array1.itemsize 4 # memory allocated to each integer in bytes array2.itemsize 128 # memory allocated to string array3.itemsize 8 #memory allocated to float type vi) ndarray.nbytes : Lists the total size (in bytes) of the array: array1. nbytes 16 # Total memory allocated to array in bytes.
56
marks [0,4] index Out of Bound "Index Error". Index 4 is out of bounds for axis with size 3
array8 array([-2, 2, 6, 10, 14, 18, 22]) # excludes the value at the array2.dtype dtype('
') >>> array3.dtype dtype('float64') v) ndarray.itemsize : It specifies the size in bytes of each element of the array. Data type int32 and float32 means each element of the array occupies 32 bits in memory. 8 bits form a byte. Thus, an array of elements of type int32 has itemsize 32/8=4 bytes. Likewise, int64/float64 means each item has itemsize 64/8=8 bytes. >>> array1.itemsize 4 # memory allocated to each integer in bytes >>> array2.itemsize 128 # memory allocated to string >>> array3.itemsize 8 #memory allocated to float type vi) ndarray.nbytes : Lists the total size (in bytes) of the array: >>> array1. nbytes 16 # Total memory allocated to array in bytes. 56 >>> marks [0,4] index Out of Bound "Index Error". Index 4 is out of bounds for axis with size 3 #### Slicing of arrays: Getting and setting smaller subarrays within a larger array. #### Just as we can use square brackets to access individual array elements, we can also #### use them to access subarrays with the slice notation, marked by the colon (:) #### character. #### The NumPy slicing syntax follows that of the standard Python list; to access a #### slice of an array x, use this: Syntax: x[start:stop:step] #### Example: >>> array8 array([-2, 2, 6, 10, 14, 18, 22]) # excludes the value at the end index array8[3:5] array([10, 14])
: - 1] array([22, 18, 14, 10, 6, 2, - 2]) Now let us see how slicing is done for 2-D arrays. For this, let us create a 2-D array called array9 having 3 rows and 4 columns.
array9 = np.array([[ - 7, 0, 10, 20], [ - 5, 1, 40, 200], [ - 1, 1, 4, 30]])
array9[0:3,2] array([10, 40, 4]) Note that we are specifying rows in the range 0:3 because the end value of the range is excluded.
array9[1:3,0:2] array([[-5, 1], [-1, 1]]) If row indices are not specified, it means all the rows are to be considered. Likewise, if column indices are not specified, all the columns are to be considered. Thus, the statement to access all the elements in the 3rd column can also be written as: array9[:,2] array([10, 40, 4])
We can modify the shape of an array using the reshape() function. Reshaping an array cannot be used to change the total number of elements in the array. Attempting to change the number of elements in the array using reshape() results in an error. Example
array3 = np.arange(10,22) array array([10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21]) 106 Informatics Practices – Class XI array3.reshape(3,4) array([[10, 11, 12, 13], [14, 15, 16, 17], [18, 19, 20, 21]])
np.concatenate ((array1,array2), axis=1) array([[ 10, 20, 0, 0, 0], [-30, 40, 0, 0, 0]]) Example-2: In[43]: x = np.array([1, 2, 3]) y = np.array([3, 2, 1]) np.concatenate([x, y]) Out[43]: array([1, 2, 3, 3, 2, 1]) In[46]: # concatenate along the first axis np.concatenate([grid, grid]) Out[46]: array([[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]) In[47]: # concatenate along the second axis (zero-indexed) np.concatenate([grid, grid], axis=1 ) Out[47]: array([[1, 2, 3, 1, 2, 3], [4, 5, 6, 4, 5, 6]])
In[48]: x = np.array([1, 2, 3])
np.hstack([grid, y]) Out[49]: array([[ 9, 8, 7, 99], [ 6, 5, 4, 99]]) .
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points: We can split an array into two or more subarrays. numpy.split() splits an array along the specified axis. We can either specify sequence of index values where an array is to be split; or we can specify an integer N, that indicates the number of equal parts in which the array is to be split, as parameter(s) to the NumPy.split() function. By default, NumPy.split() splits along axis = 0. Consider the array given below: In[50]: x = [1, 2, 3, 99, 99, 3, 2, 1] x1, x2, x3 = np.split(x, [3, 5]) print (x1, x2, x3) [1 2 3] [99 99] [3 2 1]
Above table shows few functions defined in numpy on one or more types, covering a wide variety of operations. Some of these ufuncs are called automatically on arrays when the relevant infix notation is used ( e.g. , add(a, b) is called internally when a + b is written and a or b is an ndarray). Nevertheless, you may still want to use the ufunc call in order to use the optional output argument(s) to place the output(s) in an object (or objects) of your choice. Recall that each ufunc operates element-by-element. Therefore, each ufunc will be described as if acting on a set of scalar inputs to return a set of scalar outputs.
x, y = np.array([1, 2, 3]), np.array([4, 5, 6])
np.add ( x , 10) # Add 10 to each entry of x. array([11, 12, 13]) np.multiply(x , 4) # Multiply each entry of x by 4. array([ 4, 8, 12])
This function is used to return the floor value of the input data which is the largest integer not greater than the input value. Consider the following example.
Absolute: The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs: In[12]: np.absolute(x) Out[12]: array([2, 1, 0, 1, 2]) In[13 ]: np.abs(x) Out[13]: array([2, 1, 0, 1, 2]) Trigonometric functions
Out[30]: array([[ 1, 2, 3, 4, 5], [ 2, 4, 6, 8, 10], [ 3, 6, 9, 12, 15], [ 4, 8, 12, 16, 20], [ 5, 10, 15, 20, 25]])
Below table shows few Aggregation functions defined in numpy , used to perform many useful statistical operations on arrays. Below are the few basic aggregation functions used: Let us consider two arrays:
arrayA = np.array([1,0,2,-3,6,8,4,7]) arrayB = np.array([[3,6],[4,2]])
arrayA.max() 8
arrayB.max() 6
arrayB.max(axis=1) array([6, 4]) # if axis=0, it gives row wise maximum >>> arrayB.max(axis=0) array([4, 6])
arrayA.min() - 3 >>> arrayB.min() 2 >>> arrayB.min(axis=0) array([3, 2])
arrayA.sum() 25 arrayB.sum() 15 #axis is used to specify the dimension #on which sum is to be made. Here axis = 1 #means the sum of elements on the first row arrayB.sum(axis=1) array([9, 6])
arrayA.mean() 3.125 >>> arrayB.mean() 3.75 >>> arrayB.mean(axis=0) array([3.5, 4. ]) >>> arrayB.mean(axis=1) array([4.5, 3. ])