Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Understanding Computer Vision: Concepts and Techniques, Study notes of Computer Vision

Maulana Abul Kalam Azad University of Technology Computer Vision

Computer Vision, a core field of Artificial Intelligence (AI) that enables machines to interpret and make decisions based on visual data such as images and videos. It introduces key applications like self-driving cars, surveillance, and biometrics, and explores foundational concepts including optical character recognition, projection techniques, and levels of vision processing (low, mid, and high level). The material also covers historical milestones, notably the contributions of Larry Roberts, considered the father of computer vision.

Typology: Study notes

2024/2025

Available from 05/15/2025

onkar-mandal 🇮🇳

1 document

1 / 17

This page cannot be seen from the preview

Don't miss anything!

1/17

Partial preview of the text

Download Understanding Computer Vision: Concepts and Techniques and more Study notes Computer Vision in PDF only on Docsity!

Computer Vision is a field of Artificial Intelligence(AI) that enables the computer and systems to derive meaningful information from digital images, videos and other visual inputs - and take actions or make recommendations based on that information. If AT enables computers to think, computer vision enables them to see, observe and understand. Eg, Computer vision is necessary to enable self-driving cars. Manufacturers such as Tesla, BMW, Volvo and Audi use multiple cameras, lidar, radar and ultrasonic sensors to acquire images from the environment so that their self-driving cars can detect objects, lane markings, signs and traffic signals to safely drive. Optical Character Recognition Machine Inspection Retail 3D modelling building(photogrammetry) Automotive safety Match move Motion capture Surveillance Fingerprint recognition and biometrics wo ONO Rw Ae e Larry Roberts is commonly accepted as the father of computer vision. « Computer Vision came into existence during the 1960's Low Level Vision : Edge , Corner, Stereo reconstruction Mid Level Vision : Texture, Segmentation and Grouping , illumination High Level Vision : Tracking, Specific Object recognition , Category level object recognition Projection : Projection is a technique or process which is used to transform a 3D object into a 2D object in a projection plane or view plane. (representing an n dimensional object into (n-1) dimension). Projection is of two types: 1. Parallel projection 1/17 2. Perspective projection aes Dimebric| [Teimelvd Parallel projection are of two types orthographic and oblique. Orthographic projection can be multiview or axonometric view. Axonometric view can be isometric, dimetric or trimetric. Parallel projection another type is an oblique view, which is a general one. It is of two types: cavalier or cabinet. Perspective projection can be represented in one point, two point or three point. How to transform a 3D world object to 2D picture ? Try to forma ray between our eyes and a 3D object through a canvas. The ray hit the 3D object and bounced back to the canvas and painted colours of the 3D object to 2D surface. Parts help in transforming picture from 3D to 2D world : Extrinsic Parameters - It includes the camera's orientation i.e., Rotation(R) , Translation(T). The extrinsic parameters are the camera body configuration. Intrinsic Parameters : Spatial relation between sensor and pinhole (K) , focal length (L). It gives the transformation of the optical parameters. antiga parame Cea) Process of forming a 3D world object toa 2D picture : Imagine that we are an artist who is going to draw a picture of the world. We're standing in front of the canvas and looking 2/17 Blindspot : Spot where your optic nerve connects your retina Digitization is the process of converting information into a digital format. There is a processed pipeline to convert analog to digital image. * SAMPLING : Digitization wrt Coordinate values The sampling rate determines the spatial resolution of digitized image * QUANTIZATION : Digitization wrt amplitude. The quantization level determines the number of grey levels in the digitized image. REPRESENTING DIGITAL IMAGE b = M*N*K b of bits required to store a digitized image of size MXN IMAGE TYPES Binary Image : (b&w image ) : Each pixel contain 1 bit (1: black , 0: white ) Digital Image : Monochromatic / GrayScale /Intensity Image: Pixel value can be in range 0-255 Each pixel corresponds to light intensity normally represented in gray scale. Colour Image / RGB : Each pixel contains a vector representing Red, Green, Blue components. Index Image : Construct a look up table and each image is denoted by an Index number and each index number has its own RGB value. Interpolation : constructing new data points within the range of a discrete set of known data points. Image Interpolation : is a tool which is used to zoom, shrink and geometric corrections of an image ( re-sampling of images). Image interpolation refers to the “guess” of intensity values at missing locations Why Image Interpolation ? If we want to see an image bigger - When we see a video clip on a PC, we like to see it in full screen mode. If we want a good image : If some block of an image gets damaged during the transmission, we want to repair it If we want a cool image - Manipulate images digitally can render fancy artistic effects as we often see in movies 4/17 Zooming tells us that you are trying to expand the size of the image. Two step procedure e Creation of new pixel location « Assigning gray levels to those new location Methods : e Nearest Neighbourhood Interpolation e Pixel Replication e =©Bilinear Interpolation 100 D0 100 | 100 |120 | 120 150 200 100 | 120 100 | 100 [120 | 120 180 | 250 180 | 180 | 250 | 250 130 | 180 | 250 | 250 “Suppose an Image of Size 2x2 pixels image will be enlarged 2 times . ‘Lay an imaginary 4*4 grid over the original image. ‘For any point in the overlay, look for the closest pixel in the original image, and assign its gray level to the new pixel in the grid. ‘When all the new pixels are assigned values, expand the overlay grid to the original specified size to obtain the zoomed image. Limitation : it creates a checkerboard effect . When you are trying to replicate neighbourhood pixel values, sharpness of the image decreases. Tt is a special case that is applicable when the size of the image needs to be increased an integer number of times.(Eg: 5 times) ‘Double the size of the image ‘Duplicate each column Resampling method that uses the distance weighted average of the four nearest pixel values to estimate a new pixel value. If we have 3 pixels: p,q,z: p with (x,y) g with (s,t) and z with (v,w) Then: D is to be distance metric iff 5/17 Basic geometric primitives - points, lines, conics, etc e Points lying on an Euclidean 2D plane(like the image plane) are usually described as vectors: - 2 x=| |ER « This is a common way to reason about points but has some limitations. For example, we cannot define points at infinity Since the imaging apparatus usually behaves like a pinhole camera model, many of the transformations that can happen can be described as projective transformations. This offers a general and powerful way to work with points,lines and conics. The 2D projective space is simply defined as 0 P? = R* - |0 0 Consider two images f and g defined on the same domain. Then their pixel wise addition is denoted as f+g. Or consider a positive valued image f and a, the image log(1+f) that we got by taking the logarithm of all pixel values in the image. These two operations are examples of point operators. Point operations refer to running the same conversion operation for each pixel ina grayscale image. Thresholding - select pixels with given values to produce binary images. Adaptive Thresholding - like Thresholding except choose values locally Contrast Stretching - spreading out gray level distribution. Histogram Equalization - general method of modifying intensity distribution. Logarithm Operator - reducing contrast of brighter regions. 7/7 Exponential Operator - enhancing contrast of brighter regions. e In many photo editing and visual effects applications, it is often desirable to cut a foreground object out of one scene and put it on top of a different background. The process of extracting the object from the original image is often called matting , while the process of inserting it into another image (without visible artifacts) is calledJeomposttinig). e Compositing equation C = (1- a)B+aF. This operator attenuates the influence of the background image B by a factor (1 - a) and then adds in the color (and opacity) values corresponding to the foreground layer F. e The intermediate representation used for the foreground object between these two stages is called an alpha-matted color image. In addition to the three colour RGB channels, an alpha matted image contains a fourth alpha channel a (or A) that describes the relative amount of opacity or fractional coverage at each pixel . Pixels within the object are fully opaque (a = 1), while pixels fully outside the object are transparent (a = 0). Pixels on the boundary of the object vary smoothly between these two extremes. « Histogram equalization is an image processing technique that adjusts the contrast of an image by using its histogram. e Itis used to improve contrast in images. It accomplishes this by spreading out the most frequent intensity values, i.e., stretching out the intensity range of the image. « One of the most used application of pointwise image processing e Tonal adjustments are those adjustments and changes we make to the brightness and contrast of our image. 8/17 e Frequency domain filters are used for smoothing and sharpening of image by removal of high or low frequency components. e Frequency domain filters are different from spatial domain filters as it basically focuses on the frequency of the images. e It is basically done for two basic operations i.e., smoothing and sharpening Frequency Domain Filters High Pass Filters Band Pass Filters Classification of Frequency Domain Filters Low Pass Filter : Low pass filter removes the high frequency components that means it keeps the low frequency components. It is used for smoothing the image by attenuating high frequency components and preserving low frequency components. Mechanism of low pass filtering in frequency domain is given by: (uw) = H(uy) . Flu) . Where F(u,v) is the Fourier Transform of the original image and H(u,v) is the Fourier transform of filtering mass. High Pass filter : High pass filter removes the low frequency components that means it keeps the high frequency components. It is used for sharpening the image. It is used to sharpen the image by attenuating low frequency components and preserving high frequency components. Mechanism of high pass filtering in frequency domain is given by: H(uy) = 1-H (uv), Where H(u,v) is the Fourier Transform of high pass filtering and H' (u,v) is the Fourier transform of low pass filtering . 10/17 Band pass filter : Band pass filter removes the very low frequency and very high frequency components that means it keeps the moderate range band of frequencies. Band pass filtering is used to enhance edges while reducing the noise at the same time. Scale Invariant Feature Transform : is a feature detection algorithm in computer vision to detect and describe local features (KeyPoints) in images These keypoints are scale & rotation invariant that can be used for various computer vision applications, like image matching, object detection, scene detection, etc SIFT Detector SIFT Descriptor Features to consider while performing Matching Tf an image has a rich content, Well define signature, Well define position in image, Should be invariant to rotation and scaling, Should be insensitive to light. If you are locating an edge/line , a similar image is found in many places. You will not get correct patching Techniques : BLOB In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; Random Sample Consensus - developed by Fischler and Bolles Application : to separate inliers and outliers Basic Idea of Ransac :We try to find the best partition of points in Inlier and Outlier set and estimate the model from Inlier set. There will be points : 11/17 2; Non Contextual : ignores the relationship that exists between features in an image, pixels are simply grouped together on the basis of some global attribute, such as grey level. e Hierarchical clustering is an unsupervised machine learning algorithm which is used to group the unlabeled data sets into a cluster. e In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. e The hierarchical clustering technique has two approaches to build a tree from the input set S: 1. Agglomerative : > Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data points as single clusters and merging them until one cluster is left. (i.e until S is achieved as the root) > It is the most common approach. 2. Bivisive: > Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach. > Recursively partitioning S until singleton sets are reached. e Why hierarchical clustering? , In hierarchical clustering algorithms, we don't need to have knowledge about the predefined number of clusters. e Advantages , 1. Dendrograms are great for visualization 2. Provides hierarchical relation between clusters 3. Shown to be able to capture concentric clusters e Disadvantages , 1. It is not easy to define levels for clusters. 2. Experiments showed that other clustering techniques outperform hierarchical clustering. e K-Mean clustering is an unsupervised learning algorithm, which groups the unlabeled dataset into different clusters in such a way that each dataset belongs to only one group that has similar properties. 13/17 e Here K defines the number of predefined clusters that need to be created in the process, as if K=2,there will be two clusters, and for K=3, there will be three clusters, and so on. e It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distance between the data point and their corresponding clusters. «= The algorithm takes the unlabeled data as input, divides the dataset into k-number of clusters, and repeats the process until it does not find the best clusters. The value of k should be predetermined in this algorithm. « The K-Mean clustering algorithm mainly performs two tasks: 1. Determines the best value for K center points or centeroids by an iterative process. 2. Assigns each datapoint to its closest k-center. These data points which are near to the particular k-center creates a cluster. It stops when no points’ asignmentents change. K-Mean Clustering + Iterative Approach — Initialize: Pick K random points as cluster centers — Alternate: 1. Assign data points to closest cluster center 2. Change the cluster center to the average of its assigned points — Stop when no points’ assignments change « Advantages , 1. Very simple method 2. Converges toa local minimum « Disadvantages . 1. Memory-intensive 2. Need to pick K 3. Sensitive to initialization 4. Sensitive to outliers 14/17 image for further processing and analysis. Active contour is described as an active model for the process of segmentation. VIVA QUESTIONS 1. Quantization and Sampling Digitization is the process of converting information into a digital format. There is a processed pipeline to convert analog to digital image. * SAMPLING : Digitization wrt Coordinate values The sampling rate determines the spatial resolution of digitized image * QUANTIZATION : Digitization wrt amplitude. The quantization level determines the number of grey levels in the digitized image. 2. Bits for representing color pixel REPRESENTING DIGITAL IMAGE b = M*N*K b of bits required to store a digitized image of size MXN IMAGE TYPES Binary Image : (bdw image ): Each pixel contain 1 bit (1: black , 0: white ) Digital Image : Monochromatic / GrayScale /Intensity Image: Pixel value can be in range 0-255 Each pixel corresponds to light intensity normally represented in gray scale. Colour Image / RGB: Each pixel contains a vector representing Red, Green, Blue components. Index Image : Construct a look up table and each image is denoted by an Index number and each index number has its own RGB value. 1bit Image : (Binary Image) Each pixel is stored as a single bit (O or 1). Bbit gray level Image : Each pixel has gray value b/w 0-255. Each pixel is represented by 1 bit. 24bit colour image : Each pixel is represented by 3 bytes representing RGB. 256 x 256 x 256 colors (16,777,216 colours) 16/17 For colour pixel : 8bits 3. why we will get different colors by having different intensities of RGB Colour Mixing Theory Additive color mixing theory deals with mixing of light. The primary colors red, blue and green can be paired to form white (red, blue and green), magenta (red and blue), yellow (red and green) and cyan (green and blue). 4. Interpolation Interpolation : constructing new data points within the range of a discrete set of known data points. Image Interpolation : is a tool which is used to zoom, shrink and geometric corrections of an image ( re-sampling of images). Image interpolation refers to the "quess” of intensity values at missing locations Why Image Interpolation ? If we want to see an image bigger - When we see a video clip ona PC, we like to see it in full screen mode. If we want a good image : If some block of an image gets damaged during the transmission, we want to repair it If we want a cool image - Manipulate images digitally can render fancy artistic effects as we often see in movies 17/17

Understanding Computer Vision: Concepts and Techniques, Study notes of Computer Vision

Related documents

Partial preview of the text

Download Understanding Computer Vision: Concepts and Techniques and more Study notes Computer Vision in PDF only on Docsity!