Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

OBSTACLE DETECTION WITH 3D MOBILE DEVICE, Study Guides, Projects, Research of Electronics

The challenges faced by visually impaired individuals in detecting aerial obstacles and proposes a mobile application that acts as a walking stick or a guide dog complement. The application uses distance measures taken from a stereo pair of images captured by a 3-D camera to detect obstacles and notify the user through acoustic signals or vibrations. The document also explains the pipeline of the obstacle detection approach and the scene reconstruction process.

Typology: Study Guides, Projects, Research

2014/2015

Available from 09/25/2022

ANUSHAC95
ANUSHAC95 🇮🇳

3 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Seminar Report 2015
Aerial Obstacle Detection With 3-D Mobile Devices
Department Of ECE
1
College Of EngineeringTrikaripur
OBSTACLE DETECTION WITH 3D MOBILE DEVICES
Blindness is considered the major sensory disability which determines to a
large extent the life of a person, the interaction with the environment and with the so-
ciety, and so on. A report of the WHO indicates that there were 285 million visually
impaired (VI) people in the world in 2010. These amounts includes different scales of
visual impairment, where the severe is blindness. One of the daily challenges faced by
a blind person is the autonomous movement. Regarding global orientation, there are
different GPS-based systems available in the market with specific cartographies and a
voice interface that solve this problem (e.g., the Kapten system). As for the detection
and obstacle avoidance, classic systems such as the walking stick and the guide dog
are the most used. Despite there exist technological advances in this field, they have
not became daily use tools for this community. This is due to the fact that the classic
systems achieve their goals successfully and the new developments are bulky and
uncomfortable, hindering the social integration of the user. In addition, these devices
often send acoustic signals via ear phones , which deprives the blind user of his main
information source: the sound.
Large open spaces are a challenging context for the VI. They are low-
structured environments, such as parks, where Vis have a limited number of struc-
tured references. In these environments, the traditional cane is also of limited help,
and most of the sensorial references are audible (traffic on the left/right, child playing,
and people chatting). In the literature, found some notable examples of mobility
developments for the VI. Some of them refer to text reading in the street (identifying
street names and/or bus lines). There are two main approaches to identify patches of
the image containing text: learning based and grouping based . The latter method has
been recently extended for dealing with severe blur . Factor graphs are also applied to
another important topic in mobility: crosswalks protocols. For finding the best
alignment between the user and the crosswalks, audio feedback is exploited to align
the VI properly. 360◦ panorama shave been incorporated and converted to an aerial
view of the nearby intersection for a later integration with google maps satellite
imaginary. Since, in general, GPS has a limited reliability because of the potential
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download OBSTACLE DETECTION WITH 3D MOBILE DEVICE and more Study Guides, Projects, Research Electronics in PDF only on Docsity!

OBSTACLE DETECTION WITH 3D MOBILE DEVICES

Blindness is considered the major sensory disability which determines to a large extent the life of a person, the interaction with the environment and with the so- ciety, and so on. A report of the WHO indicates that there were 285 million visually impaired (VI) people in the world in 2010. These amounts includes different scales of visual impairment, where the severe is blindness. One of the daily challenges faced by a blind person is the autonomous movement. Regarding global orientation, there are different GPS-based systems available in the market with specific cartographies and a voice interface that solve this problem (e.g., the Kapten system). As for the detection and obstacle avoidance, classic systems such as the walking stick and the guide dog are the most used. Despite there exist technological advances in this field, they have not became daily use tools for this community. This is due to the fact that the classic systems achieve their goals successfully and the new developments are bulky and uncomfortable, hindering the social integration of the user. In addition, these devices often send acoustic signals via ear phones , which deprives the blind user of his main information source: the sound. Large open spaces are a challenging context for the VI. They are low- structured environments, such as parks, where Vis have a limited number of struc- tured references. In these environments, the traditional cane is also of limited help, and most of the sensorial references are audible (traffic on the left/right, child playing, and people chatting). In the literature, found some notable examples of mobility developments for the VI. Some of them refer to text reading in the street (identifying street names and/or bus lines). There are two main approaches to identify patches of the image containing text: learning based and grouping based. The latter method has been recently extended for dealing with severe blur. Factor graphs are also applied to another important topic in mobility: crosswalks protocols. For finding the best alignment between the user and the crosswalks, audio feedback is exploited to align the VI properly. 360◦ panorama shave been incorporated and converted to an aerial view of the nearby intersection for a later integration with google maps satellite imaginary. Since, in general, GPS has a limited reliability because of the potential

proximity of buildings , images become the most reliable source of information. For instance, vision is used for guiding VI to a target. Another application for VIs in the context of mobility deals with aerial obstacle avoidance. These obstacles have no projection on the floor (typically tree branches, awnings, or similar elements).some examples of this type of obstacles are shown in fig.1.1. stereo-based SLAM has provided a method for finding stereo maps with a stereo camera carried by a human user. Having a short-term map computed on-the-fly, we are able to classify obstacles in front of the user as aerial or not-aerial. In this paper, we propose to adapt this kind of application to mobile devices (smart phones). In this regard, the main limitation to overcome is that SLAM-based short- term maps are too computationally demanding for a practical use, especially when real-time constraints arise. The structure of the environment could be also estimated through a monocular-based approach (see, for example, where a monocular SLAM system is integrated into a smart phone). These approaches are suitable because all the smart phones integrate a camera. However, the range information extracted with this kind of algorithms is up to scale. In other words, the relative scale of the data depends on the nature of the environment. Then, the scale of the data changes as the environment changes. In practice, this kind of algorithms only works in limited space environments. The main goal of this proposal is to develop a mobile application that acts as a walking stick or a guide dog complement. It does not replace these elements, but it solves their main problem, mat is, their inability to detect aerial obstacles. In the case of walking sticks, this limitation is obvious. Dogs cannot be trained to detect these obstacles, because they are not aware of the height difference between them and their owners. One of its main advantages is that the application is embedded into a smart- phone, obtaining a comfortable and discreet system that favors the user social in- tegration. Furthermore, the smart phone is also able to notify the presence of an obstacle by means of acoustic signals (through the phone speaker, not earphones) or vibrations. The latter option makes the system less noticeable and does not deprive the user of the sense of hearing. Our approach is based on distance measures taken from

Figure 1.1: examples of aerial obstacles Figure 1.2: smart phone endowed with a 3 - D camera

𝑡

1. AERIAL OBSTACLE DETECTION

The pipeline of this obstacle detection approach consists of four phases: cap- ture a stereo pair of images; obtain a set of 3-D points using a dense stereo algorithm; build a histogram of 3-D points in the direction in which the user is walking; and check for obstacles in the histogram.

2.1 SCENE RECONSTRUCTION

Let (𝐼𝐿, 𝐼𝑅) be the stereo pair of images provided by the camera at instant t. 𝑡 𝑡 Our goal is to obtain a set of 3-D points Pt = p 1 ,p 2 ,...,pN, where pi = (xi , yi , zi ) in metric coordinates with respect to the optical center of 𝐼𝐿. Mobile devices equipped with a 3-D camera provide a pair of rectified and prealigned images, so that the epipolar line of every pixel in the left image corre- sponds to the same row in the right one. This fact allows us to apply a dense stereo algorithm to obtain a disparity map Dt from the pair of images. The device also provides the extrinsic data from its stereo camera: focal distance f (in pixels) and baseline B (in meters). The 3-D scene can be reconstructed combining this informa- tion with the disparity map Dt. For each pixel i in the disparity image whose value is not unknown, a 3 - D point pi = ( xi , yi , zi ) can be obtained as follows: With ui and vi , being the coordinates of the pixel in the 2-D disparity image (with the origin of coordinates in the image center).

2.2 DISTANCE HISTOGRAM FROM 3 - D DATA

Let 𝑉⃗→t be the direction in which the user is walking at instant t. Only the obstacles found in this direction should be considered, and therefore, 3 - D data obtained in the previous step should be filtered to remove side obstacles.

Figure 2.1: Histogram and parallelepiped scanning Each bin Ht[i] represents the fraction of 3-D points contained between the planes s(i) 𝑉⃗→^ t and s(i + 1) 𝑉⃗→t of the parallelepiped. Ht represents a 1 - D distribution of obstacles in the walking direction. It is worth to remark that Pt has a projective nature, given that it is provided by a stereoscopic system. The higher the distance of observation, the higher the point sparseness. The trend of the degree of sparseness follows an exponential increase with respect to distance. This implies that cells Ht[i] will present a decreasing density as i increases, which is due to the anisotropic error distribution but not to the obstacles. To deal with this problem, a unitary square Ci is created for each bin Ht[i] at distance s(i) 𝑉⃗→t. The square is projected on the reference image, and we take the size Si of the projection. These sizes have the same projective nature than Ht[i], but in inverse order. The values of the histogram are also affected by the 3 - D occlusions of the points (each point of Ht[i] projects a 3 - D shadow over the following bins that decreases their densities). However, in our problem, the key obstacles are the closest ones, which are the least affected by this fact.

𝑡 𝑡 𝑡 𝑛 𝑛

2.3 OBSTACLE DETECTION FROM DISTANCE HISTOGRAM

Each cell in 𝐻 ^ represents a possible obstacle. A single observation may present obstacles at different distances. Hence, it follows that 𝐻is multimodal. Mean- shift is then used to separate it into different distributions, by using a uniform K-unit kernel. From the set of obtained centers, we keep the most significant ones at instant t, that is Ot = 01,02, ...,oN. The initial set of potential obstacles Ot may contain some phantom data due to the noise in the 3-D reconstruction step. A robust set of obstacles 𝑂is obtained by considering only the obstacles detected in the last M observations Ot-M+1 , Ot-M+2 , ….., Ot. An obstacle oi  Ou matches an obstacle oj  Ov if the distance between them in the histogram is less than K units, in consonance with the size of the mean-shift kernels. This guarantees that pairs of centers close enough will be discarded. Given the set of obstacles 𝑂, the one 𝑂^ with the lowest index n (the nearest 𝑡 𝑛 one to the user) is selected, whose distance is d( 𝑂 ) =n s. If this distance is below a given threshold (in our case 2m) then it is considered a potential threat and an alert signal (sound or vibration) is generated with a frequency inversely proportional to the distance d( 𝑂* ). Closer obstacles cause a higher alert frequency.

The interface allows us to configure different features: Mode, which could be obstacles (for walking assistance), telemeter (for free environment exploration) or pause; Alerts, which may be beep (acoustic signal) or vibration; Volume, which sets the volume of the system; Voice, which sets the speech velocity; Language, which sets the language of the application (English, German, French, or Spanish); About and Exit.

3. EXPERIMENTS

4.1 IMPLEMENTATION DETAILS

Besides the drastic changes that performed in the approach, the im- plementation has also suffered big changes, according with the new platform. Both 3- D smart phones (see Figure 4.1) are based on Android, whose principal language is Java. Nevertheless, used Qt1 for Android (also known as Necessitas), a C++-based SDK that generates the code directly on Android native, which is more suitable for real time applications. Figure 4.1: 3 - D Smartphone HTCevo-3D Also, used OpenCV4Android, the well-known computer vision library , for image manipulation. In order to speed up some parts of the algorithm, used parallelization strategies (through threading) that exploit the device dual-core pro- cessor, as well as vectorization strategies with neon intrinsics (a set of instructions similar to Intel SSE integrated with the ARM architectures). These tools are justified by the computational requirements of the problem and the limitations of the platform.

4.3 HISTOGRAM LINEARIZATION

In this experiment, explore the projective nature of the histogram. We have taken three observations of a single object (a fire extinguisher) at different distances. For each observation, we obtained its distance histogram Pt. Figure 4.3: Observations of a single object at different distances Figure 4.3(top) shows the raw histogram of each observation represented in different colors, and Figure 4.3(bottom) shows the linearized histograms P*. The horizontal axis represents the histogram bins ( consider a total distance of 4 m, and each bin is taken every 5 cm, so that we have 80 bins). The vertical axis represents the number of points in each bin (or the result of the linearization, in the linearized version). It can see that the raw histogram presents a variable density depending on the distance to the object, due to the effect of the projective geometry (the closer is the object, the wider is its area). Therefore, the value of the histogram bins cannot be directly compared, which makes mean-shift not applicable. In the linearalized version, the densities of different observations of a single object achieve (approximately) a balance.

4.4 OBSTACLE TRACKING

In this experiment, evaluate the robustness of the obstacle detection over time in two different environments: a park and a corridor (see Figure 4.4). In the figure, the horizontal axis represents the time, and the vertical axis is the distance histogram. That is, each column represents the distance histogram of each sequence frame (processed at 9 frames/s approximately), so that we can observe the evolution of the histogram over time. Figure 4.4: Tracking in a corridor environment The blue line represents the threshold we use to notify the user about the pres- ence of an obstacle (2m in our setting). The histogram represents 4m in total. The red points represent the obstacles that have been detected as real, that is, means obtained by mean-shift at a lower distance that the specified threshold and with tracking information enough to be considered a real obstacle and not a phantom. In the first environment, a tree is avoided. Note that once the tree has been avoided, it stops detecting this obstacle. In the second environment, we first get close to a wall, and then, we move away from it. We can-see this reflected in the shape of the plot

4.6 TESTS WITH VI USERS

The last experiment consists of several tests with blind users. In Figures we can see the people who have collaborated in this experiment: Maria Dolores (left) and Yolanda (right). Maria Dolores works in ONCE foundation as a psychologist. She is blind since she was 20 years old. She has an almost null residual vision(between2%and3%).She is only able to perceive light or darkness. Yolanda works as a counselor in a secondary school. She is a psychologist too. She was born blind and does not have any residual vision. Figure 4.6: First test with Maria Dolores Figure 4.6 shows a test with Maria Dolores. There is a palm tree leaf within the path. She walks slowly because she is not following a margin (she is walking in an open space). Some pictures of the scene taken from outside are shown in the first row of the figure, and the application visual log is shown in the second row. In the visual log, we can see the distance histogram over the image. In the left row, the obstacle has been detected. In the central row, a notification is sent, because the obstacle is closer than 2 m. In the right image, the obstacle has been avoided.

Figure 4.7: Second test with Maria Dolores A second test with Maria Dolores is shown in Figure 4.7. In this case, she is following the curb with the cane; hence, she walks faster. In the path, there is a fuzzy object: a bush. This kind of obstacles could not be detected by other sensors like sonar-based ones. The figure has the same format as the previous experiment: scene from outside (top) and visual log (bottom), before (left) and after (right) avoiding the obstacle.

the obstacle. For this reason, the alert threshold is set to 2m, but the obstacle tracking is performed from 4m. Dolores and Yolanda are our usual collaborators, but tested the approach with many other volunteers of the blind community. Here, we summarize the feedback that recovered from nine users that have tested the prototype. All of them consider that the problem we are facing represents a handicap in their lives, and a solution like this proposal could improve their quality of live. Seven of them agree with using a smart phone that could be reused for other useful tasks, while two of them prefer an ad-hoc cheaper platform. With respect to the interface and the accessibility of the application, most of them agree that it is easy to use (8 of 9). We have observed that all users get a full control of the application in a guided session of around 10 min. Finally, the best result that we have observed (that cannot show with data) is the great sensation that they experience in the first use, when they can sense the distances to the objects without touching them.

4. CONCLUSION

It is worth to highlight that the technology presented in this paper is new for this kind of devices. Until now, smart phones were not able to extract real measures from the environment. This application extracts about 30,000 real environment mea- sures per frame at 9 frames/s in commercial devices. The major limitation of this technology is the dependence on a hardware that must incorporate a 3-D camera. Our future work includes adapting this application to monocular devices. A way to do this is to incorporate a catadioptric device that splits a single-camera observation into two separated ones. Another alternative consists on rethinking the algorithm with a structure from motion (SFM) approach instead of the stereo one. This change could affect many parts of the approach, because the 3-D results of the SFM algorithms are up to scale, that is, only know the relative scale (depth) of a point with respect to the other points in the image, but the absolute scale is unknown and continuously changing.