

















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The text detection application developed in this project is a valuable tool for businesses and individuals who need to quickly and accurately extract text from images. The use of deep learning algorithms and advanced image processing techniques allows the application to accurately detect and extract text from a wide range of images, including those with complex backgrounds and varying lighting conditions. The results of the testing and evaluation phase showed that the application performs well, with an average accuracy of over 90% on a variety of test images. Overall, the text detection application is a valuable addition to the field of image processing and text extraction, and has the potential to greatly improve the efficiency and accuracy of text extraction tasks.
Typology: Summaries
1 / 25
This page cannot be seen from the preview
Don't miss anything!
PROF.S.P.PALASKAR Dr.V.S.GULHANE (Guide) (Head of Dept.)
Text detection application could describe the application's ability to automatically identify and extract text from images and videos using machine learning algorithms. It could also highlight the potential uses of such an application, such as assisting with OCR tasks, generating captions for media, and aiding in surveillance and security. The abstract could also mention the potential benefits of using a text detection application, such as increased efficiency and accuracy in text extraction tasks. Additionally, the abstract could briefly describe the underlying technology used by the application, such as machine learning models trained on large datasets of labeled images and videos.
Text in images can exhibit many variations with respect to the following properties : Geometry: Size: Although the text size can vary a lot, assumptions can be made depending on the application domain. Alignment: The characters in the caption text appear in clusters and usually lie horizontally, although sometimes they can appear as non-planar texts as a result of special effects. This does not apply to Text, which can have various perspective distortions. Text can be aligned in any direction and can have geometric distortions. Inter-character distance: Characters in a text line have a uniform distance between them.
The objective of a text detection application is to automatically identify and locate text in digital images and videos. This can be useful in a variety of applications, such as optical character recognition (OCR) for digital documents, automatic captioning of images and videos, and identifying text in surveillance footage. A text detection application can also be used to improve the accuracy of other computer vision tasks, such as object recognition and classification. Text Recognition Module The preprocessing the image to make it suitable for the Convolutional Neural Network (Deep Neural Network) This module can be used for text recognition in output image of pre-processing module and give output data which are in computer understandable form. Hence in this module following techniques are used. The module contains algorithms to detect text, segment words and recognise the text. It's mainly intended for the "text in the wild", i.e. short phrases and separate words that occur on navigation signs and such. It's not an OCR tool for scanned documents, do not treat it as such. The detection part can in theory handle different languages, but will likely fail on hieroglyphic texts. The recognition part currently uses open-source Tesseract OCR.
A text detection application typically uses a combination of computer vision and machine learning techniques to identify and locate text in digital images and videos. The specific details of how a text detection application works can vary depending on the specific implementation, but here is a general overview of the process. The application first pre-processes the input image or video to remove any noise or distortion, and to enhance the contrast of the text. This can involve techniques such as image filtering, resizing, and color space conversion. the application then uses a machine learning model to detect potential text regions in the image. This model is typically trained on a large dataset of images with labeled text regions, so that it can learn to identify text-like patterns in the input image. Once potential text regions have been detected, the application applies a series of algorithms to accurately locate and extract the text from the image. This can involve techniques such as character segmentation, where the model attempts to identify individual characters within the detected text regions, and text recognition, where the model uses OCR algorithms to convert the detected characters into digital text. the extracted text is then output by the application, either as a string of text or as a series of bounding boxes that outline the location of the text in the original image. This output can be used for a variety of purposes, such as automatic captioning, OCR, or further analysis by other computer vision algorithms. There are two buttons on the front page named “TAKE IMAGE” and “RECOGNIZE TEXT”. The “TAKE IMAGE” button is used to take image from either gallery or from mobile phone’s camera. This button is also connected with the “RECOGNIZE TEXT” button. This button will give the input image to the next button and show the input image in the imageView. Next the “RECOGNIZE TEXT” button gets the input from the “TAKE IMAGE” button. Now the image will be scanned and converted into respective output text using the trained dataset used in the project.
The DFD is also known as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system. It maps out the flow of information for any process or system, how data is processed in terms of inputs and outputs. It uses defined symbols like rectangles, circles and arrows to show data inputs, outputs, storage points and the routes between each destination. They can be used to analysis an existing system or model of a new one. A DFD can often visually “say” things that would be hard to explain in words and they work for both technical and non-technical. There are four components in DFD:
In this phase we’ve defined the technical details of the website. Depending on the project, screen design, database sketch, system interface and prototypes have been defined. Modules: ● Dashboard: As application get started application dashboard will appear consisting Two buttons named “Take Image” and “Recognize Image”. ● Take Image: The “Take image” button is used to get the image which is needed to be detected. After clicking this button options will appear which are “Camera” and “Gallery” It gives an option to user whether to use camera for capturing an image or to use existing image from gallery. ● Image view: For selecting an image for detection two buttons will appear ‘cross’ and ‘Tick’ button cross button for discarding an image and tick button for confirming an image. ● Recognize Text: Recognize text button will detect text in an image and will process those text to view in viewing area, this uses various Text detection algorithms typically work by first identifying potential regions in an image or video frame that contain text, and then using machine learning models to classify the identified regions and extract the actual text. ● The recognized text will appear here: This is area below Image that was selected before The text that is recognized will appear in this area also enables user to copy text from this area.
android:layout_height="match_parent"/> <com.google.android.material.button.MaterialButton android:id="@+id/recognizeTextBtn" android:layout_width="match_parent" app:cornerRadius="5dp" android:layout_weight="1" app:icon="@drawable/ic_baseline_scanner_24" android:layout_marginStart="5dp" android:text="Recognize Text" android:layout_height="match_parent"/>
android:layout_width="match_parent" android:layout_height="wrap_content" android:src="@drawable/ic_baseline_image_24" android:adjustViewBounds="true" app:strokeWidth="2dp"/>
Startup Interface of Our App It Consists Of App name And Two Buttons With Functions Of Take image to And Another of Recognise Text And the Footer Consist of The Recognised Text Which We can Copy and Modify.
As you Can See If We Click On Take Image Buttons Two Options Will Appear One is the take image from camera option and second is take image from gallery option.
Testing on an image containing image and objects in the image.
In the above image the text is successfully recognized from the given input image from gallery.