


















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Here is a document and it contains a lot of information about NLP
Typology: Schemes and Mind Maps
1 / 26
This page cannot be seen from the preview
Don't miss anything!
A Seminar Report On Natural Language Processing Submitted by Kuthare Mangal Wakde Sneha Hulgudwad Saloni B Tech SY Information Technology 2023- Guided By Mr. A.Waghmare (Department of Information Technology)
MGM’s College of Engineering, Nanded
Dr. Babasaheb Ambedkar Technological University, Lonere I
This is to certify that the report entitled
Kuthare Mangal Wakde Sneha Hulgudwad Saloni
MGM’s College of Engineering, Nanded
Dr. Babasaheb Ambedkar Technological University, Lonere
II
ABSTRACT
Touchless technologies enable consumers to complete touch-free transactions with systems. Rather than requiring physical touch, transactions can be conducted through touchless interactions such as gestures; object, facial, or speech detection; or conversational AI. There is an important distinction to note in the differences between touchless and contactless technologies. Touchless technologies allow users to interact with systems without any form of physical input. For example, a customer can use a voice command to select a digital menu item rather than pressing a button. Contactless technologies reduce physical contact between people. Scanning a quick response (QR) code on a phone to complete a purchase, rather than handing an attendant a credit card, is an example of contactless technology in use. Businesses can combine the use of touchless technologies and contactless technologies that help to reduce contact with another human, such as self-checkout or buy online and pick up in store (BOPIS) pickup lockers, to provide customers with increasingly frictionless and safer experiences. IV
V
Natural Language Processing: VII
Natural language or ordinary language is any language that naturally evolved in human. Involvement of human language comes through use and reputation without having proper and intentionally planned. Natural language is can be counted in different forms such as speech, singing, facial expression, signs and body gestures. Naturally developed language is actually human in habitant adoptions which are built on the basis of different words, signs, gestures and other activities. In recent years artificial intelligence is occupying important applications of human life. This made easy to come on new evaluation of technology with several new opportunities. Due to rise of importance of artificial intelligence many new sub- fields have arisen to contribute in human life. Many applications in this time arisen due to important contribution. Artificial intelligence is evolved in many fields of life such as education, health, and agricultural, natural language interpretation and also in all aspects of life. Rapid and efficient contribution of artificial intelligence shows greater importance in real life activities. Few of important fields of artificial intelligence include evolutionary computation, vision robotics, expert system, speech processing, planning, machine learning and Natural language processing. As this chapter is concern with natural language processing.
Natural Language processing (NLP) is subfield of artificial intelligence which focuses computational linguistics interpretation. This field encompasses several areas of textual and audio interpretation by integration of machine learning methods which behaves statistically. It is also covering the area of the pragmatic research of computational linguistics became very vast and powerful by implementation of various techniques. (J. Li, Chen et al. 2016). The increasing availability and capability of NLP techniques which improve computational language accuracy and improvement day by day. NLP and Machine learning are most focused areas of research. NLP is mostly influenced by other fields such as psychology, cognitive science, linguistics and many other fields. It is concern with computational models of engineering which processed to solve the human interaction and human language understanding. For this several software packages are developed for the language modelling areas for the interpretation of the computational language which human can interpreted easily.
Subjective representation in the world is basic concept of human phycology which is main cause representation of subjective experience. In this context subjective experiences are encountered by five basic senses of human and natural language (I. Li et al. 2018). This is considered as subjective consciousness of mind by the integral part of five basic senses as tactician, olfaction, gestation, vision and audition. As a subject one can hear voice, see images, taste flavor, feel tactile and smell odors in some natural and philological context that is also called natural language or universal language. Due to this fact NLP some time referred as the study of structure of subjective experience.
Consciousness of human mind is important concern of NLP which based on two important parts consciousness and unconsciousness components. Subjective representation which occurs in awareness is consciousness whereas subjective representation which occurs outside of awareness referred as unconsciousness mind.
Human cognitive learning starts when he/she born. The learning process begins by the active five senses of human. Whereas, NLP learning is based on the derivative leaning principles termed as modelling. Computerized programmed model which bases one subjective learning experience from consciousness of mind by five basic senses. NLP required a detailed description of sequence of sensory and linguistic representations which is followed by a comprehensive and detailed codification process.
NLP can be sub-divided into two broad categories the core area and application area which deals with two different areas of research. Core areas of NLP cover and investigate basic or fundamental problems which cover language modelling. Relationships of words in language which have naturally occurrences can be encountered. Other than these NLP core areas covers morphological processing or dealing with the discrimination of meaningful components of words. Syntactic parsing which builds sentence diagram which is used to attempt a suitable language text processing. Semantic processing itself is used to distil meaning of words, sentences, phrases and higher level of abstraction in piece of text. NLP is core part of personal improvement, phobias and anxiety (Colneriĉ, N. 2018). Machine interpretations which are use the perceptual and interactive thoughts by communication techniques to make conveniently change thoughts and ideas. However, NLP is to deal, convert text and understand human language but don’t be confused with the idea NLP is only concern with language and its interpretation it has very broader area of interest in these days.
Neuro-linguistic programming an artificial intelligence approach for human communication with translation of grammatical context. It is for personal and interpersonal development, psychotherapy created by Richard Bandler. Historically came in 1950 with integration of artificial intelligence and human natural language. The core basic part was information retrieval (IR) for natural language text which actually employs techniques of high scalable techniques to search and index large volume of text from database. Today most of research believes that existence of NLP field in computer world came about 1970. Richard VIII
Relevant features measures accuracy as per needed for prediction whereas irrelevant features shows features does not compatible for prediction accuracy. Normally appropriate prediction accuracy needs to select both strongly relevant features as well as weakly relevant features selection. Noisy and irrelevant features should be vanished for good and effective prediction model. This is important to find out the relevancy of features either its weak or strong. It does not mean one feature is weak that is completely worthless or not important but this can be very important with combination of another strong feature. X
Chapter 2 NLP models
Machine learning models help to learn from statistical input data and react as automated output. These models are different in nature and function specific mathematical models. Machine learning comprises on many of approaches which is used efficient classifying, clustering and predicting accurately. In this context machine learning generally classifies these approaches in supervised, unsupervised and semi-supervised machine learning approaches. These approaches have different methods and models to give best data modification, classification and prediction (Alishahi et al., 2019). For this each category is described in detail below:
Supervised machine learning models task of learning a function on input to an output based on input and output pairs. Supervised learning deduces a function from a labelled data and training data consist of a set of training examples. Each pair of consists an individual object however this pair may be having same features. Supervised learning based on set of input having labelled data sets (Jung & Lee, 2019).
Support Vector Machine imparts machine learning in natural language processing encountered crucial application of computational linguistic word classification and text categorization. SVM helps to investigate and solve conventional passive learning (Rameshbhai & Paulose, 2019). SVM may involve solving natural language processing issue like imbalanced training data and difficulty of obtaining sufficient training data. SVM is crucial part of Natural Language processing research such as POS (parts-of-speech) tagging, Word sense disambiguation, NP (non-phrase) chunking, information extraction, relation extraction, semantic role labelling and dependency analysis. These applications involve multi-class classification task; the next step is to convert multiclass into binary classification; then classifier is trained for binary classification and finally combined classifier result is obtained. XI
Fig 2: Decision Node Figure 2 shows decision tree. Decisions trees also can be applied to resolve probabilistic grammar which is important to solve prepositional phrase attachment. They have also important role to develop statistical models for parsing.
Random forest is supervised machine learning model in machine learning. Random forest classifies based on results obtained from decision trees (I. Li et al., 2019). However, mode of targeted output for each decision tree is out of forest. Random forest generates random tree on random samples training data. Therefore, model performance and controlling over fitting can be overcome by reducing variance in overall model. It is based on collection of randomly constructed decision trees and number of classifies on the basis of several subsamples of datasets. Fig 3: Random Forest XIII
Figure 3 describes about Random forest how to works more successfully in classification and regression applications. Random forest in language modelling is one of crucial approach which is used to solve problem of predicting text based on randomly grown decision trees (DT). It can generalize data more advanced level for prediction of text.
K-nearest neighbour is supervised statistical model based on pattern recognition model. It has important application in Natural Language processing-e language text categorization. K-Nearest Neighbour searches the nearest neighbour from pre-defined text document. It works on some similarity scores and rank that k neighbour based on similarity (Jung & Lee, 2019). Fig 4: K nearest neighbour Figure 4 shows the KNN to find the component of neighbour from set to test data or similar data. This similarity is further can be used to predict text document. Further if more than one neighbour belongs to same category than sum of their score as weight of this category and height score is assigned to the test document (Al-Makhadmeh & Tolba, 2019).
Unsupervised learning refers to learn from data set without being supervised, known or labelled. As contrary supervised learning methods unsupervised machine learning cannot infer classification or regression directly. It is due to untrained or non- labelling bunch of datasets and it causes difficulty to train model as normally someone does. However unsupervised learning can be used for discovering structure of data. It is important approach for applications in which precisely unknown data patterns are used to discover. In this context be aware most of time these patterns are poor approximations as supervised machine does effectively. It often used in situation where desired data output is not obtained as per application or experiment analysis-e determining target of student learning with full satisfaction (Jung & Lee, 2019). XIV
data points and cluster is most important factor that actually defines value of K. Clustering is important for the populated data or data points should be grouped in similar format or closely related characteristics.
Fuzzy clustering is method of clustering in which allow one piece of data to different clusters. Fuzzy clustering partitions clustering method which is actually generalize the membership of data characteristics to other clusters. However clusters are identified by similarity measures (Al Makhadmeh & Tolba, 2019). Fig 3: Fuzzy clustering Figure 2 shows Fuzzy clustering, follows a finite set of partition of n elements. It has XVI
important applications areas where fuzzy clustering can be used these areas includes bioinformatics, image analysis and marketing.
Hidden Markov model is finite set of states; each state is having relation or association with probability. Each state transition is given by set of probabilities i-e transition probability. HMM is statistical model that is actually variation on Markov Chain. It has many rich applications in machine learning, NLP and data mining tasks including text pattern recognition, handwriting writing recognition, speech synthesis, parts of speech tagging and speech recognition(Al-Makhadmeh & Tolba, 2019). Other applications include Gene prediction, machine translation, time series analysis, HMM helps programs come to the most likely decision logic based on both previous decisions (like previously recognized word or sentence) and current data provided (such as audio snippet). Fig 4: Hidden Markov Model Figure 3 describes 3 steps for the HMM. HMM is mostly used toddy in word vocabulary and word management games which helps to improve language skills as well natural language processing i-e Computational Processing. A detailed illustration of HMM model is given below in diagram.
Association rules can be used for mining association between items from unconstructed data with some modifications (Al-Makhadmeh & Tolba, 2019). Fig 5: Association Analysis XVII
Entity Recognition, POS tagging Sentiment analysis, Document identification, gender identification and so on) Clustering (automatic document arrangement, theme extraction, Set of text in different classes, set of unlabeled text, filter the text and so on) Regression (Examine the relationship between two or more classes, dependent and independent variables) Pattern recognition in different perspective, Decision making, Visualization, Computer vision and others (I. Li et al., 2019). Four most important application of NLP (Natural language processing) which are:
Machine Translation (MT) is the sub area of the computational linguistic that findout to used software to transliteration text or speech from one script to another through translation process. Machine Translation (MT) is the area of study that computer capable to learn with using programming skills. Machine Learning Translation is one of the most motivate technology that one would have ever comes across in the world. Now computers are more similar to humans because using NLP Application to interact different people at same time without consuming human effort. Machine Translation is automated Translation of text either written form or speech form.
RBMT, develop many years ago, was the first practical approach to machine translation. It works by parsing a source sentence to identified text and analyze its nature of text, and then converting it into the appropriate language based on rules of linguistics. Actually, rules are defined for the language translations. Rule based translation replaced by Statistical machine learning or Hybrid system (Olex et al., 2019), (Jung & Lee, 2019).
Statistical Machine Translation related to training with huge amount of data either using multilingual, bilingual and monolingual corpora. The system relation between source text and transliteration, both attributes are used in SMT then generate final results like that given source text would achieve translation. Machine Translation itself nothing no use rules of grammar and punctuation. The Machine Translation engine mostly used today Google Translate, Bing Translator and so on. Also, other Machine Translation tools are available for linguistic translations on different platform. The SMT is that provide facility to measure of percentage of translation and also provide avoid the handcraft translation (Jung & Lee, 2019).
Example-based Machine Translation system (EBMT), a sentence of any language is translated by a translator. Sentence translation is easy and result will be comes accurate most probably. Otherwise huge amount of text translates by using single click so results comes in the form of lot ambiguity in text. Large amount of sentences are taken by lot of time in XIX
execution (Alishahi et al., 2019), (Al-Makhadmeh & Tolba, 2019). Example-based Machine Translation system (EBMT), a sentence of any language is translated by a translator. Sentence translation is easy and result will be comes accurate most probably. Otherwise huge amount of text translates by using single click so results comes in the form of lot ambiguity in text. Large amount of sentences are taken by lot of time in execution (Alishahi et al., 2019), (Al-Makhadmeh & Tolba, 2019). Example-based Machine Translation system (EBMT), a sentence of any language is translated by a translator. Sentence translation is easy and result will be comes accurate most probably. Otherwise huge amount of text translates by using single click so results comes in the form of lot ambiguity in text. Large amount of sentences are taken by lot of time in execution (Alishahi et al., 2019).
In the first condition, Text is translated first by RBMT engine and then process by machine and corrects errors when occur. Second condition the RBMT engine do not translate the text but support SMT engine by using input data. (Data in different nature is either Present/Past) (Al-Makhadmeh & Tolba, 2019). The two main categories of hybrid systems are:
Text summarization is divided text into small pieces of text. Most common problem in natural language processing (NLP) is automatic text summarization. Automatic text summarizations have two main approaches how to summarize the text in NLP. One is Extraction based and second Abstraction based summarization (Al-Makhadmeh & Tolba, 2019).
The extraction based summarization technique take document of text and combine the whole text to make a summary. The extraction is prepared through already defined text without make any change (Al-Makhadmeh & Tolba, 2019). For example: Text : Ali and Alia to attend the marriage ceremony of cousin in Dhaka. In the city, Alia gave birth to a child named Yousif. Text summary : Ali and Alia attend marriage ceremony Dhaka. Alia birth Yousif. Ali, Alia, attend, marriage ceremony, Birth and Yousif these words are extracted words and combine to generate summary of words. Sometime summary is completely out of sense.
XX