Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

As an alternative to fully human-supervised algorithms, recently, there has recently been, Thesis of Engineering

In order to identify potential risks, it is important for companies to collect and analyze information about their competitors' products and plans. Sentiment analysis find a major role in competitive intelligence to extract and visualize comparative relations between products from customer reviews, this information can be used to improve product, marketing strategy and potential risk can be identified in early days. The volume, variety, velocity are properties of data, whether it comes from the

Typology: Thesis

2022/2023

Uploaded on 02/13/2023

dpark123
dpark123 🇮🇳

3 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Handling missing data in healthcare using multi modal stacked denoising encoder
CHAPTER 1
INTRODUCTION
AEC/CSE/Chikhli/21-22 Page 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download As an alternative to fully human-supervised algorithms, recently, there has recently been and more Thesis Engineering in PDF only on Docsity!

CHAPTER 1

INTRODUCTION

1. 1 Introduction

Supply and demand increase in response to healthcare trends. Additionally, Personal Health Records (PHRs) are managed by individuals. Such records are collected through different means and vary widely in type and scope depending on the particular situation. Therefore, some data may be lost, negatively affecting data analysis, so such data should be replaced with appropriate values. In this study, a method for estimating missing data using a multimodal autoencoder is proposed, which is applied in the field of medical big data. The proposed method uses stacked denoising autoencoders to estimate missing data that occurs during the data collection and processing stages. An autoencoder is a neural network whose output x^ value is similar to the input value of x. In the current study, data from the Korea National Health and Nutrition Examination Survey (KNHNES) conducted by the Korea Centers for Disease Control and Prevention (KCDC) were used. As representative healthcare data from South Korea, they contain a number of the same parameters as those used in PHR. Based on this, a model can be generated to estimate the missing data that occurs in PHR. Furthermore, PHR involves multimodality, allowing data to be collected from multiple sources on a single subject. Therefore, the applied stacked denoising autoencoder is configured in a multimodal setting. Through preprocessing, a set of data without missing values was designed in KNHNES. In dataset-based learning, a label is set as the original data, and an autoencoder input is set as the noise input, which additionally has as many random zeros as the noise factor. In this way, the autoencoder learns by making the zero-based noise values similar to the original label values. When the amount of missing data in the dataset reaches about 25%, the accuracy of the proposed method using multimodal stacked denoising autoencoders is 0.9217, which is higher than that of other common methods. For the single-modal denoising autoencoder, the accuracy is 0.932, with a slight difference of about 0.01, which is within the allowable range of data analysis. In terms of computational performance, the single- modal autoencoder has 10,384 parameters, which is 5,594 more parameters than those used in the multi-modal stacked autoencoder. These parameters affect the speed of the model. Both models exhibit significant differences in the number of parameters but relatively small differences in accuracy, suggesting that the proposed multimodal stacked denoising autoencoder outperforms single modal model. Additionally, multimodal models can save additional time when dealing with large amounts of data in locations such as hospitals and institutions. Healthcare big data involve complex relationships among the different parameters and are adaptable to changes in the surroundings. As a result, soft computing technologies that make predictions and deductions regarding the parameters or other particular circumstances have been highlighted. Soft computing is a technique designed to handle imprecise and uncertain data in which mathematical modeling is difficult or impossible to apply. Many real-world problems cannot be clearly defined, and soft computing is used to computerize such ill- defined problems. For example, the technique has been applied to find optimal answers to fuzzy propositions in

data analysis or learning, and a model generated from imperfect data tends to be less accurate during actual use. Duplicated or missing data can be estimated using values such as the mean, median, and mode, or using methods such as regression, neural network, singular value decomposition (SVD), or K-nearest neighbor (K- NN). Although an estimation using the mean value, median, and mode is simple to achieve, it is less accurate, which makes its application less viable. In addition, anestimation using a regression, SVD, or K-NN may achieve relatively high accuracy, but it requires user intervention, and extensive pre-processing is needed for algorithmic applications. Moreover, estimation using a neural network allows a model to learn the features from the data on its own, minimizing the need for user intervention. Therefore, in this study, a technique for estimating missing data, specifically the missing data in PHRs, using a multi-modal stacked denoising autoencoder in the area of healthcare big data is proposed. The goal of image classification is to decide whether an image belongs to a certain category or not. Different types of categories have been considered in the literature, e.g. defined by presence of certain objects, such as cars or bicycles, or defined in terms of scene types, such as city, coast, mountain, etc. To solve this problem, a binary classifier can be learned from a collection of images weight manually labeled to belong to the category or not. Increasing the quantity and diversity of hand-labeled images improves Tags: desert,nature,landscape,sky Tags: rose, pink Labels: clouds, plant life, sky, tree Labels: flower, plant life Tags: india Tags: aviation, airplane, airport Labels: cow Labels: aeroplane. Example images from MIR Flickr (top row) and VOC’07 (bottom row) data sets with their associated tags and class labels. The performance of the learned classifier, however, labeling images is a time consuming task. Although it is possible to label large amounts of images for many categories for research purposes, this is often unrealistic, e.g. in personal photo organizing applications. This motivates our interest in using other sources of information that can aid the learning process using a limited amount of labeled images. [2]

1.2 Denoising Autoencoders

An autoencoder is a neural network used for dimensionality reduction; that is, for feature selection and extraction. Autoencoders with more hidden layers than inputs run the risk of learning the identity function – where the output simply equals the input – thereby becoming useless. Denoising autoencoders are an extension of the basic autoencoder, and represent a stochastic version of it. Denoising autoencoders attempt to address identity-function risk by randomly corrupting input (i.e. introducing noise) that the autoencoder must then reconstruct, or denoise.

1.3 Stacked Denoising Autoencoder

A stacked denoising autoencoder is simply many denoising autoencoders strung together. It is to a denoising autoencoder what a deep-belief network is to a restricted Boltzmann machine. A key function of SDAs, and deep learning more generally, is unsupervised pre-training, layer by layer, as input is fed through. Once each layer is pre-trained to conduct feature selection and extraction on the input from the preceding layer, a second stage of supervised fine-tuning can follow. A word on stochastic corruption in SDAs: Denoising autoencoders shuffle data around and learn about that data by attempting to reconstruct it. The act of shuffling is the noise, and the job of the network is to recognize the features within the noise that will allow it to classify the input. When a network is being trained, it generates a model, and measures the distance between that model and the benchmark through a loss function. Its attempts to minimize the loss function involve resampling the shuffled inputs and re-reconstructing the data, until it finds those inputs which bring its model closest to what it has been told is true. The serial resamplings are based on a generative model to randomly provide data to be processed. This is known as a Markov Chain, and more specifically, a Markov Chain Monte Carlo algorithm that steps through the data set seeking a representative sampling of indicators that can be used to construct more and more complex features.

1.4 Self-Supervised Learning

As an alternative to fully human-supervised algorithms, recently, there has recently been a growing interest in self-supervised or naturally-supervised. These approaches make use of non-visual signals, intrinsically correlated to images, as a form of supervision for visual feature learning (Gomez et al., 2019). The prevalence of websites with images and loosely-related human annotations provide a natural opportunity for self-supervised learning. This differs from previous image-text embedding methods in that the goal is to learn generic and discriminative features in a self-supervised fashion without making use of any annotated dataset (Gomez et al., 2018).

1.5 Generalizability of Learning’s From the Web

Research has lately focused on joint image and text embeddings. Merging different kinds of data has motivated the possibilities of learning together from different kinds of data, which put more focus on the field of study where both general and applied research has been done. A Deep Visual-Semantic Embedding Model (DeViSE) (Frome et al., 2013) proposes a pipeline that, instead of learning to predict ImageNet classes, learns to infer the Word2Vec (Mikolov et al., 2013) representations of their labels. By exploiting distributional semantics of a text corpus of every word associated with an image provides inferences of previously unseen concepts in

CHAPTER 2

LITERATURE SURVEY

2.1 Healthcare Big Data

Healthcare big data refers to any data related to human health. Because advancements in information and communication technology have facilitated data collection in the field of healthcare, healthcare data are currently being collected by individuals, government agencies, and hospitals. Healthcare data can be classified into several categories, including personal genetic information, PHRs, and EMRs, depending on the target of the data collection. PHRs, EMRs, and lifelogs share common parameters, such as personal information and health screening items, and the composition of such parameters varies depending on the user. Parameters come in different forms, ranging from data obtained from surveys, such as the age, height, weight, and/or pre-existing condition of the individual, to contextual data, such as the weather, environment, and natural disasters. As data utilization becomes more diverse, trends in the healthcare industry are shifting from treatment-oriented to prevention- oriented healthcare. This, in particular, has motivated the emergence of precision medicine focusing on individual patients. Personal genetic information is unique information inherited from one’s parents, and the human genome contains sequences of approximately 3 billion base pairs. As a key element of precision medicine tailored to provide optimal and patient-centered healthcare services, such information is playing a leading role in increasing the efficiency of treatment while reducing costs. With recent developments in genetic engineering, increasing numbers of private companies are currently offering genetic testing services, consequently reducing costs. This also allows individuals to directly request for genetic testing, if desired, without having to visit a hospital. Similar to precision medicine, personal genetic information is used in human-centered healthcare services. PHRs consist of data collected through sensors, smartphones, and personal healthcare and wearable devices, as well as data containing any medical practices recorded by an individual. PHRs include data collected, viewed, and managed by the subject of the health data collection. The composition of a PHR depends on the personal interests or devices of the individual, and many studies dealing with the integration and utilization of PHRs are currently being conducted. PHRs are mainly used for daily health management, such as exercise, sleep, and weight management. For users requiring a follow-up, such as those with diabetes, PHRs can be used in conjunction with a personal health device for blood glucose monitoring. EMRs refers to any automated medical information in a hospital, such as diagnostic results, prescriptions, surgical records, and inpatient admission records. Computerization has enabled vast amounts of data to be processed, and EMRs incorporated into IT now serve as a basis for precision medicine. For promoting better health, national healthcare information is collected in a database by the state (acting as the main agent). In South Korea, KNHNES data are collected annually under the supervision of the KCDC. Such data contain information on the health and nutritional conditions to determine and evaluate the target indicators for

data, are being collected, and the appropriate use and research into various techniques are required according to the data characteristics. An estimation using the mean or median is a technique in which missing data are input through statistical calculations in a parameter column. Each column independently estimates missing data, and this type of estimation is available only for numerical data. Although an estimation method for calculating missing data estimates is simple and fast, the mean or mode does not consider the relationship between parameters and achieves a significantly lower accuracy for categorical data. Similarly, an estimation using the mode is also extremely simple to apply operationally. This type of estimation is primarily used for categorical data rather than numerical data. Similar to the mean or median, the model does not consider the relationship between parameters. Furthermore, if the estimation involves using the most frequently observed values, bias may be introduced into the data. If bias is present in the data, there may be unintended outcomes of the data analysis or learning. An estimation using k-NN involves searching for k nearest neighbors to the observations in which missing data occur and imputing such data using a weighted mean of the neighbors. k-NN is an algorithm designed for a simple classification and uses the feature similarities to predict the values of the new data. The algorithm typically yields higher accuracy than the mean or mode. However, it requires many computations and only works when the entire training dataset is stored in memory. Furthermore, the appropriate value of k should be determined because the algorithm is sensitive to outliers, and the outcome will vary depending on k. [13] An estimation using an SVD is a technique for predicting missing data using an output value x^ similar to input x by diagonalizing a matrix in linear algebra. This technique initializes a missing data value as 0 or as the mean of the column and iteratively applies an estimation through a linear combination of the k-most significant eigenvalue parameters until converging to the next threshold. An SVD is applicable to any m × n matrix. An orthogonal matrix is formed through an eigenvalue decomposition, and the created orthogonal matrix is used to generate the output value x^ similar to input x. The missing portion of x can be inferred from x^ for processing the missing data. An estimation using deep learning is a missing data prediction method that uses multiple weighted values generated through neural network learning. This method allows for a variety of representations based on neural network architectures. An autoencoder has a typical neural network architecture designed to estimate missing data. A stacked autoencoder is a learning method similar to a deep neural network (DNN) and is referred to as a deep network when the hidden layer comprises multiple layers. Autoencoders are neural networks that generate an output of value x^ similar to an input of value x and are mainly used for data compression and reducing the number of dimensions.

The encoder and decoder are symmetrically constructed: The network from the input layer up to the hidden layer in the middle is called the encoder, and the network from the output layer up to the hidden middle layer is called the decoder. The weights w and w^ of the symmetrical position is configured to be equal. If the hidden node is smaller than the input node in the neural network, input data compression and feature extraction can both be achieved. Autoencoders are generally used in pre-learning and to recover a source using the characteristics of the manifold and generative model learning. In deep learning, the use of a generative adversarial network (GAN) is also an approach to replace missing values. Using a GAN, we can construct a neural network that outputs x ^ similar to the input x. Owing to its neural network structure, the use of a GAN has been attracting attention since it was first proposed. This approach produces a new output value for input x according to the learning direction. In a learned GAN, a generator and a discriminator compete with each other to learn. This produces virtual data with an output similar to the distribution of the actual data. A GAN is highly regarded owing to its outstanding performance in numerous domains and has demonstrated excellent performance in image reconstruction. In addition, a neural network can be applied to continuous data because the image is converted into a vector and then calculated. However, a problem occurs in that learning becomes difficult, owing to the complicated structure. In addition, it is difficult to specify a specific time point for terminating the learning, resulting in a vanishing gradient from overfitting. Although a GAN is useful for generating new data, its learning requires much more training data than a normal neural network. Healthcare data require an output that is closest to the input. In addition, because the types of data that can be collected vary depending on the user’s particular situation, generalization and scalability are required. The processing of healthcare data using a GAN faces a problem in that the data are difficult to apply to a real situation owing to high learning difficulty and low scalability. In addition, healthcare data contain many variables, and thus it is necessary to consider the relationship between them. In [4], reviews or comments are classified into positive and negative. Traditionally the document classification was performed on the topic basis but later research started working on opinion basis. Following machine learning methods Naive Bayes, Maximum Entropy Classification (MEC), and Support Vector Machine (SVM) are used for sentiment analysis. The conventional method of document classification based on topic is tried out for sentiment analysis. The major two classes are considered i.e. positive and negative and classify the reviews according to that. In [5], Naïve Bayes is best suitable for textual classification, clustering for consumer services and Support Vector Machine for biological reading and interpretation. The four methods discussed in the paper are actually applicable in different areas like clustering is applied in movie reviews and Support Vector Machine (SVM) techniques is applied in biological reviews & analysis. Though field of opinion mining is latest

large models are required to train modern neural network models. Datasets for image classification and the fundamentals of convolutional image classification models and sequence NLU models are described in the following sections.

2.3 Image Classification Datasets

The ImageNet dataset is the main source for training high quality image classification models. Since the project’s inception, 14 million images have been labeled and added to the ImageNet dataset compared to the billions of images uploaded to the internet each day. One of the greatest contributions to ImageNet’s accuracy, and the time it takes to update the dataset, was the quality control process. Image labeling and the evaluation label accuracy was crowd sourced with Amazon’s Mechanical Turk2. The labeling precision of 80 randomly sampled classes of the original ImageNet DET dataset yielded an average of 99.7% accuracy. This suggested it was a reliable source of high-quality data, which justified the cost to build the dataset. The creators of the WebVision dataset showed that accurate image classification can be achieved using noisy images and the associated metadata taken directly from web searches. The WebVision 2 dataset3 contains over 16 million images and their metadata, such as descriptions, titles, and tags. The classification accuracy of models trained on the WebVision dataset offer comparable accuracy, and in some cases higher accuracy, to models trained using ImageNet, despite the presence of noise within the data. The creators of WebVision found that models that learn from web data differ from curated datasets in that they learned from the wide array of human annotations and captured the linguistic complexities of language more readily from metadata. Comparisons of models trained on WebVision to models trained on ImageNet showed the role that quantity can play in the accuracy of a model, despite the presence of noise. The class labels of image datasets are based on a database of English words known as WordNet. WordNet is organized in a hierarchy from general concepts to specific concepts. Small sets of similar words from WordNet were grouped together into synonym sets, which are often referred to as “synsets” in the literature. Approximately 21,000 synsets4 are used as class labels in ImageNet. The WebVision dataset is based on only 5000 of the synsets used to construct ImageNet.

2.4 Image Classification with Convolutional Neural Networks

In recent years, the use of CNNs led to significant progress in image classification tasks. This type of network is built from a set of layers designed to extract the salient spatial features within images. Early forms of CNNs like LeNet-5 (LeCun et al., 1989), essentially stacked pairs of two types of layers – 2D convolution and pooling. Convolutional layers are made of a set of square filters. Each filter is convolved over the input image,

producing a smaller intermediate output image. Pooling layers down-sample the output images by splitting the input images into a square matrix and passing forward the maximum value or average value. Two problems arise from deep stacks of these two types of layers. First, it is difficult to train very deep networks of this type because the gradient diminishes too rapidly during backpropagation, preventing the successful training of the most outer layers. This is often called the vanishing gradient problem in the literature. Second, large networks are computationally expensive. In CNNs, the computational expense increases quadratically with a uniform increase in network size. Residual Networks (ResNet) proposed by He and associates were designed to mitigate the vanishing gradient problem. Inception networks were designed to improve the efficiency of convolutional layers by introducing sparsity into the convolutions. This study employed both ResNet50V2 and Inception V3 as the CNN architectures for image classification. Additionally, transfer learning was exploited by using pre-trained weights for these models (pretrained on ImageNet). ResNets, Inception layers, and transfer learning are described in the following sections.

3.1 Existing System

Traditionally the document classification was performed on the topic basis but later research started working on opinion basis. Following machine learning methods Naive Bayes, Maximum Entropy Classification (MEC), and Support Vector Machine (SVM) are used for sentiment analysis. The conventional method of document classification based on topic is tried out for sentiment analysis. The major two classes are considered i.e. positive and negative and classify the reviews according to that. In [5], Naïve Bayes is best suitable for textual classification, clustering for consumer services and Support Vector Machine for biological reading and interpretation. The four methods discussed in the paper are actually applicable in different areas like clustering is applied in reviews and Support Vector Machine (SVM) techniques is applied in biological reviews & analysis. Though field of opinion mining is latest technology, but still it provides diverse methods available to provide a way to implement these methods.

3.2 Existing Technology or Algorithms

There is a large literature on semi-supervised learning techniques. For sake of brevity, we discuss only two important paradigms, and we refer to [5] for a recent book on the subject. When using generative models for semi-supervised learning a straightforward approach is to treat the class label of unlabeled data as a missing variable, see e.g. [1, 15]. The class conditional models over the features can then be iteratively estimated using the EM algorithm. In each iteration the current model is used to estimate the class label of unlabeled data, and then the class conditional models are updated given the current label estimates. This idea can be extended to our setting where we have variables that are only observed for the training data. The idea is to jointly predict the class label and the missing text features for the test-data, and then marginalize over the unobserved text features. These methods are known to work well in cases where the model fits the data distribution, but can be detrimental in cases where the model has a poor fit. Current state-of-the-art image classification methods are discriminative ones that do not estimate the class conditional density models, but directly estimate a decision function to separate the classes. However, using discriminative classifiers, the EM method of estimating the missing class labels used for generative models does not apply: the EM iterations immediately terminate at the initial classifier. Co-training [4] is a semi-supervised learning technique that does apply to discriminative classifiers, and is designed for settings like ours where the data is described using several different feature sets. The idea is to learn a separate classifier using each feature set, and to iteratively add training examples for each classifier based on the output of the other classifier. In particular, in each iteration the examples that are most confidently classified with the first classifier are added as labeled examples to the training set of the second classifier, and vice-versa. A potential drawback of the co-training is that it relies on the classifiers over the separate feature sets to be accurate, at least among the most confidently classified examples. In our setting we

find that for most categories one of the two feature sets is significantly less informative than the other. Therefore, using the classifier based on the worse performing feature set might provide erroneous labels to the classifier based on the better performing feature set, and its performance might be deteriorated. In the next section we present a semi-supervised learning method that uses both feature sets on the labeled examples, and we compare it with co-training in our experiments.[11]

3.3 Hardware and Software Requirements

Software Requirements  Python  Software libraries Pickle, Argparse, Sys, Numpy, Tensorflow, Tqdm  Datasets Possibly to refer MOSI, MOSEI and IEMOCAP datasets. Hardware Requirements  CORE I5 PROCESSOR  8 GB Ram / 500GB Hardisk

3.4 Proposed System Design

Handling of Missing Data using Multi-modal Stacked Denoising Autoencoder in Healthcare Big Data KNHNES data can be classified into health, health examination, and nutritional survey data. Health survey data consist of an individual’s lifestyle, family history, and disease, and medical records. A health examination survey consists of the pulse rate, blood pressure, weight, height, and blood glucose level of the individual. A nutritional survey consists of an individual’s meal frequency, meal size, water intake, and use of dietary supplements. KNHNES is not a typical PHR but has a high potential owing to its vast amount of data, which are also contained in the PHR. Approximately 600 parameters containing a number of health-related items are applied, most of which can be prepared and managed by individuals. Therefore, a model derived from KNHNES is highly applicable to a PHR. Moreover, KNHNES is a type of multi- modal data collected through a variety of modes. In fact, data from health surveys, health examinations, and dietary nutrition are collected through each mode according to the data collection path. During an integration process, missing data are introduced through diverse circumstances. If there are numerous transactions containing missing data, the outcomes of the data analysis will vary depending on the pre-processing techniques. This requires an appropriate processing technique that minimizes the effects of the missing data on the outcome of the data

strategy for parameter construction is needed to cluster parameters with a high mutual impact based on a feature analysis. An additional experiment was conducted to evaluate the performance of the models for different multimodal configurations. The characteristics of the parameters are examined through a similarity and cluster analysis. Using each estimation method, a combination of parameters is applied to construct the multi-modal data. Based on this, the following three datasets can be generated: similarity-based multi-modal data, cluster-based multi-modal data, and category-based multimodal data. [12]

3.5 Estimation of Missing Values using Stacked Denoising Autoencoder

The denoising autoencoder applied is a modification of the learning methods used in ordinary autoencoders. This autoencoder adds random noise to noise-free input data and learns with the aim of restoring the original noise-free data. It repeats the process of randomly adding noise to the input value and then restoring it to the original data. The stacked denoising autoencoder randomly selects a value from the original data before data entry and converts it into a 0. In neural network learning, missing data are normally estimated as 0. Similarly, when applying denoising autoencoder learning, the noise is estimated as 0 and restored to the original data when missing data occur. Accordingly, a value of 0 is entered when missing data occur, which in turn will be input using a non-zero predicted value through a trained neural network. When expressing the modality according to the data characteristics as a neural network, several hidden layers are required. In addition, when stacking hidden layers into multiple layers, various forms can be configured using a stacked denoising autoencoder. This is divided into unsupervised learning and supervised learning according to the learning method. Early stacked denoising autoencoders were introduced by applying a restricted Boltzmann machine (RBM) in the form of a deep belief network (DBN). This is used to overcome the huge amount of computation required, the local minima, and vanishing gradient problems. Currently, the use of various propagation functions and optimizers makes learning easier when using backpropagation. In this study, we experimented with a supervised stacked denoising autoencoder. Input x' is composed of data in which 25% of missing values (0) are randomly generated as noise. The label data consist of the original data x without missing values. The autoencoder is trained through a backpropagation with a loss of the MAE of output x^ and x generated as a result of one- time learning. For example, an autoencoder consisting of five hidden layers (54, 32, 16, 32, and 64) is as follows: 80(input) → 64(hidden) → 32(hidden) → 16(hidden) → 32(hidden) → 64(hidden) → 80(output), the first half of which is an encoder, ranging from 80 to 16, and the remaining half is a decoder, ranging from 16 to 80. The transactions in the source data have a missing value rate of approximately 25%. During stacked denoising autoencoder learning, the noise factor increases by 0.05, starting from 0.05 until it reaches 0.30. A rectifier

linear unit can be employed for the activation function and optimizer, respectively. The data selected in KNHNES for experiments are 14,688 data without missing value. Accordingly, in the step of Add Noise, as many input numbers as noise factor are randomly replaced by 0, and thereby a set of Noised KNHENS is created. At this time, the label is the original data without noise, and it is used to calculate an error of an output. Experimental data on 14,688 cases were randomly assigned as follows: 70% to the training data, 10% to the model validation data, and 20% to the test data. A total of 80 parameters were selected, with 80 input nodes and 80 output nodes. As the number of hidden layers increases, the repeated experiments show a lower accuracy and higher loss, which also occurs in general deep learning because healthcare data are not large in scale. Therefore, the KNHNES data show that the neural network structure is higher. The number of hidden nodes increases to half the number of input nodes, and the results of the repeated experiments show the smallest difference between the accuracy and loss of the training and verification data when 64 nodes are configured. By applying a stacked denoising autoencoder, missing data can be estimated, although learning using a single autoencoder will result in a high computational workload and loss in learning efficiency because there are numerous types of parameters used. An autoencoder is an ordinary type of neural network, and its performance varies greatly according to the learning method or configuration applied. For low-impact parameters, in particular, if the weights converge to 0, it may result in an increase in the computational workload because such convergence is of little significance, although valid values still remain. To achieve personalization and customization, healthcare models have been developed with a focus on small devices or smartphones, thus requiring an efficient model with a low computational workload. In this regard, it is necessary to reconfigure the training data based on such workload by considering the classification of the parameters and to construct a multi-modal autoencoder. Therefore, in this study, a technique for estimating the missing data using a multi- modal stacked denoising autoencoder is proposed. The proposed method involves handling missing data using integrated KNHNES data in a single-modal approach for each parameter class. For this purpose, several hidden nodes are required. In this structure, each single-modal autoencoder is merged into a hidden layer and then output as each single-modal. A total of 80 parameters consisting of existing chronic conditions, diagnosed chronic conditions, the time of the initial diagnosis showing chronic conditions, current treatments for chronic conditions, physical activities, exercise information, health examination information, dietary nutrition, and stress are selected. KNHNES data are arranged in a hierarchical form and can be classified into super- and sub- classes. For example, parameters sharing a common super- class can be categorized into a prevalence class, such as “pre- existing hypertension”, “pre-existing diabetes”, “pre-existing hyperlipidemia”, “diagnosed arthritis”, and “time of the first diagnosis with hypertension.” Single-modal data with a superclass, consisting of chronic conditions, physical activities, health examination information, dietary nutrition, and subjective health conditions, are