Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Visualization and Security, Assignments of Advanced Data Analysis

Discusses different data visualization tools and security techniques used with such data, specifically in healthcare.

Typology: Assignments

2021/2022

Uploaded on 10/01/2023

cat-orlando
cat-orlando 🇺🇸

1 document

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computer Science- CPSC 685
Big Data Analytics
Assignment#7: (100 points)
Objectives of this assignment:
Demonstrate an understanding of the fundamentals of data visualization and practice
communicating with data.
Demonstrate the ability to perform analyses using specialized big data analytics tools,
e.g. Tableau.
Demonstrate techniques for maintaining data security and privacy for massive storage.
Instructions to submit this assignment:
1) Download/install the Tableau tool “Tableau Public” free version from this link:
https://public.tableau.com/en-us/s/download
Answer all the following questions (using your own words) and upload the document in pdf
format to the corresponding D2L Dropbox.
Questions to be answered:
(10 points) Q1: Discuss two popular big data visualization tools?
Data visualization involves creating visual representation of a dataset to provide a better
understanding. This is important as it can help individuals see patterns that may not have been
identified otherwise. There are a number of ways in which data can be depicted some include
pie charts, infographics, maps, and bar charts. Data visualization tools provide a platform where
a user can input data and visually view it based on the different options provided. One of the
more popular tools is Tableau. This is an easy-to-use software where data from numerous
sources can be imported. It provides different tools such as clustering and calculations to be
used on the data. It has an interactive interface and can convert data into “interactive
graphics”. In addition to the provided features, users can create their own calculations and
apply it to the dataset. This allows for customization based on the need of the user. Another
popular data visualization tool is Plotly. This tool focuses on graphs. Data is imported and can
then be translated visually onto different types of interactive charts and maps. A unique feature
of this tool is that each graph is assigned its own URL. This allows for easy sharing and provides
others with interactive information regarding how the graph was created. This is a useful
benefit as other users can better understand the information being shared instead of having to
filter through code (Costa 2022).
(40 points) Q2: After reading the attached article, answer the following two questions:
2.1 Summarize two technologies that have been discussed in the article for
protecting the security and privacy of healthcare data?
pf3
pf4
pf5

Partial preview of the text

Download Data Visualization and Security and more Assignments Advanced Data Analysis in PDF only on Docsity!

Computer Science- CPSC 685

Big Data Analytics

Assignment#7: (100 points) Objectives of this assignment:  Demonstrate an understanding of the fundamentals of data visualization and practice communicating with data.  Demonstrate the ability to perform analyses using specialized big data analytics tools, e.g. Tableau.  Demonstrate techniques for maintaining data security and privacy for massive storage. Instructions to submit this assignment: 1) Download/install the Tableau tool “Tableau Public” free version from this link: https://public.tableau.com/en-us/s/download Answer all the following questions ( using your own words ) and upload the document in pdf format to the corresponding D2L Dropbox. Questions to be answered: (10 points) Q1: Discuss two popular big data visualization tools? Data visualization involves creating visual representation of a dataset to provide a better understanding. This is important as it can help individuals see patterns that may not have been identified otherwise. There are a number of ways in which data can be depicted some include pie charts, infographics, maps, and bar charts. Data visualization tools provide a platform where a user can input data and visually view it based on the different options provided. One of the more popular tools is Tableau. This is an easy-to-use software where data from numerous sources can be imported. It provides different tools such as clustering and calculations to be used on the data. It has an interactive interface and can convert data into “interactive graphics”. In addition to the provided features, users can create their own calculations and apply it to the dataset. This allows for customization based on the need of the user. Another popular data visualization tool is Plotly. This tool focuses on graphs. Data is imported and can then be translated visually onto different types of interactive charts and maps. A unique feature of this tool is that each graph is assigned its own URL. This allows for easy sharing and provides others with interactive information regarding how the graph was created. This is a useful benefit as other users can better understand the information being shared instead of having to filter through code (Costa 2022). (40 points) Q2: After reading the attached article, answer the following two questions: 2.1 Summarize two technologies that have been discussed in the article for protecting the security and privacy of healthcare data?

One technology/method discussed in this article for protecting the security of healthcare data is access control. This is a method that limits a user’s access in a network after they have been authenticated. It assigns privileges to each user based on permission they have received either by a patient or “trusted third party”. This will allow them to access protected health information. This helps to control what users can see, permitting them access only to the data they need and nothing more. It also helps to limit the number of users moving in a network (Abouelmehdi et al., 2017). Another method mentioned in the article is encryption. This is a very popular type of data protection in the healthcare industry where private information is encoded. This makes it unreadable and only those with the decryption key can read it. This limits the number of people who have access to the data and can help to reduce breaches. This is especially important because modern day healthcare commonly involves transmitting data. It can occur either between professionals or to the patients themselves. Therefore, the data should be encrypted during this transfer process to prevent hackers from gaining access. It is important to note that for this to be effective the technology should be efficient, easy to use, and easy to add new information. In addition, those with the decryption key should be limited (Abouelmehdi et al., 2017). 2.2 Summarize two methods that have been discussed in the article for privacy preserving in big data? One method mentioned in the article is de-identification. This process involves removing any information that can directly identify a patient. This can be done by removing specific identifiers that can trace back to a patient or by the patient verifying themselves enough that identifiers are deleted. These are not necessarily the best methods for protecting privacy, which has resulted in concepts like k-anonymity. This refers to a k value, in which the higher it is the less likely re-identification will occur. The issue with this is k-anonymization can lead to information loss. This has led to the k-anonymity extension, L-diversity. This protects datasets by “diminishing the granularity of data representation”. There are still some issues with this method as it “depends upon the range of sensitive attributes”. Inserting fictious data can cause problems during analysis which can lead to skewness. This led to T-closeness, an extension of l-diversity. It treats attributes distinctly and can intercept disclosure. The main issue with this, however, is that re-identification increases with the size and variety of the data (Abouelmehdi et al., 2017). Another method mentioned is HyberEx, hybrid execution model. This is specifically for protecting data in cloud computing. This model puts data into certain types of clouds based on its privacy. For example, information that is not

B.

C.

The next image shows the clustering. This image uses circles and squares to show which were clustered correctly and which were not.