














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Paper; Class: DATA MINING; Subject: MANAGERIAL SCIENCES; University: Georgia State University; Term: Spring 2007;
Typology: Papers
1 / 22
This page cannot be seen from the preview
Don't miss anything!
the performance of the model used to forecast future data recommendations derived from the analysis and the prediction model RE G RE SS I O N M E T H O D O L O G Y The main objective of this paper is to create a model by which physicians and patients can clearly see the risk factors surrounding heart disease and how greatly (or minimally) each of these factors contributes to the disease. This model will be created by way of the following processes: data description data preparation datasets dummy variables regression DATA DESCRIPTION Source The data used for this model comes from CorMac Technologies, which collected the data from four participating hospitals: Cleveland Clinic Foundation, Hungarian Institute of Cardiology, V.A. Medical Center (Long Beach, CA), and University Hospital (in Zurich, Switzerland). The data is publicly available at http://www.cormactech.com/neunet. Independent Variables The raw database contains 76 attributes of patients who were and were not diagnosed with heart disease. However, only fourteen of these 76 independent variables were actually used. The following table displays the independent variables beneath their specific type of variable: Demographic Numerical Classification (Non-Numerical in Nature) Age Resting Blood Pressure (in mm/Hg) Chest Pain Type Sex Cholesterol (in mg/dl) Exercise-Induced Angina Fasting Blood Sugar ST Depression Induced Resting ECG Slope of Peak Exercise ST Maximum Heart Rate Achieved Defect Classification Vessels Colored by Fluoroscopy
The complete list of variables and their descriptions are located in Appendix B. The list of those variables actually used in the prediction model are located in Appendix A. Dependent Variable The 58th^ variable (“num”) is the dependent variable, “diagnosis of heart disease.” This variable is a binary value that refers to the presence of heart disease as “1” (>50% narrowing of blood vessels) or no presence of heart disease as “0” (<50% narrowing of blood vessels). Observations The total number of observations for all four participating hospitals is
condition of angina the number of blood vessels colored by fluoroscopy Type of Heart Defect The type of heart defect a patient has can play a significant role in whether that patient will develop heart disease. The data in the original report classifies the presence of heart defects into three groups: normal (no defect) reversable defect fixed defect (no cure) The heart defect of “normal” would be no defect at all, and it has a small amount of cases where patients have been diagnosed with heart disease. The reversable defect seems to result in a higher incidence of heart disease than does no defect, and a fixed (or incurable) defect has a higher incidence even still. Resting Electrocardiograph Results The ECG results do the same thing as the test for a heart defect, as they rate each patient on the severity of those ECG results as follows: normal having ST-T wave abnormality showing probable or definite left ventricular hypertrophy These classifications of an ECG readout can be read similar to the results of the heart defect “scale.” “Normal” would classify a health patient, with little incidence of heart disease. “ST-T wave abnormality” is of greater importance to the patient, though it hardly directly relates to the onset of heart disease, as it can also be the effect of certain prescription drugs, neurogenic factors such as a such, and metabolic factors such as hypoglycemia.^2 However, left ventricle hypertrophy, a very serious condition, is “a thickening of your heart muscle’s main pumping chamber (left ventricle).” It causes the muscle in this area to become overworked, which leads to it wearing out and then eventually failing. A patient with this condition would obviously be extremely susceptible to heart disease.^3 Our calculations and analysis found this to be the case. Severity of Angina
The angina variable simply classifies the patient’s angina as either “good” or “bad.” Angina is, simply, chest pain. It usually occurs “when your heart muscle does not get enough blood.” A symptom of coronary heart disease, it is frequently a sign of atherosclerosis and can eventually lead to a heart attack.^4 There are three different types of angina identified by physicians; however, the data from this research only classifies angina into “good” and “bad,” which doesn’t allow for much in-depth analysis. Our regression analysisconfidently arrived at the obvious – “bad” results showed a sign for a diagnosis of heart disease, while “good” identified a healthy patient. SCORECARD Variable Range Points Age
Angina good -1. Blood Pressure
Chest Pain Typical Angina 0. Non-Anginal Pain -1.
0
1 Score 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.
RE C O M M E N D AT I O N S POSITIVE DIAGNOSIS INDICATORS According to the model, the patients most likely to be diagnosed with heart disease possess the following conditions: over the age of 59 pretty significant angina blood pressure over 150 heart rate over 156 NEGATIVE DIAGNOSIS INDICATORS The model also leads us to the following conditions most likely to identify patients without heart disease: under the age of 54 blood pressure less than 120 heart rate between 118 and 142
SUGGESTIONS Due to the relatively small number of observations available to researchers, the model is less than completely relaiable. A larger dataset would help to solidify the researchers' findings, as well as fine-tune the model to accommodate a larger population. Physicians and lab scientists could use this model to formulate a decision tree in order to quickly and accurately diagnose patients admitted to an emergency room. This would cut down on the likelihood of mis-diagnosis, which could lead to either death or unnecessary costs for both the patient and the hospital. A P P E N D I X APPENDIX A: USED VARIABLES # Code Description Values 3 age age in years
0 = female 5 painloc chest pain location 1 = substernal 0 = otherwise 6 painexer pain provoked by exertion? 1 = yes 0 = no 7 relrest relieved after rest? 1 = yes 0 = no 8 pncaden sum of #5,#6, and # 9 cp chest pain type 1 = typical angina 2 = atypical angina 3 = non-anginal pain 4 = asymptomatic 10 trestbps resting blood pressure (in mmHg on admission to the hospital) 11 htn 12 chol serum cholesterol in mg/dl 13 smoke smoke cigarettes? 1 = yes 0 = no 14 cigs cigarettes per day 15 years number of years as a smoker 16 fbs fasting blood sugar > 120 mg/dl? 1 = true 0 = false 17 dm history of diabetes? 1 = yes 0 = no 18 famhist family history of coronary artery disease 1 = yes 0 = no 19 restecg resting electrocardiographic results 0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria 1 = having ST-T wave abnormality 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria 20 ekgmo month of exercise ECG reading 21 ekgday day of exercise ECG reading 22 ekgyr year of exercise ECG reading 23 dig digitalis used during exercise ECG 1 = yes 0 = no 24 prop Beta blocker used during exercise ECG 1 = yes 0 = no 25 nitr nitrates used during exercise ECG 1 = yes 0 = no 26 pro calcium channel blocker used during exercise ECG 1 = yes 0 = no 27 diuretic diuretic used during exercise ECG 1 = yes 0 = no 28 proto exercise protocol 1 = Bruce
2 = Kottus 3 = McHenry 4 = fast Balke 5 = Balke 6 = Noughton 7 = bike 150 kpa/min 8 = bike 125 kpa/min 9 = bike 100 kpa/min 10 = bike 75 kpa/min 11 = bike 50 kpa/min 12 = arm ergometer 29 thaldur duration of exercise test in minutes 30 thaltime time when ST measure depression was noted 31 met mets achieved 32 thalach maximum heart rate achieved 33 thalrest resting heart rate 34 tpeakbp s peak exercise blood pressure (first of 2 parts) 35 tpeakbp d peak exercise blood pressure (second of 2 parts) 36 dummy 37 trestbpd resting blood pressure 38 exang exercise induced angina 1 = yes 0 = no 39 xhypo 1 = yes 0 = no 40 oldpeak ST depression induced by exercise relative to rest 41 slope the slope of the peak exercise ST segment 1 = unsloping 2 = flat 3 = downsloping 42 rldv5 height at rest 43 rldv5e height at peak exercise 44 ca number of major vessels colored by fluoroscopy 0 - 3 45 restckm irrelevant 46 exerckm irrelevant 47 restef rest raidonuclid ejection fraction 48 restwm rest wall motion abnormality 0 = none 1 = mild or moderate 2 = moderate or severe 3 = akinesis or dyskmem 49 exeref exercise radinalid ejection fraction 50 exerwm exercise wall motion 51 thal type of defect 3 = normal 6 = fixed defect 7 = reversable defect 52 thalsev not used 53 thalpul not used 54 earlobe not used 55 cmo month of cardiac cath 56 cday day of cardiac cath