Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Performing a Paired t-Test in SPSS: An Example with SCI_PHYS and SCI_LIVING Data, Lecture notes of Statistics

In this document, we explore how to perform a paired t-test using spss software to investigate if there is a significant difference between the means of two continuous variables, sci_phys and sci_living, measured for all observations in a dataset. We will check the normality of the differences between the two variables and then perform the paired t-test to determine if students scored better, on average, in the living systems domain compared to the physical systems domain based on the 2015 pisa data.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

jennyfer
jennyfer 🇬🇧

5

(5)

236 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Centre for
Multilevel
Modelling
The development of this E-Book has been
supported by the British Academy.
Paired t test practical
In this practical we are going to investigate how to perform a paired t-test using SPSS. A paired t-test is used when we have two
continuous variables measured for all observations in a dataset and we want to test if the means of these variables are different. The test
assumes that both the variables are normally distributed. To run a single test in SPSS requires that your dataset has two separate columns
containing the two variables to be tested. In this situation we could perform a standard 2 sample t-test by reshaping the two variables into
one long variable with an accompanying indicator column that defines which original variable each observation refers to but this would be
a less efficient test as it does not take account of the paired nature of the data.
The 2015 version of PISA focused on science, and produced separate scales to measure understanding of different content areas. Here we
will explore in which domain students scored better, on average, on a test of scientific knowledge of physical systems (SCI_PHYS) and of
living systems (SCI_LIVING). Both scores are available for every student in the sample, so a paired test of difference is appropriate here.
Paired t test in SPSS (Practical)
Before we can perform this test we need to check whether the differences between SCI_PHYS and SCI_LIVING are normally distributed.
First we need to create a difference variable which can be done as follows:
Select Compute from the Transform menu.
Type DIFF_SCI_PHYS_SCI_LIVING into the Target Variable box.
Type SCI_PHYS - SCI_LIVING into the Numeric Expression box.
Click on the OK button
We can now use this new generated variable to perform normality checks. Do this as follows:
Select Descriptive Statistics from the Analyze menu.
Select Explore from the Descriptive Statistics sub-menu.
Click on the Reset button.
Copy the DIFF_SCI_PHYS_SCI_LIVING variable into the Dependent List: box.
Click on the Plots... button.
On the screen that appears select the Histogram tick box.
Unselect the Stem and leaf button.
Select the Normality plots with tests button.
Click on the Continue button.
Click on the OK button.
We will first look at a histogram of the variable, DIFF_SCI_PHYS_SCI_LIVING. This can be found in amongst the set of output objects and
looks as follows:
Ideally for a normal distribution this histogram should look symmetric around the mean of the distribution, in this case -2.71597. This
distribution appears to be reasonably symmetric.
We will next look at a statistical test to see if this backs up our visual impressions from the histogram.
pf3

Partial preview of the text

Download Performing a Paired t-Test in SPSS: An Example with SCI_PHYS and SCI_LIVING Data and more Lecture notes Statistics in PDF only on Docsity!

Centre for Multilevel Modelling The development of this E-Book has been supported by the British Academy.

Paired t test practical

In this practical we are going to investigate how to perform a paired t-test using SPSS. A paired t-test is used when we have two continuous variables measured for all observations in a dataset and we want to test if the means of these variables are different. The test assumes that both the variables are normally distributed. To run a single test in SPSS requires that your dataset has two separate columns containing the two variables to be tested. In this situation we could perform a standard 2 sample t-test by reshaping the two variables into one long variable with an accompanying indicator column that defines which original variable each observation refers to but this would be a less efficient test as it does not take account of the paired nature of the data.

The 2015 version of PISA focused on science, and produced separate scales to measure understanding of different content areas. Here we will explore in which domain students scored better, on average, on a test of scientific knowledge of physical systems (SCI_PHYS) and of living systems (SCI_LIVING). Both scores are available for every student in the sample, so a paired test of difference is appropriate here.

Paired t test in SPSS (Practical)

Before we can perform this test we need to check whether the differences between SCI_PHYS and SCI_LIVING are normally distributed. First we need to create a difference variable which can be done as follows:

Select Compute from the Transform menu. Type DIFF_ SCI_PHYS _ SCI_LIVING into the Target Variable box. Type SCI_PHYS - SCI_LIVING into the Numeric Expression box. Click on the OK button

We can now use this new generated variable to perform normality checks. Do this as follows:

Select Descriptive Statistics from the Analyze menu. Select Explore from the Descriptive Statistics sub-menu. Click on the Reset button. Copy the DIFF_ SCI_PHYS _ SCI_LIVING variable into the Dependent List: box. Click on the Plots... button. On the screen that appears select the Histogram tick box. Unselect the Stem and leaf button. Select the Normality plots with tests button. Click on the Continue button. Click on the OK button.

We will first look at a histogram of the variable, DIFF_ SCI_PHYS _ SCI_LIVING. This can be found in amongst the set of output objects and looks as follows:

Ideally for a normal distribution this histogram should look symmetric around the mean of the distribution, in this case -2.71597. This distribution appears to be reasonably symmetric.

We will next look at a statistical test to see if this backs up our visual impressions from the histogram.

The Kolmogorov-Smirnov test is used to test the null hypothesis that a set of data comes from a Normal distribution.

Tests of Normality Kolmogorov-Smirnov Statistic df Sig. DIFF_SCI_PHYS_SCI_LIVING .013 5194. a. Lilliefors Significance Correction

The Kolmogorov Smirnov test produces test statistics that are used (along with a degrees of freedom parameter) to test for normality. Here we see that the Kolmogorov Smirnov statistic takes value .013. This has degrees of freedom which equals the number of data points, namely 5194.

Here we see that the p value (quoted under Sig. for Kolmogorov Smirnov) is .062 which is greater than 0.05 and therefore we cannot reject the null hypothesis that the distribution is normal.

Although the Kolmogorov Smirnov statistic tells the researcher whether the distribution followed by a variable is statistically significantly different from a normal distribution one should take care in not overinterpreting such findings. Significance will be strongly effected by the number of observations and so only a small discrepancy from normality will be deemed significant for very large sample sizes whilst very large discrepancies will be required to reject the null hypothesis for small sample sizes.

SPSS also supplies QQ plots to assist in looking at normality but for brevity we do not show them here.

a

We will next move on to the paired t test itself and will test the two variables, SCI_PHYS and SCI_LIVING for differences.

Below you will see instructions on how to perform the paired t test in SPSS. If you follow the instructions you will see the three tabular outputs that are embedded in the explanations below.

Select Compare Means from the Analyze menu. Select Paired-Samples T Test... from the Compare Means sub-menu. Click on the Reset button. Copy the Physical systems sub-score[SCI_PHYS] variable into the Variable1: box for Pair 1. Copy the Living systems sub-score[SCI_LIVING] variable into the Variable2: box for Pair 1. Click on the OK button.

The first SPSS output table contains summary statistics for the two variables to be compared and can be seen below:

Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Physical systems sub-score 520.3541 5194 106.85597 1. Living systems sub-score 523.0700 5194 106.28832 1.

The summary statistics table contains 5 columns and 1 row for each of the two variable to be tested. After the first column which contains the name of each variable, next we see that the mean of variable SCI_PHYS is 520.3541 whilst the mean of variable SCI_LIVING is 523.0700. Hence the variable SCI_LIVING has the bigger mean and the t test will now establish if this difference is statistically significant. We next see the number of valid observations for each variable, i.e. cases with valid values for both SCI_PHYS and SCI_LIVING. Here we have 5194 valid observations for both variables.

In the next column we see the standard deviations for SCI_PHYS and SCI_LIVING. In this case the standard deviation of SCI_PHYS is 106.85597 whilst for SCI_LIVING it is 106.28832. So there is slightly more variability for SCI_PHYS than SCI_LIVING. In the final column are the standard errors of the means for each group. Whilst the standard deviations measure the variability in the data the standard errors of the means measures how confident we are in the estimates of the means. As we collect more data the standard error of the mean gets smaller as we get more confident in the mean estimate and in fact the formula for the standard error of the mean = standard deviation / square root of N. In this case the standard error of the mean for SCI_PHYS is 1.48268 whilst for SCI_LIVING it is 1.47480. The second SPSS output table contains information on the correlation between the two variables to be compared and can be seen below:

Paired Samples Correlations N Correlation Sig. Pair 1 Physical systems sub-score & Living systems sub-score 5194 .914.

The correlation between two variables is a single number that describes how related they are to each other. It is represented by a correlation coefficient which is a numerical value to describe the correlation. Correlations lie between -1 and +1 with a positive value meaning that in general that large values of the first variable are more likely to be observed with large values of the second variable and conversely small values of the first variable are more likely to be observed with small values of the second variable. In the case of a negative correlation the opposite is true and large values of the first variable are more likely to be observed with small values of the second variable and conversely small values of the first variable are more likely to be observed with large values of the second variable. A correlation of 0 means there is no (linear) relationship between the variables. Here SPSS is giving out a form of correlation known as a Pearson correlation and we see that the correlation between SCI_PHYS and SCI_LIVING is .914. It is helpful to look at the correlation between the two variables here as typically a paired t-test is more useful than a 2-sample t-test when there is a positive correlation between the two variables as is the case here. SPSS also gives out a p value which describes whether the correlation is statistically significantly different from zero. Here we see that the p value is less than 0.05 and therefore we can reject the null hypothesis that the correlation is zero. The third SPSS output table contains details of the t test itself and can be seen below: