Research Questions
So far we have looked at several different ways to conceptualize the motivation and engagement data, but we have not answered the research questions:
So far we have looked at several different ways to conceptualize the motivation and engagement data, but we have not answered the research questions:
- Are students more motivated by usefulness or caring?
- Is there a relationship between usefulness and deep level engagement?
- Is there a relationship between caring and deep level engagement?
Z-Test
One possible way to analyze the data presented in this project is a z-test which produces a z-score. Z-scores are computed by taking each data point subtracting the data mean, and dividing by the data standard deviation.
One possible way to analyze the data presented in this project is a z-test which produces a z-score. Z-scores are computed by taking each data point subtracting the data mean, and dividing by the data standard deviation.
If you are dealing with a sample instead of individual data points, you take the sample mean minus the population mean and divide that by the population standard deviation divided by the square root of the sample size.
A z-test examines the number of standard deviations each data point is from the mean. It is also one way to compare means between data sets taken using the same measure. A positive z-score indicates the data point resides above the mean and a negative z-score below the mean.
Z-tests are used on sample sizes large enough to be called normal and can be used to make predictions about the population being studied. A normal curve is shown below (Normal, n.d.). Not all data is normal initially, so z-scores are used to normalize the data by making the mean zero and the standard deviation one. Z-tests use z-scores to find the area under the normal curve which represents the probability of an event being less than or equal to the z-score. This probability is called a p-value. So z-scores produce p-values using a table.
For example, a survey of 200 VT students was given on how much TV they watched, and a mean of 45 hours with a standard deviation of 8 hours was found on data that was approximately normal. What is the percent of students viewing 47 hours of TV (Skaggs, 2016b)? To solve this problem, you would take (47-45)/8 and get a z-score of .25. Then you would look .25 up on the normal distribution table (shown in part below) (Howell, 2011), such that z=.25. I have highlighted the line for you so you can easily see the result. The p-value is the probability of an event being less than or equal to the z-score, so the percent of students watching 47 hours of TV or less is .5987 or 59.87%.
Try one on your own now! A survey of 200 VT students was given on how much TV they watched, and a mean of 50 hours with a standard deviation of 5 hours was found on data that was approximately normal. What is the percent of students viewing 52 hours of TV (Skaggs, 2016b)? Did you get a p-value of .6554 or 65.54%? Good!
A large sample is usually defined as more than 30 participants using z-tests because sample sizes less than 30 are not usually normal. However, if the sample size is too small then the data is likely not normal and cannot be converted to a normal curve (Howell, 2011). The data presented in this study only has 20 points, so z-tests cannot be used to analyze the deviations of each student’s mean for usefulness, caring, and deep level strategies from the class mean for each measure because the data is not normal.
A large sample is usually defined as more than 30 participants using z-tests because sample sizes less than 30 are not usually normal. However, if the sample size is too small then the data is likely not normal and cannot be converted to a normal curve (Howell, 2011). The data presented in this study only has 20 points, so z-tests cannot be used to analyze the deviations of each student’s mean for usefulness, caring, and deep level strategies from the class mean for each measure because the data is not normal.
The Importance of Sample Size
I want to take a minute to discuss sample size. I have often seen students doing statistical analysis assume the sample size of their data will always be large enough to sustain any statistical test they chose to explore. This is not true, but also not surprising because students are often given data sets that are large enough to tolerate the statistical test they are asked to run. Real world data however is rarely pretty and sometimes does not fall into packages large enough for common statistical analysis. This does not mean the data is useless, but it does mean a different mode of analysis must be used because small sample sizes cannot be generalized to a large population.
In the z-test explained above and the t-test shown below to answer the first research question, sample size ultimately matters. Z-tests require a sample size of greater than 30 while a t-test generally needs a sample size of 25-30 (Howell, 2011). However, if the sample size is large, the results for the t-text and z-test will be similar if not the same because they are asymptotic tests meaning for large sample sizes the data will look normal and the data curve for the t-test will approach that of the z-test.
The sample size of the data explored in this study is only 20, so doing a z-test is out of the question immediately. Doing a t-test for analysis can be done, but brings into question the external validity of the study since the sample is smaller than is considered standard for a t-test. External validity is the degree to which studies can generalize to other groups or other research conditions (Skaggs, 2016a). Since the sample size in this exploratory study is so small, the findings may not be generalizable. In other words, just because there appears to be a relationship between students feelings of motivation in terms of usefulness and their apparent use of deep level cognitive strategies does not imply an entire school’s worth of students would result in the same finding.
I want to take a minute to discuss sample size. I have often seen students doing statistical analysis assume the sample size of their data will always be large enough to sustain any statistical test they chose to explore. This is not true, but also not surprising because students are often given data sets that are large enough to tolerate the statistical test they are asked to run. Real world data however is rarely pretty and sometimes does not fall into packages large enough for common statistical analysis. This does not mean the data is useless, but it does mean a different mode of analysis must be used because small sample sizes cannot be generalized to a large population.
In the z-test explained above and the t-test shown below to answer the first research question, sample size ultimately matters. Z-tests require a sample size of greater than 30 while a t-test generally needs a sample size of 25-30 (Howell, 2011). However, if the sample size is large, the results for the t-text and z-test will be similar if not the same because they are asymptotic tests meaning for large sample sizes the data will look normal and the data curve for the t-test will approach that of the z-test.
The sample size of the data explored in this study is only 20, so doing a z-test is out of the question immediately. Doing a t-test for analysis can be done, but brings into question the external validity of the study since the sample is smaller than is considered standard for a t-test. External validity is the degree to which studies can generalize to other groups or other research conditions (Skaggs, 2016a). Since the sample size in this exploratory study is so small, the findings may not be generalizable. In other words, just because there appears to be a relationship between students feelings of motivation in terms of usefulness and their apparent use of deep level cognitive strategies does not imply an entire school’s worth of students would result in the same finding.
T-Test
Z-tests were explained in great detail above because they are similar to t-tests, but easier conceptually to understand because large data sets are usually normally distributed. T-tests also measure the number of standard deviations each data point is from the mean, and can also be used to compare means between similar data sets. T-tests also use a table similar to the normal distribution table to relate t-scores to p-values just like with z-scores. The main difference between the two is t-tests can be used with sample sizes less than 30. This can happen because in smaller sample sizes often the population mean and standard deviation are unknown since the data set cannot be assumed to be normal, so the sample mean and standard deviation are used instead (Howell, 2011).
Z-tests were explained in great detail above because they are similar to t-tests, but easier conceptually to understand because large data sets are usually normally distributed. T-tests also measure the number of standard deviations each data point is from the mean, and can also be used to compare means between similar data sets. T-tests also use a table similar to the normal distribution table to relate t-scores to p-values just like with z-scores. The main difference between the two is t-tests can be used with sample sizes less than 30. This can happen because in smaller sample sizes often the population mean and standard deviation are unknown since the data set cannot be assumed to be normal, so the sample mean and standard deviation are used instead (Howell, 2011).
Let’s use the TV example again only with a smaller sample size. If a survey of 19 VT students was given on how much TV they watched, and a mean of 8 hours with a standard deviation of 2 hours, what is the percent of students viewing 10 hours of TV (Skaggs, 2016b)? To solve this problem, you would take (10-8) / (2 / sqrt19) and get a t-score of 2.72. Since t-tests work with small sample sizes, the table requires you to find the degrees of freedom measured on the test which is n-1. This example has 19-1 = 18 degrees of freedom. We would run a two tailed test at the .05 level to see if the data point of 10 is significantly more or less than 95% of the existing data. Since it is a two tailed test, .05/2 = .025 gives the proportion of data in each of the two tails. Using the T distribution table (shown in part below) (Howell, 2011), at 18 degrees of freedom at the .05 level of significance gives me a t-score of 2.101. Since the obtained t-value of 2.72 is greater than 2.101 the hypothesis that 10 hours of TV watching is significantly greater than the mean of 8 is rejected. Z-tests don’t have to deal with degrees of freedom and levels of significance because the data is considered normal. T-tests have such small sample sizes that degrees of freedom and significance levels have to be taken into consideration.
Try one on your own now! If a survey of 19 VT students was given on how much TV they watched, and a mean of 5 hours with a standard deviation of 3 hours, what is the percent of students viewing 8 hours of TV (Skaggs, 2016b) at the .01 level? Did you get an obtained t-value of 4.36 being greater than 2.552? Yay! This means the hypothesis that 8 hours of TV watching is significantly greater than the mean of 5 is rejected. Good job!
Even though both the MUSIC® and engagement models have been used repeatedly for various populations, the population statistics were engagement could not be requisitioned from researchers in China in a timely manner, and the MUSIC® Model has not been used on a large population of secondary mathematics students so population statistics for the sample studied in this project were not available.
T-tests can be run independently or dependently. An independent t-test would be if you want to compare mean test scores for two separate classes. These scores are completely unrelated since they are from two separate classes so they are independent. The data in the present study is dependent because the same person’s score is being compared on different items on the same survey. In other words, student one answered questions about being motivated by usefulness and caring on the same motivation survey, so student one’s scores are not independent – they are dependent. This indicates a paired sample t-test must be run.
The first research question, are students more motivated by usefulness or caring, can be answered using a paired sample t-test. The data below shows a comparison of the means for the usefulness scores and caring scores for each student. The two-tailed test reveals 21.8% of the data resides in the two tails of the distribution curve. There is no statistically significant difference between these means then because 21.8% is much larger than 5%, so the data does not conclusively indicate if the students are more motivated by usefulness or caring. However, the difference in the means between the two measures is -.32600 which shows more students rated caring higher than usefulness as a motivating factor. This finding can be verified by looking at the data.
T-tests can be run independently or dependently. An independent t-test would be if you want to compare mean test scores for two separate classes. These scores are completely unrelated since they are from two separate classes so they are independent. The data in the present study is dependent because the same person’s score is being compared on different items on the same survey. In other words, student one answered questions about being motivated by usefulness and caring on the same motivation survey, so student one’s scores are not independent – they are dependent. This indicates a paired sample t-test must be run.
The first research question, are students more motivated by usefulness or caring, can be answered using a paired sample t-test. The data below shows a comparison of the means for the usefulness scores and caring scores for each student. The two-tailed test reveals 21.8% of the data resides in the two tails of the distribution curve. There is no statistically significant difference between these means then because 21.8% is much larger than 5%, so the data does not conclusively indicate if the students are more motivated by usefulness or caring. However, the difference in the means between the two measures is -.32600 which shows more students rated caring higher than usefulness as a motivating factor. This finding can be verified by looking at the data.
Regression and Correlation
The second and third questions concerning relationships between usefulness and deep level engagement and caring and deep level engagement can be answered using regression and correlation.
The second and third questions concerning relationships between usefulness and deep level engagement and caring and deep level engagement can be answered using regression and correlation.
Regression and Correlation Investigation Let’s investigate regression first. Click on the Regression applet and put the points on the graph as I have in the picture. You can either use your mouse and click on the coordinate plane to place a point, or you can type the coordinates I have in the box below the coordinate plane. The points should be close to what I have, but they do not need to be exact. |
Next, click on the box next to “Fit your own line.” A green line will come up on the coordinate plane. Make sure you click the circle next to “Move Your Fit Line” or more blue dots will show up that you don’t want. You can click on each green point and drag it to another location. Move your green line to where you think the best fit for the data is. You can see mine in the picture, but your line does not have to be where mine is to investigate. |
Now, click on the box next to “Display line of best fit.” A red line will show up with a line that is line of best fit. Did you do a better job of choosing a regression line than I did? I hope so! If not, that is ok too!
Notice when the red line of best fit comes up, so does the r value. This is the correlation coefficient. Correlations measure the strength of the data between two variables. Data showing a perfect positive relationship have r values of 1, perfect negative relationship have r values of -1, and no relationship have r values of 0. Correlations coefficients must always then fall between -1 and 1 (Skaggs, 2016c). The correlation for this data is -.637 which means there is a negative and moderately strong relationship between the data Ok, press the reset button this time, add your own data set, and play around with guessing the regression line. Before you find the red line of best fist be sure to guess the r value, so you can start getting a feel for estimating the strength of a data set. |
Regression
Regression is when a line or curve is placed on a scatterplot that “best fits” the data and can be used to predict how one variable affects another.
Regression is when a line or curve is placed on a scatterplot that “best fits” the data and can be used to predict how one variable affects another.
Analyzing the scatterplot for students’ mean scores for how motivated they felt by the usefulness of mathematics in relation to how much they feel they use deep level cognitive strategies shows a positive trend in the regression line. This means generally as students felt the mathematics was more useful to them, they engaged more in deep level cognitive strategies. The regression equation specifically tells us the average level of deep level cognitive strategy score was 1.98 with an average increase of .43. So for every one unit of increase on the usefulness score, the deep level cognitive strategy score increased by .43. This is important because if the null hypothesis is there is not a relationship between usefulness and deep level cognitive strategies, then running this regression gives a p-value of .019 and provides evidence for rejecting this null hypothesis. The p-value is short for probability value and means there is a small chance the relationship between usefulness and deep level cognitive strategy scores happened by accident. In fact, since .019 is less than .05, we can say the relationship between usefulness and deep level cognitive strategy scores is significant, and is not occurring by accident.
|
Conversely, analyzing the scatterplot for students mean scores for how motivated they felt by a caring environment in relation to how much they feel they use deep level cognitive strategies shows no trend which can be seen in the regression line. This means as students felt more motivated by a caring environment, there was no affect in how they engaged in deep level cognitive strategies. The regression equation specifically tells us the average level of deep level cognitive strategy score was 3.48 with an average increase of .08. So for every one unit of increase on the caring score, the deep level cognitive strategy score did not change since the slope of the line is essentially zero. This is important because if the null hypothesis is that there is not a relationship between caring and deep level cognitive strategies, then running this regression gives a p-value of .729 and provides evidence for accepting this null hypothesis.
|
Correlation
Correlations measure the strength of the data between two variables. Correlations are shown as the r value in statistics, and were created by Karl Pearson in the early 1900s. Data showing a perfect positive relationship have r values of 1, perfect negative relationship have r values of -1, and no relationship have r values of 0. Correlations coefficients must always then fall between -1 and 1 (Skaggs, 2016c).
Referring back to the linear regression for the relationship between how motivated students felt by the usefulness of mathematics in relation to how much they feel they use deep level cognitive strategies, we know there is a positive relationship between those variables, but we do not know how strong that relationship is. To find this, we have to analyze the correlation coefficient. The r value for this data set is .518 which indicates a positive, moderately strong relationship between the usefulness and deep level cognitive strategy scores. The moderate strength of the data could be explained in two ways. First, since the sample size is small there is going to naturally be more variance in the data and thus a weaker correlation between the variables. Second, there could be a latent variable was not reported in the surveys affecting the relationship between usefulness and deep level cognitive strategy scores such as gender. Both of these could account for a weaker correlation between usefulness and deep level cognitive strategy scores, but regardless the data presented does show there is a positive moderate relationship between how motivated students feel by the usefulness of mathematics in relation to how much they feel they use deep level cognitive strategies.
Correlations measure the strength of the data between two variables. Correlations are shown as the r value in statistics, and were created by Karl Pearson in the early 1900s. Data showing a perfect positive relationship have r values of 1, perfect negative relationship have r values of -1, and no relationship have r values of 0. Correlations coefficients must always then fall between -1 and 1 (Skaggs, 2016c).
Referring back to the linear regression for the relationship between how motivated students felt by the usefulness of mathematics in relation to how much they feel they use deep level cognitive strategies, we know there is a positive relationship between those variables, but we do not know how strong that relationship is. To find this, we have to analyze the correlation coefficient. The r value for this data set is .518 which indicates a positive, moderately strong relationship between the usefulness and deep level cognitive strategy scores. The moderate strength of the data could be explained in two ways. First, since the sample size is small there is going to naturally be more variance in the data and thus a weaker correlation between the variables. Second, there could be a latent variable was not reported in the surveys affecting the relationship between usefulness and deep level cognitive strategy scores such as gender. Both of these could account for a weaker correlation between usefulness and deep level cognitive strategy scores, but regardless the data presented does show there is a positive moderate relationship between how motivated students feel by the usefulness of mathematics in relation to how much they feel they use deep level cognitive strategies.
Referring back to the linear regression for the relationship between how motivated students feel by a caring environment in relation to how much they feel they use deep level cognitive strategies, we know there is no relationship between those variables, but we want to confirm this with the correlation coefficient. The r value for this data set is .083 which is so close to zero it reinforces the findings from the regression analysis that there is no relationship between caring and deep level cognitive strategy scores. The lack of correlation could be explained in two ways. Again, a small sample size could be a contributing factor. However, it could also be that students’ level of cognitive engagement is not affected by how much they feel cared for in the class. It may be that affective or behavioral engagement correlates instead which indicates a separate area of study. Both of these could account for no correlation between the caring and deep level cognitive strategy scores, but regardless the data presented does show there is no relationship between how motivated students are by a caring environment in relation to how much they feel they use deep level cognitive strategies.
Summary
So what did this analysis tell us about the research questions? First, there is no conclusive statistical evidence that students are more motivated by usefulness or caring in the particular class studied, but the raw data does show students scored feeling more motivated by a caring environment than the usefulness of mathematics. Second, there is a moderately strong positive relationship between how motivated students felt by the usefulness of mathematics in relation to how much they felt they used deep level cognitive strategies in the particular class studied. This finding does not infer students feeling motivated by the usefulness of mathematics causes them to use deep level cognitive strategies, only that there is a relationship between the two. Finally, there is no relationship between how motivated students felt by a caring environment in relation to how much they felt they used deep level cognitive strategies in the particular class studied.
So what did this analysis tell us about the research questions? First, there is no conclusive statistical evidence that students are more motivated by usefulness or caring in the particular class studied, but the raw data does show students scored feeling more motivated by a caring environment than the usefulness of mathematics. Second, there is a moderately strong positive relationship between how motivated students felt by the usefulness of mathematics in relation to how much they felt they used deep level cognitive strategies in the particular class studied. This finding does not infer students feeling motivated by the usefulness of mathematics causes them to use deep level cognitive strategies, only that there is a relationship between the two. Finally, there is no relationship between how motivated students felt by a caring environment in relation to how much they felt they used deep level cognitive strategies in the particular class studied.