MATH+SAT+Dummy+Data

(Revised on January 8th) When working with data analysis there are many instances where data is faked or altered for use in the classroom. The goal of this article is to showcase the variability that resulted when a small sample size was used. In the summer of 2015, I collaborated with graduate students Chris and Jamie to generate a sample of what we called “dummy” data for normally distributed student Math SAT scores. In what follows, Chris’s model of the SAT scores had 100 students while Jamie’s model used 1,000 students. The description about how to create the model is based on Chris’s work. We assume that the reader can write functions for a collection, graph data in parallel dot plots and use the history tool to capture random samples from a collection. For more information on simulating the hypothesis test on two means, please consult the Hypothesis Testing pages on this Wiki.
 * Creating a Random Data Set for the Math SAT Scores for Simulating a Hypothesis Test (draft white paper) **

The College Board provided 2014 statistics for almost 1.7 million students with 47% male and 53% female. The population’s mean scores were 530 and 499 respectively, and the standard deviations were 123 and 114 respectively.

To begin, Chris added 100 cases to an empty case table by selecting the command New Cases in the Data menu. She labeled the first attribute **Gender**. The formula for **Gender** used an if-then-else command with the caseindex function. In TinkerPlots, the caseindex function assignes each case in the collection a value from 1 to n. Chris designated the first 47 cases as males and the remaining 53 cases as females by using the if-then-else command. Values for the second attribute, labeled **Score,** were randomly generated using the formulas randomNormal(530,123) for males and randomNormal(499, 114) for females. The if-then-else logic in Chris’s model is shown in Figure 1. Please let me know if you have questions or suggestions for improvements.