Confidence_Interval_Simulation-Student-Height

Illustrate the concept of confidence interval through simulation.
 * Learning Objective**

Open the file titled Student Height Data. This file contains the heights of 200 students measured in centimeters. The population mean is 150 cm and the population standard deviation is 12.32 cm (see the Standard Deviation Hat Plot). Copy the Height attribute data and paste it into an empty mixer (sampler engine). Set the repeat value to 35 and the draw value to 1. Rename the attribute Ht.

Select "Without Replacement" in the mixer's menu!!! This prevents a single student from being drawn twice in the sample of 35 students.

Run the sampler once. Plot the Ht data and show the mean for this sample of 35 students.

Use the History tool to capture the sample mean. Collect 99 additional sample means in the History of Results of Sampler 1 case table. Note, this will run faster if the animation is turned off.

Plot the 100 sample means. Change the icon color if necessary (red is used in this example).

Add four attributes to the History of Results of Sampler 1 case table as follows. Recall that a 95% Confidence Interval requires a z-value of 1.96. In the case table shown below, the lower_endpt attribute is sorted in descending order. This allows us to see that the first interval ranges from 150.29 to 158.45, which does not contain the population mean of 150 cm. The capture formula correctly identifies this as a “miss.” In the case table shown below, the upper_endpt attribute is sorted in ascending order. This allows us to see that the first interval ranges from 141.60 cm to 149.77 cm, which does not contain the population mean of 150 cm. The capture formula correctly identifies this as a “miss.” When the Capture attribute is selected on the History of Results of Sampler 1 plot, the sample means in the tails of the dot plot that are a different icon color produce confidence intervals which do not capture the population mean of 150 cm. [In general, we expect to see about five intervals that “miss” for a 95% confidence level.]
 * **Attribute Name** || **Formula** ||
 * margin_error || 1.96*12.32/sqrt(35) ||
 * lower_endpt || mean_Ht - margin_error ||
 * upper_endpt || mean_Ht + margin_error ||
 * Capture || If (upper_endpt < 150 or lower_endpt > 150) then "miss" else "capture" ||

The plot below shows the two misses, one in each tail of the distribution of the sample means.

In order to run the simulation again, delete all history cases and collect 100 new sample means. The simulation shown below contains three confidence intervals that do not contain the population mean because the lower endpoint is above 150 cm.

A third run of the simulation shows four confidence intervals that miss the population mean. In real life we typically only have enough time and/or resources to draw a single sample from the population. Based on that single sample mean, we estimate the true mean. For example, if the sample mean is 147.9 cm, the 95% confidence interval is (143.8, 152.0).
 * Summary**

On repeated random sampling, there is a 95% probability that the interval between 143.8 cm and 152.0 cm will contain the population mean for this population of students.

Stated another way, there is a 95% probability that the sample mean will fall between 150 +/- 4.08.

In the plot shown below, three sample means fall outside the interval (145.92, 154.08).



Activity posted July, 2016.