MAD

** Introduction **
This investigation uses TinkerPlots Dynamic Data Exploration (Version 2) to help students understand and calculate the Median Absolute Deviation as a measure of variation. This tutorial includes two main tasks with specific objectives. Fostering conceptual understanding is dependent on completing the entire sequence of tasks in the order presented.

//__ Task A – Build understanding of the data set and point estimates __//

 * Discuss the context of the data set and how the data was generated.
 * Graph the information with parallel dot plots and parallel box plots.
 * Find measures of center (mean, median) and spread (interquartile range) to compare two populations.

//__ Task B – Build Conceptual Knowledge of the Median Absolute Deviation __//

 * Find the difference between the median and a single data point, for each data point.
 * Find the absolute value of each “difference from the median” using both a table and a graph.
 * Find the median of the absolute differences in a table and a graph.

** Common Core State Standards Addressed **
__ Grade Six Statistics and Probability #6.SP.4 __ Display numerical data in plots on a number line, including dot plots, histograms, and box plots.

__ Grade Six Statistics and Probability #6.SP.5 __ Summarize numerical data sets in relation to their context, such as by: // #2 Model the situation // // #5 Appropriate Tool use //
 * Reporting the number of observations.
 * Describe the nature of the attribute under investigation, including how it was measured and it units of measurement.
 * Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered.
 * Relating the choice of measures of center and variability to the shape of the data distribution and the context in which the data was gathered.
 * Mathematical Practices **

** Prerequisites ** (None)
TinkerPlots Dynamic Data Exploration (Version 2) Choose **File |Open Sample Document | Data and Demos | Backpacks. tp** Optional – Scale and student backpacks
 * Materials **

** Question 1. What is the context of the data set? **
The body weight and pack weights of 79 students in a specific school were recorded in a TinkerPlots collection. Each distinct object in a data collection is represented with a data card. The card in figure 1 shows the information (variables) for a student named Jim. The first three categorical variables provide demographic information about the students (name, gender, and grade level). The numeric data was collected with a scale using pounds as the unit of measure. Students stood on a scale to find their body weight and then measured their backpacks on the scale. An additional variable was created in TinkerPlots to find the percent of the pack weight to the student’s body weight. It is not clear what sampling method was used so the results should not be generalized to the population of all elementary students. The results discussed here are specific to the students surveyed in this school, in this town and this part of the country.

Caution – There is no information given that indicates a random sampling of students was performed. Hint - You may delete the two text boxes to create more workspace.

Figure 1. Sample data card

** Question 2. Describe the center of the data set with a single numeric value. **
To make this question more interesting, I separated the pack weight data into two parallel dot plots by placing **Gender** on the //y//-axis and **PackWeight** on the //x//-axis. The mean and median tools were used to show the mean and median pack weight values for males and females. The distribution of the data shown in the dot plots indicates that the data has a left wall so it is skewed to the right. What this means is that there are a few students who carry backpacks that are unusually heavy, especially the student with the 39-pound pack. The real question is **what would make a better choice for the center, the mean or the median?** I will choose the median because the mean is affected by the packs that are unusually heavy. The average pack weight for this group of males is 8.5 pounds and for this group of females the average pack weight is 7 pounds.

Hint – Dot plots are created by first separating all icons horizontally and **then** stacking them vertically. Be sure to select the PackWeight attribute! To create the parallel plot shown here, drag Gender to the vertical axis. Hint – Use the mean/median menu to Show Numeric Values. Figure 2. Measures of center for male and female pack weights

** Question 3. Describe the variation of the data set with a single numeric value. **
This question can be quickly answered by using one of two values, the range or the interquartile range. The range is the difference between the minimum and maximum values. If we use the range (max – min) to report the variability, we would report 36 pounds for females and 24 pounds for males. Because of the unusual value of 39 pounds, this is an example of why the range is not a reliable estimate for variation. Figure 3 shows the range for the female packs.

Hint – Snap the ruler tool to the individual icons to measure the range. Note the value is recorded in the lower left corner of the window.

Figure 3. Range for female backpack data

The Interquartile range is the difference between the third and first quartiles (75th percentile and the 25th percentile). Students can find the interquartile range using a **box plot** and the ** ruler ** tools in TinkerPlots**.** In figure 4, the ruler was attached to the first and third quartiles of the box plot for male pack weight. The Interquartile range is 10 pounds for the males. The process was repeated for the females to find an Interquartile range of 8 pounds. The lower IQR for the female packs makes sense because a visual inspection of the variability is slightly less for the female packs (males have five packs over 20 pounds versus three for the females).

Hint - Hide the icons with the icon menu (in the plot's menu bar).



Figure 4. Interquartile range for male pack weight

//TASK B//
The median (or mean) absolute deviation formula can be used to represent variation within a given data set. I am going to use the median absolute deviation due to the unusually high value of 39 pounds (suspected outlier). I will illustrate the calculations with two different representations; a **Case Table** with formulas and a **Plot** with the ruler tools,

Hint - I used the female data set for this discussion. Open a new Backpacks.tp file. Delete the male student data and delete the percent weight attribute. SAVE this new file with a new name!


 * Question 1. What is the deviation from the median for __each__ backpack? **

I need to find the deviation (difference) between each pack weight and the median. Figure 5 shows a graphic representation of the difference between Merinda’s 19-pound pack and the median of 7 pounds. The figure also shows a Case Table with two additional attributes. The formulas for Difference and Abs_Diff can be added in the Data Card or in the Case Table.

What you might notice is that all of the differences to the right of the median in the dot plot will be positive, but the differences to the left will be negative. For example, Wendie (row 22) has a 5-pound pack so the difference from the median is -2 pounds. This is a problem since we want to find the **average deviation** (in this case we will use the median as our measure of center).

Our goal is to find the **average** of the differences. If I add up all of the differences, the negative values will cause the sum to be inaccurate. This is where the term “absolute” comes into the formula of the MAD. If I take the absolute value of the differences, the deviations will all be positive. Note that the plot has a **Measure All** menu item that can be used to find the sum of all of the deviations. This will be discussed next.

Figure 5. Deviation from the median
 * Question 2. What is the sum of the deviations from the median for __all of the__ backpacks? **

Figure 6a illustrates that the **Measure All** ruler tool actually measures each individual difference (similar to the Difference attribute in the Case Table). To achieve this, do not "stack" the icons in the dot plot. In figure 6b, the ruler tool menu was used to find the sum of the deviations or differences for a stacked dot plot (left plot) and the sum of the **absolute value** of the differences (right plot). Note that the formulas shown in the lower left corner of the plots were found by clicking on the **Measure All** button. Recall from the discussion above that in order to find an accurate sum of the differences, they must all be positive values because our goal is to find out how far the points are, on average, from the median of 7 pounds. Figure 6a. Sum of deviations from the median in an unstacked dot plot

Figure 6b. Sum of deviations in the stacked dot plot Figure 7 shows the options used to create the second plot in figure 6. In question three, I will use the option titled “Median of Differences” to automatically calculate the MAD.

Figure 7. Ruler Menu
 * Question 3. What is the median absolute deviation for the pack weights?**

To illustrate the answer to this question for the female pack weights, I am going to return to the **Case Table** and plot the Abs_Diff attribute (see Figure 8 - top plot). The goal is to find the medan of the differences which can be done with the median tool. Note that the same value, 3 pounds, is calculated when both the **Median of Differences** AND **Absolute Differences** are selected in the ruler tool's option menu (Figure 8 - bottom plot). Note the syntax of the formula in the lower left corner of the plot "Median of | Diff | of 39 cases = 3"

Figure 8. Median of the deviations for the female pack weights from the plot (top) and ruler tool (bottom). In figure 9, the ruler tool was used to find the MAD for both male and female backpack data. The calculation shows the variability for female packs (3 pounds) is slightly less then males (3.5 pounds). A more interesting comparison can be made when I investigate the MAD for each grade level as shown in figure 10. The MAD for grades five and seven are three times the MAD for grades one and three.

Figure 9. Median Absolute Deviation for female and male pack weight data. Here is a help video Figure 10. Median Absolute Deviation for grade level data. This concludes the Median Absolute Deviation Tutorial. Please contact me if you have suggested improvements for this page of the Wiki, questions about the technology or pedagogical approach, or additional activities with different data sets that you would like to share.