by David A. Corliss, Ph.D.
The problem
The question and answer below appeared in Marilyn vos Savantâ€™s column in the April 22, 2007 edition of Parade magazine. Its brevity and apparent simplicity belie the deep roots of knowledge that a reader interested in the accuracy and veracity of the answer must possess. In this analysis of the problem I will explore the elements of both the question and the answer in detail. My goal is to expose the higher order skills that are required to understand where the reader went wrong, test the accuracy of the calculations that are embedded in the answer, make judgments about the reasonableness of the conclusion, and show how, as Marilyn herself states, â€śWhen statistics like this are taken out of context, they can be misleading.â€ť
The question
â€śIf half of the children 14 and under who die in car crashes are not buckled, boostered or otherwise restrained, doesnâ€™t this mean that half of the children are appropriately secured? If so, wouldnâ€™t this also mean that the chances of a child surviving a crash are 5050, restrained or not?â€ť
The answer
â€śYes to the first question, but not to the second. When statistics like this are quoted out of context, they can be misleading. You need more information. To illustrate, suppose that 90 percent of children involved in car crashes (not just fatal ones) are secured, and 10 percent are not. Now say that 20 percent of these accidents cause a fatality, half with the children secured and half with a child who is not. This would mean that every unrestrained child involved in an accident was killed, but only one out of nine restrained children was killed. Youâ€™d draw a very different conclusion, wouldnâ€™t you?â€ť
Comprehending the readerâ€™s question
The first part of the readerâ€™s question seems straightforward. The group of interest includes children under 14 who die in car crashes. The first part of the question states the apparent fact that half of that group was not secured. The second simply asks whether the other half was secured. It implies that there is no other alternative to being secured or not. That seems logical, but we will see later that, from a data perspective there is a third alternative, which is that we just do not know.
The second question the reader asks is more complicated. Whereas in the first case the group of interest included only those who died in car crashes, this question is really drawing a conclusion about the fraction surviving. Therefore, the group of interest is all children involved in car crashes. In infers something about one group from data about another group and that is where the reader goes wrong. As Marilyn says, â€śYou need more information.â€ť
Comprehending the answer
Marilyn states that the answer to the first question is yes. As indicated above, this assumes no third alternative, a fair assumption in reality, but not from a data perspective. In any case, all the alternatives must together account for 100% of the group.
The next part of the answer says that statistics like these can be misleading when quoted out of context, and that more information is necessary. The statistic itself is not misleadingâ€”it is a fact that has been used incorrectly because the reader did not account for the fact that the group of all fatalities is not the same as the group of all children involved in crashes. He jumped from correctly interpreting the statistic about one group to making an inference about another group without enough information.
The remainder of the answer is dedicated to working through a hypothical set of data. It supposes some additional percentages that, when combined with what the reader knows, can be used to draw conclusions. The task in this deconstruction now becomes one of proving that Marilynâ€™s conclusions are correct.
Structuring the problem for solution
The first supposition is â€śâ€¦that 90 percent of children involved in car crashes (not just fatal ones) are secured, and 10 percent are not.â€ť This means that we are now dealing with a group that includes all children involved in car crashes and that they have been divided into secured (90%) and not secured (10%).
The second supposition is â€śâ€¦that 20 percent of these accidents cause a fatality, half with the children secured and half with a child who is not.â€ť We are now dealing with a group that includes all car crashes involving children, only 20% of which involve cause the childâ€™s death.
So, to sum up, we have statistics on three groups: the first is the group of children who die in accidents, the second is all children involved in accidents, and the third is all accidents. Structuring all this information into a form that lends itselfs to drawing conclusions is the challenge that we now face. It is at this stage that the higher order skills embodied in the literacy part of quantitative literacy come into play. That is, you should have been exposed to something somewhere in your education that seems similar to the problem at hand.
For example, when faced with a collection of groups like this one might dredge up the concept of Venn diagrams from a high school or college math course. The stimulus is, of course, the idea that fractions of these groups have some things in common and some things not in common. One example might be that of the 90% of all children involved in car crashes, some were secured, a characteristic they share with half of the 20% of all children who died in car crashes. Before reading on I suggest that you try to construct some diagrams to capture these intersections.
If you attempted to construct some Venn diagrams you should have come up with something like the one shown here. If you tried at all to represent the proportions of children in the various groups by size of the various circles or squares, you probably found it pretty confusing. These diagrams do not lend themselves to solving this problem with the numbers as presented so some other appoach is necessary. They do provide a clue, however.
One of the fundamental QL Core Competencies is that students should be able to read, interpret and draw conclusions from tables. They are similar to Venn diagrams in that you can overlap characteristics, but they lend themselves quite nicely to quantitative solutions. Somewhere in your education you should have been exposed to this task. If using tables came to mind, then you can much more easily structure the problem for solution.
Consider the following steps in which the cells of the tables correspond to the sets in the diagram above. The color coding of the four cells in the middle corresponds to the colors of the sets in the diagram above.
Step 1. Where to put the first supposition
â€śTo illustrate, suppose that 90 percent of children involved in car crashes (not just fatal ones) are secured, and 10 percent are not.â€ť These two percentages are the row totals in the following table since they include both nonfatal and fatal injuries. You can see immediately from where these numbers go why the Venn diagram did work very well. It is the color coded numbers that we need, not the â€śmarginalâ€ť numbers.
Percent of car 
Children in crasher 

Nonfatal  Fatal  Total  
Secured  90%  
Not secured  10%  
Total  100% 
Step 2. Where to put the second supposition
â€śNow say that 20 percent of these accidents cause a fatality, half with the children secured and half with a child who is not.â€ť The total percentage of fatal injuries out of all children involved in car accidents is 20% so that number is entered as a column total, another â€śmarginalâ€ť number. Since half the children in the fatal injury column are secured and half are not, the 20% gets distributed equally between the two conditions. These are the first two data cells that can be filled.
Percent of children in car crashes 

Nonfatal 
Fatal 
Total 

Secured 
10% 
90% 

Not secured 
10% 
10% 

Total 
20% 
100% 
Step 3. Fill in the remaining cells of the table
Since all the row and column totals have to add up so the table total is 100%, the last three cells are filled in by simple subtraction.
Thus, as Marilyn concluded, the 0% in the Not Secured/Nonfatal cell indicates that no unsecured child survives. Or, said another way, 100% (10%/10%) of the unsecured children are in the fatal column.
Percent of children in car crashes 

Nonfatal 
Fatal 
Total 

Secured 
80% 
10% 
90% 
Not secured 
0% 
10% 
10% 
Total 
80% 
20% 
100% 
Now that we have constructed the hypothetical, letâ€™s go back to the readerâ€™s questions and reexamine the answers. The first was â€śâ€¦doesnâ€™t this mean that half of the children are appropriately secured?â€ť Marilynâ€™s answered said that the answer to this question was yes. This is true if you only consider the fatal column, however. So that answer is simply a repeat of the statistic that prompted the question in the first place. The way the hypothetical is constructed, she actually set it up so 90% of ALL children are appropriately restrained.
The second question was, â€śIf so, wouldnâ€™t this also mean that the chances of a child surviving a crash are 5050, restrained or not?â€ť The answer is clearly no since she set the hypothetical up so that 20% of children in car accidents died, restrained or not.
It is clear that the answer she wanted by providing â€śmore informationâ€ť was the 0% survival for unrestrained children. If this were true, it would certainly be strong motivation for all parents to restrain all kids, but is it true? As a matter of common sense would one really expect no survivors among all the unrestrained children who are injured? Not really.
Testing the veracity of the hypothetical
When thinking about this kind of problem it is important to ask whether the numbers make sense or not, i.e., to make a judgment. If not, then what does make sense?
I expect that very few people actually carry accident statistics around in their heads so a little research is required. Though it was not necessarily a straightforward search, I found relevant statistics on the web at http://wwwnrd.nhtsa.dot.gov/pdf/nrd30/NCSA/TSFANN/TSF2005.pdf. (Traffic Safety Facts 2005: A Compilation of Motor Vehicle Crash Data from the Fatality Analysis Reporting System and the General Estimates System, National Highway Traffic Safety Administration, National Center for Statistics and Analysis, U.S. Department of Transportation, Washington, DC 20590.)
This document contains a wealth of information. The most appropriate table can be found on page 119 (135 of the PDF). From that table we can reconstruct tables similar to the ones above. There is a catch, however. The real data include the fact that it is often unknown whether a child was restrained or not. It turns out that this additional (lack of) information does not change the conclusions significantly when more appropriate calculations of risk are done, however.
The following tables show the correct data for children 15 and under. The first one includes the actual numbers and the second shows the calculated percentages.
Number of children <= 15 injured in car/light truck crashes 

Nonfatal 
Fatal 
Total 

Restrained 
191,000 
744 
191,744 
Unrestrained 
25,000 
740 
25,740 
Unknown 
13,000 
133 
13,133 
Total 
229,000 
1,617 
230,617 

Number of children <= 15 injured in car/light truck crashes 


Nonfatal 
Fatal 
Total 
Restrained 
82.8% 
0.3% 
83.1% 
Unrestrained 
10.8% 
0.3% 
11.2% 
Unknown 
5.6% 
0.1% 
5.7% 
Total 
99.3% 
0.7% 
100.0% 
The first thing to notice is that the numbers of fatalities among restrained and unrestrained children are essentially equal. This agrees with the statistic that prompted the readerâ€™s original question. But that is about the only thing that corresponds to the hypothetical, however. It is clear that the assumption of 20% of all injuries being fatal is not even close to reality! It is actually less than 1%. Nor is the 90%10% split between restrained and unrestrained correct.
At first glance the numbers and percentages in these tables seem to diminish the point that Marilyn was apparently attempting to reinforce, that being that there is a very real advantage to restraining children riding in cars. There are a number of ways of drawing this conclusion from the real data and I leave it to the reader to do so. As a starting point one could, for example, simply ask how many lives would have been saved had all children sustaining injuries been restrained. There are, however, much better calculations that can be done to assess the risk of fatalities when children are unrestrained.