IRBs often set an arbitrary grade-level requirement-such as sixth or eighth grade-at which consent forms are supposed to be
written. A recent review1 of 114 Web sites of U.S. medical schools found readability standards between fifth and tenth grade, although actual grade
levels of their consent form templates averaged almost three grades higher than their recommended grade levels.
The assumption behind such recommendations is that subjects who can't understand a consent form at a third-year college level
will understand one written at an eighth-grade reading level. Sometimes this recommendation comes from data showing that average
Americans read at an eighth-grade level; sometimes it comes from intuitive beliefs that anything written at a lower grade
level simply must be easier to understand. But this recommendation is flawed for at least two reasons. The first is that writing
at an eighth-grade level is very hard; the Flesch Reading Ease Score2 describes materials at sixth-eighth-grade reading level as having 14 to 17 words per sentence and 139 to 147 syllables per
100 words. This translates into many one- or two-syllable words. Second, how should writers measure a consent form's grade
level? Many IRB Web sites recommend readability software, often suggesting the Flesch-Kincaid formula in Microsoft Word. But
Microsoft's version of that formula is flawed; any score at grade 12 or above is reported as grade 12, plus its reported grade
level is affected by the document's format.4 Although the Flesch-Kincaid scores up to grade 17, Microsoft's version is too unreliable and inaccurate to recommend or use
since researchers can not reliably verify the grade level of a consent form.
 Table 1. Summary of reading comprehension and reading grade level
|
Few studies have been done on the impact of rewritten materials on reader comprehension. Table 1 summarizes nine studies comparing
comprehension of higher grade level documents that were rewritten to a lower grade level. The nine studies include informed
consent as well as other rewritten materials spanning over 20 years.
One study in progress13 compares an investigator-developed consent form with one developed by focus groups for a U.S. Veterans Administration study
on Gulf War Illnesses. Although the focus group made seven significant changes, such changes (see Table 2) were not reflected
in an independent statistical readability analysis (using Prose: The Readability Analyst, Grammatik 6.0, and WStyle) of the
two versions.
Although the two versions were almost statistically identical, changes recommended by the Focus Group may still produce differences
in understanding. And this may depend on whether the researchers' questionnaire is sensitive enough to detect such differences.
Limitations of rewritten studies Subject education. Most study populations are college-educated people who should have better reading and comprehension skills than those without
a college education. This is because they have more years of formal education, which includes better-developed abstract thinking
skills, larger vocabularies, and more experience reading complicated text. For this reason, some consent form comprehension
studies may show better understanding simply because these subjects are better educated. What are needed are comprehension
studies that include a broader range of subjects, whose educational attainment matches U.S. census data. College-educated
subjects may not find easy-to-read consent forms more comprehensible, but subjects with high school or junior high school
educations might.
Measuring comprehension. Psychological principles require comprehension measures to be both valid (i.e., they measure what they are supposed to measure)
and reliable (i.e., repeated testing produces similar scores). "Face validity," in which an instrument only looks like it
measures what is it supposed to measure, has no scientific value.
 Table 2. Readability of investigator-developed vs. focus group-developed consent form
|
Of the nine studies, just two7,10 addressed the content validity of their comprehension measures, but did so in very vague terms. They stated only that "Consent
validity for the measure was high, as evidenced by the judgments of a panel of experts who reviewed the questionnaire"7 or that three experts reporting, "Agreement was reached in each of these areas,"10 but neither discussed the statistical specifics of those assessments. Plus, these two papers were the only ones that addressed
reliability of their comprehension measures. Without any document validity or reliability data for their comprehension measures,
the others studies are scientifically questionable, since they are based only on "face validity."
Different researchers use different methods for measuring comprehension, so there's no way to compare findings from different
studies. Researchers have used true-false questions,11 multiple choice questions,9,10 or have asked subjects to paraphrase documents and answer questions.4,8
Both true-false and multiple-choice questions are flawed because subjects can get a percentage of answers correct by guessing.
Because a true-false test is simply a multiple-choice test with only two possible answers, subjects can get 50% correct by
guessing. For multiple-choice questions, subjects can get 20% correct with five possible answers, 25% correct with four, and
33% correct with three. Such methods for measuring subject comprehension may not be sufficiently sensitive to detect true
understanding.
Since multiple-choice questions include the correct answer among the incorrect alternatives, they measure consent form recognition,
not recall. Consent researchers seem unaware of this important distinction. As a result, comprehension scores may be higher
on a multiple-choice test than they would be if subjects were asked to tell researchers what they remembered and understood
about the consent process. While these issues are not discussed in the informed consent research, they are well known in the
psychological testing literature (which is similar to informed consent testing process), and can be found in undergraduate
textbooks such as Anne Anastasi's renowned Psychological Testing.
Psychological measurement demands that comprehension measures be both valid and reliable. Of the nine studies, only two addressed
the content validity of their comprehension measures, but did so in very vague terms.7, 10 Coyne et al. stated only that "Content validity for the measure was high, as evidenced by the judgments of a panel of experts who reviewed
the questionnaire,"7 (p. 837) but did not include specifics of those judgments. Cardinal, who used three experts, reported that "Agreement was
reached in each of these areas" (p. 296), but did not discuss specifics of that agreement.10 Coyne and Cardinal were the only researchers to address reliability of their comprehension measures. Without validity or
reliability for their comprehension measures, such studies become scientifically meaningless.
Minimally acceptable understanding? One study using multiple-choice questions9 found a statistically significant difference in understanding between a Low Reading Level consent form (grade 6) and a High
Reading Level consent form (grade 16). But the actual difference in comprehension was a difference of only one more answer
correct! Subjects with less than a high school education answered 12.88 questions correctly; those who attended college correctly
answered 13.95; those who graduated college correctly answered 14.31. Based on 21 multiple-choice questions (the number of
alternatives was not given), subjects correctly answered 61% to 68% of the questions. Does that demonstrate "comprehension?"
In an academic setting, that would be a "D" grade.
In another study that measured legal contract comprehension,8 which can be compared to an average consent form, subjects correctly answered 65% of the questions about the plain-language
version versus 51% of the legal contract version. This 14% difference in comprehension was statistically significant, but
does 65% correct demonstrate "comprehension?" The authors suggest because legal documents are complex, or don't fit with people's
understanding of the law (or what they've seen on television), plain-language legal contracts may not lead to major changes
in comprehension. The same might be said of consent forms.
A study asking subjects to paraphrase jury instructions4 found about 40% to 54% correct responses, concluding that conceptual difficulty accounted for much variation in comprehension
scores. Sentence length had no effect on subject comprehension, but educational attainment did, with the best comprehension
demonstrated by those subjects with the most years of education. A second study of rewritten jury instructions without problematic
construction problems found 43% correct answers with modified instructions versus 32% with the original. While this 11% difference
is statistically significant, the question still remains: Does 43% correct really demonstrate understanding?
A vaccine information pamphlet study5 found that subjects better understood (by 15%) a university-designed pamphlet at a sixth-grade
reading level compared to the CDC's tenth-grade version. Comprehension was measured with nine questions; subjects correctly
answered 72% (six and a half of nine questions) on the university pamphlet and 56% (five of nine questions) on the CDC version.
Is a statistically significant difference of 1.5 answers on nine questions meaningful?