21 Jun 2008

The Predictive Validity of the SAT

Submitted by Karl Hagen
A few days ago, the New York times carried an article about the SAT with the headline "Study Finds Little Benefit in New SAT."

Here's the lede:

The revamped SAT, expanded three years ago to include a writing test, predicts college success no better than the old test, and not quite as well as a student’s high school grades

It does seem pretty troubling on the surface if the College Board itself can't show any benefit to the new test, since the new test is significantly longer than the old, as well as more expensive. Those essays cost a lot more money to grade than multiple-choice questions do.

Now I'm demonstrably no College Board fanboy, but when I read the article it sounded almost like a press release from Fair Test, which I don't think has ever met a standardized test that it liked. Certainly the organization has a long-standing animus towards the SAT in all its incarnations. And sure enough, here's the press release.

One obvious difference between the press release and the article is that the former is more precise about what the study was about (predicting first-year undergraduate grades), which the Times piece transmutes into the vaguer—and more damning—"college success." It's also worth noting that the lede claims that the prediction is "no better" than the old test, while the Fair Test press release calls it "not significantly better." As I read it, that suggests it is better, just not by very much.

So I sat down and actually read the College Board studies. There are actually two reports, one on predictive validity for Freshman GPA (FGPA), and the other on differential validity, that is, evaluating the differences in predictive validity for various sub-groups who take the test. For example, how well does it predict the performance of African Americans compared to that of European Americans, or the performance of women compared to the performance of men.

I'll talk about the first report now and postpone discussing the second until a later post.

The meat of the first report is the finding on the correlations between the various components of the SAT (Critical Reading, Writing, and Math) as well as high school GPA (the predictor variables) and freshman GPA (the variable we're trying to predict). Here are the values of r. This is a measure of correlation. 1 would be perfect correlation and 0 no correlation at all. In psychological research, numbers between .3 and .5 are typically taken to reflect a medium correlation and numbers over .5 to reflect a large correlation. These numbers have been corrected for range restriction; the raw values are in parentheses. (I can't find a simple explanation of range restriction freely available on line, but for a somewhat detailed illustration of why it's important in contexts like college admissions, see this paper.)

Predictor R Predictors R
HSGPA 0.54 (0.36) SAT-M, SAT-CR 0.51 (0.32)
SAT-CR 0.48 (0.29) HSGPA, SAT-M, SAT-CR 0.61 (0.44)
SAT-M 0.47 (0.26) SAT-CR, SAT-M, SAT-W 0.53 (0.35)
SAT-W 0.51 (0.33) HSGPA, SAT-CR, SAT-M, SAT-W 0.62 (0.46)


The left column shows the correlation of each element individually with FGPA and the right column shows the components in various permutations as predictors in combination. Note that SAT-M+SAT-CR is equivalent to the old SAT (without the writing).

Fair Test's point is that adding writing to the mix only results in a tiny increase in correlation over the old system: .02 if you're looking at the SAT alone and .01 in combination with high school GPA.

That's not a particularly impressive increase, so Fair Test's criticism does seem reasonable, as far as it goes. Why should we ask students to spend so much more time and money (not to mention the extra stress) when it doesn't make much of a difference?

Note, BTW, that viewed individually, the writing subscore is actually the best predictor of freshman GPA among the three subscores. Why then does it add so little in aggregate? The obvious explanation would be that the Critical Reading and Writing scores are highly correlated, and therefore you don't get a lot of new information by adding writing into the mix.

But there's more to say about these numbers, as well as the spin that Fair Test puts on them.

From the Fair Test press release:

the College Board reports conclude that high school grades are a more accurate predictor of college performance than is the SAT.

Here, Fair test is being literally accurate but inconsistent in its analysis. Considered separately, it's true that HSGPA (r = .54) is slightly better than the new SAT (r = .53), but only by .01. If an increase of .01 in predictive validity is insignificant, so is a decrease of the same amount. If you're going to claim that the new test is not really any better at predicting freshman GPA than the old one (a supportable claim), you cannot turn around and say that it's worse than high school GPA alone. Separately, each has about the same correlation.

Further, the College Board does not suggest that the SAT be used alone as a predictor. Instead, they urge that it be used in combination with high school GPA, and the combination does indeed provide a significantly better predictor than either used alone.

Next time I'll talk about the second study and why judging a test's validity based solely on freshman GPA is far too limited, even if it is the most commonly used measure.