Yeah, We're Getting Dumber: Flynn Effect Has Reversed in U.S.
Print Friendly and PDF

A little under a century ago, the late philosopher James Flynn wrote to the late Arthur Jensen that raw IQ test scores were consistently increasing. Jensen was skeptical, but Flynn was able to meet most of his challenges. In The Bell Curve, Herrnstein and Murray labeled this phenomenon of rising raw test scores the Flynn Effect.

It remains a pretty mysterious thing. (Here’s my theory.)

But, everybody agrees, scores can’t go up forever. And indeed in some places with conscription and good environment like the Nordic countries where we have test scores for conscripts, it stopped going up a while ago. (I presume that if you score high on the conscription IQ test, you get a more genteel job in the army, so test-takers are probably pretty motivated to try hard.)

But the US was lacking in data for adults. So this is an interesting paper from Intelligence:

Looking for Flynn effects in a recent online U.S. adult sample: Examining shifts within the SAPA Project

Elizabeth M. Dworak, William Revelle, David M. Condon

May–June 2023,

There’s a big project called the Synthetic Aperture Personality Assessment run out of Northwestern U. that gives a personality test to anybody who drops by online, with almost 400,000 taking it. It includes a bunch of IQ-type questions as well as the usual repetitious personality questions. The modal participant appears to be a 20-something woman taking a psychology course, but the sample size is so immense that they can slice and dice the data almost anyway they want.

Keep in mind that it’s a volunteer sample and it’s a very low-stakes test.

… Participants (N = 394,378) recruited between 2006 and 2018 were used to examine the 35-item composite ability score, matrix reasoning, verbal reasoning, and the 8-item letter and number series. …

Participants recruited between 2011 and 2018 (n = 303,540) were subset into an additional sample to examine the 60-item composite ability scores, the 9-item letter and number series, and three-dimensional rotation; scores for these domains were only collected during this time range. …

For ICAR scores assessed with 35 items from 2006 through 2018, a small average annual slope of −0.013 SD per year was observed across all ages…

So, at that rate it would take 77 years for overall national IQ to drop a standard deviation. That Negative Flynn Effect might be half or a third of the size of the regular positive Flynn Effect at its peak.

Likewise, after accounting for the new item types collected in 2010 and 2011, overall ICAR scores measured with 60 items from 2011 through 2018 showed a small average annual slope of −0.009 SD (Range = [−0.063, 0.017] SD per year).

But when adjusted for educational attainment, the Negative Flynn Effect is bigger: -0.021 per year for the 35 items from 2006-2018 and a sizable -0.033 for the 60 items from 2011-2018. The latter would be in line with a drop of one standard deviation for, say, college graduates in just 30 years.

College graduates are getting dumber, which may explain a lot about the spread of wokeness.

Here are a couple of caveats about the methodology:

As sampling demographics and sizes were inconsistent across the 13 years of data and the SAPA Project depends on individuals who are interested in taking an online survey, it might also be the case that those interested in taking an online personality survey have changed. In the early years of recruitment for the SAPA Project, it is likely that a large proportion of individuals who originally took the survey were either directed to it by an instructor, heard about it at a research conference, or found it through websites/sources associated with academia. As the annual sample sizes have increased in more recent years and the SAPA Project has discussed in more public outlets such social media or online articles, it’s plausible that newer annual samples are more “average” or normal representation than those recruited during the SAPA Project’s former years.

Beyond inconsistencies in demographics across the sample, another factor that could be accounting for lower scores for more recent participants could be due to a decline in motivation.

For instance, I didn’t work very hard on the IQ questions and rushed back to the Let’s Talk About Me! questions.

As the SAPA Project is advertised as a personality survey, individuals seeking out the SAPA Project may not be fully engaged with items not measuring temperament at the capacity as they are with more typically considered personality items. As performance is a function of both ability and motivation, participants not trying as hard on ICAR items might also help explain why a reverse Flynn effect was observed despite more recent samples having greater proportions of participants with higher education. This lack of motivation, however, would fail to explain why scores for the most difficult ICAR domain, three-dimensional rotation, were higher for the most recent samples.

The SAPA test asks about race, but this paper doesn’t mention the impact of changing racial demographics over this 12-year span.

[Comment at]

Print Friendly and PDF