I Can’t Believe I’m Looking at Test Scores

Here’s the (incendiary) headline: Test Scores Show Dramatic Declines!

Here’s the truth: this set of test scores tells us nothing for certain. The data are apples-to-oranges-to bowling balls muddled. If anything, if you still believe test scores give us valuable information, the data might be mildly encouraging, considering what students have encountered over the past 18 months.

More about the numbers later. First, let me share with you the moment I stopped believing that standardized test data had any valid role in determining whether students or schools were successful.

I was attending a State Board of Education meeting in Lansing. These are monthly day-long affairs where education policy and affairs are discussed and instituted. (Sometimes, the legislature passes different laws, in an attempt to undermine the State Board, but that’s not relevant in this example.) The Board, on this occasion, was setting cut scores from a round of new testing data.

I can’t tell you what year this occurred, exactly, but it was after NCLB was passed, and the Board was doing what they were supposed to do: managing the data generated by federally imposed standardized testing, grades 3-8. 

Until that meeting, I assumed that there was a hard, established science to setting cut scores. I thought scores were reasonably reliable, valid measures of learning and there were pre-determined, universal clusters of students who would be labeled proficient, advanced, below basic or whatever descriptors were used. I assumed there were standard, proven psychometric protocols—percentage of correct answers, verified difficulty of questions, and so on. I was familiar with bell curves and skewed distributions and standard deviations.

What surprised me was how fluid—and even biased– the whole process seemed. There was, indeed, a highly qualified psychometrician leading the discussion, but a lot of the conversation centered on issues like: If we set the Advanced bar too low, we’ll have a quarter of the students in Michigan labeled Advanced and we can’t have that! If we move the cutoff for Basic to XX, about 40% of our students will be Below Basic—does that give us enough room for growth and enough reason to put schools under state control?

The phrase “set the bar high” was used repeatedly. The word “proficient” became meaningless. The Board spent hours moving cut bars up and down, labeling groups of students to support their own well-meant theories about whether certain schools were “good” and others needed to be shut down. So much for science.

The problem is this: You can’t talk about good schools or good teachers or even “lost learning” any more, without a mountain of numbers. Which can be inscrutable to nearly everyone, including those making policies impacting millions of children. When it comes to standardized test score analysis, we are collectively illiterate. And this year’s data? It’s meaningless.

Bridge Magazine (headline: Test Scores Slump) provides up/down testing data for every school district in Michigan. The accompanying article includes plenty of expert opinion on how suspect and incomplete the numbers are, but starts out with sky-is-falling paragraphs:  In English, the share of third-graders considered “proficient” or higher dropped from 45.1 percent to 42.8 percent; in sixth-grade math, from 35.1 percent to 28.6 percent; in eighth-grade social studies, from 28 percent to 25.9 percent.

These are, of course, aggregated statewide numbers. Down a few percent, pretty much across the board. Unsurprising, given the conditions under which most elementary and middle school students were learning. Down the most for students of color and those in poverty—again, unsurprising. Still, there’s also immense score variance, school to school, even grade to grade. The aggregate numbers don’t tell the whole story–or even the right story.

The media seemed to prefer a bad-news advertising campaign for the alarming idea that our kids are falling further behind. Behind whom, is what I want to know? Aren’t we all in this together? Is a two-point-something score drop while a virus rages reason to clutch your academic pearls?

Furthermore: what does ‘proficient’ even mean? It’s a word which appears repeatedly, with absolutely no precise definition. Everybody (including media) seems to think they understand it, however.

The really interesting thing was looking at district-by-district data. There were places where pretty much everybody took the tests, and schools where almost nobody did. Districts where the third grade scores dropped twenty percent while the fourth grade, in the same school, went up eight percent. What happened there—was it teachers? curriculum? It was also clear that charters, including virtual charters, were not the shining solution to pandemic learning.

What I took away from the data is that public education held up pretty well in Michigan, under some dire and ever-shifting conditions. In some places, kids and teachers did very well, indeed, amidst disruption. Kids without resources—broadband, devices, privacy, constant adult supervision, or even breakfast and lunch—had the hardest time. They’re the ones who need the most attention now. And good luck hiring qualified, experienced teachers to do that.

There’s probably a lot that can be learned from a close look at the 2020-21 data, but most of it isn’t about quantified student learning gains. And please—stop with the “acceleration” crapola. The pace of learning will improve when our students feel safe and part of a community, the exact conditions we’ve been striving for in perpetuity, and aren’t present anywhere, in September 2021.

Stu Bloom said, last week: I’m seriously tired of the politicians, pundits (looking at you, NYT Editorial Board), and policy-makers telling teachers and public schools to single-handedly solve the problems of racism and poverty by increasing test scores. Public schools and public school teachers are not the only ones who have anything to contribute to growing our society!

He then goes on to point out the value of actually investing in public education, in evidence-based policies and practices, designed to improve life and learning for all school-aged children. We know what to do, he says. And he’s right.

It’s time to end our national love affair with testing, to make all Americans understand that educational testing is a sham that’s harmed many children. Testing hasn’t ever worked to improve public education outcomes, and it’s especially wasteful and subject to misinterpretation right now.

26 Comments

  1. Unfortunately, too many powerful people make their living pushing standardized testing nonsense, and too many of them have the ear of the media for various less than ethical reasons. It amazes me that public education seems to be more likely to garner negative attention, other than feel good human interest stories, whether on air or in print. The narrative has been anti-public education for so long that too many no longer even question the premise.

    Like

    Reply

    1. What you say is absolutely true–standardized testing is a growth industry. But saying we’ll never get rid of unnecessary tests because one sector that profits, big-time, will prevent the end of standardized testing is way too simple. There are plenty of education-related profitable busineses that were once booming and are now fading or extinct in schools (typewriters, for example–or manual pencil sharpeners or metal playground equipment or even pianos which have been replaced by electronic keyboards). Manufacturers don’t drive policy or general beliefs. They only respond to demand.

      The rise of standardized testing and the ugly uses of the data generated aren’t driven by profit–they represent a mindshift in measuring school success. And they’re part of a larger picture, one that assumes all schools can be graded, ranked and rated using data, just as we use data for business purposes to determine where to locate a store, how to set it up, how to cheaply staff it, etc. Because we have the analytic firepower to compare data with the rise of inexpensive computers, we can slice and dice testing statistics and use data to shut down schools, elevate certain teaching methods, label ethnic groups, and do lots of racist, biased things we couldn’t do 30 thirty years ago. In the name of ‘scientific’ and ‘precisely measured.’ And we’ve instituted policy to match.

      Like

      Reply

  2. “Furthermore: what does ‘proficient’ even mean? It’s a word which appears repeatedly, with absolutely no precise definition. Everybody (including media) seems to think they understand it, however.”
    Proficiency = Pornography! The US supreme court justice, Potter Stewart, writing about pornography couldn’t define pornography, but “I know it when I see it.”

    Like

    Reply

  3. Proficiency = Pornography is why using the invalid standardized test scores for anything is just mental masturbation. It is “vain and illusory” as Wilson says. It may feel good for a very short time but it is a sad substitute for the real teaching and learning process.

    Like

    Reply

  4. “There’s probably a lot that can be learned from a close look at the 2020-21 data, but most of it isn’t about quantified student learning gains.”

    No, there is nothing to be learned. Crap in crap out is what should be learned. Or as Russ Ackhoff puts it “Doing the wrong thing righter results in one being “wronger”. It’s the wrong thing to do and there is nothing to learn from invalid corrupt data. (hint, that’s a basic scientific concept, bad data = invalid conclusions that should be rejected outright)

    Like

    Reply

    1. I can tell you what I learned from looking at the data.

      It was obvious which school leaders felt confident enough to spread the word that MI would not punish any district that didn’t test 95% of its kids, so testing was optional this year–and which school leaders insisted that virtually all kids be tested. Again–the data yielded was, to put it mildly, invalid, but there were clear distinctions between districts. Most of the districts where few kids tested were economically advantaged. They were districts that didn’t ‘need’ data to have enough resources to educate their students. And there were districts (mostly poor rural and urban districts) which feared non-compliance and forced all their students to test.

      There were such gaps between schools, grades and teachers that you couldn’t find any trends, year to year or school to school. Looking at the data, even a statistical novice could perceive that it was random and useless. The disappointing thing is that every newspaper in MI was unable to perceive that, even though every Dept of Ed employee, university-based talking head and district superintendent told them the data told us nothing. They stuck with ‘8th grade Social Studies down 2.1 points! Crisis!’

      Like

      Reply

      1. We already knew all those things. Or at least I did since before this century. The tests results in no way can be used to verify that information. We already knew it.

        Yes, I know that the lingo of education these days is data, data, data, especially from standardized tests. And it is a false lingo lacking in fidelity to truth in discourse as it has been proven to be onto-epistemologically, fundamentally a false way to assess the teaching and learning process and what the students may have learned. It’s a crock! Noel Wilson showed us just how invalid the whole process is in his seminal, never refuted nor rebutted 1997 “Educational Standards and the Problem of Error.” If you haven’t read it, you should. See: https://epaa.asu.edu/ojs/index.php/epaa/article/view/577/700

        Like

      2. Duane, nobody here is arguing with the fact–FACT–that standardized testing is harmful and tells us little. I know that data is often false, often misleading, frequently corrupt. What’s even worse than the collection of the data is the way it’s interpreted and misused.

        I have not read Wilson’s book–but I have read Alfie Kohn’s book on standardized testing (20 yrs ago), Anya Kamenetz’s book, all of Daniel Koretz’s work and all of David Berliner’s work–just off the top of my head. I hosted a Phillip Harris seminar (and had to read his book) and attended a national forum at the National Science Foundation on assessment. In grad school, I had to pay over $100 to read an original copy of Donald Campbell’s theories about standardized testing–Campbell’s Law. I worked for three years for a national non-profit that assessed teacher quality, field-testing performance assessments. I believe I am an assessment-literate educator–and I am convinced that standardized testing, even under perfect conditions, is wrong, and essentially useless.

        It is, however, the coin of the realm, currently. And every person who writes a blog explaining how reporters have misinterpreted the numbers is doing the right thing. Clearly, ‘we’ do not already know those things, as reporters all over the country are wringing their hands over ‘learning loss.’

        Like

  5. Reading about your experience where educators focus on where to set the bar for TESTs, reminded me of the way the very subject of Testing, with its standardized objective goals, lends itself to abstract data driven generalizations. Our struggle to score what learners know diverts our focus from goals like “engaged learning,” and “creative thinking”. Indeed several indelible experiences drove home to me the messy reality of standardized evaluation of kids and worse — how in our push to quantify learning we unintentionally distort what we, and our children learn, from the TESTS.

    Not long after I started giving Author /Illustrator talks, sharing the way simple drawing, stick pictures, can inspire young writers a school in Upstate New York I was asked if I could give a writing workshop for their gifted and talented fourth graders. I choked and started to blither about how I’d never spoken to a “Gifted” group before. The truth is I was afraid. As I was a new writer myself — and having failed at writing in school— I was terrified these kids would expose my thin writer’s veneer.

    Things got worse when I asked how they decided who was Gifted. She said test scores. Suddenly I saw myself in the back of the room carefully filling in little that would show once more what everybody already knew —how stupid I was. If I wasn’t so overwhelmed reliving the raw power of tests to inspire labels like “gifted” and “learning disabled” I might have of said, “The thing is if I was in your school, I wouldn’t be in the gifted group, or invited to hear me speak.” Instead I mumbled something about inclusivity and trying to offer tools that support all learners, when the caller suggested I could talk to both their gifted students, and then to a group of their lowest functioning students —half my time with the gifted kids, half with the slow kids.

    My time with the gifted kids was exciting and inspiring. Indded my worries about what to do were instantly relieved when smiling students arrived with their stellar State Writing Test work. But it was meeting with the NOT gifted students that was unforgettable. When these learners we’re ushered into the room — shoved was more like it —no one looked me in the eye. I’m not sure what they had been told about why they were chosen to meet with the Visiting AUTHOR, but as they sat in a circle look at the floor I had sense we were all lost.

    Suddenly I realized I’d never spoken to a group chosen because they were at the bottom of the pile—because they had learning problems just like I had. I was trying to figure out what to say when I noticed they hadn’t brought anything with them. No state writing sample to share, no papers or pencils, nothing.
    Out of desperation I asked “Where are your State Writing Samples?” There was a long uncomfortable silence as they looked at me like I must be slow. Then a kind girl leaned forward and said almost whispering said, “We didn’t bring them.”
    Of course I knew the answer but still I asked, “Why not?”
    She said simply, “Because they’re no good.” They all nodded and looked at me, a little relieved now that the visiting AUTHOR knew the score.

    AND HOW WE DISCRIMINATE WITH TESTS
    I was so overwhelmed with sadness I couldn’t explain to that girl and her fellow failures that they were being lied to, that the TESTS didn’t really tell adults who was smart or creative just who could write the way the TESTERS liked. A year later I learned some things about the TESTERS I wish I could have shared. It turns out one of the big testing companies was located in a town near me. And one morning while working in the bagel shop I heard a heated discussion about how frustrating it was evaluating students’ tests. I got up asked what they were talking about.

    It seems these were the folks who scored the TESTS. There was a couple of former teachers, an accountant, a business man and an engineer, all retired, but earning a little extra money by scoring students’ work. They were discussing how frustrating it was working within the narrow testing criteria. Everyone agreed that sometimes they could tell a student had misunderstood the instructions or, worse, had something real to say, but the way they said it did not fit the testing framework. Just guessing, I asked if anyone ever saw any drawings students made. Everyone said they often saw pictures in the margins of tests. But it was the engineer—trained to use drawings to think problems through and to share ideas— who said he could see how kids made pictures that SHOWED they had something worth saying, but couldn’t get those ideas into words. Still he was not allowed to make any adjustment. He had to fail them. Pictures didn’t count, period. He said it was a little scary to him to think his grand kids were being judged in this narrow way.

    It turns out these TESTS can even dumb down the smart people who score the tests. My time with gifted and struggling learners showed me again and again that they were so ready to use drawing the way the engineer did. We want creative problem solvers. Unfortunately that kind of visual work has never counted on the TESTs, It won’t count until until students’ get to advance placement math or physics. Go figure.

    Like

    Reply

    1. Thank you for your long and thoughtful comment (and for having the chutzpah to ask to talk to more than just the shiny stars). My Masters degree is in Gifted Studies; it was interesting stuff, but I never could get past all the gatekeeping and power-hoarding in gifted education–or why testing was considered a more reliable way of determining giftedness, a category that I have found to be even less precisely defined than ‘proficient.’ Teachers–good ones–often find sparks in kids that other teachers have given up on.

      I have not been interested in comparative test data for over a decade, but I wanted to see how seriously chaotic this year’s results were–and they were way worse than I expected them to be: Not coherent. Not trustworthy. Not useful.

      Like

      Reply

      1. What I meant to also say is that children who have these intelligences are ignored. The only intelligences that are important in the testing industry are those regarding reading and math – but only solving problems that test makers put on tests that require filling in the dot on a computer or on paper.

        Like

      2. Could you give me a link to that piece- I’m researching what folks know about visual skills (often next to nothing beside Einstein claiming he was a visual (wordless) thinker. I’d very much like to see what you say — open a conversation.
        Roger– Rogessley@gmail.com

        Like

      3. I’m sorry — link you sent was to Reggio in general. I’m well aware of the Reggio community and their advocacy for visual work, Indeed, writing about it’s success and limitations…. but I was looking for the link to your piece? (I should add I’m new to this platform so may be my fault.)
        Roger

        Like

  6. My personal test score epiphany came when a state education rep (NY) came to explain the particulars of the new NCLB act requirements to our faculty. When it came to showing AYP in the various sub-groups, I commented to her that many of our district’s sub-groups had very small numbers of students and that with sample sizes that small it was impossible to draw accurate conclusions regarding a cause and effect relationship between teaching and required Annual Yearly Progress/test score improvements. In addition, I pointed out the fact that those improvements in sub-groups compared scores in the same grade levels with completely different cohorts of students. Her response was very curt, “I don’t give a sh#t!”

    Liked by 1 person

    Reply

  7. This is the same situation with State Departments of Education and accrediting agencies that accredit university teacher education programs. They are requiring more and more testing for certification, which, on top of everything else, cost a graduating senior a lot of money. The scores are arbitrary and set so accrediting agencies and States can say that their State teachers score at the top and are therefore better teachers. So far, I have not seen any research that demonstrates that the scores on these tests are related to teaching performance quality.

    Liked by 1 person

    Reply

    1. It’s weird, isn’t it? All the information that establishes this data as questionable–but states are still willing to pay big bucks to get numbers that almost nobody understands, but still see as Truth.

      Like

      Reply

Leave a comment