Educational Assessments: Too Much of a Good Thing?

This is the seventh in a series about Virginia’s Standards of Learning educational assessments.

by Matt Hurt

Widespread shutdowns of Virginia’s public schools during the COVID-19 epidemic last year resulted in a significant loss of instructional hours. To make up for lost ground, teachers need to spend as much time as possible with their students. Unfortunately, a new measure enacted with the best of intentions will take students and teachers out of the classroom for two more rounds of assessments, one in the fall and one in the winter.

While the extra tests might be suitable for some school districts, local officials should be allowed to decide whether the assessments best meet the needs of their students.

In 2017 the Virginia Board of Education implemented new criteria for the Standards of Accreditation, which determine if schools are performing up to snuff. Students who failed their SOL tests but demonstrated sufficient “growth” from the previous year counted the same for accreditation purposes as a student who passed his or her SOLs in the accreditation calculation. Under the growth system, students who failed a 3rd grade SOL test would be given four years to catch up and reach proficiency by the 7th grade — as long as they demonstrated progress toward that goal each year.

Given that the original point of the SOLs was to ensure that kids were on track to graduate from high school, giving them time to catch up was a wonderful thing. It is not realistic to expect every child to be successful out of the gate. The old system of basing accreditation on pass rates only incentivized schools to focus on bubble kids (those who were close to passing in previous years) and less on the needier kids. The shift in focus to “growth” encouraged schools to work with all kids.

During the 2021 General Assembly legislative session, HB2027/SB1357 introduced as through-year “growth” assessment program. The idea was to assess students in the fall and winter in grades 3 through 8 in reading and math to use as baseline. The spring SOL data would serve as the post-test to determine proficiency as well as growth from the previous fall.  Legislators hoped that this “growth” measure, which replaces the spring-to-spring growth methodology used in 2018 and 2019, also would provide teachers with a high-quality assessment to help them in their instruction.

There are four major problems with the new through-year “growth” scheme, however. First, the stated objectives of testing for formative purposes and setting a baseline measure of growth are in conflict. Second, the extra assessments rob teachers of critical instructional time they need more than ever. Third, the data has proven too unreliable to use in many cases for guiding instruction. Fourth, a student potentially could demonstrate growth every year and never demonstrate proficiency.

Problem #1. When teachers administer a formative assessment, it is important for students to try their best to “show what they know.” Teachers use the data to remediate students who demonstrate need. When a pre-test measuring baseline performance is used in conjunction with a post-test measuring growth, there is no incentive for teachers to encourage students to try their best. Students who “sandbag” in the fall will show the most growth by the spring. This is problematic because there is no metric for measuring “sandbagging,” nor even a means by which to detect it. That’s not to say that teachers actively work to get their students to put forth a poor effort. There’s just no incentive for them to encourage them to try their best on the fall tests.

Problem #2. The new legislation requires that all of the assessments (fall growth, winter growth, and spring SOL) to be administered within 150% of the time it takes to administer the traditional SOL test. While the standard can be met with respect to students’ seat time taking the tests, the procedure cuts into teacher’s time. Security training, extra test sessions to meet the accommodations of students with disabilities, extra test sessions to assess students who had been quarantined due to COVID exposure or diagnosis, and altered schedules to ensure sufficient staff for test administration and proctoring are just a few of the considerations that impact teachers’ time. This fall, district officials reported that administration of the fall “growth” assessments disrupted instruction within their schools anywhere from three to twenty one days.

Problem #3. Data collected from the fall through-year “growth” assessments have proven unreliable. Comparing spring results to spring results covers the full yearly cycle. Comparing fall results to spring results yields different results because it does not take into account the inevitable “fade” effect over the summer. Thus, fall-to-spring comparisons tend to look better than spring-to-spring comparisons.

Table 1 summarizes results for approximately 35,000 students who took a spring 2021 SOL test and also took the corresponding fall 2021 through-year “growth” assessment. Students generally proved less proficient on the same content in the fall than in the spring. A few schools’ and divisions’ fall results aligned with their spring results, but most did not. 

There are a few caveats to consider when evaluating this data. 

  1. Due to the time constraints imposed by the state code which mandates these tests described in Problem #2 above, the fall tests are shorter and assess fewer skills than do the SOL tests. For example, the Math 5 fall test, which measured skills from Math 4 SOLs, contained 31% fewer questions and assessed 16% fewer standards than the Math 4 SOL test.
  2. There was more time, energy, money, and personnel allocated to summer learning in 2021 than in any other time in known history.  More students participated in summer school and the pupil teacher ratios were much lower than previous years. It is unreasonable to expect that these efforts would not demonstrate some improvements in outcomes by the fall.
  3. Given the incentive structures outlined in Problem #1 above, it is quite likely that students were not encouraged to do their best on the fall tests like they traditionally are on the SOL tests. Every year, teachers and schools offer students incentives for good effort on their SOL tests, such as pizza parties, etc. During the administration of the through-year “growth” assessment this fall, not one pizza party was offered to incentivize student effort. 

Problem #4. Given the concerns about the data, it is possible that more students will demonstrate growth by the spring of 2022 than warranted. If the fall data was skewed negatively (as was demonstrated in Table 1), and the administration protocols and incentives associated with spring SOL data remains consistent with previous years, as is expected, then many more students will demonstrate “growth” from fall 2021 to spring 2022 than from spring 2021 to spring 2022. As the Virginia Department of Education can be counted on to use the more beneficial statistic for school accreditation, many students likely will demonstrate “growth” from fall to spring each year yet never reach proficiency.

Proposed Solution. Many variables impact student achievement around the state. Why should school districts be compelled to adopt a one-size-fits-all solution that disrupts so many days in our school? Why not allow divisions to choose whether or not through-year “growth” assessments are worth the chaos and loss of instructional time? We already have a growth measure in place (the spring-to-spring growth method) that has proven beneficial for students and schools. Why not allow schools to choose to use them?

As was demonstrated in the piece Outperforming the Educational Outcome Trends- Virginia’s Region VII, some schools and divisions have made significant improvements in student outcomes in recent years. Region VII (and other divisions in the Comprehensive Instructional Program consortium) have used common benchmarks for years to monitor progress and obtain formative data to drive instruction. These assessments have proven reliable to the point that on average, a school’s SOL pass rate can be predicted based on benchmark data within plus/minus one point of accuracy. This school year, some divisions have elected to suspend administering these assessments in order to try to give back instructional time to teachers that was stolen by the through-year “growth” assessments.

In conclusion, the General Assembly would better serve the school children of Virginia by doing the following.

  1. Determine what is working, and then not intervene in those successes. In other words, “if it ain’t broke, don’t fix it”.
  2. Do not expect that less instructional time will yield better student outcomes, at least in real terms. You can’t fatten a pig by continuously weighing it. There are sufficient formative and summative assessments in every division, and more assessments serve as a distraction.
  3. Do not expect that “pre-test” data will serve as a reliable student-growth measure. The incentives in place do not support accurate data collection in the fall. Do expect that such a measure can demonstrate “growth” that doesn’t necessarily yield proficiency over time.
  4. Determine if we really wish to seriously attack the real problem of subgroup gaps or if we wish to whitewash the problem with a flawed through-year “growth” methodology.

Matt Hurt is executive director of the Comprehensive Instructional Program based in Wise County.

Share this article


(comments below)


(comments below)


29 responses to “Educational Assessments: Too Much of a Good Thing?”

  1. dick dyas Avatar

    Why are Assessment Days and Grading Days always on weekdays? Why not have them on Saturdays? ( Pay the teachers time-and-a-half.)

  2. Nancy Naive Avatar
    Nancy Naive

    Johnny, this is the fourth day this week that you’ve been late to class. You do know what that means, right?
    Yes ma’am. It’s definitely Thursday.

  3. LarrytheG Avatar

    I very much appreciate Matt writing these essays and I’m looking forward to him doing it on a regular basis.

    This one, I need to read a couple of times… in the weeds for me! Which is my own ignorance that I need to work on.

    I have to say that “someone” gave a lot of thought to the assessments but clearly did not vett it far enough down the food chain … which I understand is not necessarily one of VDOE’s finer qualities… 😉

    I’m also told that the school year has been extended , true?

    1. Matt Hurt Avatar

      Well, no one who 1) understands the incentives of educators and 2) is interested in ensuring at-risk student populations become more proficient invested serious thought into this scheme. This sounds really good if you think about it really fast and on the surface. Once you start thinking about second and third order consequences, the idea really begins to fall apart and become more of a way to make things look better on paper.

      The more I think about this, and the proficiency of our students in general, the more I understand that there are two types of educators (and people in general). There are those who have an internal locus of control who believe that they have a significant bearing on outcomes. The inverse of that are those who have an external locus of control- things just happen and we can’t control the outcome. If you fall in that second camp (external locus of control), you probably aren’t keen on personal accountability, and this growth scheme sounds really good.

      The more insidious possible explanation for such things (as well as lowering expectations for student outcomes) would be that you don’t think large percentages of certain subgroups of kids can meet the ridiculously low bar the state has set for proficiency, therefore you need to find other ways to manipulate the system to make everything work out acceptably. I would characterize this as the “soft bigotry of low expectations”. The rationale on the back end of that is driven by compassion (we have to look out for their self esteem and make them feel better about their lack of performance), but the front end is driven by ignorance (these kids can’t be successful). I don’t think that nearly enough folks have searched their hearts to think about their beliefs of student capacity by subgroup. Folks need to decide if all kids have capacity, or just some kids.

      1. LarrytheG Avatar

        That’s a pretty scathing commentary, and especially so coming from an educator who reads and understands policy proscriptions.

        But I might also posit that such negative attitudes might also come from someone in the system who knows the way the system is now is not going to succeed with every child. I’ve heard such sentiment expressed from retired teachers who say that given the resources that ARE available to help kids, you end up knowing that some are within your power to help and others need more than you have or have access to resources to do. So you do your best, but you know it’ll take more than that for some – and those kids are usually in difficult family situations…not stable, more chaotic, kids end up moving around, changing schools or being absent 3 times more than average, etc.. These are what I often refer to as “harder to teach”. Even a good teacher who has to split their time 15 -20 different ways can’t shortchange most of them to try to “save” one kid. At the teacher level where you are going to be held accountable, it’s according to how many of your kids succeeded, not if you saved one kid, but others failed that could have not.

        It’s a conundrum……….. Surely this is VDOE’s job but not theirs alone and at the end of the day, as you intimate, no one seems to be directly in charge of who is responsible for THESE kids…..

        And no, I don’t think Charter schools do any better at this – we simply don’t know because most do not provide the level of detail on academic performance on a subgroup basis like public schools must – and yes I DO give credit to VDOE for ensuring that and making that data widely available to everyone including public education’s most harsh critics.

        1. Matt Hurt Avatar

          “Every system is perfectly designed to get the results it gets.” This statement is as true as the day is long. My biggest problem with all of this is that if our subgroup gaps are a major concern, why has no one taken ownership of it? For example, take a peek at the BOE’s adopted priorities.

          Priority 1: Provide high-quality, effective learning environments for all students
          Priority 2: Advance policies that increase the number of candidates entering the teaching profession and encourage and support the recruitment, development, and retention of well-prepared and skilled teachers and school leaders
          Priority 3: Ensure successful implementation of the Profile of a Virginia Graduate and the accountability system for school quality as embodied in the revisions to the Standards of Accreditation

          Who is taking ownership of improving our subgroup gaps. According to this information, it’s not a stated priority of the Board. I see no where in their comprehensive plan (unless I overlooked it) that they hold themselves accountable for making sure that happens. The General Assembly hasn’t passed anything to that end. The Superintendent of Public Instruction hasn’t taken ownership.

          If no one is responsible for it, it’s not a priority and it surely won’t get fixed.

          1. LarrytheG Avatar

            I get what you are saying here but the direction VDOE has taken on inclusion, equity and diversity is not associated with that?

            I thought that – that was a tacit admission that current practices may not be reaching kids effectively in certain demographic subgroups.


          2. Matt Hurt Avatar

            Well, let me put it this way. Just because you say stuff and do stuff does not mean the intended outcomes will come.

            Please suffer through this example with me. A few years ago we went into a deep dive on evaluations. We during that process, we collected teacher evaluation goals that were related to student outcomes. One of the most prolific goals that we found among schools that were struggling with their reading SOL scores was to improve reading fluency. The problem with that particular goal is that the SOL test does not specifically measure fluency. Yes, a more fluent reader does have a leg up in reading a passage, but there are toms of other skills assessed by the SOL test. If the teacher puts all of his/her eggs in the fluency basket, those kids may really be fluent, but lack the other skills which enable them to pass their SOL test.

            In my estimation, all of the strategies I have seen come out of Richmond with the intention of addressing inclusion, equity and diversity will not move the achievement needle, at least in a positive direction. I have no doubt that these things have been put in place with the best of intentions, but the likelihood if negative unintended outcomes (students being worse off proficiency wise) is greater than the intended positive outcomes.

            When I look at the Board’s strategic plan, I can see examples of things that might tangentially address some real issues, but nothing in there hits them head on. If the board’s not willing to accept responsibility for improving outcomes for students, then who is? Their first stated priority is to “Provide high-quality, effective learning environments for all students”. Notice, there’s no mention anywhere of providing equitable outcomes.

          3. LarrytheG Avatar

            re: ” If the board’s not willing to accept responsibility for improving outcomes for students, then who is?”

            So an obvious question is how such a thing would actually work and would it result in more top-down from Richmond rules – like for instance, standardized letter grading tied to SOL type skills… or similar?

            Seems like if someone in RIchmond was going to be held directly accountable, then they would want to hold people further down the food chain also accountable or at least be able to make changes that directly affect their accountability?

            So – for instance, if someone in Richmond is accountable would they have a role in making changes in schools that fall short and thus the Richmond person gets the blame?

          4. Matt Hurt Avatar

            Exactly. So instead of holding folks more accountable (which might at least help improve outcomes), they’re holding them less accountable. This is one of the reasons why I truly believe in my heart that our leaders are not all that concerned about our kids not being as proficient as they could otherwise be. I just think that if subgroup gaps among our students were something that kept you up at night, that you wouldn’t let folks off the hook who can do something about it.


          5. LarrytheG Avatar

            I don’t argue pro or con on this but I think if someone in Richmond is held accountable, they’re going down the food chain to hold those folks accountable which may well be School district administrators and school principals.

            I DO KNOW of a case where Richmond sent a “take-over” specialist and it did not go well.. because she was not really interested in changes as much as getting scalps…..

  4. LarrytheG Avatar

    We should talk about the kids that are in economically disadvantaged subgroup which often has numbers of minority kids. Another description used is sometimes “at risk”.

    What exactly does that mean?

    What are some of the characteristics of such kids that are in this category, and what issues do teachers have to do with in their regard?

    We get lots and lots of blame game recrimination here in BR as to who is at fault for the “failures” and it often sounds like these kids are well within the realm of school and teacher responsibility and if they “fail”, then there ought to be sanctions and changes, often to turn these kids over to Charter schools willy-nilly – without hardly a word about what these kids problems actually are much less what Charter Schools would do differently to succeed or worse some kind of foolish statements that it’s parents fault also for not “helping” their kids.

    So I’d posit that perhaps a clear and concise description of what constitutes an “at-risk” kid be given and then we can talk more informatively about what the schools and teachers should be doing to help these kids succeed and not “fail”.

    1. James C. Sherlock Avatar
      James C. Sherlock

      If you read the series of columns that I am in the midst of writing on Loudoun County schools, you will see the subcategories into which children are divided for reporting purposes. “At risk” is not a term or art in Virginia law and policy.

      There are both racial and socio-economic subgroups whose progress or lack of it is reported following federal guidelines as well as state law and policy.

      If you look at the spreadsheets which accompany my work, you will see them.

      1. LarrytheG Avatar

        “At risk” is not a category, I agree, but instead a generic term that includes all kids that for various different reasons can and do fail.

        I sometimes read your posts but not always since we agreed not to comment on them and I often disagree with your views.

    2. Matt Hurt Avatar

      In my world, “at-risk” refers to a student who is at-risk of not being at least proficient on grade level standards. Not all economically disadvantaged students find themselves in that group, and we certainly have some doctor’s and lawyers kids who are.

      In my experience, most of these kids come from families that don’t value education very much. In fact, many of their parents had a tough time in school, and they expect their kids will as well. That’s why relationships are so critical in working with these kids- they need a reason to do what the teacher asks them to do, because they and they’re families’ are not concerned about getting a good education.

      A lot of SPED kids also find themselves in this group. They have identified learning problems that make it harder, though not impossible, for them to accomplish the same learning as others.

      1. LarrytheG Avatar

        Thanks. So… of the demographic sub-groups that comprise those that do fail SOLs , how do they break out in terms of economically disadvantaged and “at risk” minus and/or separate from SPED?

        My impression, perhaps based on ignorance, is that SPED kids have explicitly identifiable issues that may be lifelong and permanent and who often require specific approaches unique to each child.

        As opposed to kids who may have learning disabilities that are “correctable” with proper help.

        And then the kids who are intellectually capable of basic SOL skills but are not performing at their potential – and these are the ones that many folks believe that the schools “fail” at.

        I realize I’ve given you ample fodder to be critiqued but probably helpful for you as an educator to provide your insight on these which may vary from what many non-educators might think, especially given the current culture-war politics of blaming public schools for “failing” kids.

        1. Matt Hurt Avatar

          It’s funny that you bring this up, as this is a topic of essay #9 in this series, and I’m working on that one right now. I will give you a spoiler though- it has more to do with the adults than the kids.

          BTW, I do really appreciate the questions. They do just as you say.

  5. James C. Sherlock Avatar
    James C. Sherlock

    Matt, I always appreciate and learn from your work. I understand exactly what you are saying and I agree with you. Sometimes the kind of point you make is lost on the voters in general.

    So I ask you to make absolutely sure that you don’t give the left any reason to cancel SOLs, whether they misinterpret you or not. They really, really want to eliminate what they call “high stakes” testing.

    Standards of learning across the state show who has been peddling Dr. Good and who is getting the very difficult job of educating kids right.

    I am in the midst of a series of articles right now that use SOL results to prove that the Loudoun school board president and the superintendent filed a demonstrably false report with the state, whether intentionally or not. The victims are the most vulnerable kids in that school system.

  6. James C. Sherlock Avatar
    James C. Sherlock

    Matt, I always appreciate and learn from your work. I understand exactly what you are saying and I agree with you. Sometimes the kind of point you make is lost on the voters in general.

    So I ask you to make absolutely sure that you don’t give the left any reason to cancel SOLs, whether they misinterpret you or not. They really, really want to eliminate what they call “high stakes” testing.

    Standards of learning across the state show who has been peddling Dr. Good and who is getting the very difficult job of educating kids right.

    I am in the midst of a series of articles right now that use SOL results to prove that the Loudoun school board president and the superintendent filed a demonstrably false report with the state, whether intentionally or not. The victims are the most vulnerable kids in that school system.

  7. James C. Sherlock Avatar
    James C. Sherlock

    Matt, I always appreciate and learn from your work. I understand exactly what you are saying and I agree with you. Sometimes the kind of point you make is lost on the voters in general.

    So I ask you to make absolutely sure that you don’t give the left any reason to cancel SOLs, whether they misinterpret you or not. They really, really want to eliminate what they call “high stakes” testing.

    Standards of learning across the state show who has been peddling Dr. Good and who is getting the very difficult job of educating kids right.

    I am in the midst of a series of articles right now that use SOL results to prove that the Loudoun school board president and the superintendent filed a demonstrably false report with the state, whether intentionally or not. The victims are the most vulnerable kids in that school system.

  8. James Wyatt Whitehead Avatar
    James Wyatt Whitehead

    Are the Fall and Winter tests administered and scored like the Spring SOL test? If so there is a silver lining. The school “testing coordinator”, a full time position, will have something to do all year long now.

    1. Matt Hurt Avatar

      James, school testing coordinators are only full time positions in the lands of “milk and honey”. In the majority of the state, folks with full time positions have to pick up those responsibilities.

      1. James Wyatt Whitehead Avatar
        James Wyatt Whitehead

        Much better to have administration pick up that duty. You want the leaders to have their fingertips on the process and the data.

        1. Matt Hurt Avatar

          While there are some administrators that tackle that duty, it’s not the norm. A bunch of school counselors are also school testing coordinators.

          1. LarrytheG Avatar

            maybe in some larger schools it’s’ a dedicated position but in some schools I’ve heard that grade level leaders do that.

            The thing that interests me is how kids are assessed between the SOL testing years…. Is there some kind of standardized approach to it so that kids that are behind and likely will fail their SOLs are identified early enough to get them SOL-ready?

          2. Matt Hurt Avatar

            In grades 3-8 reading and math they’re assessed every year. NCLB brought that to fruition, it used to just be 3, 5, and 8.

            This is not a problem with history, because the history tests are pretty much stand alone. It’s also not much of a problem at the high school level, except for English 11 because all of the other high school tests can stand alone.

            The big problems are with Science 5 (which assesses Science 4 and Science 5 content) Science 8 (which assesses science content from grades 6-8) and English 11 (which assesses reading and writing skills from grades 9-11). The way that we deal with this is through benchmarking in all of those grades. Everyone knows that the benchmark data will tell us if kids in the lower grades are on track to be successful when they get into the SOL tested grade for that subject.

          3. LarrytheG Avatar

            So…. the SOLs are state-wide standardized.

            Are these other tests and benchmarks also standardized statewide (so that for instance, a test score for a kid in Henrico would be comparable to a kid in Wise – on a one-to-one basis or are these “in-between SOL” tests and benchmarks different per school district or school?

            Are these tests/benchmarks anchored in terms of skills criteria along similar lines?

            For instance NAEP and SOLs have some “sort of” connections…. such that an SOL score might indicated what the NAEP score might be for that kid – ballpark?

            Probably have said and asked too much but basically trying to understand the non-SOL testing and benchmarking as to whether it’s standardized and how or if it related on some level to SOLs and NAEP.

          4. Matt Hurt Avatar

            I think my #9 essay that I just sent to Jim explains a lot of that stuff, so I’ll let you read that first. Please feel free to ask away after you read that essay.

          5. LarrytheG Avatar

            okay, thanks.

            And keep writing – your essays are information and objective without all the culture war stuff, a welcome change!

Leave a Reply