Data Update Methodology

Massachusetts Reporting Change November 6, 2020

Charlie’s done it again. It is becoming increasingly clear that one way to fight the pandemic in Massachusetts is to move the goalposts if the numbers are worsening.  (This is a variant of the idea that if you pretend there is no coronavirus, then there is no coronavirus.  Here’s looking at you, lame duck president). 

That can be done in several ways. First, stop reporting a measure if it doesn’t look as good as it did previously. For example: drop suspected cases from the count of patients in the hospital, in the ICU, or intubated. That way, you can make the increase in the number of hospitalizations look smaller.  Or hide statistics on the test positivity rate based on individuals, not tests.

Here’s another idea: change the way a particular statistic is measured to make things look better. The latest example: the color coding system used to define risk levels across communities in the state. The state has redefined the four risk-level color codes (red, yellow, green, and grey) so that it is much more difficult for a community to fall into the red (riskiest) zone, even as cases and test positivity increase rapidly.

Table 1 is a comparison of the community level data through October 31 (the latest data available) showing the difference between the old and new coding systems.


Table 1: Comparison of Old and New Community Color Coding
City and Town Data Two Weeks Ending October 31,2020
Measure Red Yellow Green Grey
Old Coding % of State Population
71% 21% 3% 5%
New Coding % of State Population
15% 47% 28% 11%
Old Coding Number of Cities/Towns 155 67 9 120
New Coding Number of Cities/Towns 16 91 79 165


For example,  71% of the state’s population would be living in cities or towns coded at the highest risk level) if the state were still reporting using the old system.  This is 155 out of the 351 cities and towns in the Commonwealth.  But redefine the color codes, and presto, only 15% of the population in 16 cities and towns are in the red zone.  Under the old criteria, only 8% of the population is now living in cities and towns classified as green or grey (the lowest risk level).  But under the new system, 37% of the population lives in those low-risk communities.

We get it Charlie –  you want to get students back into the classroom.  But why not make your case with consistent measures of community risk over time? And while I’m at it, the travel restrictions are a joke.  Only travelers from other states with case rates of fewer than 10 cases per 100,000 people per day are exempt from a 14 day quarantine on arrival in Massachusetts (not that anybody is taking this seriously anyway). This is from a state that has a case rate of over 15 cases per 100,000 people per day.  Huh?

Data Update Methodology

Massachusetts Reporting Change November 2, 2020

Well, the state did it again. That Charlie Baker certainly is sly. The state revised the daily dashboard to eliminate useful information and generally provide a more upbeat assessment of the state of coronavirus in Massachusetts. This is a quick summary of the changes as I’ve identified them so far. I will perhaps have more later as I work through the changes, and I will also have to modify the data updates to accommodate these changes.

There are three significant changes as far as I can tell. First, the state dropped information about the number of people tested for the first time each day, and the total number of people tested. This means that the newly tested positivity rate (which was significantly higher than the all test positivity rate) can no longer be determined, nor can the breakdown between newly tested individuals and repeat testers. While the state is still reporting new confirmed and cases,, they are no longer reporting how many new people have been tested to determine a positivity rate. [Update.  Likely my mistake. Although the state did drop this from the dashboard, I might have missed the first time tester information in the files to be downloaded.  I still might be able to calculate this].

Second, they eliminated the race/ethnicity report. This was a way to double check and properly scale the weekly breakdown of cases, hospitalizations, and deaths by age group. They are providing certain information by age, but I don’t think at first glance that it will be sufficient to continue to do this calculation.

Finally, they eliminated suspected hospitalizations from the hospital count. Thus, you will see a significant drop in hospitalizations today. That is fake news. In fact, 33 confirmed patients were added to the total today.

Disappointing on the whole. In my opinion, the state should be adding information to the report, not eliminating it.


Massachusetts Covid Breakdown by Age Part I: Methodology

Since late August, I’ve wanted to perform an analysis of Covid cases, hospitalizations, and deaths by age cohort. Unfortunately, the reporting of this information by the state is (1) not transparent (2) internally inconsistent, and (3) sometimes clearly incorrect. I’ve spent time over the last month attempting to compile data from the weekly public health reports to which this information has been relegated. That work has been frustrating, to put it mildly.

The remainder of this post details the issues with the age cohort data provided by the state, and what I’ve done to calculate estimates of these important coronavirus statistics from the data provided. Parts of this post are technical in nature, so skip to the following posts for the bottom line.

Through August 11, the state provided a daily summary of cumulative cases, hospitalizations, and deaths by eight different age cohorts (and one group for unknown age) on its dashboard. Since then, all information by age cohort has been included only in the weekly public health report. In addition, the state dropped its cumulative reporting, and now provides age-based summaries for the prior two weeks only. This makes weekly tracking difficult, as one week rolls off and a new week is added to the summary in each report.

Fortunately, the state continues to provide a daily breakdown of cumulative cases, hospitalization, and death counts by race/ethnicity. Through August 11, the race/ethnicity total counts matched the total counts in the age cohort report as well as the aggregate totals for cases and deaths shown in the dashboard (confirmed and suspected).  After August 11, the state dropped the reporting of suspected cases and deaths from the daily dashboard as well. 

As an aside, the data aggregators covid tracking ( and worldometers ( began to use the race/ethnicity report to tabulate cases and deaths in Massachusetts, as it was only source of confirmed and suspected cases available on a daily basis. (The state added back probable cases and deaths to the daily Dashboard report in early September, but these data are no longer on the front page).

The race/ethnicity totals match the case and death totals reported by the state each day, but the two week totals in the weekly public health report by age cohort do not match the figures for the equivalent period in the race/ethnicity report. Table 1 shows these discrepancies starting August 8.

Table 1: Massachusetts Reporting of Total Cases, Hospitalizations and Deaths
Comparison of Weekly Public Health Reports to Daily Race/Ethnicity Report
August 8 to October 3, 2020
  From Daily Reports   From Weekly Reports
Two Weeks Ending Cases Hospitalized Deaths   Cases Hospitalized Deaths
8-Aug 5443 231 211   3912 116 14
15-Aug 5159 240 200   4856 107 180
22-Aug 4649 212 200   4728 82 200
29-Aug 4476 186 208   4398 78 200
5-Sep 4830 196 220   4716 91 190
12-Sep 4570 174 187   4785 81 176
19-Sep 4985 190 179   5126 97 184
26-Sep 5510 211 195   5947 124 202
3-Oct 7122 208 212   7672 133 223

Table 1 clearly shows the mismatch between the totals from the two reporting sources.  In particular, the death total for August 8 (this is not a typo), and the new hospitalization totals for the entire period stand out as particularly inconsistent.  Hospitalizations appear to be significantly under reported in the weekly report, both in comparison to the race/ethnicity report and to the new hospitalizations reported independently by hospitals (not shown here). 

Calculating accurate estimates is complicated by another factor: on September 2, the state changed its definition of probable cases and eliminated 8,050 cases, 26 deaths, and roughly 100 hospitalizations from the historical count.  Fortunately, the state did provide a back history for the changes in cases and deaths, so that these figures can be adjusted accordingly.  The state did not provide a back history for change in hospitalizations, so the 100 figure is an estimate. And while the state did provide a back history for total cases and deaths, it did not provide revised figures by age cohort.

This data definition change is why Table 1 is broken into three three-week periods.  The first period, through August 22nd, is before the state made the change, so the figures shown for those dates are the actual numbers as reported, not the adjusted numbers,  in order to show equivalent totals for comparisons between the two sources.  (In my estimates later, I do adjust all figures downward). 

The second period is a “transition period” that reflects these definition changes.  The August 29th figures from the weekly reported (released on September 2nd) were already adjusted, but the daily reports are not.  The following two weeks, through September 12th, contain data both before and after the definition change.  Therefore, the weekly data is as reported, and the daily data has been adjusted through September 1 to reflect the case definition changes. 

The figures for the he final three-week period are as reported, because the case definitions for both reports are aligned once again. (In this final period, the weekly reported figures for cases and deaths are always higher than the daily reported figures.  It almost appears that the state is erroneously using a 15 day total, rather than a 14 day total).

To reflect all of this, I used the following approach to estimate cases, hospitalizations, and deaths by age cohort.  First, all the data prior to September 2nd has been adjusted to reflect the definition change for probable cases.  Second, prior to and including August 8th, I derived weekly figures by simply summing daily figures.  (This means that I do not have to rely on the August 8th weekly public health report, as the 14 deaths reported there are clearly wrong).  Finally, starting August 15th, I used the following approach:

(1)  For each two week period, calculate the total number of cases, hospitalizations, and deaths over that period from the race/ethnicity report.

(2) Scale the age cohort figures in the weekly report for each statistic so that the totals calculated match that for the same period from the race/ethnicity report.  For example, suppose there are 200 total deaths over a two week period from the race/ethnicity report, but 160 deaths reported for that same period in the weekly age cohort report.  Furthermore, suppose there are 20 reported deaths for people aged 60 through 79.  This means that I calculate 24 deaths for that age cohort for that period (200 / 160 * 20).

(3) Subtract off the figures calculated for each age cohort for the prior week for each statistic to derive an estimate for the current week.  Because I have actual daily data from the race/ethnicity report for the week ending August 8th, I have a starting point for the August 15th calculation.

This approach ensures two things.  First, the percentages by age cohort for each statistic are preserved for each two-week period.  Second, total cases, hospitalizations, and deaths match the totals reported by the state for each two-week period. 

The third step, the subtraction, seems to lead to more volatile weekly changes than one might expect, and is probably the weakest part of the approach.  This is particularly true for hospitalizations, for which the weekly data is most suspect.  In fact, for the August 15th calculation, which blends together daily age data with weekly data, a naive implementation leads to negative hospitalizations and deaths for the 80 plus group for that week.  Quite simply, I fudged some numbers  there to make the numbers seem more reasonable.

The next several posts will use the estimates calculated this way to analyze information about cases, hospitalizations, and deaths by age cohorts.



Johns Hopkins Reporting Change: Tests and Positivity Rate

Some time in the past week, Johns Hopkins changed the way they report positivity rates for Massachusetts. (Since I don’t check the site every day, I’m not sure exactly when this occurred, but it did occur recently). Formerly, Johns Hopkins used a method that focused on individuals – taking total new cases (confirmed and suspected) and dividing that by the number of new individuals tested with a molecular test. They performed this calculation on a “reported day” basis – updating their numbers as new data is reported and adjusted by the Commonwealth. This is different than a calculation on a “as-of-date” basis, which looks at the date that the test is performed, not when it is reported. More on that later.

With the new method, Hopkins still uses total new cases (including suspected) as the numerator in the calculation. However, they are now using total molecular tests performed as the denominator. This has greatly reduced Hopkins’ calculated and reported positivity rate. As of September 14, Hopkins now shows a 7 day positivity rate of 0.74% in Massachusetts – 2,289 new cases divided by 310,742 molecular tests

This is even lower than that reported by the Commonwealth (0.83%). Why is that? First, the state uses “as-of-date” calculations, looking back over the past seven days at the number of tests performed on each day for which results have been reported. This is 2,279 new confirmed cases divided by 275,565 molecular test results. Second, as just noted, the state only uses confirmed cases in the numerator (this lowers the positivity rate). Finally, the state lags the data one day. While 32,467 cases were reported on September 14th, there were only 20 tests performed on September 14 for which results had been reported in time to be included in the September 14th report – hence the state uses the 7 days ending September 13th for reporting positivity rates.

From Hopkins directly, summarizing their calculation (emphasis added). Note Hopkins’ preference for a measure based on individuals, not tests.

Positivity Rates: Our calculation, which is applied consistently across the site and predates most states’ test positivity tracking efforts, looks at number of cases divided by number of negative tests plus number of cases. We feel that the ideal way to calculate positivity would be number of people who test positive divided by number of people who are tested. We feel this is currently the best way to track positivity because some states include in their testing totals duplicative tests obtained in succession on the same individual, as well as unrelated antibody tests. However, many states are unable to track number of people tested, so they only track number of tests. Because states do not all publish number of positive and number of negative tests per day, we have no choice but to calculate positivity via our approach. We describe our methodology as well as our data source (COVID Tracking Project) clearly on the site.”


Massachusetts Reporting Change: Probable Cases

On September 2, Massachusetts once again changed its coronavirus reporting – this time changing the definition of “probable cases”. According to the Massachusetts Dashboard “The previous case definition defined probable cases as individuals: with a positive antigen or serology test AND symptoms or likely exposure; with COVID-19 listed as an underlying or contributing cause of death on a death certificate; and with appropriate symptoms and likely exposure.”

However, “The new case definition updates the clinical criteria associated with COVID-19; defines probable cases as individuals: with a positive antigen test, with COVID-19 listed as an underlying or contributing cause of death on a death certificate, or with appropriate symptoms and likely exposure .” Furthermore, “the criteria indicating likely exposure are now restricted to known contact with a case or association with a specific outbreak. Individuals with positive serology (antibody) tests have been placed in a new suspect category which is not reportable to CDC.”

The Commonwealth indicates that the new reporting standard is more objective, is able to be more consistently applied through time, and brings Massachusetts’ reporting of probable cases more in line with the reporting standards of other jurisdictions. (Note that not all jurisdictions even report probable cases).

Significantly, all of the prior reporting on probable cases, hospitalizations, and deaths has been adjusted and backdated for this change. What are the ramifications of this? First, the Commonwealth reduced the number of probable cases substantially – from 9,755 under the old standard to 1,705 under the new standard (a reduction of 8,050).

The chart below shows the weekly change in probable cases over time. Most of the cases dropped were from May, when the pandemic was raging in Massachusetts. However, a substantial number of cases from June through August were also eliminated.

The number of probable deaths from Covid19 also dropped, but much less substantially – from 233 to 207 (26 deaths). The eliminated probable deaths are spread relatively uniformly over time. Ironically, however, the first death reported in Massachusetts, which occurred on March 10, was a probable case that was eliminated. I have adjusted my reporting statistics where possible to reflect these changes in cases and deaths.

The other consequence of this change is that it impacts how Massachusetts stands relative to other states in the case horse race (a horse race nobody wants to win). Early on, Massachusetts was third in the country in the number of cases, trailing only New York and New Jersey. As our case rates dropped substantially, and the pandemic spread to the Sun Belt, we had dropped in the rankings to 13th in the total number of cases, and 17th in cases per capita (per the worldometers aggregation site We are now ranked 16th in total cases and 22nd in cases per capita.

Unfortunately, our per capita death ranking remains unchanged – we have the 3rd highest per capita death rate, trailing (once again) only New York and New Jersey.


Calculating Covid-19 Positivity Rates

On August 12, 2020, the Commonwealth of Massachusetts changed the way that it calculated the positivity rate from covid-19 testing. This reduced the headline positivity rate from covid-19 testing in Massachusetts. It is also misleading, and counter to the way in which most aggregators calculate positivity rates. This post serves to explain the differences between this calculation and the two other more widely used methods.

In all cases, the positivity rate is a simple ratio between two numbers. The numerator uses some definition of cases or positive tests results. The denominator uses some definition of tests or tested people. The positivity rate is just (Cases or Test Results) / (Tests or Tested People) (multiplied by 100 to convert to percent, if desired). In Massachusetts, and generally elsewhere, the results are restricted to tests obtained through molecular (PCR) tests, not antibody or antigen tests.

Focus on Tests (Massachusetts Method)

The Massachusetts method changes the focus of positivity rates from individuals to tests. It simply uses the ratio of the number of positive tests to the ratio of tests performed. The upshot of this is that repeat testers can have a significant impact on the positivity rate. For example, if a professional athlete gets tested every day, all of those tests increase the denominator in the calculation (regardless of whether the athlete tests positive or negative on any given test).

If someone is retested frequently, and never get a positive test result, this reduces the positivity rate compared to a calculation in which each person is counted only once. The flip-side of this is that if an individual tests positive, and comes back to be tested again a week later, and tests positive a second time, this will again count as a positive test. Hence, there is no guarantee that this method will result in a lower positivity rate, but it generally does. Most of the people being retested are presumably doing so for health and safety reasons, and not because they suspect they have Covid or have tested positive in the past.

In Massachusetts, the number of people being retested is quite significant. For example, as of this writing, about 30-35% of the tests being conducted in Massachusetts over the past several weeks are for people being retested. These people have a much lower positivity rate than people being tested for the first time – the trailing 7-day positivity rate for the re-testers is 0.4% (as of August 27) compared to 1.5% for those who have only been tested once.

Focus on Individuals Tested (Standard Method)

A more common approach to calculating positivity is to focus on individuals, not tests. Until August 12th, this is the way that Massachusetts calculated positivity rates. Each person is only counted once both for determining the numerator (with this method the number of cases) and denominator (with this method the number of individuals tested) in the positivity rate calculation.

To focus on the professional athlete or safety professional again, this means that each retest does not change the denominator of the calculation, regardless of the test results. Each person is only counted once, regardless of how many times they are tested. If an individual does test positive, this will count as a positive case, even if they are tested two weeks later and then get a negative test result. This retest with a negative result does not increase either the numerator or denominator in the calculation.

However, if a person has a negative test result, but comes back later and tests positive, the numerator will increase by one person, but the denominator doesn’t change. (This all presupposes nobody is reinfected with Covid. Although there have been recent validated reports of reinfection, this remains exceedingly rare as of now).

Include Suspected Cases (Enhanced Method)

The final method to calculate positivity rates is to include suspected cases in the calculation. Essentially, this method (used by Johns Hopkins, among others, in its calculations) assumes that all individuals suspected of having Covid, if tested, would test positive. By definition, this increases the positivity rate compared to the standard method. Specifically, the calculation is:

(Individuals with Positive Covid Tests + Probable Cases) / (Individuals Tested for Covid + Probable Cases).

In other words, the number of probable or suspected cases is added to both the numerator and denominator in the calculation. In Massachusetts, the number of probable cases is not insignificant. Over the life of the pandemic, almost 8% of all confirmed and probable cases have been probable, and over the past several weeks, this figure has ranged between 15% and 20% of probable and suspected cases. Hence, the positivity rate calculated this way has been noticeably higher than calculated the standard way.


(1) Testing Based Calculation (Massachusetts)

All Positive Molecular Tests / All Molecular Tests Performed.

(2) Individual Based Calculation (Standard)

All Individuals with Positive Molecular Test / All Individuals with Molecular Test.

(3) Enhanced Calculation (Include Probable)

(All Individuals with Positive Molecular Test + Individuals Probably Infected) / All Individuals with Molecular Test + Individuals Probably Infected).