Reporting very local health data—while protecting privacy

Making population health and demographic data available for local communities can create powerful opportunities. The common statewide and citywide public health statistics one often sees—COVID cases, vaccination rates, cancer rates, infant mortality, and so on—are useful for tracking overall trends. But such broad analyses do not necessarily help people understand their local experience or help organizations trying to devise mitigation or intervention strategies.

When public health data can be reported at a socially meaningful resolution, the local factors behind the statistics often emerge. "Neighborhoods," for example, are meaningful areas to city dwellers, and small adjoining neighborhoods can vary significantly in population composition. Something as simple as understanding age or household composition, or proximity to a health hazard, can help assign meaning to a statistic. Perhaps an organization is developing an initiative to increase immunization rates; an effective program will likely need to take into account the demographic, socio-political, and economic status of the target population—and, as such, likely be neighborhood-specific.

Publicly reporting health data for small areas, however, is difficult for privacy reasons. As a population gets smaller, the chance that an individual person behind a statistic can be identified goes up. This is both ethically inexcusable and illegal: the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule forbids the disclosure of individuals' health information without consent. (Under some circumstances, including pandemic scenarios like COVID-19, exceptions may be granted for the sake of quick, thorough data dissemination.) The result is that most public health reporting is simply done only for large and/or highly populous areas.

A Solution

With the right math and computing power, small population statistics can often be published—down to an area containing only 600 individuals, in some cases. The key is to dynamically evaluate all the factors that go into a potential disclosure—the data variables, resolution, and presentation formats available—and present only those options that are secure. This is not particular easy: the intersection of data across numerous time periods, geographic scales, and stratifications might allow inadvertent disclose (e.g. the imperfect overlap of county and postal code boundaries can expose population "slivers"; revealing case rates for one age group or sex may disclose rates for another age group or sex). And the result might be a set of statistics with more detail in some places than others—a visualization, for example, that allows someone to zoom in on some neighborhoods or ZIP codes but not others.

Green River builds operational software—we are, after all, an engineering firm. So we took the logic behind the disclosure protection calculations we had devised, and implemented them in software code. The resulting system underwent a third-party verification process, and is now in use on My Healthy Community, a health data portal for the Delaware Department of Health and Social Services.

If you would like to learn more, a white paper describing our approach in detail is here. Or let us know if you have any questions!