Why Do the Data Synthesis Estimates Differ From Other Published Estimates?

The data synthesis provides a baseline estimate of the population that is established from sources of data that are completely independent from the types of biases that can be associated with local population reporting. We highlight four specific issues that can be associated with local estimation here. They are source bias, time bias, error estimation, and undercoverage.

Source Bias

Estimates published in the American Jewish Yearbook (AJYB) are not derived from a single method or source. They represent an amalgamation of multiple sources and methods. These include directories, key informants, membership lists, internet searches, news reports, and for some of the larger areas, local community surveys. It is not that all of these sources are drawn upon to inform each local estimate, but instead, one local estimate might be based on a report from a key informant, another local estimate on a report in a newspaper article, and another on a population survey. Unlike studies of other groups in the US, there is no external frame of reference with which to be able to evaluate potential sources of bias across these various sources. Even local surveys, although designed to provide a representative sample of the target area, can be biased. Typically one has some way to gauge the degree of bias in one's sample by comparing it to an external frame of reference such as the US Census. There is, however, no Census data on the Jewish population.

The data synthesis provides a snapshot of the population based on sources of data that are all desgined to provide nationally represenatative samples of the US adult population. Because they are designed to be representative of all adults, the samples can be evaluated against known external frames of reference such as the US Census. In addition, the data synthesis includes "meta-data". These meta-data include methodological characterstics of each of the data sources, particularly methodological characteristics that could be associated with bias in the estimates. Thus, potential bias associated with the source of data can be examined directly. To date, once adjusting for differences in sample composition based on comparisons to Census data, there has been near zero variability in estimates of the Jewish population across the different sources of data included in the data synthesis.

We are currently conducting a systematic study of differences between the data synthesis and local Jewish population surveys. In particular, whether local studies that yield similar estimates to the data synthesis differ in any systematic ways from those local studies that yield different estimates from the data synthesis. Factors such as the purpose of the survey and targeted advertising are considered. If the survey purpose or other aspects of methodology such as advertising the survey to Jews in the community through Jewish organizations increases the likelihood of Jewish respondents without commensurately increasing the likelihood of non-Jews participating, this can lead to upward bias in population estimates that are derived from the local survey.

Time Bias

The variety of sources reported in the AJYB differ not only in terms of methodology but also time frame, spanning years, and sometimes decades. One community estimate might be based on data from 1957 (with assumptions about what changes might have been over time), another based on the previous year, and another the previous decade. Some of the local Jewish community studies are carried out decennially, following the pattern of the US decennial Census. The US Census also prepares annual estimates through their Population Estimates Program (PEP) and the American Community Survey (ACS) because of the importance of updated population estimates for programming, budget, and planning purposes. Although the yearbook publishes estimates yearly, the methods for updating the estimates from the variety of time periods represented across the original sources is unclear, especially since data on births, deaths and migration in the various areas are not collected in a way that would enable one to determine rates of change among the Jewish population within these areas.

The data synthesis remedies this problem by standardizing the time period associated with the estimates. Data are aggregated over multiple years in order to increase the effective sample size needed to estimate the population at county levels and are post-stratified to the most recent data available from the US Census. The number of years of data combined are limited to the most recent years and do not span decades. Thus, the synthesis provides a snapshot of the population within specific time periods.

Error Estimation

For many of the sources of data reported in the AJYB, such as those based on reports from key informants or news stories, there is way to gauge what kind of errors might be associated with the estimate. For the major population areas where local Jewish community surveys have been conducted, there is some degree of error associated with the estimates and this error can be measured (see FAQ). The degree of error in an estimate will vary from community to community depending on the particulars of how the survey was conducted, the methods of sampling and other factors. When the variety of sources are combined, information about the degree of error that might be associated with the population estimate are lost. It might be that errors from one source counterbalance errors from other sources. It might be that errors are compounded across the sources and result in an increase in bias or inaccuracy in the final estimate that results. A benefit of the data synthesis approach is that the degree of uncertainty associated with the different sources of data can be examined directly and factored into the final result if needed.

Undercoverage

The AJYB provides estimates only where there is a known Jewish community, thus it excludes many areas where Jews live, but are unknown to informants or reside outside the target area of a local Jewish community. The data synthesis looks at distributions across hundreds of independent and representative random samples of the US adult population. One hallmark of this feature is that it covers all areas of the United States, not just large metropolitan areas or areas where there are "known Jewish communities". In contrast, other methods that rely on stitching together estimates of known Jewish communities are guaranteed to miss Jews who live in areas outside the targeted search area. It is akin to searching for one's keys under the street-lamp not because that's where they might reside, but because that's where the light is better.

As an internal check on the validity of our estimates in areas where the data synthesis indicates there are Jews not represented in AJYB estimates, we examined data provided by the Birthright Research Group here at the Center for Modern Jewish Studies. We mapped the location of applicants considered to be eligible for the program, that is, they had at least one Jewish birth parent, or had completed Jewish conversion through a recognized Jewish denomination and are recognized as Jewish by their local community or by one of the recognized denominations of Judaism. The Birthright applicant data are not representative samples for purposes of population estimation, but they do provide convergent validity that there are, indeed, Jews scattered throughout the US outside known Jewish population centers and throughout areas where the data synthesis yields estimates that are higher than those reported in the AJYB.

For example, the data synthesis yields an estimate of around 3,500 people (~0.3%) in the Blue Ridge Mountain region of Georgia. The AJYB reports only a total population of 300 people known to be in Lumpkin, Floyd and Whitfield counties. The distribution of Birthright applicants clearly shows that the Jewish population is more widely distributed across the region and estimates based solely on the known population in the three counties corresponding to places reported in the yearbook underestimates the true population (click to view). Another example, the yearbook reports a population of 4,350 people based in Guilford and Forsyth counties (Greensboro, High Point and Winston-Salem). The Winston-Salem estimate is based on a distinctive Jewish names survey conducted in 2011. The High Point estimate is based on synagogue membership and the Greensboro estimate is based on a 2009 key informant. The data synthesis estimates less than 1% (~0.6% with 95% CI ranging from 0.3% to 1%) or 5,400 adults in the entire Piedmont Triad area identify their religion as Jewish. This increases to just over 9,200 people if one assumes the national average in terms of proportions of children and proportion of Jewish adults who do not identify their religion as Jewish. Again, distribution of Birthright applicants indicates that the Jewish population is more widely distributed than the "known" population centers (click to view). Taking into account the regions not represented in the AJYB estimates, as well as possible under-estimates of Distinctive Jewish Names samples for this area or under-estimate of the university town of Greensboro, the data synthesis likely provides a more accurate estimate than otherwise available.

Many of the limitations of source bias, time bias, error estimation and undercoverage are resolved using the data synthesis approach. New data are continually added and only a subset of the most recent years are used to provide new estimates of the Jewish by religion population. Thus, estimates for specific time periods are based on sources of data specific to that period. There are limitations to the data synthesis approach; as noted elsewhere, the synthesis focuses on the most easily "measurable" or observable portion of the population, adults who identify their religion as Jewish. To extrapolate from this base number to the total population requires reliance on data that is less reliable and requires further development. Thus, we have more confidence in the estimates of the proportion of adults in each area who identify as Jewish in response to questions about religion than we do the "total population" estimates. Another limitation is that there is always more data that could be added and for all of the data that are added, there are small areas that remain difficult to estimate reliably. We continue to improve estimation of these areas.