Tuesday, November 23, 2010

Jewish Survey of the Hour: Rochester, NY community study (Jewish Federation of Rochester and Rochester Research Group)

Apparently, today brought a bumper crop of surveys. No sooner had I finished commenting on JESNA's research than a note arrived from the North American Jewish Data Bank heralding a new community study of Rochester, New York. Reading it moved me further away from any attempt at dispassionate analysis. I will continue, however, vaguely organizing my comments around the AAPOR minimum disclosure standards. In case you are wondering whether my comments were fueled by sour grapes, this was the first I was aware of it.

Who sponsored the research study, who conducted it, and who funded it, including, to the extent known, all original funding sources.

Research was sponsored by the Jewish Federation of Rochester. Who conducted it isn't clear. The report is credited to Jocelyn Goldberg-Schaible of the Rochester Research Group, but data collection was also recorded as being conducted by volunteers.

The exact wording and presentation of questions and responses whose results are reported.

Mostly. The study does a very good job of this, although item order for things like motivations for charitable giving are reported in order of preference, not order displayed to respondent (if the items were rotated, no indication is provided).

A definition of the population under study, its geographic location, and a description of the sampling frame used to identify this population. If the sampling frame was provided by a third party, the supplier shall be named. If no frame or list was utilized, this shall be indicated.

Yes. The study's frame is reported, although the efforts by which initial cases were found are not covered in depth.

Even by the combative standards of the field, the authors are extraordinarily quick to praise their own work and damn that of others. The introduction proclaims that "Ours was not a boiler-plate survey designed generically and then tweaked a bit for our community - ours was a survey designed by and for our own community. There is a significant difference there, and part of that difference is its fundamental intent. And that fundamental intent, once again, was not about counting" (p. 2). I'm not sure what Jack Ukeles would say about that. Having spent nine months of my life working on the questionnaire for the 2005 Boston Jewish community study with Combined Jewish Philanthropies' community study committee, I can say that this didn't apply to my work.

In a similar vein, the authors write that the fact that the statistics reported are estimates "...is, by the way, always the case in community demographic surveys, irrespective of methodology and whether it’s telephone-based, online, or even face-to-face, even when charts and graphs imply more specificity than they can actually support, by providing numbers with several decimal places, and by performing convoluted analysis on those numbers" (p. 3; emphasis in original). Apparently lacking a well-developed sense of irony, the authors round their population estimates to the nearest five (while rounding those of earlier Rochester surveys to the nearest hundred, an implicit claim of a twentyfold increase in accuracy), mean years of residency to the nearest tenth of year (p. 37), and mean numbers of Jewish friends to the nearest hundredth (pp. 61-63, 65). (The basis by which population estimates were arrived at is not explained and remains, frankly, a mystery to me.)

A description of the sample design, giving a clear indication of the method by which the respondents were selected (or self-selected) and recruited, along with any quotas or additional sample selection criteria applied within the survey instrument or post-fielding. The description of the sampling frame and sample design should include sufficient detail to determine whether the respondents were selected using probability or non-probability methods.

It's evident from the material that the study was a nonprobability design--all initial seeds appear to have been self-selected (e.g., hearing about the survey from advertisements, in communal newsletters, and so on) and thus the sample is a nonprobability design.

Sample sizes and a discussion of the precision of the findings, including estimates of sampling error for probability samples and a description of the variables used in any weighting or estimating procedures. The discussion of the precision of the findings should state whether or not the reported margins of sampling error or statistical analyses have been adjusted for the design effect due to clustering and weighting, if any.

Sample size (n=1,913 plus 421 "usable" partials--"usable" is never defined). Sampling error estimates are reported as well: "With 2,234 respondents [and another 100 college students attending local colleges] our overall sample is conservatively associated with a precision interval (or margin of error) of +/-3% at the 95% confidence level, suggesting that our findings and projections should be within 3% of what we would have found if everyone in our Jewish community had participated" (p. 12).

These "margins of error" are misleading, however. Sampling error is calculable in surveys for which the mode of selection of units is understood. We know how random selection works--with a population of size N and a sample selected of size n, the probability of a unit being selected is n/N. When we mess around with the probability of selection in one way or another (e.g., stratification or clustering), as long as we can calculate the probability of selection, we can work our way back to confidence intervals and the use of inferential statistics. The same holds true when randomize experimental subjects into treatment and control groups. Even if we cannot directly randomize, as long the statistical process generating the observed data can be accurately modeled, we can use the apparatus of inferential statistics. That isn't the case here. Snowball sampling (as occurred here) does not supply it (Berg 1988; Eland-Goosensen et al. 1997; Friedman 1997; Kalton 1983; Kalton and Anderson 1986; Spreen 1992 [citations from Sagalnik and Heckathorn 2004]).

The closest analogy to the methods applied here that has significant levels of acceptance as capable of generating meaningful estimates of sampling error is respondent-driven sampling (RDS; Heckathorn 1997, 2002, 2007; Sagalnik and Heckathorn 2004; Volz and Heckathorn 2008). For RDS to work, though, all sections of the population must be connected with one another, long chains of referrals are needed in order to access parts of the social network with zero probability of selection in the initial wave, respondents must give their "degree" (the number of people with whom they are connected with the characteristic that is salient for sample selection), and numbered coupons must be used to record details of the selection and the social network it was based on (Sagalnik and Heckathorn 2004).

This is not the case here: the initial seeds were self-selected; we have no evidence from the report that long chains of referrals were achieved; respondents were not asked their degree (and it is doubtful whether most people could answer this accurately); and, we do not know which which cases referred other cases. Even RDS, far more credible than the Rochester study, has had doubt cast on its accuracy in recent years (Gile and Handcock 2010; Goel and Sagalnik 2010).

Nevertheless, the authors describe their sample (unjustifiably, in my view) as "extremely solid" (p. 3), "robust" (pp. 7, 9, 17), "highly robust" (p. 11), and "statistically robust beyond all expectations" (p. 12), as well as "inclusive" (pp. 7, 10), "more inclusive" (p. 11), "more inclusionary" (p. 69), "highly inclusive" (pp. 6, 8), and having "broad-based inclusiveness" (p. 11), not to mention being "impressive" (p. 17) and "very clever" (pp. 6, 8). There seem to be two reasons for this extraordinary self-assurance: sample size and inclusiveness.

The sample size is heavily hyped by the authors. "For the sake of comparison, national election polling predictions are based on smaller samples than ours. Communities like Philadelphia, whose Jewish population is roughly ten times larger than our own, recently completed their demographic study with a sample roughly half as large as ours. And ten years ago, when Rochester undertook its last demographic study, our sample was less than one-third as large as ours is today," (p. 12). Sample size is important to a survey inasmuch it reduces sampling error but, as I described above, this survey's attributes do not allow sampling error to be accurately estimated, a large sample is of little avail.

Beyond sampling error, surveys are subject to two serious forms of potential error associated with acquiring a sample: coverage error and nonresponse error. These appear to hold little concern for the authors due to the sample's aforementioned inclusiveness. The only evidence for the study's inclusiveness that I read in the report is that (a) the study had a field period of eight weeks (p. 7); (b) it received "a constant parade of media and PR spotlights" (p. 7); (c) it included Jews of different ages (p. 11); (d) it included interfaith households, GLBT Jews, old Jews, new residents, highly affiliated Jews, marginally affiliated Jews, unaffiliated Jews, Jews living in central areas, and Jews living in outlying areas (p. 7); (e) that volunteers were available to help (p. 11); and (f) that older Jews finished the survey without assistance within a week of it beginning (p. 11). This culminates in the claim (g) that "[i]t seems safe to say that few Jews in the greater Rochester area ended those eight weeks unaware of that the Count Me In survey was taking place" (p. 7).

Nonresponse always carries a concern regarding bias, and it seems entirely conceivable that the people most likely to respond would be those with the greatest interest in the subject-matter. Indeed, these concerns are heightened when the survey is accompanied by a PR campaign that has the effect of increasing the salience of the survey for the most connected Jews; a lower response rate can be a good thing if it lessens bias. With all due respect to the "parade of media and PR spotlights," it is also possible that there was coverage error: Jews who would have been eligible for the survey but were unaware of it and thus were systematically excluded. This is a risk inherent in the survey's approach, which relied on respondents to come to it rather than seeking them out, which is the approach used in virtually all Jewish population studies.

Indeed, nonresponse and/or coverage error apparently did occur: "Might we, as a result of this approach, have ended up with a higher proportion of affiliated Jews than we did in 2000? We probably did. By opening the survey's gates to all who chose to enter, we have over three times as many total participants, and those most involved Jewishly were most apt to take part" (emphasis in original); the estimated proportion of people who were never synagogue members dropped by almost half (p. 68), as did estimates of intermarried household (p. 69), a glaring indicator of bias.

All this is apparently excused by the survey's inclusionary (i.e., self-selecting) nature. "The fact remains," the authors write, "that within this year's sample we also have significantly more non-affiliated participants than we did in 2000" (p. 68; emphasis in original). Further, the reader is told, that "[i]n 2010, we have not turned our backs on the unaffiliated, and have in fact included them in far larger numbers than they were included in 2000 via RDD and DJN [Distinctive Jewish Names] telephone-based approach. It's just that alongside these non-affiliated respondents are a robust cohort of those more affiliated Jews who in the past would never have had the chance to be 'counted in', and this time around, via 2010's more inclusionary online methodology, were provided with that opportunity" (p. 69; emphasis in original). The decision-making is perhaps more clearly seen earlier: "Perhaps the best aspect of this sampling strategy was its broad-based inclusiveness. This year, everyone who wanted to participate had the opportunity to do so. In contrast with a telephone-based survey that works from a pre-determined list and/or Random Digit Dialing [RDD] and/or a set of Distinctively Jewish Names [sic]...no one in our community with ideas and opinions and experiences to share was left out of this survey. Ours was truly a COMMUNITY survey--and everyone who took part now gets to feel that their ideas and opinions and experiences have indeed been COUNTED IN" (p. 11; USE OF CAPITALS in original).

The fact that the survey's estimates of community priorities, age composition, religious behavior, and likely every other topic reported were biased in the direction of the views, behaviors, and characteristics of the more affiliated as a result of the survey's methodology apparently did not enter into consideration. This is utterly wrongheaded for a study with the goal of providing "actionable insight" for organizations' planning processes (p. 2). Bad data are a dangerous basis for decision-making. Inclusivity and feeling counted are indeed virtues, but never, ever, at the expense accuracy.

On first looking at this research, my first reaction was that the authors had done a wonderful job of going back to 1930s. This is not damning with faint praise, as readers of my dissertation (all 1.4 of you) will be aware. As probability samples with close to full coverage of the Jewish community are financially out of reach for most Jewish communities these days, we are in the sample places as we stood in the 1930s and 1940s, before robust methods were developed for sampling human populations. These studies focused on enumerating the Jewish communities rather than estimating (in a probabilistic, statistical sense) with the best of them having well thought out procedures to find as many Jews as possible in a community. Such an approach could reasonably justified. However, the sheer puffery and overweening arrogance--I can think of no other word to describe it--of the authors turned these thoughts into ashes in my mouth. (I'm no slacker when it comes to professional self-regard, and the same goes for other Jewish researchers. Our posturing, boasting, and deprecation of each other is usually confined to private conversations and email exchanges, conference presentations, and the odd journal article rather than community study reports which are, as they ought to be, client focused.)

Better reports (demonstrating my self-regard, I will shamelessly put up my own work on the 2005 Boston Jewish community study as an example) balance their self-promotion with appropriate humility about potential errors. While we said that our study provided "a rich portrait" of the community (p. 1) "breaks new ground" (p. 21) and lauded the research contractor's "high quality work" (p. 21), honestly the only examples of preening I could find in the summary report, we also pointed out the underestimates of young adults and Russian Jews (pp. 21-22).

In the case of Rochester, though, not the slightest hint of potential error (other than the mischaracterized confidence intervals) disrupts the fanfaronade that is this report. And it is a shame. Surveying smaller Jewish communities, never easy, has become extraordinarily difficult and the task of providing representative samples virtually impossible. Collecting nearly 2,000 completed surveys is no mean feat and likely reached much of the affiliated core of Rochester's Jewish population. There was considerable merit in the study's approach had its goal been enumeration, had the authors been forthright in the shortcomings of their data, and had responses by disaggregated by potential bias: affiliated, less affilitated, and unaffiliated Jews...there are, after all, reasons we undertake convoluted analyses.

References

Berg, S. 1988. "Snowball Sampling." Pp. 528-32 in Encyclopedia of Statistical Sciences, vol. 8, edited by S. Kotz and N.L. Johnson. New York: Wiley.

Eland-Goosensen, M., L. Van De Goor, E. Vollemans, V. Hendriks, and H. Garretsen. 1997. "Snowball Sampling Applied to Opiate Addicts Outside the Treatment System." Addiction Research 5(4):317-30.

Friedman, S.R. 1995. "Promising Social Network Research Results and Suggestions for a Research Agenda." Pp. 196-215 in Social Networks, Drug Abuse, and HIV Transmission. NIDA Monograph Number 151. Washington, DC: National Institute on Drug Abuse.

Gile, Krista J. and Mark S. Handcock. 2010. "Respondent-Driven Sampling: An Assessment of Current Methodology." Sociological Methodology 40(1): 285-327.

Goel, Sharad and Matthew J. Sagalnik. 2010. "Assessing Respondent-Driven Sampling." Proceedings of the National Academy of Sciences of the United States 107(15):6743-47.

Heckathorn, Douglas D. 1997. "Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations." Social Problems 44(2):174-99.

Heckathorn, Douglas D. 2002. "Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations." Social Problems 49(1):11-34.

Heckathorn, Douglas D. 2007. "Extensions of Respondent Driven Sampling: Analyzing Continuous Variables and Controlling for Differential Recruitment." Sociological Methodology 37:151-208.

Kalton, Graham. 1983. Introduction to Survey Sampling. Beverley Hills, CA: Sage.

Kalton, Graham and D.W. Anderson. 1986. "Sampling Rare Populations." Journal of the Royal Statistical Society, Series A 149:65-82.

Sagalnik, Matthew J. and Douglas D. Heckathorn. 2004. "Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling." Sociological Methodology 34:193-239.

Spreen, M. 1992. "Rare Populations, Hidden Populations, and Link-Tracing Designs: What and Why?" Bulletin de Methodologie Sociologique 36:34-58.

Volz, Erik and Douglas D. Heckathorn. 2008. "Probability Based Estimation Theory for Respondent Driven Sampling." Journal of Official Statistics 24(1):79-97.

No comments:

Post a Comment