This section offers details on how data are collected and what types of data collections can be proposed.
1. To provide social scientists and health researchers with a new opportunity for original data collection and discovery.
2. To promote the study of state-level dynamics regarding health, institutions, and/or politics.
3. To promote exploration into variations among subpopulations within the United States, with regard to health, institutions, and/or politics.
4. To increase the speed and efficiency with which advances in social scientific theory and analyses can be applied to critical social problems.
5. To maximize financial efficiency by implementing multiple modules on a single survey reducing the average cost per study.
6. To provide secondary access to large samples of state-level data for scholars, educators, journalists, and students for usage.
Sample Recruitment. Participants are recruited by 22 vendors who use a variety of strategies and incentives to maintain online respondent panels. Respondents are channeled to the surveys through PureSpectrum, an online polling platform that works with panel providers from all 50 states and D.C. PureSpectrum provides initial respondent identification, deduplication, and screening for quality, demographic qualifications, and location. Using the PureSpectrum API, the CHIP50 team has developed a system to launch and monitor multiple survey projects in all states and D.C. based on pre-specified quotas (to match the demographics of state populations) and qualifications. These data are thus based on large online non-probability samples. As discussed in the validation section, the sampling approach has been rigorously evaluated (e.g., performance relative to probability samples). The advantage is that each survey can, if needed, recruit 300-600 respondents per state, using demographic quotas that match the state population. The overall sample size for a typical CHIP50 study is 15,000-25,000.
Deduplication and Quality. In addition to the initial screening offered by PureSpectrum, the CHIP50 team has developed filters applied to the already collected data to ensure its quality. This includes checking to ensure the deduplication of respondents (i.e., no respondent participants more than once) by using demographic and geographic variables. It also includes scoring answer quality based on survey completion time, attention check questions, response inconsistencies, and straight-lining (non-differentiation, e.g., giving the same response to a long series of questions). Respondents who receive low-quality scores based on this combination of factors are excluded from the data.
Weighting. To improve the representativeness of the data, the CHIP50 team generates national and state (including D.C.) weights. It relies on the U.S. Census Bureau data for population demographics including race/ethnicity, age, gender, education, and geographic region (urban, suburban, or rural). For national-level weights, geographic region and interlocking gender-by-age-by-race categories are also used. A second set of national and state weights is produced to match the population with all the previously mentioned parameters but also includes 2020 vote choice and turnout. Those weights are appropriate for politically sensitive analyses where partisanship bias is especially concerning.
Validation. The CHIP50 team has assessed the validity of the data collection approach by comparing the data to administrative data and other surveys that employ probability samples. This exercise has validated the data insofar descriptive results closely match those from probability surveys (e.g., concerning COVID-19 behaviors, trust in institutions, etc.) and match (or sometimes even outperform) administrative data (e.g., COVID-19 vaccination rates, state-level presidential vote). This stems, in part, from the samples being very large (our samples differ from small non-probability samples). For more information, see here and here.
The typical CHIP50 study starts with a uniform consent form, lasts approximately 25 minutes, and includes roughly 15,000-25,000 respondents. A typical CHIP50 survey includes over-samples of Asian-American, Black, and Hispanic respondents. This varies between 500 and 1000 over-sampled individuals, depending on the size of the overall sample and the goal of a given data collection. (CHIP50 also often oversamples male respondents to compensate for what would otherwise be a female-skewed sample; this ensures reasonable weights.)
Every survey includes demographic and political background measures that are either directly asked as part of the survey or provided by PureSpectrum. Each survey will include roughly four modules from applicants. The CHIP50 team will determine module ordering, in consultation with the applicants (but the CHIP50 team reserves the right to make all final order decisions).
Fielding of a single survey takes four to six weeks. The CHIP50 team plans to field five to six surveys a year (meaning data collection will be ongoing for about eight months of the year). The exact timing of the surveys will be determined by the CHIP50 team in consultation with the applicants, with the CHIP50 team reserving the right to make final decisions.
Proposals may come from any substantive area within any discipline in the social and health
sciences so long as they utilize survey questions (closed or open-ended). A few details:
Survey items that direct respondents to a distinct webpage are not allowed.
Survey items that ask respondents for any personal information are not allowed.
All items must be written in English only.
Survey experiments are welcome but not necessary.
Survey experiments or questions that involve hypothetical payment to participants (beyond what CHIP50 compensates them for participation) are allowed.
Mild deception in a survey experiment or question is allowed but requires: 1) that the applicant, upon acceptance, obtain IRB approval from their institution, and 2) that the applicant provide a de-briefing statement. An example of “mild deception” is a news article that is not real. Specific questions about “mild deception” can be directed to the CHIP50 team.
Questions or survey experiments that prime respondents in a way that can affect the rest of the survey would require discussion and special approval by the CHIP50 team.
Questions or survey experiments that may reduce the response rate, survey quality, or integrity are not allowed.
Survey experiments or questions that require direct payment to participants (beyond what CHIP50 compensates them for participation) are not allowed.
Survey experiments or questions that ask respondents if they would pledge some of their compensation to a cause or another respondent are not allowed.
Survey experiments or questions with embedded videos and/or audio are not allowed.
Regular proposals currently come in two types:
1. State-based proposals. In these cases, the focus is on making comparisons between (at
least a subset of) states.
2. Large sample proposals. In these cases, the focus is on obtaining a large, heterogenous national sample, often to explore particular subgroups (heterogeneities in the population). Subgroups can be expected to present in a sample at levels roughly proportionate to their presence in the U.S. population, although with the addition of oversampling for Asian-American, Black, and Hispanic respondents. The state-level composition of the sample is not essential in these cases. These samples will typically be approximately 15,000 to 20,000 respondents.
More details are provided in the item/sample size section.
Applicants are welcome to propose, via a regular proposal, projects that focus on one or a subset of states. In these cases, the applicant should contact the CHIP50 team to discuss what samples are plausible for a given state or states (beyond the 300-600 per state in a typical CHIP50 survey).
CHIP50 will launch special calls for specific types of proposals—such as those by early career investigators, those focused on particular topics, or those focused on particular subgroups (oversamples). Special competitions will be widely advertised and announced well in advance on the CHIP50 website.
CHIP50 is funded by the by the Social, Behavioral, and Economic Sciences Directorate of the
National Science Foundation.
CHIP50 plans on accepting applications and fielding studies until approximately the end of 2025 (at which point the grant supporting CHIP50 will expire). At that point, if demand is sufficient, CHIP50 will continue but the funding structure will change. Successful applicants will be required to purchase their data; the cost will cover survey participant payment and a small infrastructure fee to support the processing of proposals and implementation of surveys (and the automation of the survey process). The goal is to keep costs relatively low to ensure broad possible access to these data. (Data collection costs are significantly lower than commercial or academic vendors.)
The principal investigators of CHIP50 are Matthew Baum of Harvard University, James Druckman of Northwestern University/University of Rochester, David Lazer of Northeastern University, and Katherine Ognyanova of Rutgers University. Associate principal investigators of CHIP50 are Roy Perlis of Harvard University and Mauricio Santillana of Northeastern
University. A multidisciplinary advisory board assists in managing CHIP50. Team members span several generations and across multiple disciplinary boundaries, and each member has established a reputation in their respective field. Most importantly, they share enthusiasm for the project. The full list of board members will be posted soon.
All correspondence, including questions, should be directed to the CHIP50 e-mail: CHIP50StatesSurvey@gmail.com.