Collecting data on a group of people can be done through a census or a survey. The purpose of a census is to map the entire population (or the group you wish to make a statement about) using data about each individual who is part of it. For example, the most recent national census in Belgium took place in 2001, in the form of a general socioeconomic survey of every resident of our country. Needless to say, a census requires a lot of effort, time and money, which is the main reason why surveys are used much more frequently.
Surveys, or questionnaires, question only part of the population. Thanks to statistical methods, researchers can make statements about the entire population, for example the entire Belgian population, based on the data from only part of the population (the sample), collected by means of a survey, for example. The Great Corona Study is an example of a survey. It ran online and collected data through self-reporting, but like all surveys it belongs to the broad collection of empirical studies, which, unlike theoretical studies, are based on concrete observations.
The Great Corona Study is a nonprobabilistic, observational online survey. - It is nonprobabilistic because anyone can take the initiative to participate or not. - It is observational because no intervention takes place at the initiative of the researchers. - The research objectives are both summative (e.g., "What fraction of the population always wears a mouth mask in public places?") and comparative (e.g., "Is there a difference between mouth mask habits among men and women?"). - The data are tracked over time via cross-sectional analysis, as individual profiles are not tracked over time (which would be a longitudinal study). The observational, non-probabilistic survey design constitutes an interesting tool to monitor the pandemic in all its aspects, because it is important to reach many people and to pick up signals quickly. In this regard, the Internet is a convenient medium to quickly recruit many participants. But who did we reach with this study? And who actually participated? The answers to these questions are important to know how the results might be biased.
The population that we want to understand through the Great Corona Study is the Belgian population. The sample is the group of people from the population that can be included in the study. Telephone surveys can only be conducted with people who actually have a telephone. Online studies can only involve people with an internet connection, etc. These technological limitations can be related to age or socioeconomic background, and thus immediately make it easier or harder to reach certain groups of the population.
Even within the group of people with access to the Internet, not everyone is equally likely to notice the study. In order to inform as many people as possible about the existence of the Great Corona Study, we reach out to the public via various classical media (vrt, HLN, VTM) and social media (Facebook and twitter). The survey was available in Dutch, French and German, as well as in English. Via cookies, we were able to control for duplicate participation. When the probability of reaching someone is somehow related to characteristics that are also important for the research question (e.g. age), this can distort the results. It is important to try to correct that selection bias.
Who did we fail to reach with the Great Corona Study? Perhaps older people, who have a less frequent online presence; people who do not speak any of the four languages offered; people who are not reached through the various media channels; people who are in a weaker socioeconomic situation; people who can rarely spare time on Tuesdays; or a combination of these factors. We were unable to reach non-Flemish peopleto the same extent as we did Flemish people, despite attempts to actively recruit through French-speaking channels in Wallonia and Brussels. Also at play in this is the fact that the Great Corona Study is an initiative of Flemish universities, and receives more attention in the Flemish than in the Francophone press. Thus, many of the study's conclusions are limited to Flanders or are qualified by stating that in Flanders, compared to other Belgian regions, there is more certainty about these statements.
The group of people you reach is not the same as the group of people who eventually participate. There can be various reasons why someone does not, or only partially, participate: insufficient time, limited interest, all minds of barriers... This usually results in a situation where the participants in the sample no longer form a random group that correctly represents the study population. When people with certain profiles are more or less likely to participate than others, volunteer bias can occur. When certain profiles are more likely to drop out than others, there may be attrition bias. When certain profiles are more likely to participate again after a previous participation, there may be retention bias. In an online survey via self-report, participants may also be more or less thorough in completing the questions. To counter the latter problem, researchers can design the survey so that participants must provide an answer in order to move on to the next question.
Who were the people that we reached but that did not participate in our study? The answer to this question is complex. We actively explored factors that could help identify and, if necessary, correct for the representativeness of the participant field. For example, we were able to determine that our sample of participant was fairly representative in terms of vaccination. Partially vaccinated individuals were slightly overrepresented in the Great Corona Study since early March, possibly due to overrepresentation of people from the healthcare sector..
Since it is impossible to know and correct for all possible underlying sources of bias, the results of the Great Corona Study are also consistently reported with the disclaimer that associations do not necessarily imply causal relationships and that not all bias can be remediated.
Much research has been done on the opportunities and limitations of studies such as the Great Corona Study or previously the Great Flu Survey, and many other forms of surveys and citizen science. It is clear that online surveys are helpful to collect valuable information, provided the interpretation is done in a way that takes into account the limitations of such an approach.
The fact that multiple similar surveys with different angles and participant groups, yield similar results may reinforce conclusions. For example, the Belgian health institute Sciensano also conducted several rounds of a COVID-19 health survey. Unlike the Great Corona Study, that survey did have a longitudinal component: participants could indicate whether or not they were willing to link their responses across study waves. There were also several smaller-scale but longitudinal surveys that gauged the vulnerability and attitudes towards corona measures for specific groups in society.
Despite its limitations, the [Great Corona Study](https://www.uantwerpen.be/en/projects/great-corona-study/) has proven to be of great value because of its uniquely large reach, not only as a data source for the scientific community, but also to support policymaking by quickly picking up trends and signals. Because of the volume of topics in the survey, there are numerous examples of policy-relevant signals that it picked up first.