Sampling is tricky
Selecting the right chunk of people who will give you the correct data is not an easy task. Unless you've picked the right set of people to answer your questions, you shouldn't expect to get the right answers.
It's important to evaluate and zero in on the exact audience you're going to survey before you start sending it out.
Targeting the wrong audience would mean getting incorrect data. If I ask questions about the most friendly dog breed to a bunch of cat owners, it would not make sense because I don't know if they are interested in dogs. Therefore, the answers I would get would be largely inaccurate. For this survey, the ideal thing is to have that domain knowledge about dogs, to comment on which one is the ideal breed. This handpicking of the right bunch of dog experts is called sampling.
So what does it take to select the right sample, and what are the different errors you must be careful to avoid? Let's start by defining some terms we need to understand.
What's a population?
Population essentially includes all the people about whom you want to learn something. They are your target audience and to whom you need to pose questions.
Everyone who lives in California, all the non-profits in Delhi, or New Yorkers younger than 16 are examples of finite populations you can target and query depending upon the purpose of your study.
An ideal scenario would be when a population is of a definite size and researchers can reach out to everyone and ask them questions. A survey that covers the entire population is called a census.
Survey sampling
In this situation, you would carefully identify and select a group from the population in a way that the entire population is represented. This process is called survey sampling, and perfecting the craft of sampling is a key to conducting great surveys.
Researchers are not interested in the sample per se, but what characteristics of the population they can infer from the team. So the right sampling technique goes a long way.
These are three important things surveyors should keep in mind about their sample:
Diversity
Ensuring that all units in the survey are not similar to each other is a tall order, but it's important. To be a truly representative sample, the entire group must represent the spectrum of diversity within the population.
Consistency
It's also important that survey respondents have been tracked on a case-by-case basis before going ahead with the survey. A good idea is to administer a consistency test for a sample, for example a pilot test, where you compare the individual units of the sample with the whole to make sure it properly represents the parent population's characteristics.
Transparency
Many factors determine the size and the structure of a population. Researchers need to discuss these limitations and maintain transparency about the procedures followed while selecting the sample so the survey results are viewed with the right perspective.
When it comes to sampling, the more rigorously you sample, and find the right candidates for your survey, the better the outcome. Depending on your surveying needs, there might be a particular method of sampling better suited for you.
In survey sampling theory, there are two basic ways to get information about respondents—you either measure everyone (take a census) or you measure a subset of the population.
With large populations, though, taking a subset almost becomes the only option. Therefore, sampling error has come to be more of an occupational hazard for surveyors. That does not really mean it's unavoidable though!
The main cause of sampling error is simple: it occurs when statistical characteristics of a certain part or a subset of a population are incorrectly assumed to apply to the entire population.
It's like assuming that every sports fan loves collecting merchandise, which may not be true if they are not a fan of collecting items.
The fraction of the sample you are selecting has to be done very carefully. The sample which is being chosen has to be similar in its characteristics with that of the population as a whole.
What causes sampling error?
A sampling error occurs when researchers take a random sample instead of observing every single individual in the population.
The fraction of the sample you are selecting has to be done very carefully. The sample which is being chosen has to be similar in its characteristics with that of the population as a whole.
Here's an example of a sampling error:
We have decided to conduct an auto survey on what cars American want to buy and how much they're willing to spend.
To kick start the study, the survey is distributed to 1,000 random Americans. Now, by some mistake, the people in the list may include billionaires like Bill Gates or Warren Buffett. While this is highly unlikely, it's possible and can potentially skew the survey results.
When a surveyor is trying to find out more about an issue that is directly related to a person's income, such as how much individuals spend on a vehicle, we put ourselves at risk of collecting data from significant outliers of the population. In this case, billionaires do not represent average members of the population - this could be a problem for the surveyor.
Sampling error is more like a degree of uncertainty that always exists in survey data and needs to be controlled. This has to be considered carefully by the surveyor before sending the survey out.
Types of sampling errors, and how to avoid them
Let's look at some of the common sampling errors and how to avoid them:
Population-specific error:
A sampling error that occurs when researchers do not understand who they are supposed to survey. For example, take a question about what people are eating for breakfast. In a family, one person might be doing the buying, which influences the options for the entire family.
As an example, let's assume a surveyor selects a sample that includes population between the ages of 15-25 years, but many of these people do not make a purchasing decision for a video streaming service because they do not work full time. On the other hand, if they choose to sample working adults who make the purchases, they may not watching the video streaming services full time.
Solution: Keep in mind the objective of a survey and according to that, make the choice of the population you want to survey. Taking the above example, if the survey is about something specific about the nature of the streaming platform, then it's probably a better idea to direct the questions towards to 15-25 age group: who are the actual viewers.
Sample frame error:
A frame error that occurs when the wrong sub-population is used to select a sample. A famous frame error occurred during the 1936 Literary Digest poll on the presidential election. It predicted a win for Alfred Landon, but his oponent Franklin Roosevelt ended up winning by a landslide. Turns out that Literary Digest subscribers at the time were mostly rich who in the 1930s tended to be Rebuplican and ended up skewing the results.
Solution: Initiating a small pre-survey before you go on your thing can also be a good technique to get started on.
Selection error:
This occurs when respondents offer to participate in a study, but they aren't actually part of the sample you need. Selection error can be avoided by going to extra lengths to make sure that participants meet the prerequisites of your survey.
Solution: Initiating a small pre-survey before you go on your thing can also be a good technique to get started on.
Non-response error:
Non-response occurs when you try to contact a part of your sample but they don't respond. This may occur because the potential respondent was not present during the time of contact, or they simply refused to respond. A large enough number of non-responses can skew your population by under or over representation of certain demographics.
Solution: The extent of this non-response can be avoided by using follow-up surveys to make sure you get a response, or by ensuring that your target population is adequately represented through alternate responders.
Sample insuffiency errors:
These errors occur because the people who actually respond might not represent the full spectrum of the sample.
Solution: Sampling insuffiency errors can be controlled by careful sample designs, large samples, and multiple contacts to ensure proper representation.
Most sampling errors can be avoided by increasing the population size and ensuring that most of the selected respondents adequately represents the rest of the population. The more rigorously you sample and find the right candidates for your survey, the better the outcome will be.
The industry best practice is to use at least 1,500 responses. This ensures top-line results are within 3-5 % of the margin of error and that results are sorted by one or two criteria (age, gender, etc.) and are within 10% of the margin of error. The number of people you narrow down for surveying would be your sample size.
There are several sampling techniques you can use to make sure that you're sending your survey to the right audience and getting the best results. It is important that researchers discuss this, take into account the limitations, and ensure transparency while selecting their sample.
The margin of error you see with survey results is actually an estimate of your sampling error.
As a surveyor, it's crucial that you carefully consider and then only select the samples that delivers the most accurate results. It's also important to be aware of other sampling errors that can lead to a flawed survey and to work hard to avoid them and you've already done Step 1 since you're here!