41. Statistics – Sampling Methods

In general, sampling data is not about the data collecting from the sampled region. The main purpose is to use the sampled data to represent the desired population. Sampling’s data will refer to statistics characteristics such as sample mean, sample variances to estimate the population’s parameter. This chapter will talk about the sampling methods and tactics such as reducing sampling error.

Before doing the sampling, there are 4 primary reason why population sampling is something people try to avoid:

But how does the sample effectively represent the population? Then it will be validated by minimizing the sampling error and non-sampling error. The definition of the sampling and non sampling error are listed below.

Coming from the variation from the sampling where the sampling from population. In this case, increase of sampling size and have appropriate sampling methods can reduce sampling error. 

This happens when sampling does not have any representation. For instance, when requesting the public to discuss the acceptance level of product, but the sampling methodology is using internet voting. This will potentially neglect non-technology user’s opinion.

 

In statistics, all data are assumed to be non-sampling error free.

In general, sampling methods can be broken down to random sampling and non-random sampling.

Random sampling refers to random methods to ensure all parts of the population can be sampled

non-random sampling methods will be based on sampling owner’s opinion or refer to the constraints from the environment which causes possible parts of population not able to be sampled. 

For random sampling, there are 5 methods to execute.

Number the population and determine the sampling quantity and randomly sample the population. This can be done either by lottery system or use normal distribution’s number to select the sample.

 

The advantage for simple random sampling is that all parts of population will have equal chance to be sampled. But the flaw will occur if population is significant or the population size is uncertain.

When having 5000 parts within a population, and wanted to sample 100 parts. Then the interval will be sampled once per every 50 parts. 

 

And the 1st value for sampling can be determined by the simple random sampling.

Evaluate the strata level for different population to ensure the selected population has similar ratio of ditribution. For instance, when gender, management level are significantly different for classification, stratified sampling will minimize the variances.

There are two major drawbacks for strata. One is when select unrelated strata, then the sampling would be ineffective. Meanwhile the other one is having excess level of strata to nullify the effect of stratified sampling.

The population would be clustered and split to separate groups and then will random sample the split groups to represent the population. But this would have not been ideal if the split groups have drastic differences whether it’s the nature of groups or the behavior of groups. 

 

The ideal for cluster sampling, the groups shall be similar but the variation within one particular group shall be drastic to reflect the situation.

Use multiple sampling methods to minimize the sampling error. This can combine with clustered sampling, stratified, interval and simple random sampling for the population.

For non-random sampling, there are 2 methods to execute.

Quota sampling is similar to stratified sampling. The difference is that the quota sampling does not know the population’s distribution.

 

The quota distribution is based on the sampler’s experience and judgement to give the estimated quota for the population percentage to sample the designated population.

Sampling based on the subjective judgement. This is even more biased due to the sampler will decide which population is more representative before selecting the population to evaluate.

Share your thoughts