# Section B_Group 3_Sumit Vaishnav_13PGP114_Sample Size issues and determination

Sample Size Issues and Determination

What sample size should we go with? This is the most important question given the objectives and constraints that exist in selecting the sampling method.  Since every survey is different from other, one cannot expect fixed rules for determining the sample size.  However, important factors that should be considered are :

• population size & variability present within the population;
• resources available (time, money and personnel);
• accuracy level required of the results;
•  detail required in the results;
•  level of non-response;
• the sampling methods to be used; and
• presence of other variables of interest

Now let us look at their importance :
Variability Large sample is required where variability is high .  However, actual population variability is generally not known in advance; information from a previous survey or a pilot test may be used to give an indication of the variability of the population.
When the characteristic being measured is comparatively rare, a larger sample size will be required to ensure that sufficient units having that characteristic are included in the sample.
Population Size An aspect that affects the sample size required is the population size.  When the population size is small, it needs to be considered carefully in determining the sample size, but when the population size is large it has little effect on the sample size.  Gains in precision from increasing the sample size are by no means proportional to population size.
Resources and Accuracy As discussed earlier, the estimates are obtained from a sample rather than a census, therefore the estimates are different to the true population value.  A measure of the accuracy of the estimate is the standard error.  A large sample is more likely to have a smaller standard error or greater accuracy than a small sample.
When planning a survey, you might wish to minimise the size of the standard error to maximise the accuracy of the estimates.  This can be done by choosing as large a sample as resources permit.  Alternatively, you might specify the size of the standard error to be achieved and choose a sample size designed to achieve that.  In some cases it will cost too much to take the sample size required to achieve a certain level of accuracy.  Decisions then need to be made on whether to relax the accuracy levels, reduce data requirements, increase the budget or reduce the cost of other areas in the survey process.
Level of Detail Required If we divide the population into subgroups (strata) and we are choosing a sample from each of these strata then a sufficient sample size is required in each of the subgroups to ensure reliable estimates at this level.  The overall sample size would be equal to the sum of the sample sizes for the subgroups.  A good approach is to draw a blank table that shows all characteristics to be cross-classified.  The more cells there are in the table, the larger the sample size needed to ensure reliable estimates.
Likely level of Non-response Non-response can cause problems for the researcher in two ways.  The higher the non-response the larger the standard errors will be for a fixed initial sample size.  This can be compensated for by assigning a larger sample size based on an expected response rate, or by using quota sampling.
The second problem with non-respondents is that the characteristics of non-respondents may differ markedly from those of respondents.  The survey results will still be biased even with an increase in sample size (ie. increasing the sample size will have no effect on the non-response bias).  The lower the response rate, the less representative the final sample will be of the total population, and the bigger the bias of sample estimates.  Non-response bias can sometimes be reduced by post-stratification as well as through intensive follow up of non-respondents, particularly in strata with poor response rates.
Sampling Method Many surveys involve complex sampling and estimation procedures.  An example of this is a multi-stage design.  A multi-stage design can often lead to higher variance in resulting estimates than might be achieved by a simple random sample design.  If, then, the same degree of precision is desired, it is necessary to inflate the sample size to take into account the fact that simple random sampling is not being used.
Relative importance of the variables of interest Generally, surveys are used to collect a range of data on a number of variables of interest.  A sample size that will result in sufficiently precise information for one variable may not result in sufficiently precise information for another variable.  It is not normally feasible to select a sample that is large enough to cover all variables to the desired level of precision.  In practice therefore, the relative importance of the variables of interest are considered, priorities are set and the appropriate sample size determined accordingly.
Calculation of sample size When determining an appropriate sample size, we take as a general rule, the more variable a population is, the larger the sample required in order to achieve specific levels of accuracy in survey estimates.  However, actual population variability is not known and must be estimated using information from a previous survey or a pilot test.  It is worthwhile keeping in mind that the gains in precision of estimates are not directly proportional to increases in sample size (i.e doubling the sample size will not halve the standard error, generally the sample has to be increased by a factor of 4 to halve the SE).
In practice, cost is a major consideration.  Many surveys opt to maximise the accuracy of population estimates by choosing as large a sample as resources permit.  In  complex surveys, where estimates are required for population subgroups, enough units must be sampled from each subgroup to ensure reliable estimates at these levels.  To select a sample in this case, you might specify the size of the standard error to be achieved within each subgroup and choose a sample size to produce that level of accuracy.  The total sample is then formed by aggregating this sample over the subgroups.
Sample size should also take into account the expected level of non-response from surveyed units.
When the characteristic being measured is comparatively rare, a larger sample size will be required to ensure that sufficient units having that characteristic are included in the sample.