Session 8: Data Types and Analyzing Questionnaire with SPSS …… Section A _Group 2_Abhishek Kumar_13PGP003

In research studies after the data is collected, data is analysed statistically; that is, responses from a questionnaire or data collection sheet can be coded so that each response is represented by a number. When data have been coded, it can be entered to a data analysis program such as SPSS. SPSS is menu driven, making it easy to use. SPSS is a statistical analysis package and can be used to analyse questionnaires etc. SPSS is a three stage process:

1.Defining Variables                                                 2.Inputting data                                               3.Analysing the data

Here we will first distinguish between types of data. Then go through a pile of questionnaires that need to be coded before data are entered into SPSS. Then show the conventions for assigning codes to questionnaires before the data entry process.Then analyse and assess the questionnaire using SPSS.


Paramount to all data entry and analysis – is the knowledge of what type of data a given variable is, because different types of data are coded and ultimately analysed in different ways. SPSS gives three options regarding types of data: nominal, ordinal and scale.

Nominal data

These are categorical data that have no order.The categories within each variable are mutually exclusive:respondents can only fall into one category.For example,respondents can only be one ethnicity from a given list.Where a nominal variable has two categories, it is often referred to as dichotomous or binary.

           Examples of nominal variables

Gender – respondents can be male or female

Disease/health status – respondents can either have a disease or not

Marital status – respondents can only have one marital status at a given time: single,married, separated, divorced,widowed

Ordinal data

These are also categorical variables in which the categories are ordered.

               Examples of ordinal variables

Age group – for example, 30–39, 40–49, 50–59, 60

Likert scales – strongly agree, agree, neither agree nor disagree, disagree, strongly disagree

Scale data: In SPSS this covers discrete and continuous data. Discrete data comprise variables that can only take integers(whole-numbers).

Examples of discrete data: 

Number of nights spent in hospital

Number of courses of a given drug prescribed during the study period

                 Age at last birthday

Number of cigarettes smoked in a week

Continuous data can take any value.

However, this is usually restricted by the accuracy of the equipment used for measuring.

For example,

Scales for weighing adult human weights rarely measure more accurately than whole kilograms and occasionally to one decimal place. This is also for practical reasons; there is little need to weigh adult humans to greater precision than the nearest kilogram or 100 grams (1 decimal place).

                Examples of continuous data

Blood pressure

Body mass index (BMI)

Lung function, for example peak expiratory flow rate (PEFR)


Once data collection is complete, then the next task is to decide how to code each question so that it can easily be seen which values should be inputted into SPSS. This is necessary because numerical values are needed representing answers to questions on a questionnaire or other data collection sheet for SPSS to analyse the data. The initial task of deciding on coding is best done using an unused questionnaire so that all possible codes can be written on the questionnaire without confusion. In addition, it is a good idea to write the variable names on this questionnaire.These steps help the coding process so that individual codes do not have to be remembered and also provides a permanent record of the coding of the dataset.

Below some excerpts from a questionnaire aimed at hotel employees to discover their knowledge of diabetes and to find out whether employees consider hotels (as a workplace) to be appropriate places to conduct health promotion specifically aimed at type 2 diabetes. These excerpts have not been annotated with possible variable names, but have been annotated with coding. Variable names should not be long (ideally eight characters), but should be as descriptive as possible (when setting up an SPSS datasheet it is possible to give each variable a longer label). Variable names must be unique within a dataset.

Questions producing nominal data; there is no ordering. Question 2  lists three options and participants were asked to select one, therefore a different code is needed for all options. As the data are nominal the numbers given.





Enter the data into SPSS by having each question represented by a column in SPSS. Translate your response scale into numbers (i.e. 5 point Likert might be 1 = completely disagree, 2 = disagree, 3 = neither agree nor disagree, 4 = agree, 5 = completely agree). Reverse phrased items should be scored in reverse too!

What we’re trying to do with this analysis is to first eliminate any items on the questionnaire that aren’t useful. So, we’re trying to reduce our 30 items down further before we run our factor analysis. We can do this by looking at descriptive statistics, and also correlations between questions.

 Descriptive Statistics

The first thing to look at is the statistical distribution of item scores. This alone will enable you to throw out many redundant items.Therefore, the first thing to do when piloting a questionnaire is descriptive statistics on the questionnaire items. This is easily done in SPSS. We’re on the look out for:

  1. Range: Any item that has a limited range (all the points of the scale have not been used).
  1. Skew: It is mentioned above that ideally each question should elicit a normally distributed set of responses across subjects (each items mean should be at the centre of the scale and there should be no skew).
  1. Standard Deviation: Related to the range and skew of the distribution, items with high or low standard deviations may cause problems so be wary of high and low values for the SD.

These are your first steps. Basically if any of these rules are violated then your items become non-comparable (in terms of the factor analysis) which makes the questionnaire pretty meaningless!!


All of your items should inter-correlate at a significant level if they are measuring aspects of the same thing. If any items do not correlate at a 5% or 1% level of significance then exclude them

Factor Analysis 

When you’ve eliminated any items that have distributional problems or do not correlate with each other, then run your factor analysis on the remaining items and try to interpret the resulting factor structure.

What you should do is examine the factor structure and decide:

  1. Which factors to retain?
  1. Which items load onto those factors?
  1. What your factors represent?
  1. If there are any items that don’t load highly onto any factors, they should be eliminated from future versions of the questionnaire (for our purposes you need only state that they are not useful items as you won’t have time to revise and re-test your questionnaires!).


Having looked at the factor structure, you need to check the reliability of your items and the questionnaire as a whole. We should run a reliability analysis on the questionnaire. There are two things to look at: (1) the Item Reliability Index (IRI), which is the correlation between the score on the item and the score on the test as a whole multiplied by the standard deviation of that item (called the corrected item-total correlation in SPSS). SPSS will do this corrected item-total correlation and we’d hope that these values would be significant for all items. Although we don’t get significance values as such we can look for correlations greater than about 0.3 (although the exact value depends on the sample size this is a good cut-off for the size of sample you’ll probably have). Any items having a correlations less than 0.3 should be excluded from the questionnaire. (2) Cronbach’s alpha, as we’ve seen, should be 0.8 or more and the deletion of an item should not affect this value too much.

Useful References

  1. Breakwell, G. M., Hammond, S., & Fife-Shaw, C. (Eds.) (1995). Research Methods in Psychology, London: Sage. [Chapters 8 & 12: QZ200 Res]
  2. Cook, T. D. & Campbell, D. T. (1979). Quasi-Experimentation. Chicago: Rand-McNally. [Chapter 2].
  3. Cronbach, L. J. & Meehl, P. E. (1955). Construct Validity in Psychological Tests, Psychological Bulletin, 52 (4), 281-302.
  4. Field, A. P. (2004). Discovering statistics using SPSS: advanced techniques for the beginner(2nd Edition). London: Sage.
  5. Gaskell, G. D. Wright, D. B., & OSMuircheartaigh, C. A. (1993). Reliability of Surveys. The Psychologist, 6 (11), 500-503


Posted by: Abhishek Kumar_13PGP003

Other Members of Group 2_Section A:  Sarvesh Singh, Naureen Fatima, Pavan Kumar Tatineni, Pittala Priyanka, Poulomi Paul, Sao Ruchi and Charan Kumar Karra.


Section A_Group 3_ Shreesti Rastogi (13PGP054): Session 5

Contact Less, Receive More

Surveys are a part and parcel of an MBA grad’s life, be it marketing, operations, finance or any other specialization. Each and every Subject, at one time or the other requires the grad to delve deep into his environment, and, come out with a finding that is free from errors or biases, and what better than primary data for the survey.

Collecting primary data can be quite a daunting task and it can be pretty expensive too. This, makes it almost impossible for the B grad to collect any primary data. Thus, he has to depend upon secondary data. But thanks to the internet which has chalked out the path for online surveys -an easy, inexpensive and highly convenient method to collect primary data. However, online surveys with all of its advantages does have the shortcoming of Low response rate.

Studies show that, the response rate i.e. the Number of Completed Surveys / Number of Participants Contacted x 100% for an online survey can be as low as 20%.

There can be a number of reasons for a poor response rate, however, the primary reasons that have been identified with a viable solution are:

a)      Poor Survey Design.

b)      Communication with the respondent.

Poor Survey Design: Not understanding the need and objectives of one’s research can lead to a bad survey design. To ensure that the survey fulfils its purpose, it is important to ensure that the following are done:

a)      Understand the purpose of the research – This helps in making a relevant questionnaire. The answers that are received are insightful.

b)      The Questionnaire must be clean, precise, logical and short – this motivates the respondent to answer the questionnaire and thus helps in increasing the response rate.

c)      Studies show that surveys perceived important by the respondent have a higher response rate. Thus, outlining the intent of the survey in the beginning helps the respondent to assimilate the importance of the survey.

Communication with the respondent:  A lack of interest is generally noticed among the respondents. One of the major causes identified for this problem, is that a survey email is considered as a spam email by many and hence, receives the same treatment as a spam – trash box. Then, how does one ensure the commitment of a respondent?

One can contact the respondents before the survey. The e-mail must inform the respondent about the sender of the e-mail, the objective of the e-mail, the source from which the e-mail address of the respondent has been received, the objective and the duration of the survey.

The Contact details of the sender must also be shared. The contact details can be used by the responder to ratify his doubts or simply to verify the credibility of the survey. Personalised e-mails further reduce the appearance of the e-mail as a spam message. This can be simply be done by including the name of the respondent at the beginning of the message.

It is a good idea to give the respondent an option to opt out from the survey. The respondents who do not opt out are more likely to respond, thus increasing the response rate.

Sending & Receiving Responses – There is always a good time and there is always a bad time for all things. Similarly, surveys must be sent keeping in mid the availability of the respondent. Further, one must also provide adequate time to the respondent to respond to the survey.

Following up by sending reminder mails only to the non-respondents helps in increasing the response rate. The reminder mails must be sent after maintaining an optimum time interval.

Thus, following such simple steps can ensure a higher response rate for an online survey. Studies conducted in Georgia using such simple steps have shown to increase the response rate up to 70%. This will help the B grads to collect primary data in faster time frame.


Section B_Group 8_ Sneha Srivastava (13PGP110): Session 5

Web-Based Surveys

“Measure twice, cut once.” – A carpenter’s rule. Same goes for the research process where assessing the consumers’ attitude and perception is of prime importance.


Technology has changed the traditional door-to-door method of survey to telephonic method and ultimately the web-based method of research. Now the researchers are turning to the faster and more economical ways to conduct primary research. At the same time, there is reluctance in the minds of researchers to trust the accuracy and validity of the response, got from web-based survey. That makes us ponder, whether the fear of falling behind the younger generation or budget cut is pushing the researchers towards online surveys. Before jumping on to any conclusion, it is important to analyse the various aspects related to the shift of conventional method of primary research to the contemporary method which uses internet.


>> Benefits

–          Economical- paper, printing and data entry cost is eliminated. Variable cost to include more number of participants is almost zero.

–          Streamlines the data collection process, data is available in ready to use format. That makes it easy to analyse and perform the secondary research.

–          Faster, as response collection time is minimized.

–          Greater recall, the respondents get their own time to answer.

–          Detailed responses, because of anonymity.

–          Wide availability of survey design, as there is provision of including sound and video clips in the questionnaire to make it more appealing to the respondents.

–          Flexibility- the questionnaire can be tailored according to the target group.

–          Since the use of paper is minimized, it is environment friendly.

>> Problems

–          Bad sampling, as sampling of email addresses are difficult.

–          Questions can’t be explained in case clarifications are required.

–          Follow up questions can’t be asked if required.

–          Limited coverage, as large section of population doesn’t have an access to the internet. Further, even though the accessibility is there, not all potential respondents are computer literate.

–          Low response rate, as there are chances that the invitation link to fill the survey will go in spam folder of the receiver, a crowding out effect.

What can be done?

Choose the mode of survey (online or offline) based on your research objective and the target group. If the research requires a specific target group, go for offline survey or send the invitation link to only the selected set of respondents.

To increase the response rate, funnel design should be used for preparing the questionnaire. Moreover, it should be kept simple and free of ambiguity.

Emails with personalized survey links and reminders could facilitate the response.


Hence, while web based surveys are in constant flux, the researchers should use their craft with reflexivity. Internet has provided a worthy alternative for the research design and the collection of data. But the project DNA should be kept in mind before designing a questionnaire and choosing the mode for survey. The decision to choose the mode of survey should be scenario-based rather than cost or convenience based. It is very important to check whether it is going to fulfill our research objective or not. Each response should be evaluated carefully to make sure that it adds value and meaning to the research process.


Section B_Group 1_Kritika S_13PGP087_Poorva Gadre_13PGP098

What is TRP (Television Rating Point)?

 The most important term in the media and entertainment field is Television Rating Points better known as TRP. Television Rating Points is an index which provides the details about the viewership of television channels as well as the shows telecasted on them. It helps in deciding the time-slot of the shows and also the slot for advertisements. In India TAM (Television Audience Measurement) and aMap(Audience Measurement & Analytics Limited) are the electronic rating systems for measurement of TRP.

 In case of TAM, for calculation purpose two methodologies are used one being the frequency monitoring and the other is picture matching technique. In the frequency monitoring technique ‘people meters’ are attached to the TV sets in sample homes which records the details about the channels watched by the family members at a particular point of time, their frequency of changing the channel and the maximum viewership time-slots. The ‘People meter’ reads the frequencies of channels, which are later, decoded into the name of the channels and the agency prepares a national data based on the readings from the TV sets of sample population. To determine the final TRP of a channel or its shows an average of the entire week is considered and accordingly the success of a show for the particular week is judged .

 The sample homes for the purpose of calculation are based on different geographical, demographic or SEC sectors. The numbers of ‘people meters’ sample as per the Ministry of Information and Broadcasting (MIB) has to be ramped up to 20,000.The drawback in this technique, as cable operators frequently change the frequencies of the different channels before sending signals to the homes. It may be very misleading to read a channel according to a particular frequency even if the down linking frequency is same all over India.

Second technique is picture matching technique people meter continuously records a small portion of the picture that is being watched on that particular television set. Along with this records of all the channels’ data in the form of small picture portion is also collected. Data collected from the sample homes is later on matched with the main data bank to interpret the channel details and subsequently national rating is produced.

 aMap is an audience measurement system that provides data on television in India via aMap Digital. AmapDigital is an overnight DTH (Direct-to-home) TV audience measurement panel. aMap provides data on television ratings, gross rating points (GRP), reach, time spent, market share, target groups, connectivity of channels, content analysis .The sample size for this technique is 6000 metered homes.

It is the only system in India that gathers and disseminates connectivity data on an overnight basis, for three times of the day. To aMap’s subscribers, data is provided on an overnight basis by market and by band. Data is disseminated via the following bands Prime band, Colour Band, S Band, Ultra High Frequency (UHF) and Hyper Band. It gives the percentage of homes that receive a channel on a particular band. The greatest advantage of this method is that it provides data on daily basis unlike TAM which delivers on weekly basis.






Section B_Group 1_ Arnab Sen_13PGP069_Poorva Gadre_13PGP098


Statistical procedures can be divided into two major categories:

  • Descriptive statistics
  • Inferential statistics.

Descriptive statistics includes statistical procedures that we use to describe the population we are studying. The data could be collected from either a sample or a population, but the results help us organize and describe data. On the other hand, inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample.

The method in which we select samples to learn more about characteristics in a given population is called hypothesis testing. Hypothesis testing is really a systematic way to test claims or ideas about a group or population.

The method for hypothesis testing can be described in four simple steps:

Step 1: Null and alternative hypotheses

The null hypothesis is a claim of “no difference”. In a mathematical formulation of the null hypothesis there will typically be an equal sign. The null hypothesis is what we are attempting to overturn by our hypothesis test. If we are studying a new treatment, the null hypothesis is that our treatment will not change our subjects in any meaningful way. This hypothesis is denoted by H0.

The opposing hypothesis is the alternative hypothesis. The alternative hypothesis is a claim of “a difference in the population,” and is the hypothesis the one often hopes to bolster. This hypothesis is denoted by either Ha. In a mathematical formulation of the alternative hypothesis there will typically be an inequality, or not equal to symbol.

Step 2: Test statistic

We calculate a test statistic from the data. There are different types of test statistics. One of them is the z statistic. The z statistic will compare the observed sample mean to an expected population mean μ0. Large test statistics indicate data are far from expected, providing evidence against the null hypothesis and in favour of the alternative hypothesis.

 Step 3: p Value and conclusion

The test statistic is converted to a conditional probability called a P-value. The P- value answers the question “If the null hypothesis were true, what is the probability of observing the current data or data that is more extreme?”

Step 4: Decision

Alpha (α) is a probability threshold for a decision. If P ≤ α, we will reject the null hypothesis. Otherwise it will be retained for want of evidence. α is called the level of confidence. For example if α =0.95, we can say that the solution of the hypothesis test can be claimed to be correct with a confidence of 95 %. The level of confidence is decided based on the criticality of the problem. The more the criticality of the problem, the more will be the level of confidence set at.


 There are basically two kinds of errors that are possible in case of hypothesis testing:

Type I Error:

This kind of error is also known as a “false positive”: the error of rejecting a null hypothesis when it is actually true. In other words, this is the error of accepting an alternative hypothesis (the real hypothesis of interest) when the results can be attributed to chance. It occurs when one is observing a difference when in truth there is none.

Type II Error:

This kind of error is also known as a “false negative”: the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature. In other words, this is the error of failing to accept an alternative hypothesis when one doesn’t have adequate power. It occurs when one is failing to observe a difference when in truth there is one.

Type I and type II errors are part of the process of hypothesis testing. Although the errors cannot be completely eliminated, we can minimize one type of error. When one tries to decrease the probability of one type of error, the probability for the other type increases. However, if everything else remains the same, then the probability of a type II error will nearly always increase. There is always a trade-off between the two types of errors.

In some cases a Type I error is preferable to a Type II error. In other applications a Type I error is more dangerous to make than a Type II error.

Suppose one is designing a medical screening for a disease. Is a Type I or a Type II error better? A false positive may give the patient some anxiety, but this will lead to other testing procedures. Ultimately the patient will discover that the initial test was incorrect. Contrasted to this, a false negative will give the patient the incorrect assurance that he does not have a disease when he in fact does. As a result of this incorrect information, the disease will not be treated. If one could choose between these two options, a false positive is more desirable than a false negative.

Now suppose that one has been put on trial for murder. The null hypothesis here is that one is not guilty. Which of the two errors is more serious? Again, it depends. A Type I error occurs when one is found guilty of a murder that you did not commit. This is a very a dire outcome. A Type II error occurs when one is guilty but is found not guilty. This is a good outcome for one, but not for society as a whole. Here we see the value in a judicial system that seeks to minimize Type I errors.



SectionB_Group3_Prasha Mishra(13PGP101) : Session 5

Establishing Questionnaire Flow & Layout

Market Researchers are well aware that development of a questionnaire is a key to building mutual understanding and agreement between interviewer and respondents. Better the rapport, better are the chances of the interviewer to get good results. Also, the respondent’s answers will be carefully thought out and detailed.

Given below are some guidelines to develop the questionnaire flow.

 Identify Respondents – Most processes of data acquisition today employ sampling methods. This is to identify minimum number of respondents for their questionnaire and yet cover various types and get good results. For instance, a study on food focuses on users of some specific brands. A study on cosmetic brands targets urban women depending on the product type.

Opening questionnaire – After identifying key respondents, start the questionnaire with question that builds respondent’s interest.  The initial questions should be simple, interesting and not private like respondent’s age, income. Private questions are considered threatening and make the respondents defensive.

Flow of the questionnaire – The questionnaire should proceed in a logical fashion. General questions should be covered first which will make the respondent think about the concept, company or product. Then the questions should include specifics. For instance a questionnaire on shampoo may begin with

“Have you purchased a hair shampoo within the past 3 weeks? “

Then the questions should lead the respondent asking about

“The frequency of using shampoo, brands purchased in the last few months, satisfaction dissatisfaction with the brands, characteristics of ideal shampoo”.

 Layout – Ask questions that continue to maintain respondent’s interest and commitment to motivate them to finish the survey. The questionnaire should be designed with short encouragements at strategic locations in the questionnaire.

Sensitive, private questions should be positioned at the end. Embarrassing questions should be covered near the end, this will ensure that most questions are already answered before the respondents become defensive. Moreover, rapport has been established at this point increasing the chances of a completed questionnaire.

 – Prasha Mishra (13PGP101)


Section B_Group 3_Sumit Vaishnav_13PGP114_Sample Size issues and determination

Sample Size Issues and Determination

What sample size should we go with? This is the most important question given the objectives and constraints that exist in selecting the sampling method.  Since every survey is different from other, one cannot expect fixed rules for determining the sample size.  However, important factors that should be considered are :

  • population size & variability present within the population;
  • resources available (time, money and personnel);
  • accuracy level required of the results;
  •  detail required in the results;
  •  level of non-response;
  • the sampling methods to be used; and
  • presence of other variables of interest

Now let us look at their importance :
Variability Large sample is required where variability is high .  However, actual population variability is generally not known in advance; information from a previous survey or a pilot test may be used to give an indication of the variability of the population.
When the characteristic being measured is comparatively rare, a larger sample size will be required to ensure that sufficient units having that characteristic are included in the sample.
Population Size An aspect that affects the sample size required is the population size.  When the population size is small, it needs to be considered carefully in determining the sample size, but when the population size is large it has little effect on the sample size.  Gains in precision from increasing the sample size are by no means proportional to population size.
Resources and Accuracy As discussed earlier, the estimates are obtained from a sample rather than a census, therefore the estimates are different to the true population value.  A measure of the accuracy of the estimate is the standard error.  A large sample is more likely to have a smaller standard error or greater accuracy than a small sample.
When planning a survey, you might wish to minimise the size of the standard error to maximise the accuracy of the estimates.  This can be done by choosing as large a sample as resources permit.  Alternatively, you might specify the size of the standard error to be achieved and choose a sample size designed to achieve that.  In some cases it will cost too much to take the sample size required to achieve a certain level of accuracy.  Decisions then need to be made on whether to relax the accuracy levels, reduce data requirements, increase the budget or reduce the cost of other areas in the survey process.
Level of Detail Required If we divide the population into subgroups (strata) and we are choosing a sample from each of these strata then a sufficient sample size is required in each of the subgroups to ensure reliable estimates at this level.  The overall sample size would be equal to the sum of the sample sizes for the subgroups.  A good approach is to draw a blank table that shows all characteristics to be cross-classified.  The more cells there are in the table, the larger the sample size needed to ensure reliable estimates.
Likely level of Non-response Non-response can cause problems for the researcher in two ways.  The higher the non-response the larger the standard errors will be for a fixed initial sample size.  This can be compensated for by assigning a larger sample size based on an expected response rate, or by using quota sampling.
The second problem with non-respondents is that the characteristics of non-respondents may differ markedly from those of respondents.  The survey results will still be biased even with an increase in sample size (ie. increasing the sample size will have no effect on the non-response bias).  The lower the response rate, the less representative the final sample will be of the total population, and the bigger the bias of sample estimates.  Non-response bias can sometimes be reduced by post-stratification as well as through intensive follow up of non-respondents, particularly in strata with poor response rates.
Sampling Method Many surveys involve complex sampling and estimation procedures.  An example of this is a multi-stage design.  A multi-stage design can often lead to higher variance in resulting estimates than might be achieved by a simple random sample design.  If, then, the same degree of precision is desired, it is necessary to inflate the sample size to take into account the fact that simple random sampling is not being used. 
Relative importance of the variables of interest Generally, surveys are used to collect a range of data on a number of variables of interest.  A sample size that will result in sufficiently precise information for one variable may not result in sufficiently precise information for another variable.  It is not normally feasible to select a sample that is large enough to cover all variables to the desired level of precision.  In practice therefore, the relative importance of the variables of interest are considered, priorities are set and the appropriate sample size determined accordingly.
Calculation of sample size When determining an appropriate sample size, we take as a general rule, the more variable a population is, the larger the sample required in order to achieve specific levels of accuracy in survey estimates.  However, actual population variability is not known and must be estimated using information from a previous survey or a pilot test.  It is worthwhile keeping in mind that the gains in precision of estimates are not directly proportional to increases in sample size (i.e doubling the sample size will not halve the standard error, generally the sample has to be increased by a factor of 4 to halve the SE).
In practice, cost is a major consideration.  Many surveys opt to maximise the accuracy of population estimates by choosing as large a sample as resources permit.  In  complex surveys, where estimates are required for population subgroups, enough units must be sampled from each subgroup to ensure reliable estimates at these levels.  To select a sample in this case, you might specify the size of the standard error to be achieved within each subgroup and choose a sample size to produce that level of accuracy.  The total sample is then formed by aggregating this sample over the subgroups.
Sample size should also take into account the expected level of non-response from surveyed units.
When the characteristic being measured is comparatively rare, a larger sample size will be required to ensure that sufficient units having that characteristic are included in the sample.