# Group_5_Lecture_6_Shravan_Kailasa_Probability_Proportional_to_Size_Sampling

In some cases the sample designer has access to an “auxiliary variable” or “size measure”, believed to be correlated to the variable of interest, for each element in the population. The “auxiliary variable” is used as the basis of sampling when the stratification methodology is employed.

Probability Proportional to Size (‘PPS’) Sampling, in which the selection probability for each element is set to be proportional to its “size measure”, up to a maximum of 1. This implies that depending upon the proportion of the element’s likely frequency of occurrence within the population; it is represented by the same amount in the sample. In a simple PPS design, these selection probabilities can then be used as the basis for Poisson sampling. However, this has the drawback of variable sample size, and different portions of the population may still be over- or under-represented due to chance variation in selections.

Systematic sampling theory can be used to create a probability proportionate to size sample. This is done by treating each count within the size variable as a single sampling unit. Samples are then identified by selecting at even intervals among these counts within the size variable. This method is sometimes called PPS-sequential or monetary unit sampling in the case of audits or forensic sampling.

Example: Suppose we have six schools with populations of 150, 180, 200, 220, 260, and 490 students respectively (total 1500 students), and we want to use student population as the basis for a PPS sample of size three. To do this, we could allocate the first school numbers 1 to 150, the second school 151 to 330 (= 150 + 180), the third school 331 to 530, and so on to the last school (1011 to 1500). We then generate a random start between 1 and 500 (equal to 1500/3) and count through the school populations by multiples of 500. If our random start was 137, we would select the schools which have been allocated numbers 137, 637, and 1137, i.e. the first, fourth, and sixth schools.

The PPS approach can improve accuracy for given samples size by concentrating sample on large elements that have the greatest impact on population estimates. PPS sampling is commonly used for surveys of businesses, where element size varies greatly and auxiliary information is often available – for instance, a survey attempting to measure the number of guest-nights spent in hotels might use each hotel’s number of rooms as an auxiliary variable. In some cases, an older measurement of the variable of interest can be used as an auxiliary variable when attempting to produce more current estimates.

The PPS approach is especially used as part of a clustering based sampling exercise. Consider the scenario when clusters are not of the same size and a selection among them is warranted for. This is usually the case in large populations consisting and wherein “area sampling” or “geographical sampling” versions of clustering is to be used. In such cases, the PPS approach is used, thus giving the larger clusters a better chance of selection and smaller clusters a smaller probability. But, heed must be paid in conducting the survey, such that same number of interviews are conducted within each cluster, to give the same chance of overall selection to a sample unit (interview) when this method is used.

Take the example of an opinion poll survey or an exit poll survey. Here, the multistage clustering surveying technique is followed. In the first stage, the constituencies within which the polls are conducted are selected. This is usually done using the PPS method. After which, ‘n’ constituencies are selected out of a total ‘N’ constituencies. The “sampling size” is the number of eligible voters in that constituency. Now, the polling station areas are selected from within these clusters using a non probabilistic sampling technique. Now, within these polling station “clusters”, the survey is administered, based on the random sampling technique. Here is an excerpt from the CNN-IBN, Lokniti, The Hindu and the CSDS exit poll methodology carried out in the run up to the elections to the five state assemblies –

“The findings presented here are based on a Post Poll survey conducted by the Lokniti, Centre for the Study of Developing Societies (CSDS), Delhi, in Madhya Pradesh for CNN-IBN and The Week. The survey was conducted among 2829 respondents between 26th November and 1st December 2013 in 140 locations spread across 35 Assembly constituencies. The 35 assembly constituencies were the same as those where a Pre Poll survey was conducted by CSDS in October 2013. The respondents too were from the same pool of respondents that had been selected for the Pre Poll survey. The constituencies were selected using the Probability Proportionate to Size Method. Four polling stations within each of the 35 sampled constituencies were selected using the Systematic Random Sampling (SRS) technique. The respondents were also selected using the SRS method from the most updated electoral rolls of the 140 selected polling stations. Keeping in mind, the probability of non completion of interviews amongst all the selected respondents we adopted the technique of over sampling of respondents. A total of 4900 respondents were randomly sampled of which 3500 were targeted for interviews in the field during the Post Poll Survey, of which 2829 interviews were successfully completed in the stipulated time”

It further goes on to say –

“The social profile of the respondents interviewed largely matched the demographic profile of the State. Women comprise 43.4 percent of the sample. 15.9 percent of the sample consists of Scheduled Caste respondents, 20.2 percent is Scheduled Tribe and 9.4 percent is made up of Muslims. These numbers, are by and large similar to the actual Census figures and reflect the representative nature of the sample, although there is a slight over representation of Muslim and Urban and a under representation of Women”.

The above example demonstrates the application of PPS sampling technique in pre and post election scenario surveys conducted to gauge the popular perception.

SectionB_Group 5_Shravan_Kailasa_13PGP119

Other members of Section B,Group 5 are T. Sai Vijay, Nori Venkata Sairam,Kiran Tippani , Jayesh Surishetty, Keerthi Kiran Gautam, Kranthi Kiran Gude and Pramod Kumar.