Session 8: Data Types and Analyzing Questionnaire with SPSS …… Section A _Group 2_Abhishek Kumar_13PGP003

In research studies after the data is collected, data is analysed statistically; that is, responses from a questionnaire or data collection sheet can be coded so that each response is represented by a number. When data have been coded, it can be entered to a data analysis program such as SPSS. SPSS is menu driven, making it easy to use. SPSS is a statistical analysis package and can be used to analyse questionnaires etc. SPSS is a three stage process:

1.Defining Variables                                                 2.Inputting data                                               3.Analysing the data

Here we will first distinguish between types of data. Then go through a pile of questionnaires that need to be coded before data are entered into SPSS. Then show the conventions for assigning codes to questionnaires before the data entry process.Then analyse and assess the questionnaire using SPSS.


Paramount to all data entry and analysis – is the knowledge of what type of data a given variable is, because different types of data are coded and ultimately analysed in different ways. SPSS gives three options regarding types of data: nominal, ordinal and scale.

Nominal data

These are categorical data that have no order.The categories within each variable are mutually exclusive:respondents can only fall into one category.For example,respondents can only be one ethnicity from a given list.Where a nominal variable has two categories, it is often referred to as dichotomous or binary.

           Examples of nominal variables

Gender – respondents can be male or female

Disease/health status – respondents can either have a disease or not

Marital status – respondents can only have one marital status at a given time: single,married, separated, divorced,widowed

Ordinal data

These are also categorical variables in which the categories are ordered.

               Examples of ordinal variables

Age group – for example, 30–39, 40–49, 50–59, 60

Likert scales – strongly agree, agree, neither agree nor disagree, disagree, strongly disagree

Scale data: In SPSS this covers discrete and continuous data. Discrete data comprise variables that can only take integers(whole-numbers).

Examples of discrete data: 

Number of nights spent in hospital

Number of courses of a given drug prescribed during the study period

                 Age at last birthday

Number of cigarettes smoked in a week

Continuous data can take any value.

However, this is usually restricted by the accuracy of the equipment used for measuring.

For example,

Scales for weighing adult human weights rarely measure more accurately than whole kilograms and occasionally to one decimal place. This is also for practical reasons; there is little need to weigh adult humans to greater precision than the nearest kilogram or 100 grams (1 decimal place).

                Examples of continuous data

Blood pressure

Body mass index (BMI)

Lung function, for example peak expiratory flow rate (PEFR)


Once data collection is complete, then the next task is to decide how to code each question so that it can easily be seen which values should be inputted into SPSS. This is necessary because numerical values are needed representing answers to questions on a questionnaire or other data collection sheet for SPSS to analyse the data. The initial task of deciding on coding is best done using an unused questionnaire so that all possible codes can be written on the questionnaire without confusion. In addition, it is a good idea to write the variable names on this questionnaire.These steps help the coding process so that individual codes do not have to be remembered and also provides a permanent record of the coding of the dataset.

Below some excerpts from a questionnaire aimed at hotel employees to discover their knowledge of diabetes and to find out whether employees consider hotels (as a workplace) to be appropriate places to conduct health promotion specifically aimed at type 2 diabetes. These excerpts have not been annotated with possible variable names, but have been annotated with coding. Variable names should not be long (ideally eight characters), but should be as descriptive as possible (when setting up an SPSS datasheet it is possible to give each variable a longer label). Variable names must be unique within a dataset.

Questions producing nominal data; there is no ordering. Question 2  lists three options and participants were asked to select one, therefore a different code is needed for all options. As the data are nominal the numbers given.





Enter the data into SPSS by having each question represented by a column in SPSS. Translate your response scale into numbers (i.e. 5 point Likert might be 1 = completely disagree, 2 = disagree, 3 = neither agree nor disagree, 4 = agree, 5 = completely agree). Reverse phrased items should be scored in reverse too!

What we’re trying to do with this analysis is to first eliminate any items on the questionnaire that aren’t useful. So, we’re trying to reduce our 30 items down further before we run our factor analysis. We can do this by looking at descriptive statistics, and also correlations between questions.

 Descriptive Statistics

The first thing to look at is the statistical distribution of item scores. This alone will enable you to throw out many redundant items.Therefore, the first thing to do when piloting a questionnaire is descriptive statistics on the questionnaire items. This is easily done in SPSS. We’re on the look out for:

  1. Range: Any item that has a limited range (all the points of the scale have not been used).
  1. Skew: It is mentioned above that ideally each question should elicit a normally distributed set of responses across subjects (each items mean should be at the centre of the scale and there should be no skew).
  1. Standard Deviation: Related to the range and skew of the distribution, items with high or low standard deviations may cause problems so be wary of high and low values for the SD.

These are your first steps. Basically if any of these rules are violated then your items become non-comparable (in terms of the factor analysis) which makes the questionnaire pretty meaningless!!


All of your items should inter-correlate at a significant level if they are measuring aspects of the same thing. If any items do not correlate at a 5% or 1% level of significance then exclude them

Factor Analysis 

When you’ve eliminated any items that have distributional problems or do not correlate with each other, then run your factor analysis on the remaining items and try to interpret the resulting factor structure.

What you should do is examine the factor structure and decide:

  1. Which factors to retain?
  1. Which items load onto those factors?
  1. What your factors represent?
  1. If there are any items that don’t load highly onto any factors, they should be eliminated from future versions of the questionnaire (for our purposes you need only state that they are not useful items as you won’t have time to revise and re-test your questionnaires!).


Having looked at the factor structure, you need to check the reliability of your items and the questionnaire as a whole. We should run a reliability analysis on the questionnaire. There are two things to look at: (1) the Item Reliability Index (IRI), which is the correlation between the score on the item and the score on the test as a whole multiplied by the standard deviation of that item (called the corrected item-total correlation in SPSS). SPSS will do this corrected item-total correlation and we’d hope that these values would be significant for all items. Although we don’t get significance values as such we can look for correlations greater than about 0.3 (although the exact value depends on the sample size this is a good cut-off for the size of sample you’ll probably have). Any items having a correlations less than 0.3 should be excluded from the questionnaire. (2) Cronbach’s alpha, as we’ve seen, should be 0.8 or more and the deletion of an item should not affect this value too much.

Useful References

  1. Breakwell, G. M., Hammond, S., & Fife-Shaw, C. (Eds.) (1995). Research Methods in Psychology, London: Sage. [Chapters 8 & 12: QZ200 Res]
  2. Cook, T. D. & Campbell, D. T. (1979). Quasi-Experimentation. Chicago: Rand-McNally. [Chapter 2].
  3. Cronbach, L. J. & Meehl, P. E. (1955). Construct Validity in Psychological Tests, Psychological Bulletin, 52 (4), 281-302.
  4. Field, A. P. (2004). Discovering statistics using SPSS: advanced techniques for the beginner(2nd Edition). London: Sage.
  5. Gaskell, G. D. Wright, D. B., & OSMuircheartaigh, C. A. (1993). Reliability of Surveys. The Psychologist, 6 (11), 500-503


Posted by: Abhishek Kumar_13PGP003

Other Members of Group 2_Section A:  Sarvesh Singh, Naureen Fatima, Pavan Kumar Tatineni, Pittala Priyanka, Poulomi Paul, Sao Ruchi and Charan Kumar Karra.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s