Session 7: Data Analytics using SPSS
When for the very first time, I started using SPSS; I found it very complicated to run. However, when college professor started teaching the course using SPSS, I found SPSS it rather logical.
While using SPSS or any analytics tool, principle is to focus on objective of your study. If focus on objective is kept intact, the flow of steps to be followed on SPSS becomes clear and logical.
Another rule is to always try to have all the data in numeric form. For example, it’s better to keep gender data as ‘Female’ marked as ‘1’ and ‘Male’ as ‘2’ or vice versa.
In SPSS, there are three data types, (a) Nominal, (b) Ordinal and (c) Scale.
Nominal data set is the most basic in all data types. Its presence is kept only for the namesake; it contains no meaningful value. Nominal numbers contain no information whatsoever. For example names of students indicated using number, numbers on the back of t-shirts of players. Another example can be again ‘Female’ marked as ‘1’ and ‘Male’ as ‘2’. Here, even if we mark ‘Male’ as 1 and ‘Female’ as 2, it makes no difference.
Ordinal data type is the one which follows order. It’s a sequence of what comes first and what later. But ordinal number does not tell the exact difference between the two data sets. For example, Age when kept in range and marked as:
18 years-20 years –> 1,
21 years-25 years –> 2 and
26 years-30 years –> 3
Then the numbers 1, 2, 3 are ordinal as the age ranges they follow are in order. But these numbers cannot tell the exact difference between two datasets. One cannot tell the exact difference between two different ranges 18-20 and 21-25, it can be anything from 1 to 7.
Scale data numbers are the one which can be marked on scale and can tell the exact difference between two data sets. For example, keeping age again in perspective, when kept definite i.e. 26 years represented as 1, 29 years represented as 2 then one can easily tell the difference between two data sets as 3. Other examples can be weight, height etc.
As can be seen from above examples we keep Scale data as continuous whereas we categorize Nominal and Ordinal data sets.
Steps followed during analytics study using SPSS are:
- Analysis/ Processes
- Analysis: Doing the first level analysis is always important; if ignored, there are good chances that our final strategy will get ruined. At first level analysis, we perform counting by looking at the frequency output of SPSS. Most important factor to check while doing first level analysis is ‘percentage of Valid Number’ as it depicts the actual percentage data which follows the trend under study.
Labor Force Status
From the derived preliminary frequency table output of SPSS many observations can be made. For example if the output is available for ‘age when people are getting married for the first time’ and ‘education’, then based on that one may observe that when lesser is the education then lower is the age of marriage. This same observation can be recorded as hypothesis which needs to be checked for its correctness through further detailed study.
To check the correctness of the hypothesis one needs to check the relation between two variables under study. It’s been done by using pearson chi-square test (If ‘Assym.sig’ value is less than 0.05 then reject the null hypothesis).
Once the relation between two variables is known then one should be able to interpret it from the given results. As seen from the following table one can say that 52.8 % of people who visited museum got married early.
Visited Art Museum or Gallery in Last Yr * categorise age when first married Cross tabulation
Based on the interpretation, we finally build up the strategy which would be beneficial in order to particular objective in hand.
Further detailed explanation on this will be covered in the next blog.