In the previous session we learned about different types of variables/data. In this session we will learn about “Analysis” of given variables and how to find the relation between two different variables.
One can perform descriptive analysis for a given data; such kind of analysis gives frequency distribution of particular variable, standard deviation, mean, std. error etc.
After preliminary analysis of distribution of each of the variables, our next task is to look for relationships among two or more of the variables. There are multiple tools that may be used include correlation and regression, ttest etc. The type of analysis chosen depends on the research design, characteristics of the variables, shape of the distributions, level of measurement, and whether the assumptions required for a particular statistical test are met.
In this session we are going to discuss: cross tabulation, it is a joint frequency distribution of cases based on two or more categorical variables.
When we do crosstabulation we don’t take continuous variable as the its output due to use of continuous variable shall be very long and difficult to interpret. In such case to get more meaningful output, we transform continuous variable data into categorical variables by dividing it into mutually exclusive and collectively exhaustive intervals.
Thus redefined categorical variables are crosstabulated as row and column interchangeably to get more substantiate observation.
The observation of cross tabulation can be validated by the use of the chisquare statistic to determine whether the variables are statistically independent or if they are associated.
The chisquare test of statistical significance, first developed by Karl Pearson, assumes that both variables are measured at the nominal level. To be sure, chisquare may also be used with tables containing variables measured at a higher level; however, the statistic is calculated as if the variables were measured only at the nominal level. This means that any information regarding the order of, or distances between, categories is ignored.
The null hypothesis assumed is that there is no relationship between two variables i.e. they are independent and alternate hypothesis is that they are dependent. In business statistics we see only significant value in the output table. If sig. > 0.05 then accept Null hypothesis i.e. there is no relationship between the two variables otherwise reject the Null hypothesis. Value of alpha is generally considered as 0.05, however, in the case of criticality like medical emergencies it is considered as 0.01.
Example : Generations of students at Washington School have taken field trips at both the elementary and secondary levels. The principal wonders if parents still support field trips for children at either level. Five hundred letters were mailed to parents, asking them to indicate either approval or disapproval; 100 parents returned the response postcard. Each postcard indicated whether the parents’ children were currently enrolled in elementary or high school, and the parents’ approval or disapproval of field trips.
Table below contains the collected data
Approve

Disapprove

No Opinion

Row Totals


Elementary  28  14  5  47 
High School  19  28  6  53 
Column Totals  47  42  11  100 
Analysis of above mentioned data by cross tabulation and chisquare is shown below –
Case Processing Summary 

Cases 

Valid 
Missing 
Total 

N 
Percent 
N 
Percent 
N 
Percent 

Parents * Opinion 
100 
100.0% 
0 
0.0% 
100 
100.0% 
Parents * Opinion Crosstabulation 

Opinion 
Total 

Approve 
Disapprove 
No Opinion 

Parents  Elementary school parents  Count 
28 
14 
5 
47 
% within Parents 
59.6% 
29.8% 
10.6% 
100.0% 

high school parents  Count 
19 
28 
6 
53 

% within Parents 
35.8% 
52.8% 
11.3% 
100.0% 

Total  Count 
47 
42 
11 
100 

% within Parents 
47.0% 
42.0% 
11.0% 
100.0% 
ChiSquare Tests 

Value 
df 
Asymp. Sig. (2sided) 

Pearson ChiSquare 
6.143^{a} 
2 
.046 
Likelihood Ratio 
6.222 
2 
.045 
LinearbyLinear Association 
3.262 
1 
.071 
N of Valid Cases 
100 

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.17. 
Clearly from ChiSquare test sig value= 0.046 < 0.05, We reject the Null Hypothesis , hence we can conclude that Parents do approve field trips.