Section A _Group 1_Vishal Dandwate_13PGP013 (Session 8)

In the previous session we learned about different types of variables/data. In this session we will learn about “Analysis” of given variables and how to find the relation between two different variables.

One can perform descriptive analysis for a given data; such kind of analysis gives frequency distribution of particular variable, standard deviation, mean, std. error etc.

After preliminary analysis of distribution of each of the variables, our next task is to look for relationships among two or more of the variables. There are multiple tools that may be used include correlation and regression, t-test etc. The type of analysis chosen depends on the research design, characteristics of the variables, shape of the distributions, level of measurement, and whether the assumptions required for a particular statistical test are met.

In this session we are going to discuss:  cross tabulation, it is a joint frequency distribution of cases based on two or more categorical variables.

When we do cross-tabulation we don’t take continuous variable as the its output due to use of continuous variable shall be very long and difficult to interpret. In such case to get more meaningful output, we transform continuous variable data into categorical variables by dividing it into mutually exclusive and collectively exhaustive intervals.

Thus redefined categorical variables are cross-tabulated as row and column interchangeably to get more substantiate observation.

The observation of cross tabulation can be validated by the use of the chi-square statistic to determine whether the variables are statistically independent or if they are associated.

The chi-square test of statistical significance, first developed by Karl Pearson, assumes that both variables are measured at the nominal level. To be sure, chi-square may also be used with tables containing variables measured at a higher level; however, the statistic is calculated as if the variables were measured only at the nominal level. This means that any information regarding the order of, or distances between, categories is ignored.

The null hypothesis assumed is that there is no relationship between two variables i.e. they are independent and alternate hypothesis is that they are dependent. In business statistics we see only significant value in the output table. If sig. > 0.05 then accept Null hypothesis i.e. there is no relationship between the two variables otherwise reject the Null hypothesis. Value of alpha is generally considered as 0.05, however, in the case of criticality like medical emergencies it is considered as 0.01.

Example :- Generations of students at Washington School have taken field trips at both the elementary and secondary levels. The principal wonders if parents still support field trips for children at either level. Five hundred letters were mailed to parents, asking them to indicate either approval or disapproval; 100 parents returned the response postcard. Each postcard indicated whether the parents’ children were currently enrolled in elementary or high school, and the parents’ approval or disapproval of field trips.

Table below contains the collected data-

 Approve Disapprove No Opinion Row Totals Elementary 28 14 5 47 High School 19 28 6 53 Column Totals 47 42 11 100

Analysis of above mentioned data by cross tabulation and chi-square is shown below –

 Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent Parents * Opinion 100 100.0% 0 0.0% 100 100.0%

 Parents * Opinion Crosstabulation Opinion Total Approve Disapprove No Opinion Parents Elementary school parents Count 28 14 5 47 % within Parents 59.6% 29.8% 10.6% 100.0% high school parents Count 19 28 6 53 % within Parents 35.8% 52.8% 11.3% 100.0% Total Count 47 42 11 100 % within Parents 47.0% 42.0% 11.0% 100.0%

 Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 6.143a 2 .046 Likelihood Ratio 6.222 2 .045 Linear-by-Linear Association 3.262 1 .071 N of Valid Cases 100 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.17.

Clearly from Chi-Square test sig value= 0.046 < 0.05, We reject the Null Hypothesis , hence we can conclude that Parents do approve field trips.