‘Humanizing’ the data using Crosstab
“Data gives you the what, but humans know the why.”
Data on its own may not be enough. It is important for people to make sense out of it and stories can have an impact. Specially, if these stories point the discrimination happening around us – they can be used to make a difference. Discrimination based on gender, age, caste, ethnicity, sexual orientation has been affecting us. Impact of discrimination on all the parties is serious. It impacts the morale, confidence and productivity of an individual. From a manager’s point of view discrimination may lead to higher organizational cost in the form of increased absenteeism, employee turnover, litigation and tarnished public image. Thus as future managers and thought leaders, it becomes necessary for us to have an understanding of what kind of discrimination is prevalent so as to know how to mitigate it.
Crosstabs are used to “humanize” the data by turning understanding the nature of discrimination prevalent around us. The data has got a lot to tell you. However listen to it patiently. Do not jump to conclusions; it is better to look at the data in the following manner:
Step 1. Look at the data in aggregate.
Step 2. Look at the data by subgroup.
Step3. Get the insight and return to the data with new questions.
The variations by subgroup can be very important. A big difference by region, for example, might imply that favoritism for a particular region may crowd out the fair dues to the other region. Conversely, you may look for that is it favoritism for a particular region or disliking for the other—so you want to make sure that if any kind of discrimination exists is identified and rooted out.
The discrimination analysis can be for large as well as small data. However the approach to analyze both the sets differs. For small data, example discrimination at workplace pertaining to layoffs –these disparate impact cases may deal with only few dozen to a few hundred employees. The use of contingency table analysis for small samples will be helpful in this case. In a typical disparate impact contingency table analysis, cross-tabulation tables are constructed. Columns represent the ‘treatment’ in these tables, and they are analyzed to determine whether they had any impact on the rows (result). For example, one might consider a 2×2 table in which columns referred to “Belonging to Region X’ and “Belonging to Region Y” with rows referring to “continued employment” and “laid off’. Here, ‘X’ and ‘Y’ can be replaced by the region names. In a typical large sample discrimination analysis, the table would be analyzed by constructing a Chi-squared test with I degree of freedom to test the null hypothesis of independence between rows and columns .If the test showed no significant differences, for example, a p value greater than the critical value, usually 5%, then one would fail to eject (i.e., tentatively accept) the null hypothesis of no discrimination. When sample sizes are smaller, the Chi-square statistic can sometimes lead to incorrect conclusions. In that case some alternative methods may be chosen to implement when addressing disparate impact and similar smaller sample problems. Some sophisticated methods may employ the use of Lorenz curves, the Gini coefficient and Tog coefficient .