Session 7, GroupA1, Akanksha Maheshwari (13PGP007)
Data is the actual source of information that is collected through study, experiment or survey.
While working with statistics it is very important to recognize the different types of data:
- Numerical (discrete and continuous)
For example, if I ask five of students of my class “how many pets you own?” They might give me the following data: 0, 2, 1, 4, and 18. (The fifth one might count each of her aquarium fish as a separate pet). Not all data are numbers; let’s say I also record the gender of each of them, getting the following data: female, male, female, male, female.
So most of the data fall into one of two groups either numerical or categorical.
Numerical data: It is a values that represent quantities. These data have meaning as a
- Measurement – student’s height, weight
- Count- number of stock shares a student owns, number of pens in a box or number of chocolates can be eaten at one go.
Numerical data can be further broken into two types: discrete and continuous.
- 1. Discrete data: represent values arising from a counting process. They take on possible values that can be listed out. For example let me ask “The number of magazine you are subscribed to “is an example of a discrete numerical variable because response is one of a finite number of integers. One can subscribe to zero, one, ten and so on magazines.
- 2. Continuous data: represent measurements. Their possible values cannot be counted but can be described using intervals on the real number line. For example, the time you wait in a queue at our mess, your waiting time could be 5 minute, 5. 1 minutes, 5.13 minutes or 5.113 minutes, depending upon the precision of the measuring device. Another example could be that the lifetime of a battery your mobile phone can be anywhere from 0 hours to an infinite number of hours, if it lasts forever.
Categorical data (also known as qualitative data): represent values that can be placed into categories such as “yes” or “no”. “Do you currently own a bike?” Or that represents characteristics such as a student’s gender, marital status, hometown, or the types of movies they like. Or any data that can take on numerical values such as “1” indicating male and “2” indicating female. But these numbers do not have any mathematical meaning. You cannot add them together.
Measurement of DATA
There are four measurement scales used in statistical analysis of data.
Nominal data (also called categorical data): Nominal come from word “name”. Numbers are used to label an item or characteristic. For example I may designate subject specialization in our institution by numbers such as Finance = 101, Marketing = 102, HR = 103. It cannot be manipulated in a numerical fashion. Only counting the frequencies is possible.
Ordinal or Rank data: It mixes numerical and categorical data. Numbers are used to rank objects or attributes. The data fall into categories, but the numbers placed on the categories have meaning. The numbers do have mathematical meaning but arithmetic operations are not possible. For example ranking us on basis of CGPA on a scale of 1 to 4, considering we are only four in our class gives ordinal data. Student 1 ranked as 3, Student 2 ranked as 1, Student 3 ranked as 4 and Student 4 ranked as 2.
Interval data: Data with ordinal properties along with distance between objects is an interval measurement. It is superior to ordinal data. For example, we all drinking cold drink in cafeteria are concerned with temperature which is an interval measurement, where the difference between two values is meaningful. Interval measurement have arbitrary zero point. Basic arithmetic operations are possible with interval data.
Ratio data: Highest level of measurement that allows you to perform all basic arithmetic operations. Data measured on a ratio scale have a fixed zero point. When the variable equals 0.0, it means there is none of that variable. Variables like our height or our weight are ratio variables.