DataTypes
Contents
Types of data
Broadly speaking, by ‘data’ we mean numerical values associated with some variable of interest. However, we must not be overly complacent about such a broad definition; we must be aware of different types of data that may need special treatment when it comes to statistical analysis. For this reason it is important to be able to distinguish a few key features. A (random) variable can produce data that are either of continuous or discrete nature (see below for examples). Another level at which variables differ is whether they are sampled in time or in a cross-section.
Discrete data
The variable, [math]X,[/math] is said to be discrete if it can only ever yield isolated values some of which (if not all) are often repeated in the sample. It is, however, important to note that there are different types of discrete data:
- ORDINAL. Here the categories have a natural ordering.
Examples: Football Leagues: Premier League, Championship, etc. - NOMINAL. Here there is no natural ordering to the categories.
Examples: Gender: Male, Female - COUNT. A variable that represents the counts of certain events.
Examples: Number of children in household: 0,1,2,3,etc.
Continuous data
The variable, [math]Y,[/math] is said to be continuous if it can assume any value taken (more or less) from a continuum (a continuum is an interval, or range of numbers). A nice way to distinguish between a discrete and continuous variable is to consider the possibility of listing possible values. It is theoretically impossible even to begin listing all possible values that a continuous variable, [math]Y,[/math] could assume. However, this is not so with a discrete variable; you may not always be able to finish the list, but at least you can make a start.
For example, the birth-weight of babies is an example of a continuous variable. There is no reason why a baby should not have a birth weight of [math]2500.0234[/math] grams, even though it wouldn’t be measured as such! Try to list all possible weights (in theory) bearing in mind that for any two weights that you write down, there will always be another possibility half way between. We see, then, that for a continuous variable an observation is recorded, as the result of applying some measurement, but that this inevitably gives rise to a rounding (up or down) of the actual value. (No such rounding occurs when recording observations on a discrete variable.)
A variable can be continuous even though it is defined on a limited scale. For instance the weight variable has a limited scale as weights cannot be negative.
Finally, note that for a continuous variable, it is unlikely that values will be repeated frequently in the sample, unless rounding occurs.
Other examples of continuous data include: heights of people; volume of water in a reservoir; and, to a workable approximation, Government Expenditure. One could argue that the last of these is discrete (due to the finite divisibility of monetary units). However, when the amounts involved are of the order of millions of pounds, changes at the level of individual pence are hardly discernible and so it is sensible to treat the variable as continuous.
| class="wikitable" |- ! scope="col"| Additional resources ! scope="col"| Video ! scope="col"| Text |- | | variablesprobdist/v/discrete-and-continuous-random-variables:Discrete and Continuous Variables | |
Cross-section data
Cross-section data comprises observations on a particular variable taken at a single point in time. For example: annual crime figures recorded by Police regions for the year 1999; the birth-weight of babies born, in a particular maternity unit, during the month of April 1998; initial salaries of graduates from the University of Manchester, 2000. Note, the defining feature is that there is no natural ordering in the data.
Time-series data
On the other hand, time-series data are observations on a particular variable recorded over a period of time, at regular intervals. For example; personal crime figures for Greater Manchester recorded annually over 1980-99; monthly household expenditure on food; the daily closing price of a certain stock. In this case, the data does have a natural ordering since they are measured from one time period to the next.