How To Find Standard Deviation Of X

Missing alternative text

The median is known equally a measure out of location; that is, information technology tells usa where the data are. Equally stated in , we practice not demand to know all the exact values to calculate the median; if nosotros made the smallest value even smaller or the largest value even larger, it would not change the value of the median. Thus the median does non utilise all the information in the data and then it can be shown to be less efficient than the mean or average, which does use all values of the information. To calculate the mean we add up the observed values and dissever by the number of them. The total of the values obtained in Table ane.1 was 22.five

Missing alternative text

, which was divided by their number, fifteen, to give a mean of 1.five. This familiar procedure is
conveniently expressed past the following symbols:

Missing alternative text

(pronounced "x bar") signifies the hateful; x is each of the values of urinary lead; due north is the number of these values; and σ , the Greek capital sigma (our "S") denotes "sum of". A major disadvantage of the mean is that it is sensitive to outlying points. For example, replacing 2.2 past 22 in Table i.1 increases the mean to 2.82 , whereas the median volition exist unchanged.

As well every bit measures of location we need measures of how variable the data are. We met two of these measures, the range and interquartile range, in Affiliate 1.

The range is an important measurement, for figures at the peak and lesser of information technology denote the findings furthest removed from the generality. Notwithstanding, they do not give much indication of the spread of observations most the hateful. This is where the standard deviation (SD) comes in.

The theoretical basis of the standard departure is complex and demand not trouble the ordinary user. Nosotros will discuss sampling and populations in Chapter 3. A applied point to note here is that, when the population from which the data arise take a distribution that is approximately "Normal" (or Gaussian), so the standard departure provides a useful basis for interpreting the data in terms of probability.

The Normal distribution is represented by a family unit of curves divers uniquely by ii parameters, which are the mean and the standard difference of the population. The curves are always symmetrically bell shaped, but the extent to which the bell is compressed or flattened out depends on the standard deviation of the population. However, the mere fact that a curve is bell shaped does not mean that it represents a Normal distribution, considering other distributions may take a like sort of shape.

Many biological characteristics adapt to a Normal distribution closely enough for it to be commonly used – for example, heights of adult men and women, blood pressures in a healthy population, random errors in many types of laboratory measurements and biochemical data. Figure ii.1 shows a Normal bend calculated from the diastolic blood pressures of 500 men, mean 82 mmHg, standard divergence x mmHg. The ranges representing [+-1SD, +12SD, and +-3SD] about the mean are marked. A more than extensive set of values is given in Table A of the print edition.

Figure 2.1

Missing alternative text

The reason why the standard deviation is such a useful measure of the besprinkle of the observations is this: if the observations follow a Normal distribution, a range covered past one standard deviation in a higher place the hateful and one standard difference below it

Missing alternative text

includes about 68% of the observations; a range of two standard deviations above and two below (

) about 95% of the observations; and of three standard deviations higher up and three below (

) near 99.seven% of the observations. Consequently, if we know the mean and standard difference of a set of observations, we can obtain some useful information by uncomplicated arithmetic. By putting 1, 2, or iii standard deviations to a higher place and beneath the mean we can estimate the ranges that would exist expected to include about 68%, 95%, and 99.vii% of the observations.

Standard departure from ungrouped data

The standard deviation is a summary measure of the differences of each observation from the mean. If the differences themselves were added upward, the positive would exactly balance the negative and and so their sum would be zero. Consequently the squares of the differences are added. The sum of the squares is so divided by the number of observations minus oneto requite the mean of the squares, and the square root is taken to bring the measurements dorsum to the units nosotros started with. (The sectionalisation by the number of observations minus oneinstead of the number of observations itself to obtain the hateful foursquare is because "degrees of liberty" must be used. In these circumstances they are one less than the total. The theoretical justification for this need not problem the user in exercise.)

To gain an intuitive feel for degrees of freedom, consider choosing a chocolate from a box of due north chocolates. Every fourth dimension we come to choose a
chocolate we have a choice, until we come to the last i (normally one with a nut in it!), and then we have no choice. Thus nosotros have n-ane choices, or "degrees of freedom".

The calculation of the variance is illustrated in Table 2.i with the 15 readings in the preliminary study of urinary lead concentrations (Tabular array 1.2). The readings are set out in cavalcade (one). In column (ii) the divergence betwixt each reading and the mean is recorded. The sum of the differences is 0. In column (3) the differences are squared, and the sum of those squares is given at the bottom of the column.

Table 2.1

Missing alternative text

The sum of the squares of the differences (or deviations) from the mean, ix.96, is now divided by the total number of ascertainment minus one, to requite the variance.Thus,

Missing alternative text

In this example we find:

Missing alternative text

Finally, the square root of the variance provides the standard deviation:

Missing alternative text

from which we get

Missing alternative text

This procedure illustrates the structure of the standard deviation, in detail that the two farthermost values 0.1 and three.two contribute nigh to the sum of the differences squared.

Calculator process

Most inexpensive calculators take procedures that enable one to summate the hateful and standard deviations direct, using the "SD" mode. For example, on modernistic Casio calculators one presses SHIFT and '.' and a little "SD" symbol should appear on the display. On earlier Casios i presses INV and MODE , whereas on a Sharp 2nd F and Stat should exist used. The data are stored via the M+ push button. Thus, having set the computer into the "SD" or "Stat" way, from Tabular array 2.1 nosotros enter 0.1 M+ , 0.4 Thou+ , etc. When all the data are entered, we tin check that the correct number of observations have been included by Shift and north, and "xv" should exist displayed. The hateful is displayed past Shift and

Missing alternative text

and the standard deviation by Shift and

Missing alternative text

. Avoid pressing Shift and AC betwixt these operations as this clears the statistical retention. There is some other button on many calculators. This uses the divisor north rather than northward – 1 in the calculation of the standard difference. On a Sharp calculator

Missing alternative text

is denoted

Missing alternative text

, whereas

Missing alternative text

is denoted south. These are the "population" values, and are derived bold that an unabridged population is available or that interest focuses solely on the data in paw, and the results are not going to exist generalised (see Chapter
iii for details of samples and populations). Every bit this situation very rarely arises,

Missing alternative text

should be used and ignored, although fifty-fifty for moderate sample sizes the difference is going to be small. Think to render to normal mode before resuming calculations because many of the usual functions are not bachelor in "Stat" mode. On a modernistic Casio this is Shift 0. On earlier Casios and on Sharps one repeats the sequence that call up the "Stat" style. Some calculators stay in "Stat"
fashion even when switched off.Mullee (one) provides advice on choosing and using a reckoner. The calculator formulas use the relationship

Missing alternative text

The correct hand expression can be easily memorised by the expression hateful of the squares minus the mean square". The sample variance

Missing alternative text

is obtained from

Missing alternative text

The to a higher place equation can be seen to exist truthful in Table 2.1, where the sum of the square of the observations,

Missing alternative text

, is given every bit 43.7l.

We thus obtain

Missing alternative text

the same value given for the total in column (3). Intendance should exist taken considering this formula involves subtracting two large numbers to get a pocket-sized one, and can lead to incorrect results if the numbers are very big. For example, try finding the standard divergence of 100001, 100002, 100003 on a calculator. The right reply is 1, but many calculators will give 0 because of rounding error. The solution is to subtract a large number from each of the observations (say 100000) and calculate the standard difference on the remainders, namely 1, two and 3.

Standard departure from grouped data

Nosotros can also summate a standard deviation for discrete quantitative variables. For instance, in addition to studying the lead concentration in the urine of 140 children, the paediatrician asked how frequently each of them had been examined by a doctor during the yr. Afterward collecting the information he tabulated the data shown in Tabular array 2.2 columns (1) and (2). The hateful is calculated past multiplying column (1) past column (ii), adding the products, and dividing by the total number of observations. Table two.2

Missing alternative text

As we did for continuous information, to calculate the standard deviation we square each of the observations in plow. In this case the observation is the number of visits, but because we have several children in each class, shown in column (2), each squared number (column (4)), must be multiplied past the number of children. The sum of squares is given at the foot of cavalcade (five), namely 1697. We then use the calculator formula to detect the variance:

Missing alternative text

and

Missing alternative text

.Note that although the number of visits is not Unremarkably distributed, the distribution is reasonably symmetrical about the mean. The gauge 95% range is given by

Fig 2.19

This excludes ii children with no visits and
six children with half dozen or more visits. Thus in that location are eight of 140 = 5.vii% exterior the theoretical 95% range.Note that it is common for discrete quantitative variables to have what is known every bit skeweddistributions, that is they are not symmetrical. Ane inkling to lack of symmetry from derived statistics is when the mean and the median differ considerably. Some other is when the standard deviation is of the same guild of magnitude as the mean, just the observations must be non-negative. Sometimes a transformation will
convert a skewed distribution into a symmetrical one. When the data are counts, such as number of visits to a doctor, often the foursquare root transformation volition help, and if there are no zero or negative values a logarithmic transformation will render the distribution more symmetrical.

Information transformation

An anaesthetist measures the pain of a procedure using a 100 mm visual analogue scale on seven patients. The results are given in Tabular array 2.3, together with the log etransformation (the ln push button on a figurer). Tabular array 2.three

Missing alternative text

The data are plotted in Figure 2.2, which shows that the outlier does not appear so farthermost in the logged data. The mean and median are 10.29 and 2, respectively, for the original data, with a standard deviation of 20.22. Where the mean is bigger than the median, the distribution is positively skewed. For the logged information the mean and median are ane.24 and 1.10 respectively, indicating that the logged data accept a more symmetrical distribution. Thus it would exist better to analyse the logged transformed data
in statistical tests than using the original scale.Figure 2.2

Missing alternative text

In reporting these results, the median of the raw data would be given, but information technology should be explained that the statistical examination wascarried out on the transformed data. Annotation that the median of the logged information is the same as the log of the median of the raw data – notwithstanding, this is not true for the mean. The mean of the logged information is not necessarily equal to the log of the mean of the raw data.
The antilog (exp or

Missing alternative text

on a calculator) of the mean of the logged data is known as the geometric mean,and is ofttimes a
better summary statistic than the hateful for information from positively skewed distributions. For these data the geometric mean in 3.45 mm.

Between subjects and within subjects standard deviation

If repeated measurements are made of, say, blood pressure on an individual, these measurements are likely to vary. This is within subject field, or intrasubject, variability and we tin can summate a standard difference of these observations. If the observations are close together in time, this standard deviation is often described every bit the measurement mistake.Measurements made on different subjects vary according to between subject field, or intersubject, variability. If many observations were made on each individual, and the average taken, then nosotros can assume that the intrasubject variability has been averaged out and the variation in the average values is due solely to the intersubject variability. Unmarried observations on individuals clearly contain a mixture of intersubject and intrasubject variation. The coefficient of variation(CV%) is the intrasubject standard divergence divided past the hateful, expressed as a percentage. It is often quoted as a measure of repeatability for biochemical assays, when an analysis is carried out on several occasions on the same sample. It has the advantage of beingness independent of the units of measurement, but too numerous theoretical disadvantages. It is unremarkably nonsensical to employ the coefficient of variation as a measure of between bailiwick variability.

Common questions

When should I use the hateful and when should I use the median to describe my
data?

It is a commonly held misapprehension that for Normally distributed information one uses the mean, and for non-Normally distributed data i uses the median. Alas this is non so: if the data are Normally distributed the mean and the median volition be close; if the data are not Normally distributed then both the mean and the median may give useful data. Consider a variable that takes the value i for males and 0 for females. This is conspicuously not Normally distributed. All the same, the mean gives the proportion of males in the group, whereas the median simply tells united states which grouping independent more than than 50% of the people. Similarly, the mean from ordered chiselled variables can be more than useful than the median, if the ordered categories can be given meaningful scores. For case, a lecture might exist rated equally 1 (poor) to 5 (excellent). The usual statistic for summarising the result would exist the hateful. In the situation where in that location is a minor group at i extreme of a distribution (for example, annual income) then the median will exist more "representative" of the distribution. My information must accept values greater than zero and yet the mean and standard deviation are about the same size. How does this happen? If information take a very skewed distribution, then the standard deviation will be grossly inflated, and is not a good measure of variability to use. Every bit we take shown, occasionally a transformation of the information, such as a log transform, will render the distribution more symmetrical. Alternatively, quote the interquartile range.

References

1. Mullee K A. How to choose and apply a calculator. In: How to do information technology two.BMJ
Publishing Group, 1995:58-62.

Exercises

Practise two.one

In the campaign confronting smallpox a doctor inquired into the number of times 150 people aged xvi and over in an Ethiopian village had been vaccinated. He obtained the following figures: never, 12 people; once, 24; twice, 42; three times, 38; four times, 30; five times, four. What is the hateful number of times those people had been vaccinated and what is the standard deviation?Answer