Diseases, endocrinologists. MRI
Site search

Variance of properties. Absolute variations

Dispersion in statistics is found as the individual values ​​of the characteristic squared from . Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. (for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic;
X min – minimum value of the grouping characteristic;
n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X'i is the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The dispersion formula can be transformed as follows:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval;
A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency;
m1 is the square of the first order moment;
m2 - moment of second order

(if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we obtain:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average;
ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of the within-group variances reflects random, i.e., that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-sign that forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula:

The rule for adding variance in statistics

According to rule of adding variances the total variance is equal to the sum of the average of the within-group and between-group variances:

The meaning of this rule is that the total variance that arises under the influence of all factors is equal to the sum of the variances that arise under the influence of all other factors and the variance that arises due to the grouping factor.

Using the formula for adding variances, you can determine the third unknown variance from two known variances, and also judge the strength of the influence of the grouping characteristic.

Dispersion properties

1. If all values ​​of a characteristic are reduced (increased) by the same constant amount, then the dispersion will not change.
2. If all values ​​of a characteristic are reduced (increased) by the same number of times n, then the variance will correspondingly decrease (increase) by n^2 times.

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. Determination of group, group average, intergroup and total variance

Example 2. Finding the variance and coefficient of variation in a grouping table

Example 3. Finding variance in a discrete series

Example 4. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic;
X min – minimum value of the grouping characteristic;
n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X"i – the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The formula can be transformed like this:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval;
A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency;
m1 is the square of the first order moment;
m2 - moment of second order

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we obtain:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

Within-group variance characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.



Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average;
ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

Dispersionrandom variable- measure of the spread of a given random variable, that is, her deviations from mathematical expectation. In statistics, the notation (sigma squared) is often used to denote dispersion. The square root of the variance equal to is called standard deviation or standard spread. The standard deviation is measured in the same units as the random variable itself, and the variance is measured in the squares of that unit.

Although it is very convenient to use only one value (such as the mean or mode and median) to estimate the entire sample, this approach can easily lead to incorrect conclusions. The reason for this situation lies not in the value itself, but in the fact that one value does not in any way reflect the spread of data values.

For example, in the sample:

the average value is 5.

However, in the sample itself there is not a single element with a value of 5. You may need to know the degree of closeness of each element in the sample to its mean value. Or in other words, you will need to know the variance of the values. Knowing the degree of change in the data, you can better interpret average value, median And fashion. The degree to which sample values ​​change is determined by calculating their variance and standard deviation.



The variance and the square root of the variance, called the standard deviation, characterize the average deviation from the sample mean. Among these two quantities, the most important is standard deviation. This value can be thought of as the average distance that elements are from the middle element of the sample.

Variance is difficult to interpret meaningfully. However, the square root of this value is the standard deviation and can be easily interpreted.

Standard deviation is calculated by first determining the variance and then taking the square root of the variance.

For example, for the data array shown in the figure, the following values ​​will be obtained:

Picture 1

Here the average value of the squared differences is 717.43. To get the standard deviation, all that remains is to take the square root of this number.

The result will be approximately 26.78.

Remember that standard deviation is interpreted as the average distance that items are from the sample mean.

The standard deviation measures how well the mean describes the entire sample.

Let's say you are the head of a PC assembly production department. The quarterly report states that production for the last quarter was 2,500 PCs. Is this good or bad? You asked (or there is already this column in the report) to display the standard deviation for this data in the report. The standard deviation figure, for example, is 2000. It becomes clear to you, as the head of the department, that the production line requires better management (too large deviations in the number of PCs assembled).

Recall that when the standard deviation is large, the data are widely scattered around the mean, and when the standard deviation is small, they cluster close to the mean.

The four statistical functions VAR(), VAR(), STDEV() and STDEV() are designed to calculate the variance and standard deviation of numbers in a range of cells. Before you can calculate the variance and standard deviation of a set of data, you need to determine whether the data represents a population or a sample of a population. In the case of a sample from a general population, you should use the functions VAR() and STDEV(), and in the case of a general population, the functions VAR() and STDEV():

Population Function

DISPR()

STANDOTLONP()
Sample

DISP()

STDEV()

Dispersion (as well as standard deviation), as we noted, indicates the extent to which the values ​​included in the data set are scattered around the arithmetic mean.

A small value of variance or standard deviation indicates that all data is concentrated around the arithmetic mean, and a large value of these values ​​indicates that the data is scattered over a wide range of values.

Dispersion is quite difficult to interpret meaningfully (what does a small value mean, a large value?). Performance Tasks 3 will allow you to visually, on a graph, show the meaning of the variance for a data set.

Tasks

· Exercise 1.

· 2.1. Give the concepts: dispersion and standard deviation; their symbolic designation for statistical data processing.

· 2.2. Complete the worksheet in accordance with Figure 1 and make the necessary calculations.

· 2.3. Give the basic formulas used in calculations

· 2.4. Explain all designations ( , , )

· 2.5. Explain the practical meaning of the concepts of dispersion and standard deviation.

Task 2.

1.1. Give the concepts: general population and sample; mathematical expectation and their arithmetic mean symbolic designation for statistical data processing.

1.2. In accordance with Figure 2, prepare a worksheet and make calculations.

1.3. Provide the basic formulas used in the calculations (for the general population and sample).

Figure 2

1.4. Explain why it is possible to obtain such arithmetic mean values ​​in samples as 46.43 and 48.78 (see file Appendix). Draw conclusions.

Task 3.

There are two samples with different sets of data, but the average for them will be the same:

Figure 3

3.1. Complete the worksheet in accordance with Figure 3 and make the necessary calculations.

3.2. Give the basic calculation formulas.

3.3. Construct graphs in accordance with Figures 4, 5.

3.4. Explain the obtained dependencies.

3.5. Carry out similar calculations for the data of two samples.

Original sample 11119999

Select the values ​​of the second sample so that the arithmetic mean for the second sample is the same, for example:

Select the values ​​for the second sample yourself. Arrange calculations and graphs similar to Figures 3, 4, 5. Show the basic formulas that were used in the calculations.

Draw appropriate conclusions.

Prepare all tasks in the form of a report with all the necessary pictures, graphs, formulas and brief explanations.

Note: the construction of graphs must be explained with drawings and brief explanations.

Variance is a measure of dispersion that describes the comparative deviation between data values ​​and the mean. It is the most used measure of dispersion in statistics, calculated by summing and squaring the deviation of each data value from the mean. The formula for calculating variance is given below:

s 2 – sample variance;

x av—sample mean;

n sample size (number of data values),

(x i – x avg) is the deviation from the average value for each value of the data set.

To better understand the formula, let's look at an example. I don’t really like cooking, so I rarely do it. However, in order not to starve, from time to time I have to go to the stove to implement the plan of saturating my body with proteins, fats and carbohydrates. The data set below shows how many times Renat cooks every month:

The first step in calculating variance is to determine the sample mean, which in our example is 7.8 times per month. The rest of the calculations can be made easier using the following table.

The final phase of calculating variance looks like this:

For those who like to do all the calculations in one go, the equation would look like this:

Using the raw count method (cooking example)

There is a more efficient way to calculate variance, known as the raw count method. Although the equation may seem quite cumbersome at first glance, it is actually not that scary. You can make sure of this, and then decide which method you like best.

is the sum of each data value after squaring,

is the square of the sum of all data values.

Don't lose your mind right now. Let's put this all into a table and you will see that there are fewer calculations here than in the previous example.

As you can see, the result was the same as when using the previous method. The advantages of this method become apparent as the sample size (n) increases.

Variance calculation in Excel

As you probably already guessed, Excel has a formula that allows you to calculate variance. Moreover, starting with Excel 2010, you can find 4 types of variance formula:

1) VARIANCE.V – Returns the variance of the sample. Boolean values ​​and text are ignored.

2) DISP.G - Returns the variance of the population. Boolean values ​​and text are ignored.

3) VARIANCE - Returns the variance of the sample, taking into account Boolean and text values.

4) VARIANCE - Returns the variance of the population, taking into account logical and text values.

First, let's understand the difference between a sample and a population. The purpose of descriptive statistics is to summarize or display data so that you quickly get the big picture, an overview so to speak. Statistical inference allows you to make inferences about a population based on a sample of data from that population. The population represents all possible outcomes or measurements that are of interest to us. A sample is a subset of a population.

For example, we are interested in a group of students from one of the Russian universities and we need to determine the average score of the group. We can calculate the average performance of students, and then the resulting figure will be a parameter, since the whole population will be involved in our calculations. However, if we want to calculate the GPA of all students in our country, then this group will be our sample.

The difference in the formula for calculating variance between a sample and a population is the denominator. Where for the sample it will be equal to (n-1), and for the general population only n.

Now let's look at the functions for calculating variance with endings A, the description of which states that text and logical values ​​are taken into account in the calculation. In this case, when calculating the variance of a particular data set where non-numeric values ​​occur, Excel will interpret text and false Boolean values ​​as equal to 0, and true Boolean values ​​as equal to 1.

So, if you have a data array, calculating its variance will not be difficult using one of the Excel functions listed above.

Among the many indicators that are used in statistics, it is necessary to highlight the calculation of variance. It should be noted that performing this calculation manually is a rather tedious task. Fortunately, Excel has functions that allow you to automate the calculation procedure. Let's find out the algorithm for working with these tools.

Dispersion is an indicator of variation, which is the average square of deviations from the mathematical expectation. Thus, it expresses the spread of numbers around the average value. Calculation of variance can be carried out both for the general population and for the sample.

Method 1: calculation based on the population

To calculate this indicator in Excel for the general population, use the function DISP.G. The syntax of this expression is as follows:

DISP.G(Number1;Number2;…)

In total, from 1 to 255 arguments can be used. The arguments can be either numeric values ​​or references to the cells in which they are contained.

Let's see how to calculate this value for a range with numeric data.


Method 2: calculation by sample

Unlike calculating a value based on a population, in calculating a sample, the denominator does not indicate the total number of numbers, but one less. This is done for the purpose of error correction. Excel takes this nuance into account in a special function that is designed for this type of calculation - DISP.V. Its syntax is represented by the following formula:

DISP.B(Number1;Number2;…)

The number of arguments, as in the previous function, can also range from 1 to 255.


As you can see, the Excel program can greatly facilitate the calculation of variance. This statistic can be calculated by the application, either from the population or from the sample. In this case, all user actions actually come down to specifying the range of numbers to be processed, and Excel does the main work itself. Of course, this will save a significant amount of user time.