Diseases, endocrinologists. MRI
Site search

What is the variance of a random variable? An example of finding variance

Types of dispersions:

Total variance characterizes the variation of a characteristic of the entire population under the influence of all those factors that caused this variation. This value is determined by the formula

where is the overall arithmetic mean of the entire population under study.

Average within-group variance indicates a random variation that may arise under the influence of any unaccounted factors and which does not depend on the factor-attribute that forms the basis of the grouping. This variance is calculated as follows: first, the variances for individual groups are calculated (), then the average within-group variance is calculated:

where n i is the number of units in the group

Intergroup variance(variance of group means) characterizes systematic variation, i.e. differences in the value of the studied characteristic that arise under the influence of the factor-sign, which forms the basis of the grouping.

where is the average value for a separate group.

All three types of variance are related to each other: the total variance is equal to the sum of the average within-group variance and between-group variance:

Properties:

25 Relative measures of variation

Oscillation coefficient

Relative linear deviation

The coefficient of variation

Coef. Osc. O reflects the relative fluctuation of extreme values ​​of a characteristic around the average. Rel. lin. off. characterizes the proportion of the average value of the sign of absolute deviations from the average value. Coef. Variation is the most common measure of variability used to assess the typicality of averages.

In statistics, populations with a coefficient of variation greater than 30–35% are considered heterogeneous.

    Regularity of distribution series. Moments of distribution. Distribution shape indicators

In variation series there is a connection between the frequencies and the values ​​of the varying characteristic: with an increase in the characteristic, the frequency value first increases to a certain limit and then decreases. Such changes are called distribution patterns.

The shape of the distribution is studied using skewness and kurtosis indicators. When calculating these indicators, distribution moments are used.

The kth order moment is the average of the kth degrees of deviation of variant values ​​of a characteristic from some constant value. The order of the moment is determined by the value of k. When analyzing variation series, one is limited to calculating the moments of the first four orders. When calculating moments, frequencies or frequencies can be used as weights. Depending on the choice of constant value, initial, conditional and central moments are distinguished.

Distribution form indicators:

Asymmetry(As) indicator characterizing the degree of distribution asymmetry .

Therefore, with (left-sided) negative asymmetry . With (right-sided) positive asymmetry .

Central moments can be used to calculate asymmetry. Then:

,

where μ 3 – central moment of the third order.

- kurtosis (E To ) characterizes the steepness of the function graph in comparison with the normal distribution at the same strength of variation:

,

where μ 4 is the central moment of the 4th order.

    Normal distribution law

For a normal distribution (Gaussian distribution), the distribution function has the following form:

Expectation - standard deviation

The normal distribution is symmetrical and is characterized by the following relationship: Xav=Me=Mo

The kurtosis of a normal distribution is 3, and the skewness coefficient is 0.

The normal distribution curve is a polygon (symmetrical bell-shaped straight line)

    Types of dispersions. The rule for adding variances. The essence of the empirical coefficient of determination.

If the original population is divided into groups according to some significant characteristic, then the following types of variances are calculated:

    Total variance of the original population:

where is the overall average value of the original population; f is the frequency of the original population. Total dispersion characterizes the deviation of individual values ​​of a characteristic from the overall average value of the original population.

    Within-group variances:

where j is the number of the group; is the average value in each j-th group; is the frequency of the j-th group. Within-group variances characterize the deviation of the individual value of a trait in each group from the group average value. From all within-group variances, the average is calculated using the formula:, where is the number of units in each j-th group.

    Intergroup variance:

Intergroup dispersion characterizes the deviation of group averages from the overall average of the original population.

Variance addition rule is that the total variance of the original population should be equal to the sum of the between-group and average of the within-group variances:

Empirical coefficient of determination shows the proportion of variation in the studied characteristic due to variation in the grouping characteristic and is calculated using the formula:

    Method of counting from a conditional zero (method of moments) for calculating the average value and variance

The calculation of dispersion by the method of moments is based on the use of the formula and 3 and 4 properties of dispersion.

(3. If all the values ​​of the attribute (options) are increased (decreased) by some constant number A, then the variance of the new population will not change.

4. If all values ​​of the attribute (options) are increased (multiplied) by K times, where K is a constant number, then the variance of the new population will increase (decreased) by K 2 times.)

We obtain a formula for calculating dispersion in variation series with equal intervals using the method of moments:

A - conditional zero, equal to the option with the maximum frequency (the middle of the interval with the maximum frequency)

The calculation of the average value by the method of moments is also based on the use of the properties of the average.

    The concept of selective observation. Stages of studying economic phenomena using a sampling method

A sample observation is an observation in which not all units of the original population are examined and studied, but only a part of the units, and the result of the examination of a part of the population applies to the entire original population. The population from which units are selected for further examination and study is called general and all indicators characterizing this totality are called general.

Possible limits of deviations of the sample average value from the general average value are called sampling error.

The set of selected units is called selective and all indicators characterizing this totality are called selective.

Sample research includes the following stages:

Characteristics of the object of study (mass economic phenomena). If the population is small, then sampling is not recommended; a comprehensive study is necessary;

Sample size calculation. It is important to determine the optimal volume that will allow the sampling error to be within the acceptable range at the lowest cost;

Selection of observation units taking into account the requirements of randomness and proportionality.

Evidence of representativeness based on an estimate of sampling error. For a random sample, the error is calculated using formulas. For the target sample, representativeness is assessed using qualitative methods (comparison, experiment);

Analysis of the sample population. If the generated sample meets the requirements of representativeness, then it is analyzed using analytical indicators (average, relative, etc.)

If the population is divided into groups according to the characteristic being studied, then the following types of variance can be calculated for this population: total, group (within-group), average of group (average of within-group), intergroup.

Initially, it calculates the coefficient of determination, which shows what part of the total variation of the trait being studied is intergroup variation, i.e. due to the grouping characteristic:

The empirical correlation relationship characterizes the closeness of the connection between grouping (factorial) and performance characteristics.

The empirical correlation ratio can take values ​​from 0 to 1.

To assess the closeness of the connection based on the empirical correlation ratio, you can use the Chaddock relations:

Example 4. The following data is available on the performance of work by design and survey organizations of various forms of ownership:

Define:

1) total variance;

2) group variances;

3) the average of the group variances;

4) intergroup variance;

5) total variance based on the rule for adding variances;


6) coefficient of determination and empirical correlation ratio.

Draw conclusions.

Solution:

1. Let us determine the average volume of work performed by enterprises of two forms of ownership:

Let's calculate the total variance:

2. Determine group averages:

million rubles;

million rubles

Group variances:

;

3. Calculate the average of the group variances:

4. Let's determine the intergroup variance:

5. Calculate the total variance based on the rule for adding variances:

6. Let's determine the coefficient of determination:

.

Thus, the volume of work performed by design and survey organizations depends by 22% on the form of ownership of enterprises.

The empirical correlation ratio is calculated using the formula

.

The value of the calculated indicator indicates that the dependence of the volume of work on the form of ownership of the enterprise is small.

Example 5. As a result of a survey of the technological discipline of production areas, the following data were obtained:

Determine the coefficient of determination

Along with studying the variation of a characteristic throughout the entire population as a whole, it is often necessary to trace quantitative changes in the characteristic across the groups into which the population is divided, as well as between groups. This study of variation is achieved by calculating and analyzing different types of variance.
There are total, intergroup and intragroup variances.
Total variance σ 2 measures the variation of a trait throughout the entire population under the influence of all factors that caused this variation.

Intergroup variance (δ) characterizes systematic variation, i.e. differences in the value of the studied trait that arise under the influence of the factor trait that forms the basis of the group. It is calculated using the formula:
.

Within-group variance (σ) reflects random variation, i.e. part of the variation that occurs under the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. It is calculated by the formula:
.

Average of within-group variances: .

There is a law connecting 3 types of dispersion. The total variance is equal to the sum of the average of the within-group and between-group variance: .
This ratio is called rule for adding variances.

A widely used indicator in analysis is the proportion of between-group variance in the total variance. It's called empirical coefficient of determination (η 2): .
The square root of the empirical coefficient of determination is called empirical correlation ratio (η):
.
It characterizes the influence of the characteristic that forms the basis of the group on the variation of the resulting characteristic. The empirical correlation ratio ranges from 0 to 1.
Let us demonstrate its practical use using the following example (Table 1).

Example No. 1. Table 1 - Labor productivity of two groups of workers in one of the workshops of NPO "Cyclone"

Let's calculate the overall and group means and variances:




The initial data for calculating the average of intragroup and intergroup variance are presented in table. 2.
table 2
Calculation and δ 2 for two groups of workers.


Worker groups
Number of workers, people Average, children/shift Dispersion

Completed technical training

5 95 42,0

Those who have not completed technical training

5 81 231,2

All workers

10 88 185,6
Let's calculate the indicators. Average of within-group variances:
.
Intergroup variance

Total variance:
Thus, the empirical correlation ratio: .

Along with variation in quantitative characteristics, variation in qualitative characteristics can also be observed. This study of variation is achieved by calculating the following types of variances:

The within-group dispersion of the share is determined by the formula

Where n i– number of units in separate groups.
The share of the studied characteristic in the entire population, which is determined by the formula:
The three types of variance are related to each other as follows:
.

This relation of variances is called the theorem of addition of variances of the trait share.

However, this characteristic alone is not enough to study a random variable. Let's imagine two shooters shooting at a target. One shoots accurately and hits close to the center, while the other... is just having fun and doesn’t even aim. But what's funny is that he average the result will be exactly the same as the first shooter! This situation is conventionally illustrated by the following random variables:

The “sniper” mathematical expectation is equal to , however, for the “interesting person”: – it is also zero!

Thus, there is a need to quantify how far scattered bullets (random variable values) relative to the center of the target (mathematical expectation). well and scattering translated from Latin is no other way than dispersion .

Let's see how this numerical characteristic is determined using one of the examples from the 1st part of the lesson:

There we found a disappointing mathematical expectation of this game, and now we have to calculate its variance, which denoted by through .

Let's find out how far the wins/losses are “scattered” relative to the average value. Obviously, for this we need to calculate differences between random variable values and her mathematical expectation:

–5 – (–0,5) = –4,5
2,5 – (–0,5) = 3
10 – (–0,5) = 10,5

Now it seems that you need to sum up the results, but this way is not suitable - for the reason that fluctuations to the left will cancel each other out with fluctuations to the right. So, for example, an “amateur” shooter (example above) the differences will be , and when added they will give zero, so we will not get any estimate of the dispersion of his shooting.

To get around this problem you can consider modules differences, but for technical reasons the approach has taken root when they are squared. It is more convenient to formulate the solution in a table:

And here it begs to calculate weighted average the value of the squared deviations. What is it? It's theirs expected value, which is a measure of scattering:

definition variances. From the definition it is immediately clear that variance cannot be negative– take note for practice!

Let's remember how to find the expected value. Multiply the squared differences by the corresponding probabilities (Table continuation):
– figuratively speaking, this is “traction force”,
and summarize the results:

Don't you think that compared to the winnings, the result turned out to be too big? That's right - we squared it, and to return to the dimension of our game, we need to extract the square root. This quantity is called standard deviation and is denoted by the Greek letter “sigma”:

This value is sometimes called standard deviation .

What is its meaning? If we deviate from the mathematical expectation to the left and right by the standard deviation:

– then the most probable values ​​of the random variable will be “concentrated” on this interval. What we actually observe:

However, it so happens that when analyzing scattering one almost always operates with the concept of dispersion. Let's figure out what it means in relation to games. If in the case of arrows we are talking about the “accuracy” of hits relative to the center of the target, then here dispersion characterizes two things:

Firstly, it is obvious that as the bets increase, the dispersion also increases. So, for example, if we increase by 10 times, then the mathematical expectation will increase by 10 times, and the variance will increase by 100 times (since this is a quadratic quantity). But note that the rules of the game themselves have not changed! Only the rates have changed, roughly speaking, before we bet 10 rubles, now it’s 100.

The second, more interesting point is that variance characterizes the style of play. Mentally fix the game bets at some certain level, and let's see what's what:

A low variance game is a cautious game. The player tends to choose the most reliable schemes, where he does not lose/win too much at one time. For example, the red/black system in roulette (see example 4 of the article Random variables) .

High variance game. She is often called dispersive game. This is an adventurous or aggressive style of play, where the player chooses “adrenaline” schemes. Let's at least remember "Martingale", in which the amounts at stake are orders of magnitude greater than the “quiet” game of the previous point.

The situation in poker is indicative: there are so-called tight players who tend to be cautious and “shaky” over their gaming funds (bankroll). Not surprisingly, their bankroll does not fluctuate significantly (low variance). On the contrary, if a player has high variance, then he is an aggressor. He often takes risks, makes large bets and can either break a huge bank or lose to smithereens.

The same thing happens in Forex, and so on - there are plenty of examples.

Moreover, in all cases it does not matter whether the game is played for pennies or thousands of dollars. Every level has its low- and high-dispersion players. Well, as we remember, the average winning is “responsible” expected value.

You probably noticed that finding variance is a long and painstaking process. But mathematics is generous:

Formula for finding variance

This formula is derived directly from the definition of variance, and we immediately put it into use. I’ll copy the sign with our game above:

and the found mathematical expectation.

Let's calculate the variance in the second way. First, let's find the mathematical expectation - the square of the random variable. By determination of mathematical expectation:

In this case:

Thus, according to the formula:

As they say, feel the difference. And in practice, of course, it is better to use the formula (unless the condition requires otherwise).

We master the technique of solving and designing:

Example 6

Find its mathematical expectation, variance and standard deviation.

This task is found everywhere, and, as a rule, goes without meaningful meaning.
You can imagine several light bulbs with numbers that light up in a madhouse with certain probabilities :)

Solution: It is convenient to summarize the basic calculations in a table. First, we write the initial data in the top two lines. Then we calculate the products, then and finally the sums in the right column:

Actually, almost everything is ready. The third line shows a ready-made mathematical expectation: .

We calculate the variance using the formula:

And finally, the standard deviation:
– Personally, I usually round to 2 decimal places.

All calculations can be carried out on a calculator, or even better - in Excel:

It's hard to go wrong here :)

Answer:

Those who wish can simplify their life even more and take advantage of my calculator (demo), which will not only instantly solve this problem, but also build thematic graphics (we'll get there soon). The program can be download from the library– if you have downloaded at least one educational material, or receive another way. Thanks for supporting the project!

A couple of tasks to solve on your own:

Example 7

Calculate the variance of the random variable in the previous example by definition.

And a similar example:

Example 8

A discrete random variable is specified by its distribution law:

Yes, random variable values ​​can be quite large (example from real work), and here, if possible, use Excel. As, by the way, in Example 7 - it’s faster, more reliable and more enjoyable.

Solutions and answers at the bottom of the page.

To conclude the 2nd part of the lesson, we will look at another typical problem, one might even say a small puzzle:

Example 9

A discrete random variable can take only two values: and , and . The probability, mathematical expectation and variance are known.

Solution: Let's start with an unknown probability. Since a random variable can take only two values, the sum of the probabilities of the corresponding events is:

and since , then .

All that remains is to find..., it's easy to say :) But oh well, here we go. By definition of mathematical expectation:
– substitute known quantities:

– and nothing more can be squeezed out of this equation, except that you can rewrite it in the usual direction:

or:

I think you can guess the next steps. Let's compose and solve the system:

Decimals are, of course, a complete disgrace; multiply both equations by 10:

and divide by 2:

That's better. From the 1st equation we express:
(this is the easier way)– substitute into the 2nd equation:


We are building squared and make simplifications:

Multiply by:

The result was quadratic equation, we find its discriminant:
- Great!

and we get two solutions:

1) if , That ;

2) if , That .

The condition is satisfied by the first pair of values. With a high probability everything is correct, but, nevertheless, let’s write down the distribution law:

and perform a check, namely, find the expectation:

The main generalizing indicators of variation in statistics are dispersions and standard deviations.

Dispersion this arithmetic mean squared deviations of each characteristic value from the overall average. The variance is usually called the mean square of deviations and is denoted by  2. Depending on the source data, the variance can be calculated using the simple or weighted arithmetic mean:

 unweighted (simple) variance;

 variance weighted.

Standard deviation this is a generalizing characteristic of absolute sizes variations signs in the aggregate. It is expressed in the same units of measurement as the attribute (in meters, tons, percentage, hectares, etc.).

The standard deviation is the square root of the variance and is denoted by :

 standard deviation unweighted;

 weighted standard deviation.

The standard deviation is a measure of the reliability of the mean. The smaller the standard deviation, the better the arithmetic mean reflects the entire represented population.

The calculation of the standard deviation is preceded by the calculation of the variance.

The procedure for calculating the weighted variance is as follows:

1) determine the weighted arithmetic mean:

2) calculate the deviations of the options from the average:

3) square the deviation of each option from the average:

4) multiply the squares of deviations by weights (frequencies):

5) summarize the resulting products:

6) the resulting amount is divided by the sum of the weights:

Example 2.1

Let's calculate the weighted arithmetic mean:

The values ​​of deviations from the mean and their squares are presented in the table. Let's define the variance:

The standard deviation will be equal to:

If the source data is presented in the form of interval distribution series , then you first need to determine the discrete value of the attribute, and then apply the described method.

Example 2.2

Let us show the calculation of variance for an interval series using data on the distribution of the sown area of ​​a collective farm according to wheat yield.

The arithmetic mean is:

Let's calculate the variance:

6.3. Calculation of variance using a formula based on individual data

Calculation technique variances complex, and with large values ​​of options and frequencies it can be cumbersome. Calculations can be simplified using the properties of dispersion.

The dispersion has the following properties.

1. Reducing or increasing the weights (frequencies) of a varying characteristic by a certain number of times does not change the dispersion.

2. Decrease or increase each value of a characteristic by the same constant amount A does not change the dispersion.

3. Decrease or increase each value of a characteristic by a certain number of times k respectively reduces or increases the variance in k 2 times standard deviation  in k once.

4. The dispersion of a characteristic relative to an arbitrary value is always greater than the dispersion relative to the arithmetic mean per square of the difference between the average and arbitrary values:

If A 0, then we arrive at the following equality:

that is, the variance of the characteristic is equal to the difference between the mean square of the characteristic values ​​and the square of the mean.

Each property can be used independently or in combination with others when calculating variance.

The procedure for calculating variance is simple:

1) determine arithmetic mean :

2) square the arithmetic mean:

3) square the deviation of each variant of the series:

X i 2 .

4) find the sum of squares of the options:

5) divide the sum of the squares of the options by their number, i.e. determine the average square:

6) determine the difference between the mean square of the characteristic and the square of the mean:

Example 3.1 The following data is available on worker productivity:

Let's make the following calculations: