Selective observation in statistics. Marginal sampling error 17 how the relative marginal sampling error is determined

The main advantage of sampling, among others, is the ability to calculate random sampling error.

Sampling errors are either systematic or random.

Systematic- in the event that the basic principle of sampling - randomness - is violated. Random- usually arise due to the fact that the structure of the sample always differs from the structure of the general population, no matter how correctly the selection is made, that is, despite the principle of random selection of units of the population, there are still discrepancies between the characteristics of the sample and the general population. The study and measurement of random errors of representativeness is the main task of the sampling method.

As a rule, the error of the mean and the error of the proportion are most often calculated. The following conventions are used in calculations:

Average calculated within the general population;

The average calculated within the sample population;

R- the share of this group in the general population;

w- the share of this group in the sample population.

Using conventions, the sampling errors for the mean and for the fraction can be written as follows:

The sample mean and sample share are random variables that can take on any values ​​depending on which units of the population are included in the sample. Therefore, sampling errors are also random variables and can take on different values. Therefore, the average of the possible errors μ .

Unlike systematic, random error can be determined in advance, before sampling, according to the limit theorems considered in mathematical statistics.

The average error is determined with a probability of 0.683. In the case of a different probability, one speaks of a marginal error.

The mean sampling error for the mean and for the fraction is defined as follows:


In these formulas, the variance of a feature is a characteristic of the general population, which are unknown during selective observation. In practice, they are replaced by similar characteristics of the sample population based on the law of large numbers, according to which the sample population of a large volume accurately reproduces the characteristics of the general population.

Formulas for determining the average error for various selection methods:

Selection method Repeated non-repeating
mean error share error mean error share error
Self-random and mechanical
Typical
Serial

μ - average error;

∆ - marginal error;

P - sample size;

N- the size of the general population;

Total variance;

w- share of this category in the total sample size:

Average of within-group variance;

Δ 2 - intergroup dispersion;

r- number of series in the sample;

R is the total number of episodes.


marginal error for all selection methods is related to the average sampling error as follows:

Where t- coefficient of confidence, functionally related to the probability with which the value of the marginal error is provided. Depending on the probability, the confidence coefficient t takes on the following values:

t P
0,683
1,5 0,866
2,0 0,954
2,5 0,988
3,0 0,997
4,0 0,9999

For example, the error probability is 0.683. This means that the general mean differs from the sample mean in absolute value by no more than μ with a probability of 0.683, then if is the sample mean, is the general mean, then With probability 0.683.

If we want to provide a higher probability of inference, we thereby increase the bounds of random error.

Thus, the value of the marginal error depends on the following quantities:

The fluctuation of the sign (direct connection), which is characterized by the magnitude of the dispersion;

Sample sizes (feedback);

Confidence probability (direct connection);

selection method.

An example of calculating the error of the mean and the error of the share.

To determine the average number of children in a family, 100 families were selected from 1000 families by random non-repetitive sampling. The results are shown in the table:

Define:.

- with a probability of 0.997, the marginal sampling error and the boundaries within which the average number of children in a family is located;

- with a probability of 0.954, the boundaries in which the proportion of families with two children is located.

1. Determine the marginal error of the mean with a probability of 0.977. To simplify the calculations, we use the method of moments:

p = 0,997 t= 3

average error of the mean, 0.116 - marginal error

2,12 – 0,116 ≤ ≤ 2,12+ 0,116

2,004 ≤ ≤ 2,236

Consequently, with a probability of 0.997, the average number of children in a family in the general population, that is, among 1000 families, is in the range of 2.004 - 2.236.

Average sampling error

The sampling set can be formed on the basis of a quantitative sign of statistical values, as well as on an alternative or attributive basis. In the first case, the generalizing characteristic of the sample is sample mean quantity denoted , and in the second - sample share quantities, denoted w. In the general population, respectively: general average And general share of the river.

Differences -- And W-r called sampling error, which is divided into registration error and representativeness error. The first part of the sampling error arises from incorrect or inaccurate information due to misunderstanding of the essence of the issue, carelessness of the registrar when filling out questionnaires, forms, etc. It is fairly easy to detect and fix. The second part of the error arises from the constant or spontaneous non-compliance with the principle of random selection. It is difficult to detect and eliminate, it is much larger than the first and therefore the main attention is paid to it.

The value of the sampling error depends on the structure of the latter. For example, if, when determining the GPA of faculty students, more excellent students are included in one sample, and more losers are included in another, then the sample average scores and sampling errors will be different.

Therefore, in statistics, the average error of repeated and non-repeated sampling is determined in the form of its specific standard deviation according to the formulas

= - repeated; (1.35)

= - non-repetitive; (1.36)

where Dv is the sample variance, determined with a quantitative sign of statistical values ​​according to the usual formulas from Chapter 2.

With an alternative or attributive sign, the sample variance is determined by the formula

Dv \u003d w (1-w). (1.37)

It can be seen from formulas (1.35) and (1.36) that the average error is smaller for a non-repetitive sample, which determines its wider application.

Marginal sampling error

Considering that on the basis of a sample survey it is impossible to accurately estimate the parameter under study (for example, the mean value) of the general population, it is necessary to find the limits in which it lies. In a particular sample, the difference can be greater than, less than or equal to. Each of the deviations from has a certain probability. In a sample survey, the real value in the general population is unknown. Knowing the average sampling error, it is possible to estimate the deviation of the sample mean from the general one with a certain probability and establish the limits within which the parameter under study (in this case, the average value) is located in the general population. The deviation of the sample characteristic from the general one is called marginal sampling error. It is defined as a fraction of the average error with a given probability, i.e.

= t, (1.38)

Where t - confidence factor, depending on the probability with which the marginal sampling error is determined.

The probability of occurrence of a certain sampling error is found using theorems of probability theory. According to the theorem of P. L. Chebyshev, with a sufficiently large sample size and limited population variance, the probability that the difference between the sample mean and the general mean will be arbitrarily small is close to one:

A. M. Lyapunov proved that regardless of the nature of the distribution of the general population, with an increase in the sample size, the probability distribution of the occurrence of one or another value of the sample mean approaches the normal distribution. This is the so-called central limit theorem. Therefore, the probability of deviation of the sample mean from the general mean, i.e. the probability of occurrence of a given limiting error also obeys the indicated law and can be found as a function of t using the Laplace probability integral:

where is the normalized deviation of the sample mean from the general mean.

The values ​​of the Laplace integral for different t calculated and available in special tables, of which a combination is widely used in statistics:

Probability

Given a specific level of probability, choose the value of the normalized deviation t and determine the marginal sampling error by the formula (1.38)

In this case, = 0.95 and t= 1.96, i.e. consider that with a probability of 95% the marginal sampling error is twice the average. Therefore, in statistics, the value t sometimes referred to the multiplicity factor of the marginal error relative to the average.

After calculating the marginal error, the confidence interval of the generalizing characteristic of the general population is found. Such an interval for the general average has the form

(-) (+), (1.39)

and similarly for the general share

(w-)p(w+). (1.40)

Consequently, during selective observation, not one exact value of the generalizing characteristic of the general population is determined, but only its confidence interval with a given level of probability. And this is a serious shortcoming of the sampling method of statistics.

Determining the sample size

When developing a program of selective observation, sometimes they are given a specific value of the marginal error with a level of probability. The minimum sample size that provides the given accuracy remains unknown. It can be obtained from the formulas for the mean and marginal errors, depending on the type of sample. So, substituting formulas first (1.35) and then (1.36) into formula (1.38) and solving it with respect to the sample size, we obtain the following formulas

for resampling

for no resampling

In addition, for statistical values ​​with quantitative characteristics, one must also know the sample variance, but by the beginning of the calculations it is not known either. Therefore, it is taken approximately in one of the following ways:

taken from previous sample observations;

according to the rule that the range of variation fits about six standard deviations (R/ = 6 or R/ = 6; from here D = R 2 /36);

According to the “three sigma” rule, according to which approximately three standard deviations fit into the average value (/ \u003d 3; hence \u003d / 3 or D = 2 /9).

When studying non-numerical characteristics, even if there is no approximate information about the sample fraction, it is accepted w= 0.5, which, according to formula (1.37), corresponds to the sample variance in the amount Dv = 0,5(1-0,5) = 0,25.

Marginal sampling error is equal to t times the number of mean sampling errors:

μ is the mean sampling error, calculated with the adjustment for which the adjustment is made in the case non-repeated selection;

t is the confidence factor, which is found at a given probability level. So for P=0.997 according to the table of values ​​of the Laplace integral function t=3

Value marginal sampling error can be installed with probability. The probability of occurrence of such an error, equal to or greater than three times the average sampling error, is extremely small and equals 0.003 (1–0.997). Such unlikely events are considered practically impossible, and thereforethe probability that this difference will exceed three times the value of the mean error, determines error level and is not more than 0,3% .

Determining the marginal sampling error for shares

Condition:

From finished products, in order of actual random non-repeated selection, 200 q were taken, of which 8 q were spoiled. Can we assume with a probability of 0.954 that the loss of production will not exceed 5% if the sample is 1:20 of its size?

Given:

  • n \u003d 200ts - sample size (sample population)
  • m \u003d 8ts - the number of damaged products
  • n:N \u003d 1:20 - the proportion of selection, where N is the volume of the population (general population)
  • P \u003d 0.954 - probability

Define: ∆ ω < 5% (согласуется ли то, что потери продукции не превысят 5%)

Solution:

1. Let's determine the sample share - such a share is spoiled products in the sample set:

2. Determine the volume of the general population:

N=n*20=200*20=4000(c)- quantity of all products.

3. Let us determine the marginal sampling error for the share of products with the corresponding feature, i.e. for the share of damaged products: Δ = t*μ, Where µ - the average error of the share with an alternative attribute, taking into account the amendment for which the adjustment is made in the case of unrepeated selection; t is the coefficient of confidence, which is found at a given probability level Р=0.954 according to the table of values ​​of the Laplace integral function: t=2

4. Define r confidence interval boundaries For fractions of an alternative feature in the general population, i.e. what share of spoiled products will be in the total volume: since the share of spoiled products in the sample volume is ω = 0.04, then, taking into account the marginal error ∆ ω = 0.027 general share of the alternative feature(p) will take the values:

ω-∆ ω < p < ω+∆ ω

0.04-0.027< p < 0.04+0.027

0.013 < p < 0.067

Conclusion: with a probability P=0.954 it can be argued , that the proportion of spoiled productswhen sampling a larger volume, it will not go beyond the found interval (not less than 1.3% and not more than 6.7%). But there remains a possibility that the share of damaged products may exceed 5% up to 6.7%, which, in turn, is not consistent with the statement ∆ ω< 5%.

*******

Condition:

The store manager knows from experience that 25% of the customers who enter the store make a purchase. Suppose there are 200 customers in the store.

Define:

  1. share of buyers who made purchases
  2. sample fraction variance
  3. sample share standard deviation
  4. the probability that the sample fraction will be between 0.25 and 0.30

Solution:

As general share (p) accept sample share (ω ) and determine the upper bound of the confidence interval.
Knowing the critical point (according to the condition: the sample fraction will be in the range of 0.25-0.30), we build a one-sided critical region (right-sided).
According to the table of values ​​of the integral Laplace function, we find Z
This option can also be considered as reselection provided that the same buyer, without buying the 1st time, returns and makes a purchase.

If the sample is considered as non-repetitive, it is necessary to correct the average error by a correction factor. Then, by substituting the corrected values ​​of the marginal error for the sample fraction, when determining the critical region, Z and P will change

Determination of the marginal sampling error for the mean

According to the data of 17 employees of the firm employing 260 people, the average monthly salary was 360 USD, with s=76 USD. What is the minimum amount that must be deposited in the firm's account to guarantee the payment of wages to all employees with a probability of 0.98?

Given:

  • n=17 - sample size (sample)
  • N=260 - population size (general population)
  • X cf. =360 - sample mean
  • S=76 - sample standard deviation
  • P \u003d 0.98 - confidence probability

Define: the minimum allowable value of the general mean (the lower bound of the confidence interval).

The discrepancy between the values ​​of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error. Distinguish between systematic and random sampling errors.

Random bugs are explained by insufficiently uniform representation in the sample population of various categories of units of the general population.

Systematic errors may be associated with a violation of the selection rules or the conditions for the implementation of the sample.

Thus, when surveying household budgets, the sampling frame was built for more than 40 years on the basis of the territorial-sectoral selection principle, which was due to the main goal of the budget survey - to characterize the standard of living of workers, employees and collective farmers. The sample was distributed among the regions and sectors of the economy of the RSFSR in proportion to the total number of employees; to create an industry sample, a typical sample was used with a mechanical selection of units within groups.

The main selection criterion was the average monthly salary. The principle of selection ensured proportional representation in the sample set of workers with different levels of wages.

With the advent of new social groups (entrepreneurs, farmers, unemployed), the representativeness of the sample was violated not only due to differences with the structure of the general population, but also due to a systematic error that arose due to a mismatch between the sampling unit (worker) and the observation unit (household) . A household with more than one working family member was also more likely to be selected than a household with one worker. Families not employed in the surveyed sectors fell out of the range of selected units (pensioner households, self-employed households, etc.). It was difficult to assess the accuracy of the results obtained (boundaries of confidence intervals, sampling errors), since probabilistic models were not used in the construction of the sample.

In 1996–1997 a fundamentally new approach to the formation of a sample of households was introduced. The data of the 1994 population microcensus were used as the basis for its implementation. The general population in the selection was made up of all types of households, with the exception of collective households. And the sampling set began to be organized taking into account the representativeness of the composition and types of households within each subject of the Russian Federation.

The measurement of errors in the representativeness of sample indicators is based on the assumption of a random nature of their distribution with an infinitely large number of samples.

Quantifying the reliability of a sample indicator is used to get an idea of ​​the general characteristic. This is carried out either on the basis of a sample indicator, taking into account its random error, or on the basis of a certain hypothesis (about the value of the average variance, the nature of the distribution, connection) regarding the properties of the general population.

To test the hypothesis, the consistency of empirical data with hypothetical data is evaluated.

The magnitude of the random representativeness error depends on:

  • 1) on the sample size;
  • 2) the degree of variation of the studied trait in the general population;
  • 3) the accepted method of forming a sample population.

There are mean (standard) and marginal sampling errors.

Average error characterizes the measure of deviations of sample indicators from similar indicators of the general population.

marginal error it is customary to consider the maximum possible discrepancy between the sample and general characteristics, i.e. maximum error for a given probability of its occurrence.

According to the sample population, it is possible to evaluate various indicators (parameters) of the general population. The most commonly used scores are:

  • - the general average value of the studied trait (for a multi-valued quantitative trait);
  • – general share (for an alternative sign).

The basic principle of applying the sampling method is to ensure an equal opportunity for all units of the general population to be selected in the sample population. With this approach, the requirement of random, objective selection is observed and, therefore, the sampling error is determined primarily by its size ( P ). With an increase in the latter, the value of the average error decreases, the characteristics of the sample population approach the characteristics of the general population.

With the same number of sampling sets and other equal conditions, the sampling error will be smaller in the one of them, which is selected from the general population with a smaller variation of the studied trait. A decrease in the variation of a trait means a decrease in the value of the variance (for a quantitative trait or for an alternative trait).

The dependence of the size of the sampling error on the methods of forming the sample population is determined by the formulas for the average sampling error (Table 5.2).

Let's supplement the indicators of Table. 5.2 with the following explanations.

The sample variance is slightly less than the general one; it has been proved in mathematical statistics that

Table 5.2

Formulas for calculating the mean sampling error for various sampling methods

Sample type

repeated for

unrepeatable for

Actually

random

(simple)

Serial

(with equal

Typical (in proportion to the size of the groups)

If the sample is large (i.e. P large enough), then the ratio approaches unity and the sample variance practically coincides with the general one.

The sample is considered unconditionally large when n> 100 and unconditionally small at P < 30. При оценке результатов малой выборки указанное соотношение выборочной и генеральной дисперсии следует принимать во внимание.

They can be calculated using the following formulas:

where is the average i th series; is the overall average for the entire sample;

where is the proportion of units of a certain category in i th series; - the share of units of this category in the entire sample; r- number of selected episodes.

4. To determine the average error of a typical sample in the case of selecting units in proportion to the size of each group, the average of the intragroup dispersions (- for a quantitative trait, for an alternative trait) acts as an indicator of variation. According to the rule of adding variances, the value of the average of the intragroup variances is less than the value of the total variance. The value of the average possible error of a typical sample is less than the error of a simple proper random sample.

Combined selection is often used: individual selection of units is combined with group selection, typical selection is combined with selection in series. With any selection method, with a certain probability, it can be argued that the deviation of the sample mean (or share) from the general mean (or share) will not exceed a certain value, which is called marginal error samples.

The ratio between the sampling error limit (∆) guaranteed with some probability F(t), and the mean sampling error has the form: or , where t – confidence coefficient, determined depending on the level of probability F(t).

Function values F(t) And t are determined on the basis of specially compiled mathematical tables. Here are some of the most commonly used ones:

T

Thus, the marginal sampling error answers the question of sampling accuracy with a certain probability, the value of which depends on the value of the confidence coefficient t. Yes, at t = 1 probability F(t ) deviation of the sample characteristics from the general ones by the value of a single mean error is 0.683. Consequently, on average, out of every 1000 samples, 683 will give generalized indicators (average, share), which will differ from the general ones by no more than a single average error. At t = 2 probability F(t) is equal to 0.954, which means that out of every 1000 samples, 954 will give general indicators that will differ from the general ones by no more than two times the average sample error, etc.

Along with the absolute value of the marginal sampling error, we also calculate the relative error which is defined as the percentage of the marginal sampling error to the corresponding characteristic of the sample:

In practice, it is customary to set the value of ∆, as a rule, within 10% of the expected average level of the attribute.

The calculation of the average and marginal sampling errors allows you to determine the limits within which the characteristics of the general population will be:

The limits in which, with a given degree of probability, an unknown value of the indicator under study in the general population will be contained are called confidence interval, and the probability F(t) confidence probability. The higher the value of ∆, the larger the confidence interval and, consequently, the lower the accuracy of the estimate.

Consider the following example. To determine the average size of a deposit in a bank, 200 foreign currency accounts of depositors were selected using the method of repeated random sampling. As a result, it was found that the average deposit amount was 60 thousand rubles, the dispersion was 32. At the same time, 40 accounts turned out to be on demand. It is necessary, with a probability of 0.954, to determine the limits within which the average deposit amount on foreign currency accounts in the bank and the share of demand accounts are located.

Calculate the mean error of the sample mean using the reselection formula

The marginal error of the sample mean with a probability of 0.954 will be

Consequently, the average deposit in foreign currency bank accounts is within a thousand rubles:

With a probability of 0.954, it can be argued that the average deposit in foreign currency bank accounts ranges from 59,200 to 60,800 rubles.

Let us determine the share of demand deposits in the sample population:

Sample share mean error

The marginal error of the share with a probability of 0.954 will be

Thus, the share of demand accounts in the general population is within w :

With a probability of 0.954, it can be argued that the share of demand accounts in the total number of foreign currency accounts in the bank ranges from 14.4 to 25.6%.

In specific studies, it is important to establish the optimal ratio between the measure of the reliability of the results obtained and the size of the acceptable sampling error. In this regard, when organizing a sample observation, the question arises related to determining the sample size necessary to obtain the required accuracy of the results with a given probability. The calculation of the required sample size is carried out on the basis of the formulas for the marginal sampling error in accordance with the type and method of selection (Table 5.3).

Table 5.3

Formulas for calculating the sample size with a proper random selection method

Let's continue the example, which presents the results of a sample survey of personal accounts of bank depositors.

It is required to determine how many accounts need to be examined so that with a probability of 0.977 the error in determining the average deposit amount does not exceed 1.5 thousand rubles. Let us express from the formula for the marginal sampling error for re-selection the indicator of the sample size:

When determining the required sample size using the above formulas, it becomes difficult to find the values ​​of σ2 and yes, since these values ​​can be obtained only after a sample survey. In this regard, instead of the actual values ​​of these indicators, approximate ones are substituted, which could be determined on the basis of any trial sample observations or from analytical previous surveys.

In cases where the statistician knows the average value of the characteristics being studied (for example, from instructions, legislative acts, etc.) or the limits in which this characteristic varies, the following calculation can be applied using approximate formulas:

and the product w(1 – w) should be replaced by the value 0.25 (w = 0.5).

To get a more accurate result, take the maximum possible value of these indicators. If the distribution of a trait in the general population obeys the normal law, then the range of variation is approximately equal to 6σ (the extreme values ​​\u200b\u200bare separated from the average by 3σ on both sides). Hence , but if the distribution is obviously asymmetric, then .

With any type of sample, its volume begins to be calculated according to the re-sampling formula

If, as a result of the calculation, the selection share ( n ) exceeds 5%, then the calculation is carried out according to the formula of non-repetitive selection.

For a typical sample, it is necessary to divide the total volume of the sample population between the selected types of units. The calculation of the number of observations from each group depends on the previously mentioned organizational forms of a typical sample.

In the typical selection of units disproportionately to the number of groups, the total number of selected units is divided by the number of groups, the resulting value gives the number of selection from each typical group:

Where k is the number of distinguished typical groups.

When selecting units in proportion to the number of typical groups, the number of observations for each group is determined by the formula

where is the sample size from i -th group; - volume i -th group.

When selecting, taking into account the variation of the trait, the percentage of the sample from each group should be proportional to the standard deviation in this group (). The calculation of the number () is carried out according to the formulas

In serial selection, the required number of selected series is determined in the same way as in proper random selection:

Reselection

Non-repeating selection

In this case, the variances and sampling errors can be calculated for the mean value or proportion of the trait.

When using selective observation, the characteristics of its results are possible on the basis of a comparison of the obtained error limits of selective indicators with the value of the permissible error.

In this regard, the problem arises of determining the probability that the sampling error will not exceed the permissible error. The solution of this problem is reduced to the calculation based on the formula for the marginal sampling error of the quantity t.

Continuing the consideration of an example of a sample survey of personal accounts of bank customers, we will find the probability with which it can be argued that the error in determining the average deposit size will not exceed 785 rubles:

the corresponding confidence level is 0.95.

Currently, the practice of selective observation includes statistical observations carried out by:

  • - bodies of Rosstat;
  • – other ministries and departments (for example, monitoring of enterprises in the system of the Bank of Russia).

A well-known generalization of experience in organizing sample surveys of small enterprises, population and households is presented in the Methodological Provisions on Statistics. They give a broader concept of selective observation than discussed above (Table 5.4).

In statistical practice, all four types of samples are used, presented in Table. 5.4. However, preference is usually given to the probabilistic (random) samples described above, which are the most objective, since they can be used to assess the accuracy of the results obtained from the data of the sample itself.

Table 5.4

Sample types

In samples quasi-random type probabilistic selection is assumed on the basis that the expert considering the sample considers it acceptable. An example of the use of quasi-random sampling in statistical practice is the "Sampling Survey of Small Enterprises to Study Social Processes in Small Business", conducted in 1996 in some regions of Russia. Observation units (small enterprises) were selected expertly, taking into account the representation of economic sectors from the already formed sample of the survey of the financial and economic activities of small enterprises (the form "Information on the main indicators of the financial and economic activities of a small enterprise"). When summarizing the sample data, it was assumed that the sample set was formed by the method of simple random selection.

direct use of expert judgment is the most common method of intentionally including units in a sample. An example of such a selection method is the monographic method, which involves obtaining information from only one observation unit, which is typical, according to the survey organizer - an expert.

Samples based on directional selection, are implemented using an objective procedure, but without using a probabilistic mechanism. The method of the main array is widely known, in which the sample includes the largest (significant) units of observation that provide the main contribution to the indicator, for example, the total value of a feature representing the main purpose of the survey.

In statistical practice, it is often used combined method of statistical observation. The combination of continuous and selective observation methods has two aspects:

  • alternation in time;
  • their simultaneous use (part of the population is observed on a continuous basis, and part - selectively).

alternation periodic sampling with relatively rare continuous surveys or censuses is necessary to clarify the composition of the studied population. In the future, this information is used as a statistical basis for sample observation. Examples are population censuses and household sample surveys in between.

In this case, you need to solve the following tasks:

  • – determination of the composition of signs of continuous observation, which ensure the organization of the sample;
  • – substantiation of periods of alternation, i.e. when continuous data is no longer relevant and costs are needed to update it.

Simultaneous use within the framework of one survey of continuous and sample observations is due to the heterogeneity of the populations encountered in statistical practice. This is especially true for surveys of the economic activity of a set of enterprises, which are characterized by skewed distributions of the characteristics under study, when a certain number of units have characteristics that differ greatly from the bulk of the values. In this case, such units are observed on a continuous basis, and the other part of the population is observed selectively.

With this organization of observations, the main tasks are:

  • – establishment of their optimal proportion;
  • – development of methods for assessing the accuracy of the results.

A typical example illustrating this aspect of the application of the combined method is the general principle of conducting surveys of the population of enterprises, according to which surveys of the population of large and medium-sized enterprises are carried out mainly by a complete method, and small ones by a sample.

Further development of the sampling methodology is carried out both in combination with the organization of continuous observation, and through the organization of special surveys, the conduct of which is dictated by the need to obtain additional information to solve specific problems. Thus, the organization of surveys in the field of conditions and living standards of the population is provided for in two aspects:

  • - mandatory components;
  • – additional modules within the integrated system of indicators.

Mandatory components may be annual surveys of income, expenditure and consumption (similar to household budget surveys), which also include basic indicators of the living conditions of the population. Every year, according to a special plan, the mandatory components should be supplemented with one-time surveys (modules) of the living conditions of the population, aimed at an in-depth study of any selected social topic from their total number (for example, household assets, health, nutrition, education, working conditions, housing conditions, leisure, social mobility, security, etc.) with varying frequency, determined by the need for indicators and resource capabilities.

Based on the values ​​of the characteristics of the sample units registered in accordance with the program of statistical observation, generalizing sample characteristics are calculated: sample mean() And sample share units that have some trait of interest to researchers, in their total number ( w).

The difference between the indicators of the sample and the general population is called sampling error.

Sampling errors, like errors of any other type of statistical observation, are divided into registration errors and representativeness errors. The main task of the sampling method is to study and measure random errors of representativeness.

The sample mean and sample share are random variables that can take on different values ​​depending on which units of the population are in the sample. Therefore, sampling errors are also are random variables and can take on different values. Therefore, the average of the possible errors is determined.

Average sampling error (µ - mu) is equal to:

for the average; for share,

Where R- the share of a certain attribute in the general population.

In these formulas σ x 2 And R(1-R) are characteristics of the general population, which are unknown during sample observation. In practice, they are replaced by similar characteristics of the sample population on the basis of the law of large numbers, according to which the sample population, with a sufficiently large volume, accurately reproduces the characteristics of the general population. Methods for calculating the average sampling errors for the average and for the share in repeated and non-repeated selections are given in Table. 6.1.

Table 6.1.

Formulas for calculating the mean sampling error for the mean and for the share

The value is always less than one, so the value of the average sampling error with non-repetitive selection is less than with repeated selection. In cases where the sample fraction is insignificant and the factor is close to unity, the correction can be neglected.

It is possible to assert that the general average of the indicator value or the general share will not go beyond the boundaries of the average sampling error only with a certain degree of probability. Therefore, to characterize the sampling error, in addition to the average error, we calculate marginal sampling error(Δ), which is related to the level of probability that guarantees it.

Probability level ( R) determines the value of the normalized deviation ( t), and vice versa. Values t are given in normal probability distribution tables. Most commonly used combinations t And R are given in table. 6.2.


Table 6.2

Standard deviation values t with the corresponding values ​​of the probability levels R

t 1,0 1,5 2,0 2,5 3,0 3,5
R 0,683 0,866 0,954 0,988 0,997 0,999

t is a confidence factor that depends on the probability with which it can be guaranteed that the marginal error will not exceed t times the mean error. It shows how many average errors are contained in the marginal error.. So if t= 1, then with a probability of 0.683 it can be argued that the difference between the sample and general indicators will not exceed one mean error.

Formulas for calculating the marginal sampling errors are given in Table. 6.3.

Table 6.3.

Formulas for calculating the marginal sampling error for the mean and for the share

After calculating the marginal errors of the sample, one finds confidence intervals for general indicators. The probability that is taken into account when calculating the error of a sample characteristic is called the confidence level. A confidence level of probability of 0.95 means that only in 5 cases out of 100 the error can go beyond the established limits; probabilities of 0.954 - in 46 cases out of 1000, and at 0.999 - in 1 case out of 1000.

For the general average, the most probable boundaries in which it will be, taking into account the marginal error of representativeness, will look like:

The most probable boundaries in which the general share will be located will look like:

From here, general average , general share .

Given in table. 6.3. formulas are used in determining sampling errors, carried out by the actual random and mechanical methods.

With stratified selection, representatives of all groups necessarily fall into the sample, and usually in the same proportions as in the general population. Therefore, the sampling error in this case depends mainly on the average of the intragroup variances. Based on the rule for adding variances, we can conclude that the sampling error for stratified selection will always be less than for proper random selection.

With serial (nested) selection, the intergroup dispersion will be a measure of fluctuation.

Liked the article? Share with friends: