Yahoo Answers is shutting down on May 4th, 2021 (Eastern Time) and the Yahoo Answers website is now in read-only mode. There will be no changes to other Yahoo properties or services, or your Yahoo account. You can find more information about the Yahoo Answers shutdown and how to download your data on this help page.

?
Lv 6
? asked in Science & MathematicsMathematics · 1 decade ago

Why does the sample variance, for sample size n, differ from the population variance by factor n/(n-1)?

I can work out the algebra, but an intuitive ("physical") explanation would be a great help. Thnx.

2 Answers

Relevance
  • 1 decade ago
    Favorite Answer

    The variance (σ²) should be calculated as the average squared difference from the population mean: Σ(Xi-μ)²/n and, on average, this gives the correct variance:

         E[Σ(Xi-μ)²/n] = σ².

    However, if you have only a sample, you do not know μ, and so you estimate it with the average from the sample: X=ΣXi/n, which is correct on average: E[X]=μ. However, X will normally differ slightly from μ. But the function F(a)=Σ(Xi-a)²/n has a minimum when a=X, not μ:

         0 = F'(a) = Σ2(Xi-a)/n ⇒ a = ΣXi/n = X.

    That means F(X) ≤ F(μ). Unless the sample average, X, actually equals the population average, μ, the estimated variance will be too small, on average. How much too small? How far from μ is X? You should know (and be able to calculate easily) that Var(X-μ) = Var(X) = σ²/n (part of the Central Limit Theorem). The mean-square difference between X and μ, E[(X-μ)²] is σ²/n, so you might expect F(X) to be too small by σ²/n, on average: F(X) ≈ σ²-σ²/n = [(n-1)/n]σ² or σ² ≈ [n/(n-1)]F(X). I leave it to you to check that F(μ)-F(X) actually does exactly equal (μ-X)², so:

         σ² = E[(n/(n-1))F(X)] = E[Σ(Xi-X)/(n-1)].

  • 4 years ago

    the "formula" you cite isnt extremely top, i think of you propose the sum of (x - xbar)^2/n. yet this "formula" is only genuine under the assumption of sampling from a limiteless inhabitants, or a minimum of whilst the pattern length is extremely small in terms of the inhabitants length. once you pattern from a finite inhabitants, there is one greater factor that's an element of the variance calculation - that factor is (one million - n/N), so whilst n=N (pattern the whole inhabitants), the factor turns into 0 and for this reason the variance is 0. the reason this could be intuitive is that when you pattern the whole inhabitants, there is not any sampling errors - you have measured the whole inhabitants.

Still have questions? Get your answers by asking now.