Statistical Error and Confidence Intervals

Dec 22, 20223 min read

Statistical Errors

The difference between estimated and true value is known as statistical error. These errors may occur due to variety of reasons.

a. Errors of origin

These errors may occur due to inappropriate definitions, bias of investigators, or instability of collected data.

b. Errors of manipulation

These errors occur due to manipulation in measurement, counting and description.

c. Errors of inadequacy

These errors occur due to use of incomplete data and inadequate sample size.

Further errors in statistics can be divided into two parts (a) experimental errors and (b) sampling errors. The experimental errors can also be divided into two parts (i) measurement or systematic or observational errors and, (ii) sampling bias. The measurement errors occur due to faulty measurement scales, mistakes committed by the person measuring or difference between the inspectors. The sampling bias may occur due to non-random selection of samples, location bias or inspection bias.

Sampling error include alpha, beta errors, margin of error, regression residuals, and sum of square errors. Alpha error is rejecting a true null hypothesis and the p-value is called as probability of committing alpha error. The beta error is opposite to alpha error i.e., accepting a false null hypothesis. Alpha and beta error influence each other as the reducing probability of committing alpha error will increase the probability of committing beta error subsequently. Margin of error (MOE) is half the width size of the confidence interval. Larger value of alpha results in lesser value of MOE and vice versa. The residual error is the difference between actual value of dependent variable y and regressed or predicted value of y. standard error is defined as the standard error of the sampling distribution. An increase in sample size reduces the probability of sampling errors.

Estimation and Hypothesis Testing

The two major objectives of statistical inference about the population using sample are estimation and hypothesis testing. A good estimator of population parameter shall be unbiased, consistent, efficient and adequate.

The estimation can be an interval of point estimation. For instance, if an investigator wishes to estimate the average weight of apples in an orchard which may lead to a purchase decision may conclude average weight of apples as 147 gms or lying somewhere between 147 gms – 157 gms, the first statement represents the point estimation and the second statement represents the interval estimation. The interval estimation is generally explained in terms of probability. The above interval estimation of average apple size may be said as, there is a 95% confidence that average weight of apples in the orchard is in between 147 – 157 gms of the probability that the average weight of apples of the particular orchard will lie between 147 – 157 gms is .95.

The width of the confidence interval is impacted by heterogeneity in the population and sample size under study. As the degree of homogeneity increases the size of the confidence interval of estimation will become wider and vice versa. Likewise smaller samples may result into wider confidence intervals. The confidence intervals can be calculated using (i) informal method, (ii) traditional method and (iii) bootstrapping. H0 and H1. Let us understand the process of creating an interval estimation using z-distribution and t-distribution depending upon large and small samples i.e., above 30 and below 30 respectively.

Let us assume that a sample of 100 apples (shall be considered as large sample) is collected from different trees of an orchard with a mean sample weight of 147 gm with an assumed standard deviation of 12 gm. Investigator wishes to create an interval estimation of the average weight of the apples in the orchard.

The information given here is.

Should be concluded as, there is a 95% chance that the average weight of apples will lie between 144.64 gm and 149.35 gm. It can also be concluded that 95% of the samples taken from orchard will carry the population mean or average weight of apples from the orchard.

Let us consider that the sample collected for investigation is 20 and the population standard deviation is unknown, while the sample mean or sample statistic of 20 apples is 146 gm. Also, the standard deviation of the sample (s) is calculated as 8.19 gm. Then we use the t-distribution for constructing a confidence interval for population parameter i.e., population mean or average weight of the apples in orchard. In this case the information given is.