Sampling Theory

Oct 29, 202417 min read

Introduction

A researcher investigating a problem can either go for a desk research; that majorly depends on secondary data or the investigator can decide to go for field survey research; that majorly depends on primary data. Once the researcher decides to go for a field survey, the researcher has two options to trade with; (a) Census survey or (b) Sample survey. The data from the field can be collected in one of the two ways;

(i) The researcher can decided to collect data from all units, respondents or people under investigation. This is known as a census survey

(ii) The researcher can decided to collect data from few units, respondents or people out of the all units, respondents or people under investigation. This is known as a sample survey.

The above argument clearly states that the sample is a part of population. The population is generally represented by English alphabet, “N” and the sample is generally represented by English alphabet, “n”. The concept of sample can be understood by following diagram;

figure - 1

For instance; the manager HR of an organization employing N = 2000 employees has following research question under investigation.

RQ.1. What is the level of satisfaction amongst the employees about the leave policies adopted by organization?

Here the population under study or investigation contains 2000 employees also termed as data points. Further, the research question itself hints that the field survey i.e. having opinion of employees on same is the best way to find an answer the research question under investigation. After making a decision about the field survey, the manager HR can follow one of the two approaches; either the manager can collect data from all 2000 employees or data points and reach to a conclusion after analysing the data or he manager can collect the data from few employees say, n= 200 also known as sample points and conclude the research question under investigation after analysing the collected data. The first approach in which all the data points (employees) are investigated is known as a census survey method and the second approach in which only sample points (few employees) are investigated to reach the conclusion is known as a samplingmethod.

The population or Data Points

The population under investigation can be finite or infinite. The population that can be counted is known as a finite population some of the examples of finite population are; number of car users in India, number of management graduates, number of national cricket teams in the world and many more. Sometimes the population under investigation cannot be counted or counting it becomes extremely complex due to its nature or huge size such as; population of the world (notice that the population of the world cannot be counted precisely it can only be approximated), population of grains in a harvest, number of atoms per gram of substance, number of germs on the body are the examples of infinite population.

Population under investigation can further be real or hypothetical. The population that contains units of existence is known as a real population for example; households of a city, car users and many more. The population that doesn’t contain the real units of existence is known a hypothetical population such as; results of infinite tosses of a coin and else. The hypothetical population is also known as theoretical population. The classification itself suggests the theoretical or hypothetical population is useful in theoretical or hypothetical studies, while the real population is useful in real life studies such as social sciences.

When to Adopt Census over Sample?

Population or census investigation is preferred when the size of the population is small in numbers as well as geographically. Further the availability of time shall also be checked. Census investigation consumes more time comparative to sample investigation. The non-availability of sufficient funds is yet another constraint in census investigation as it is a costly method comparatively. For instance; if RQ.1. is to be studied in office that has only 25 employees then census investigation makes more sense in comparison to sample investigation.

Merits and Demerits of Census Study

Census study has many advantages such as;

Intensive study

Many aspects can be covered in single exercise of data collection whenever census investigation method is adopted. For example; census department of India collects data for multiple items such as education, gender, income etc.

High Degree of accuracy and Reliability

Since entire population under study is studied in this method and all data points are contacted the reliability and degree of accuracy of the study remains very high.

Only method applicable

Sometimes it becomes impossible to draw a representative sample form the population due to its heterogeneous nature; hence the census method of investigation is only method available for application.

Despite of higher reliability and, intensity and applicability the census investigation method has certain disadvantages as well;

Time, money and resource consuming

Census method is an expensive method in terms of time, money and resources in comparison to sample investigation method. It can obviously be understood that in comparison of sample larger number of units are to contacted in this method and hence this method becomes highly time consuming, monetarily expensive and high resource consuming.

Non-sampling Errors

The statistical errors due to non – response, poor measurement, poor method of data collection, personal biasness, incomplete coverage of population, statistical editing errors etc. can occur in census investigation method.

Objectives Merits and Demerits of Sample Study

Sampling is a scientific method. In this method the conclusion is drawn on the basis of a properly drawn sample that fairly represents the population characteristics. At times a fairly representative sample provides accurate result in comparison to census survey.

Sampling survey has following objectives;

To estimate the population parameter

A population parameter is statistical measure of population under investigation such as mean of the population, mode of the population, nature of the population etc. In order to estimate the population at times the sample is drawn and parameter is estimated. For example; to know the mode weight of the oranges in an orchard a researcher may draw a proper sample that is fairly representative of population of oranges and estimate the desired mode. These studies come under, “Theory of Estimation”.

Testing of hypothesis

Testing of hypothesis is yet another objective of sample study. Truthfulness of an assumption or hypothesis may be checked with some probability using a sample study. For example; a researcher may have an assumption or hypothesis about the mode of oranges in orchard. The researcher may draw a fairly representative sample from population of oranges to check the degree of truthfulness of the assumption or hypothesis. These studies come under, “Testing of Hypothesis”

Statistical quality control

During any production process, it is very important to keep the production process under check for production of desired quality. To ensure this the samples are drawn frequently from the production line and the desired quality of production is measured. These studies come under, “Statistical quality control”.

It is evident that the importance of sampling is immense. Sampling surveys have many advantages. Some of the advantages can be listed as follows;

Reduced cost and greater speed

It is obvious that in sampling survey number of units under study is less, hence collection of data requires less time and less cost in comparison to census survey.

Detailed Enquiry and administrative convenience

Since the number of units under study is less in comparison to census survey, the units can be allotted more time and supplementary information can be obtained. The reduced size of the units under study also ensures administrative convenience.

Greater accuracy

Since the method of enquiry is scientific hence it leads to greater accuracy. The results concluded by using sample survey are generally discussed in terms of probability hence they cover the fluctuation of output as well.

Applicable in most of the cases

The sampling survey method is applicable in most of the cases. Sometimes the complete population under investigation is not available at all, but the sample is always available for studying the population.

Despite of its merits the sample investigation or sample survey has many disadvantages. Some of them are;

Illusory conclusion

Since all the units are not contacted in sample survey hence the results are always arguable and it can be said that the sample provides an illusory conclusion.

Difficult or impossible to draw a representative sample

Drawing a population that is a true representative of population is a very difficult and challenging task. Higher is the heterogeneity in the population; much difficult it becomes to draw a representative sample. Personal biasness of the investigator may also lead to non-representativeness of sample. Sometimes due to very high heterogeneity in population it becomes almost impossible to draw a sample.

Specialized knowledge is required

Sampling process is not as straight as census. There is no discussion on, how the units shall be selected for study? In a census survey, while sample survey requires an expertise in the field of drawing a proper and fairly representative sample.

4. Sampling errors

There is always a possibility of sampling errors due to non-representativeness, biasness, randomness and else.

Census vs sample investigation

Table -1 (Census investigation vs sample investigation)

	Census	Sample
Nature of enquiry	Extensive	Limited
Economy	Expensive	Economical
Speed	Less	More
Accuracy	May be 100%	Cannot be 100%
Nature of Errors	Non-sampling	Both
Suitability	Small population	Large population
Administration	Difficult	Easy

Defining a Good Sample

As discussed earlier that a sample investigation has multiple benefits. It is also evident that the sample investigation or study is more scientific in nature. The first step towards conducting a sample investigation or sampling study is to drawing a correct sample. Following are the requisites of a good sample.

1. Representativeness

A sample shall be representative of population i.e. the heterogeneous nature of the population shall reflect in the population correctly. A poorly representative sample leads to poor conclusions. It is easy to draw a representative sample from a homogeneous population but as the degree of heterogeneity increases it becomes difficult to draw a represented sample.

2. Independence of selected sampling units

The units selected for the sample shall be independent of each other i.e. the data produced by one sample unit shall not have influence of other sample units. The independency of sample units reduces the chances of group bias.

3. Sample should be Adequate

Sufficiency of sample is very important. A sample less than the required size produce defective results and a sample more than the required size produces generalization errors. An unnecessary drawn large sample increases cost and causes administrative inconvenience.

4. Sample should be in accordance with the objective

Sample units shall always be in accordance with the objective of the study. A sample that is not aligned with the objectives reached to meaningless results at times.

Methods of Sampling

After defining the requisites of a good sample further the methods of sampling are described. Following figure presents the methods of sampling;

Figure -2

Random sampling

If every unit in the population has an equal chance of selection the sample is called as a random sample. For instance; if a sample of 10 units is to be drawn from the population of 60 the chance lies with each unit to be selected for sample is given by 10/60 or 1/6 or 16.67%.

Unrestricted Random sample

The method of drawing unrestricted random sample is largely practiced when the population is highly homogeneous. In contemporary times the job of drawing a random sample or many random samples from population can be easily executed by computers.

There are various traditional methods in practice for drawing a unrestricted random sample as well. Some of the traditional methods of drawing a unrestricted random sample from population are as follows;

1. Lottery Method

The Lottery means that anyone can get a chance to be selected for sample. Lottery method can be used in one of the following ways;

a. Roulette Wheel Method

In roulette wheel method a wheel with a pointer (figure -2) is used to collect the desired sample. Let us understand it through an example.

Figure -3

Let us consider a researcher wishes to find out the popularity of IPL cricket teams amongst the students of a certain educational institute. The institute consists 500 students (population under study) and it will not be possible to collect data from all the students however he has a list of students with him, hence the researcher decides to collect the data from 50 students (sample). The challenge in front of researcher is how to select a random sample from the population. The researcher decides to go for a roulette wheel method for sample unit selection. The researcher allocates serial numbers to the student list from 1 to 500. Now researcher rotates the wheel thrice to get first sample unit as the population is a three digit number. The pointer stops as 1, 8, and 0 in three subsequent spins. Therefore the unit (student) number 180 is selected for the sample. When researcher initiates the second set of spin the first number appeared is 6. Then this set of spin is discarded as a sample unit’s serial number cannot start with a digit more than 5. In third set of spin the researcher gets 0, 9 and 7 in subsequent spins. In this case the student number 097 will become the part of sample. The process continues till the sample of 50 units is collected. The deletion of repetitive numbers will obviously be deleted.

b. Chit System

Chit System is one of the most common methods of unrestricted random sampling. In this method the serial numbers are allotted to all population units and are written on identical chits individually, these chits are further dropped in a box. Blindly the number of chits equal to desire sample size is draw from the box. The process of chit system can be observed during lucky draws at many places.

2. Tippett’s Method

Leonard Henry Caleb Tippett has given a table of random numbers containing more than 10,400 random numbers in 1927 (also known as Tippett’s table of random numbers). This table can be effectively used to pick a perfectly random sample. The process can be understood with the following example. In the example mentioned above the researcher wishes to select the sample units by using tippet’s table, first the entire population is sample from 001 to 500. Further a row and column number is selected to identify the location of a random number called the starting point from the table arbitrarily. Let us randomly pick the number located at 9th row and 4th column from the table (Figure -4)

Figure -4

From starting point all the numbers falling in the row are taken out in this case the starting point is 05028 and the subsequent row numbers are 30033, 53381, 23656, 75787, 59223, 92345, 31890, 95712, and 08279. The numbers can be rewritten without break as;

05028300335338123656757875922392345318909571208279

Since the maximum size of the population is three digits long hence the three digit arrangements from the beginning are made in following way;

050, 283, 003, 353, 381, 236, 567, 578, 759, 923, 223, 712, 453, 189, 095, 712, 082, 79

The first sample unit is 50th person; the second sample unit is 283rd person and son on. Sampling units exceeding the number 500 i.e. 567, 578, 759, 923, 712 are discarded so as the number left with two digits i.e. 79. Hence 11 sample units are collected by following the process once. The process is repeated and a starting random point is again identified. The process continues until all 50 sample units are identified. While repeating the process the repeated sample units are also discarded.

Merits and Demerits of Random sampling

One of the great merits of random sampling methods is that they are easy to use and free from personal biasness. At times the universe or population gets fairly represented by the sample collected this way. This method defined the definition of random selection at it’s best, as it can be observed that in all three methods discussed above all units of population have an equal chance of getting selected for the sample.

The method comes with certain demerits as well. One of the biggest limitation of the method is that it is most effective only when the population is highly homogeneous. As the degree of homogeneity reduces the accuracy of method decreases and the method fails to pick a representative sample from the population. The method again fails to collect the data from a widely spread population geographically. Further if the size of the sample is very small the method produces a difficulty in population representation.

Restricted Random Sampling

The sampling procedure here also remains random but with some restrictions. The method is more useful in the cases when the homogeneity lies within the heterogeneous population. Following are the methods of restricted random sampling

Stratified Random Sampling

When the population is highly heterogeneous and a random sample has to be selected from the population the challenge in front of the researcher is to select a representative sample. In such cases stratified random sampling is applied. The strata’s are the homogeneous groups within the heterogeneous population. Each strata carries the homogeneous property of sample units. It can be understood with following examples.

Let us consider that Bajaj auto wishes to find out level of satisfaction from their motorbikes in a particular township of 1000 bike users. Depending on the engine capacity Bajaj produces bikes majorly in three categories 100cc, 150cc and 300cc. It is obvious that the parameters of satisfaction for a user may depend on engine capacity the person is using i.e. satisfaction of a 100 cc bike user will hover around the mileage and low maintenance, while the satisfaction of a user having a bike of 300cc will revolve around power and speed. From the sales records Bajaj auto knows that there are 600 bike users of 100cc, 400 users of 150cc and 100 users of more than 150cc. The time and resources do not permit to conduct a census study hence Bajaj auto decides to conclude the project with a sample of 100 bike users. They have decided to go for a random selection of 100 units. It is clear from the distribution of population that 100cc bike users have a higher chance of selection in the sample due to their large representation in the population. There is a possibility that not even a single bike user of more than 300cc engine is selected for the sample. Picking up a representative sample becomes a challenging task hence. To counter the situation the researcher uses the stratified sampling technique and divides the population in to three strata’s. first strata is of 100cc bike users containing 600 units, second strata is of 150cc bike users having 400 units and third strata is of 300cc bike users having 100 units in it. Further 33 units from each group can be selected from each group to obtain a sample of nearly 100 units.

Here highly heterogeneous population is divided in to homogenous strata with a common property amongst the units of the strata i.e. engine capacity in this case and each unit within the strata has an equal chance of selection, that’s how the property of random sample selection is maintained.

Merits and Demerits of Stratified Sampling

One of the major advantage of this method is that it is highly suitable where the respondents are highly skewed or heterogeneous in nature. There is an advantage of comparability amongst the different strata’s straight away in this method. The data is highly representative of heterogeneous population in this case.

This method also has certain limitations. As dividing strata’s requires the complete knowledge of heterogeneous characteristics of the population. Wrong formulation of strata’s cam mislead the results at the end. If the population size is very small it is very difficult to design the strata’s.

Cluster sampling

This method of sampling is highly suitable and effective when the sample is to be collected from lager geographical area. The geographical area is first divided in to clusters and then few clusters are selected for data collection by using a random sampling method. The cluster division should be as such that each cluster shall be a representative of population i.e. each cluster shall cover the heterogeneous characteristic of the population. For instance; if a food chain is operating in Faridabad city of Haryana and they want to test the taste of a newly launch product in the city. Then the company can divide the entire city in certain clusters. Further two or three clusters can be selected to collect the sample data and the food chain stores of that area are covered for data collection. The pictorial representation of the same can be observed from figure -1

Figure -1

The biggest challenge of this method to divide the geographical area in to homogeneous clusters. The clusters shall be homogeneous in nature containing heterogeneous characteristics of population. The advantage of this method lies in the concept of heterogeneity of population in heterogeneous clusters. While the challenge of this method is to produce representative clusters out of geographical distribution. However the mechanism of cluster sampling looks similar to stratified sampling but in stratified sampling one unit represents the sampling unit, while in cluster sampling a cluster represents that sampling unit. Apart from that each homogeneous population gets converted in to homogenous stratas’ while the heterogeneous population gets converted in to heterogenous clusters.

Systematic sampling

The systematic sampling method is applied where the population is homogeneous in nature and there is non-repetition of some property after certain interval. The procedure can be understood with following example.

Let us consider that the sample of 200 units needed to be picked from a population of 2000 units. The researcher uses following simple mathematical formula to select the sample units systematically

Sample units to be selected = (Size of Population)/ (Desired Sample Size)

In this case,

Sample units to be selected = 2000/200=Every 10th Unit

Hence every 10th unit is selected to be a part of the sample. The researcher may choose to pick a starting point randomly or may start the beginning. The method is completely random but restricted as no units knows in advance that which all units will fall in the criteria of selection.

Multi-stage Sampling

Multi-stage sampling can be considered as the special case of cluster sampling i.e. in this case the clusters are selected randomly in stage one and in second stage the sampling units from each clusters are selected randomly. Following diagram elaborates the process;

Figure -2

Non-probabilistic methods of sampling

The sample collected non-probabilistically simply means that all the units do not have an equal chance of selection. The probability that a unit will be selected for sampling varies from unit to unit. For instance; a student is asked to select five students so that they represent the entire class. The student that has to choose the sample may pick his friends or knowns more promptly and hence the friends of the student who is collecting sample has a higher chance to be selected in comparison to the students who are not known to the person that is collecting the data.

Judgement Sampling

This method gives the investigator a complete freedom to select units for sample, in this method individual items of a sample are selected by the investigator consciously using his own judgment. This method is highly suitable for the situations where the investigator is familiar with the population. For instance; if five students have to be selected from a class of 100 students as a sample. The teacher who is continuously teaching the class is the best person to select five students that can be best representative of the entire class i.e. population.

The biggest advantage of this method lies in the concept that the highly representative sample can be selected from the population. The method is less time consuming comparatively as the investigator knows the target population very well.

The disadvantage of this method is investigator’s biasness that can affect the results of the study significantly.

Quota Sampling

In quota sampling the proportion of sample to be taken from the particular category is decided on the basis of representation of particular category in the population.

Let us reconsider that Bajaj auto example mentioned under the heading of stratified sampling. In case of adoption of quota sampling the sample will be collected on the basis of representation of particular category in population. Following tables provides the details of quota sample selection.

Strata	Population Size	Sample Units selected
Strata 1 (100cc bike users)	600	60
Strata 2 (150cc bike users)	400	40
Strata 3 (300cc bike users)	100	10
Total	1000	100

Each category is allotted with the quota of representation in the entire sample depending upon its weight in the population.

The disadvantage of this method is that the results are biased towards the largest allocated category. The method of sample selection is non-random. At times the research requires equal details from each group despite of their proportional representativeness this method fails completely in such cases.

This method of sample selection is less time consuming. The other advantage of the method is that it represents the population perfectly.

The disadvantage of the method lies in the biasness of the investigator and biasness of the results towards the largest representation.

Convenience Sampling

This sampling method lies completely with the convenience of the investigator. The investigator can use this method when the universe is not clearly defined, the sampling units are not clearly defined and the complete source list is not available. For instance; a marketing officer of Cipla has to collect data on sales of a particular drug and there are 200 medical stores in the city through which the particular drug is sold. The investigator may have decided to collect the data from 20 medical stores. The investigator may choose to collect the data by picking up 20 stores randomly from 20 medical stores or he can simply select 20 medical stores from a particular area nearest to his residence.

The biggest advantage of the method is that it is less time consuming and the sample under study is highly convenient as the sample is selected by investigator himself.

The disadvantage of the method are researcher’s biasness and non-representativeness of the sample.

Snowball Sampling

As a small ball of snow when rolled from the top of a snow mountain it gets bigger and bigger while rolling down. Likewise the snowball sampling is yet another technique to pick a non-random sample. The investigator has a targeted size of the sample in mind. The collection of sample starts by picking up one unit from the population and further the selected sample units is asked to provide references, further the provided references are added to sample the process continues until the desired sample size is reached. Following diagram describes the snowball sampling.