Transcript Slide 1

Evolution of the art of keeping Records and
Development of Total Survey Design with
application to some projects
By:
Pulakesh Maiti
Indian Statistical Institute
• Summary. While statistics have been collected and used in this
subcontinent from antiquity, much changes in collection and use
took place during the British Period (1757 – 1947) in Indian
History.
• Some of the changes were due to imperial needs, but much of it
took place indirectly as a result of western education and a spirit
of scientific curiosity and experimentation.
• Interest in rapid social, economic and technical development
added a new dimension after India’s independence in 1947.
• We keep a track of the evolution of data collection. Some
discussions are made on the acceptance of random sampling
with its basic principles and the necessary activities in its
domain.
• Finally, Total Survey design has been developed and deployed to
some projects undertaken in India and abroad.
Two highlights from the pre British period.
In the Arthasastra by Kautilya (321 – 296 B.C.), which
literally means a treatise on economics, one gets an
account of data collection.
“ It is the duty of Gopa, village accountant, to attend
the accounts five or ten villages, as ordered by the
collector general……..Also, having numbered the
houses as tax paying or non-tax paying, he shall not
only register the total number of inhabitants of all the
four castes in each village, but also keep an account of
the exact number of cultivators, cowherds, merchants,
artisans, labourers, slaves and biped and quadruped
animals, fixing at the same time the amount of gold,
free labour, toll and fines that can be collected from it
(each household)”.
(Shamsastry, 1929, p 158).
• “The basic unit for recording information
pertaining to agriculturists and the produce
was the village. Ascertainment of the extent
of the soil in cultivation and weighing
several portion of personal observation was
made through the superintendent to the
survey, the Bitkchi, the Patwaris who were
being appointed at the village level”.
The above period pertains to that of Akbar, the
Great MoghulEmperor(1556A.D.to1605A.D.).
In many Countries, especially in European Countries
too, evolved the mechanism of data collection. As
early as in 1662 AD, Graunt published his work on
social statistics based on the data collected in an
arbitrary or haphazard manner.
However, such practice was not well organized. It is
only after the industrial renaissance in Europe, the
necessity for such enquires in depth and breadth
also increased and collection of data in the from of
complete enumeration on various social, economic,
demographic and biological characteristics came into
practice in the 19th century through organized
bodies.
Such a practice came into existence in India too,
when the British Government started census
operation around 1881.
The following is a summary
picture
of
the
current
statistical system in India
developed through the early
and later British period and
the
period
after
independence.
• Office of the Registrar General and census commissioner;
Home Ministry is responsible for conducting the decennial
population data, birth and death statistics, calculation of
birth, death and other demographic rates;
• Department of commercial intelligence and statistics,
Ministry of Finance looks after statistics on foreign trade
and business;
• Reserve Bank of India, Ministry of Finance, looks after
foreign trade, monetary flow, interest rates etc.;
• Directorate of Economics and Statistics, Ministry of Food
and Agriculture is responsible for compiling and publishing
agricultural statistics such as crop production, crop fore
costs, fisheries, live stock on all India basis;
• Labour Bureau, Ministry of Labour, prepares consumer
price index number;
Office of Economic Adviser (OEA), Ministry of Industry on
a weekly basis, based on price quotations compiled by
official as well as some non-official agencies in respect of
435 selected items and commodities identified in the
basket of index;
Central Bureau of Health Intelligence (CBHI), State
Welfare Bureau, ICMR, Ministry of Health and Family
Welfare records different aspects of public health and
family welfare.
The system producing health statistics is totally
decentralized and still relatively week even by Indian
Standard on incidence or prevalence of major diseases at
the national level;
Newly Created Ministry of Environment and CSO have
been bringing about handbook on environmental
statistics.
• The rapid growth of interest in “sampling
methods” and the conclusions made possibly
started after Kiaer (1895) who introduced the
concept of random sampling to replace the usual
approach of complete enumeration and
emphasize the value of a representative sample.
• A representative sample is defined as a
photograph, who reproduces details of the
original in its true relative proportions. Bowley
A.L. (1906) discussed about the use of a random
sample. The works of Bowley A.L. (1926) and
Neyman .J (1934) may be said to have laid the
foundation f modern sampling theory.
India witnessed the advent of large scale sample surveys
under the guidance of Late Professor P.C. Mahalanobis.
The National Sample Survey (NSS) was created in 1950 as
a multifaced fact finding body
. The Department of statistics (DOS) was set up in the
Cabinate Secretariate in April 1961 and during the same
period CSO and NSS were under the full fledged
department of statistics (DOS).
In the month of February, 1999, Department of Statistics
and Programme Implementation were merged and
named as ‘Department of Statistics and Programme
Implementation’ in the ministry of planning and
implementation.
Finally by October 1999, Department of Statistics and
Programme implementation were declared as ‘ Ministry of
Statistics and Programme Implementation’.
Responsibility of collection or coordination of data fell on NSSO
and CSO. Since then NSSO is
continuing to contribute to
National data base, whereas CSO
is mainly playing the role of
dealing conceptualization and
standardization
of
different
concepts and definitions.
Classification of Available Data Sources
Data at present are obtained mainly through
•The government organization set up;
•Different line departments of the government;
•Academic research institutes / universities.
The first two may be defined as official data, whereas the
third one may be termed as academic statistic.
Academic statistics are mainly generated from
different research projects undertaken by
different research institutions/universities ,while
making investigations on
•methodological issues;
•development of probability/non probability samples
•understanding the nature and extent of errors
and their effects on survey results.
It may be noted that, much attention has been
paid by the survey theoreticians to measure
the extent of sampling error(a part of total
errors) and to control through properly adopted
sampling designs, choice of appropriate
estimators,
but, so far survey design has received little
importance in theory and practice of survey
sampling
Schematic Diagram 1
Schematic Diagram 1 (Contd.)
Each tool involves estimation and estimators that differ in
mathematical complexity. One needs to examine at this
stage also, if relatively simple descriptive estimators such
as totals, means and ratios may be used or more complex
relationship measures such as regression or correlation
coefficient may be used in exploratory analysis, whose
primary concern is to make the characteristics of the
population being studied more understandable.
It is also necessary to plan, if some of those tools may
also be used in confirmatory analysis, when the objective
is to test statistical models or assumptions indicated by
exploratory work;
It is also necessary to decide on the type of activities to be
conducted in the face of non-sampling errors and on the
statistical tests to be used for measuring total error.
Schematic diagram 2
Schematic diagram 2 (Contd.)
The normal practice adopted so far in survey
sampling is to take the decisions on the choice
of
• a sampling frame;
• a sampling design;
• a questionnaire design;
• sample size;
• sample weights;
without much considerations to survey design .
In the next few slides ,we discusses some issues
relating to the above topics.
A Sampling Frame:
In some situations one may have number of
frames. For example, for the study of health status
of workers, one of the frames could be (i) list of
work places, (ii) visiting their homes, and (iii)
telephoning them at home. For any specific illness,
physician’s offices may also be visited.
Thus, candidates for a sampling error could include
list of areas, telephone numbers, business
establishments, physicians or hospitals.
The frame chosen will affect the quality of survey
results.
When deciding to adopt a particular frame, one
would need to consider the errors that would be
introduced as a result of this choice.
A Sampling Design: The type of frame chosen
will influence the type of sample design that can
be used and will influence the efficiency of the
potential design;
In case of non availability of a frame, sampling
design adopted will be different from the one,
where more than one frames, when jointly used
cover the entire population, are used.
Normally, stratified multistage sampling design
is adopted in practice. In the face of having
intermediate reference units as sampling units,
sampling design would be different.
A Questionnaire Design: After defining the concepts, definitions to be
used and choosing the sampling design, a detailed list and description
of the survey variables with the units of measurement is prepared in
consultation with the subject specialists, before they are presented in
a most efficient way as a data gathering instrument.
Sometimes, the variables to be measured have to be translated into
operational/workable definitions and expressed in the form of a
logical series of questions, which the interviewer can ask and the
interviewee comprehend and answer.
They should be designed in such a way that they (i) enable the
collection of actual information, (ii) facilitate the work of data
collection, collation, processing and tabulation, (iii) ensure economy
in data collection (iv) permit comprehensive and meaningful analysis
and purposeful utilization of captured data.
The refinement of the general data requirement of any survey into the
precise, questions is a step-by-step process. Just as development of a
complex design is. There should be spaces indicating “confidentiality”,
the identity of the agency and the hierarchical identity of the
respondent.
• Sample size: Basic approaches for single item(s) of enquiry
based on SRS designs depend only on the precision
measured in terms of (i) margin of error, (ii) coefficient of
variation, (iii) cost concepts alone and (iv) also considering
both precision and cost.
• These approaches are applied to different commonly used
such designs as unstratified sampling, stratified sampling,
cluster sampling and multistage sampling.
•
•
•
•
•
The statistical tool used for determination of sample size is
, for Qualitative Characteristic
and
, for Quantitative Characteristic
where, , ) are sample statistics and population parameters;
d is the margin of error and is the confidence coefficient
attached to the statement that the sample statistic would
be within + / - of d of the population parameter.
Considering precision of estimate
only
• Situation I, Single Item: Qualitative Characteristic,
under SRS, using the above probability statement,
(with 95% confidence)
nI=4.P(1-P)/d^2
• As a thumb rule, under SRSWR, nI would be taken as
nI=1/d^2
• Under SRSWOR, nF would be taken as
nF=nI[N/(N+nI)]
Situation II, Single Item
Situation III, Multiple Items
Situation IV: sample size for
subdivision
• Considering cost aspects only:
There is no denying the fact that in most
surveys the cost aspect is of primary concern.
An overall budget is contemplated and various
cost components are envisaged. This again
depends on the set up and the survey
problems at hand
Situation V:One stage sampling
Situation VI
Cluster Sampling
Two-Stage Sampling
Optimality Criteria
Computation of Sample Weights:
Two qualities of each respondent are identified under the fixed
population view.
One is structural identity indicating which part of the stratum structure
(stratum, primary, secondary sampling unit) the person came from and
the other is sampling weight indicating the relative likelihood of being
selected and responded in the survey.
A sampling weight is calculated as the reciprocal of each respondent’s
original probability of selection provided there is no non response.
In the case of non response, the above weight has to be revised by
multiplying by the inverse of response probabilities.
Type of estimator(s) used for
Estimating total of a Character y
Survey Design:
Survey design is the design for allocation of
the jobs to the investigators and supervisors
engaged as members of survey personnel
group;
It helps one estimate measurement variance
and Survey Design has to be determined at
the planning stage.
Survey design is essential for separating the
sources of variation.
• Analysis is carried out in standard practice
with the assumptions that ,
(1) there is no problem with the frame,
(2) no problem of non-response and ,
(3)no problem of measurement error,
The only error arising is due to sampling error
and for that s.e. of the estimator
is
calculated.
WHAT HAPPENS WHEN THE
ABOVE ASSUMPTIONS ARE
VIOLATED?
(i.e.there are frame errors,true
values are not reported and
data set are incomplete).
• Survey Errors:
Survey errors can be classified as sampling
error and non-sampling errors by type and
within each category, errors are classified as
variable errors and biases by nature.
Variable errors and biases can arise form
sampling and/or non-sampling operations.
This double dichotomy gives rise to a four fold
classification of errors.
Many potential sources of errors can be found in each of
these classes, since every operation is a potential source of
variable errors and biases. Different biases can be considered
as a set of constants determined by the essential survey
conditions, although their values remain largely unknown
Biases represent the difference between expected sample
value and true value, whereas, variable errors measure the
source of difference between the estimate and its expected
value. Variable sources would fluctuate, if we are to select
different samples with the same design.
Most biases can not be reduced by increasing the size of the
sample, but only by improving the quality of operation.
Contrariwise the reduction in variable errors depends on the
number of units of some kind.
Variable errors can be measured by noting
internal replications of the units within the
sample.
Measurement requires the replication of units,
whether sampling units or observations by
proper survey design to separate sources of
variation.
Measurement of biases essentially depends on
a different method external to the survey
proper.
Non-sampling errors are often thought of as being due
entirely to mistakes and deficiencies entered during
planning, execution and processing stages of the
survey operation. Non-sampling errors are defined as a
residual category.
Thus, one can have non-sampling errors arising from
(1)deficiencies in the problem formulation leading to
wrongly conceived concepts, definitions and inability
to arrive at the workable definition;
(2)imperfections in the frame leading to an inappropriate
sampling design and wrong population being studied;
poor construction and/or inadequacies of the frame;
(3)imperfections in the questionnaire design;
(4)inappropriate
choice
of
reference
period;
imperfections in the tabulation plan;
(5)inability in collecting information from all
items and all respondents;
(6)inaccurate survey design;
(7)mistakes in recording information;
(8)variability in responses;
(9)illogical /unrepresented data;
(10)errors in interpretation.
Many such factors can cause a disagreement
between survey results and true population value.
As one might expect, even the notion of true
population value sometimes appears to be
controversial.
In some situations, the notion of an absolute
standard for comparison is a fundamental element
of the conceptual frame work, in other situations
one may be satisfied with a purely operational
view of reality, where measurements are simply
defined as a product of a specified data collection
procedure. Absolute standard of truth plays no
role in the purely operational work.
• Some illustrative examples of sources of nonsampling errors encountered in real life
problems:
Every activity outlined at different stages of the
schematic diagram 1 may be subject to errors, if
proper measure is not taken at that stage. Started
with the definition, errors may occur and end up
till the completion of the study.
For illustrative purposes, we mention few
examples of possible errors likely to occur at
each stage of some of the projects undertaken,
had there been no measure taken ,through the
display of next few slides.
Workable definition: The project “Domestic Tourists in
Orissa (1988-89), needed redefinition of a “tourist” with
respect to the objective of the study. Among others,
enquiries were also directed to finding availability of
existing
infrastructure
facilities
in
terms
of
accommodation, transport (road, rail, air), medicine and
other aspects.
Normally, a tourist by definition is a person who visits
places of historical monuments, pilgrimages etc.
According to objective of the study, any person, for any
reason whatsoever, requiring accommodation to spend
at least one night should be considered as a tourist, and
hence became a member of the target population.
Therefore, the usual definition of a tourist became
unusable and was defined according to the objective of
the study. Otherwise, target population considered could
have been under coverage.
• Frame Problems: That an imperfect frame may lead to
coverage errors was observed both in the study of health
status of workers and the study through IPPVIII Project
(1998);
In the former study of health workers, errors due to coverage
problem were likely to occur,
• If an area frame were sued, which would cover all workers,
but would also include large workers;
• If a telephone frame were used, which would not cover
workers without telephones, and would also large number
of workers;
• If business establishments were used, which would contain
large concentration of workers. However, it might be
extremely difficult to construct a complete list of elements;
• However, if medical records are used, it became easy to
identify persons who had the disease.
The Indian Population Project (IPP-VIII 1998): undertaken
at the Indian Statistical Institute was meant for studying
different facts of IPP-VIII. One important component was to
assess the impact of the project on the beneficiaries.
The lower income group formed the beneficiary group.
While listing the beneficiaries in an area, many nonbeneficiaries were included causing over coverage.
It was also observed in the project: “Cost Benefit Analysis
of Rural Electrification (1975-76):
that investigators employed as piece rate workers
appeared to list more households, which were not within
the village boundaries.
Later this was seen through maps and other available
relevant materials.
In the absence of availability of a frame in the study
of the project entitled identification of other
backward classes (1994-95),
a frame for urban population was generated
through a sample drawn from a rural population.
The method developed and deployed generated
the frame with coverage error.
The project entitled “Evaluation of total literacy
Programe” (23.06.92-07.07.92): in the district of
North 24 Parganas, undertaken at the Indian
Statistical Institute aimed at evaluating the
Programe of total Literacy Campaign (TLC) in terms
of literacy rates and some other parameters. The
learning centers with identity parameters formed
the frame under study.
When target population appears to be either
too mobile in nature or unidentifiable to be
listed down, even an imperfect frame was
used knowing consciously that it will lead to
errors of over coverage or under coverage.
This mistakes were unavoidable; but necessary
adjustment on the survey results was made.
[Stanza-Bopape Project (1995-96) Calcutta
Urban Poverty Survey (1977-78)].
• For some kinds of population, traditional finite population sampling is
not feasible because of the following reasons.
• To mention a few. The population size may not be known, and an
exhaustive list of target population may not be readily available.
Instead, one can locate and identify a set of distinct reference units
forming what we may call a reference population. These reference
units are used as the list of sampling units;
• Sampling from these reference units and en scanning the sample, a
sample of population units is obtained following a specified linking
rule; Use of intermediate reference units as frame units create the
problem of multiplicity of the population unit.;
• Concept of Generaliszd Horvitz-Thompson Estimator;
• This situation creates unavoidable mistakes due to a frame, knowing
consciously that it will create multiplicity problem; but adjustment on
the survey estimates can be made [NSS-Slum Survey June 2012, to
December 2012].
Questionnaire Design: Errors are likely to occur from
Inappropriate ordering/spacing: Inappropriate order
of placement not only generates biased information,
but partial non-responses may occur for the
questions following the sensitive and/or logically not
properly placed questions.
For instance “Questions on which area hospital the
respondent prefers” should not be asked before
questions mentioning the name of specific area
hospitals.
A logical sequence should be mentioned; otherwise
responses to the preference questions would be biased
in favour of those hospitals whose names have been
mentioned [CMDA-Survey (1975-76)]
•Inability in understanding the question: In the
personal interview, a question on “satisfaction with
respect to health care” might fail to make it clear to
the respondent, which aspect of health care is being
addressed; accessibility, cost or quality [CMDA (197576 ); IPP VIII (1988)];
Respondents may not also understand questions
that use technical jargon or unfamiliar words.
•Presenting more questions at a time. Some question
might ask more than one questions “Do you plan to
quit your job and find another next year”?
•The respondent may feel compelled to answer rather
than to admit ignorance; such questions should be
avoided
•In appropriate choice of word: certain words or phrases in
the questions can influence answer. For example, “would
you agree that” could influence some respondents to answer
in the affirmation [identification of other backward classes
(1994-95 )].
Other characteristics of questions may taint the respondent’s
answer. Use of inflammatory words, links to the status quo (for
example most guardians think that) and suggestion of
hypothetical circumstances should be avoided[(1994-95)].
•Length of schedule/questionnaire: If the questionnaire is
too long, the respondent may loose interest and end
participation prematurely. Even, if he continues to do till the
end, quality of the data may be diluted. It is true, one would
need at the same time additional information for consistency
check; but there should be a balance between the two.
•Choice of the reference period: There should be
varying reference periods instead of fixed reference
period for all information [NSS].
•Sequence of the questions: Starting with
complicated or sensitive questions that are difficult
to answer may cause the respondent to feel
inadequate or uncomfortable. Starting simple and
innocuous, but interesting questions, on the other
hand, tends to put the respondent at case and
create harmonius feelings towards the survey topic.
•
Interviewers’ inabilities: interviewer’ deficiencies (poor interviewing techniques,
misunderstanding of concepts, misinterpretation of response, wrong arithmetic
etc., his gender, employment status, ability to create rapport etc.) may create
problems in data collection.
•
Field condition: difficulty in implementing a random sample due to a peculiar field
condition [REC ( 1975-76 )] may arise.
•
Respondent’s inability: respondents’ failures in interpretation of the question,
inability to provide answers and deliberate or inadvertent supply of wrong
information etc., and also their preferences for some members may create some
problem [ 1976 fertility survey in Indonesia, Dasgupta and Mitra (1958)];
•
Choice of the respondent: inappropriate respondent rule does not help choose a
respondent [Tuigan and Cavdar (1975)];
•
Social stigma: imposition of social stigma like those of female participation in
workforce [Shah (1981)], taking alcohols etc., ;
•
Purposeful reporting: purposeful reporting of certain information incorrectly, such
as women do not like to disclose their ages;
Sampling Design: for a stratified random sampling, choice of the
stratification variable must be correlated with the study variable; the
administrative zones should not always form the different strata.
As illustrative examples, the following instances are cited:
•Time of start of functioning of the projects under evaluation;
• Age of the respondents (ISI Project (1997-98)];
• Degree of affluence [Community life in Selected Communities in
South Africa (1995)];
•Population size [The socio-economic demographic and cultural
pattern of the female labour force participation (1995-96)];
•Different types of races [Community attitudes and preferences
pertaining to country and cremation related issues in the East Rand in
South Africa (1996-97)];
•Degree of concentration [Domestic Tourists Survey (1988-89)];
•Intensity of electrification [Rural Electric Corporation (1976-1977)];
•Administrative Zones [ISI Project (23.06.92-07.07.92); ISI Project
(1994-95); ISI Project (1997-1998a), ISI Project (1997-1998).
Above deficiencies on Survey Data lead to Incomplete Data:
(1)Non response arises due to deficiencies at all the stages of the integrated,
system-contrary to the general belief that it occurs only at the interactive
process between a respondent and an investigator. An extended definition
of non-response, particularly item non-response includes in which
missing data arise,
(2)From the processing of information provided by units rather than refusal
of units. For example, editing procedures may eliminate some responses
which are to be judged to be impossible and inconsistent with other
findings;
(3)Out of the problem of non contact due to inaccurate assessing information
to reach a sampling unit for inadequacy in information in the frame;
(4)Because of non-availability (temporary) of a respondent at home;
(5)Because of non-coverage due to the frame problem;
(6)Because of ill designing of the schedule which creates burden over the
respondent;
(7)For lack of solicitation to make respondents participate in the survey
process;
(8)Due to difficulties in contacting under natural calamities like
floods/earthquakes and/or political disturbances. [see the Schematic
Diagram 2].
Causes of item non-response:
(1)Non-response rates are higher for sensitive items such as income etc., [Donald
(1960)];
(2)Mode of interview is responsible for producing items non-response [1975-76];
(3)Higher items non-response rates arise on questions enquiring substantial thought
or effort on the part of the respondent [Frances and Bush (1975), Craig and MCCann (1978)];
(4)Item non response is independent of questionnaire length [Craig and MC-Cann
(1978)];
(5)A significant age and occupation has effect on item non-response [Messmer and
Seymour (1982)];
(6)Questions appearing after a branching question has notably higher item nonresponse [Messmer and Seymour (1982)];
(7)Interviewers who were more impersonal have lower item non response rates than
those of interviewers who had a more personable interviewing style [Rogers
(1976)];
(8)Non response on some items will be higher for some subgroups (elderly, females,
the less educated);
(9)Interviewers who though it inappropriate to ask a sensitive question will have
higher items non-response on the question [Bailer etall (1977)].
The previous discussions generate awareness on the
existence of non sampling errors.
Experiencing globally the existence of non-sampling errors of
different types, the design to control total error of survey
estimates considering all sources of error has come to be
known as the Total Survey Design. The practice of total survey
design should operate in a comprehensive and integrated
fashion.
Surveys need to be carefully planned with due considerations,
given to all known sources of errors. Resources available for
conducting the survey should be directed towards minimizing
total error and not any single error component. During
analysis, an analysis should be made to make an estimate of
total error. Finally, in anticipation of future survey on similar
types, estimates of the components of total error should be
made so that they may be used in the planning phase of the
survey population in future.
During the design phase of the survey,
the practice of the total survey design involves assessing the
level of error associated with alternative procedures on
(i) sampling design, (ii) questionnaire design, (iii) survey
design and choosing that combination of sampling design,
measurement procedure, analysis method which will
minimize the total error of the estimate within available
resources.
• The success of total survey design methods at the planning
stages depends on good information on costs and errors of
alternative procedures, and on the availability of total
error and total cost models that can be used for choosing
an optimum design, optimality in the sense of minimizing
total error.
• While the survey is being carried out, the practice of total
survey design involves the use of quality control
procedures that monitors progress of data collection .
The goal of quality control procedures is to detect
•errors when they occur or soon after, so that the
survey work can replaced, if necessary ;
Total survey design at the analysis and reporting
stage entails attempting to calculate and report on
the total error of the survey estimate by
(1)use of suitably designed probability sampling
procedures which would allow one to calculate accurate
measures of sampling error;
(2)introduction of experiment design into the survey
process that allows determination of the magnitude of
the effect of a particular error source on the total error
of the estimate.
However, attempts to make use of total survey
design may be hampered by such several
problems as
(1) introduction of additional complexities into
the survey;
(2) collection of extra data or inclusion of
experimental methods to permit ;
(3) estimation of impact of certain sources of
non-sampling errors; need of effort and money
that could be devoted at the estimates, once the
data are available;
(4)use of quality control procedures considering
time and money that could be divided in to the
primary activities of sampling and data collection.
Emperical Evidence in support of
the previous statements obtained
through a number of real life
projects.
Development of The total Survey Design:
Since the success of the total survey design depends on good
information on costs, type of errors with some quantitative
measures,
a dress rehearsal through pilot studies was held to examine,
(1)If, question wording may be confusing;
(2)If, the forms may be difficult for an interviewer to
administer;
(3) if, procedures appearing to be more complicated and
extensive for interviewers to complete on schedule;
(4) if, selection of interviewers would be guided by gender or
not;
(5) if, selection of interviewers would be guided by subject
specialist or
(6) if, experienced household interviewers;
The previous Tables were critically examined to see,
(1)if, the proposed approximately designed field
experiment finally would be appropriate considering the
magnitude of errors due to
non-response as well as measurement error
[Ref. Table 8, 9, 10, 11, 12, 13, 14, 15, 16];
(2)If, random sub sampling from the non-respondents or
call back procedure should be adopted and in case of
call back procedures what should be the number of
attempts/call backs [Table 17];
(3)If, non-responses vary by ages [Ref. Table 18];
(4)If, non-responses vary by household size [Ref. Table
19];
(5)If, non-responses vary by type of dwelling units [Ref.
Table 20];
(6)If ,non-responses vary by region [Ref. Table 21];
On the basis of the information gathered through pilot
studies, total survey design was developed and deployed
in executiting the projects at the final stage and the
following information were incorporated in designing the
total Survey Design.
• It was observed that three attempts are good enough for
completion of any project. This was revealed in the
literature also [Ref. Table 22];
• Since non-response rates appeared to be less, no effort
was made in estimating response probabilities of
responding units. Had the non-response rates been
comparatively higher, weighting adjustment procedure
could have been adopted in revising the initial designed
based weights by multiplying the reciprocal of the
estimated response probabilities, compensating for the
error due to non-response.
On the Choice of Weighting Adjustment Cells:
It hasbeen empirically observed through a number of real life
surveys [Bennet and Hill (1964), Cobb, Kind and Chen (1957), Dunn
and Howks (1966), Lubin, Levitt and Zuekerman (1962), Lundberg
and Larsen (1949), Newman (1962), Ognibene (1970), Pan (1951),
Reuss (1943), Skelten (1963), Warwick and Lininger (1975), Kendal
and Buckland (1960), Sudman (1976), Suchman (1962), Birbaum
and Sirken (1950), Deighton etall (1980), Politz and Siman (1949),
Madow etall (1983), Gower (1979), Demio (1980), Kalsbeek and
Lessler (1978), Lessler (1974. 1980), Roy (1976-77, 1977 – 78, 1988
– 89), Maiti (1994 – 1995, 1995 – 1996), Lyberg and Rapa Port
(1979), Turner, etal (1970), Bergman etal (1978), etc.], that nonrespondents differ with respect to the following characteristics.
1. Income class; 2. Household size; 3. Status of labour force;
Ownership status of dwelling units; 5. Age; 6. Socio-economic
groups;
7. Extent of coverage ; 8. Different recall periods;
9. Varying multiplicity size etc.
Any of the above variables may be used
for defining weighing adjustment cells in
estimating response probabilities. In case,
post stratification method is used,
Different cells or the same weighting
adjustment cells may be used.
(1)A non-linear cost model alternative to
existing linear model has been developed
[Maiti, P. (2008)];
(2)A survey design model for measuring the
measurement variance has been developed
[Ref. Maiti, P. (2009)];
References:
(1)Babbie, Earl R. (1973): Survey Research Methods. Belmont, CA, Wadsworth.
(2)Backstorm, Charles H. and Gerald Hursh-cesar (1981). Survey Research, 2nd edition,
New York, Wiley.
(3)Bailer Barbara A. (1979): Ratation Sampling Biases and their effects on estimates of changes,
43rd session of the International Statistical Institute, Manila.
(4)Bergman L.R. Honve R and Rappa, J. (1978): Who do some people refuse to participate
interview surveys? Statistik Tidskrift.
(5)Bowley, A.L. (1906): Address to the Economic and Statistics Section of the British
Association for the Advancement of Science, York, 1906, J. Roy Statist. SOC. 69,
540-558
(6)----------(1926): Measurement of the Precision attained in Sampling. Bulletin of the
International Statistical Institute, 22, 6-62.
(7)Bennet, C.M. and Hill, R.E (1964): A companson of selected personality
characteristics ofrespondents and non-respondents to a mailed Questionnaire. Journal
of Educational Research, 58, No.4 178-180.
(8)Birbaum, Z.W. and Monroe G. Sirken (1950): Bias due to non-availability III
Sampling Survey. JASA, 45, 98-111.
(9)Bailer, Barbara A. (1979): Rotation Sampling Biases and their effects on estimates of
changes 43rd session of the International Statistical Institute, Manita.
(10)Bergman, L.R. Honve, R. and Rappa, 1. (1978): Why do some people refuse to
participate interview surveys? Statistik Tidskrift.
(11)Brooks, Camilla and Barbara Bailar (1978): An Error Profile Employment as
Measured by current Population Survey Statistical Policy Working Paper 3, office
Federal Statistical Policy and Standards. u.S. Department of Commerce.
(12)Bandyopadhyay, S. Chaudhury, A., Ghosh, J.K. and Maiti, P. (1999): A Draft
Proposal for an Enterprise Survey Scheme as a substitute for Economic Census.
Indian Statistical Institute, Calcutta.
(13)Cochran, W.G. (1977): Sampling Techniques, Wiley Eastern Limited, New Delhi, III
edition.
(14)Cole,
D
(1956):
Field
Work
in
Sample
Surveys
of
Household
Income
and
Expenditure, Applied Statistics, Volume 5, 49-61.
(15)Cobb,
J.M.,
King
S.,
and
Chen,
E.
(1957):
Differences
between
respondents
and
nonrespondents
in
a
morbidity
survey
involving
clinical
examination,
Journal
of
Chronic
Diseases, 6.
(16)Chevry
Gabriel
(1949):
Control
of
General
Census
by
means
of
an
area
sampling
method, JASA, 44,373-379.
(17)Chapman,
David
D.
and
Rogers
Charles,
E.(1978):
Census
of
AgricultureArea
Sample
design
and
methodology.
Proceedings
of
the
American
Statistical
Association
Section on Survey Research Methods, 141-147.
(18)Craig, C. Samuel and John M. Mc Cam (1978): Item non-response in Mail Surveys: Extent and Correlates, Journal of Marketing
Research, 15, 285 – 289.
(19)Deming, W (1960): Sampling Design and Business Research, New York, Wiley.
(20)---------------(1944):
On
Errors
in
Surveys"
American
Sociological
Review,
9,
359369.
(21)--------------(1950): Some Theory of Sampling, John Wiley and Sons, New York.
(22)--------------(1953):
On
a
Probability
mechanism
to
attain
an
Economic
Balance
between in resultant error and the bias of non-response. JASA 48, 743-772.
(23)Dalenius,
Tore
(1974):
The
Ends
and
Means
of
Total
Survey
Design;
Stockholm,
The
University of Stockholm.
(24)----------------(1957):
Sampling
in
sweden
Contribution
to
the
Methods
and
Theories
of Sample Survey Practice, Stockholm, Almquist and Wicksell.
(25)-----------------(1962):
Recent
Advances
in
Sample
Survey
Theory
and
Methods,
AMS, 33, 325-349.
(26)-----------------(1977a):
Bibliography
of
non-sampling
errors
III
Surveys.
l(A-G),
International Statistical Review, 3, 71-89.
(27)-----------------(1977b):
Bibliography
of
non-sampling
errors
III
Surveys
II(A-Q),
International Statistical Review, 45, 181-197.
(28)------------------(1977c):
Bibliography
of
non-sampling
errors
III
Surveys,
IIl(R-Z),
International Statistical Review, 45, 313-317.
(29) Dasgupta, A and Mitra, S.N. (1958): A Technical Note on Age Grouping. The
National Sample Survey No.12, New Delhi.
(30) Dunn, J.P. and Hawkes, R (1966): Comparison of non-respondents and respondents
in a Periodic Health Examination Program to a mailed questionnaire, American
Journal of Public Health, 56, 230-236.
(31) Demaio, T.Y.(1980): Refusals, who where and why? Public Opinion Quarterly 44.
(32) Deigton, Richard E, James, R. Poland, Joel R Stubs and Robert D Tortora (1978): Glossary of Nonsampling Error Terms, An illustration of a semantic problem in statistic, Statistical policy working paper 4,
Washington DC: U.S. Department of Commerce.
(33) Donald, Marjorie N (1960): Implication of non-response for the interpretation of mail questionnaire
data, Public opinion quarterly, 24, 99-114.
(34) Erickson, W.A. (1967): "Optimal Sample Design with non-response", JASA, 62, 6378.
(35) Emrich, Lawrence (1983): "Randomised Response Technique" In William G. Madow
and Ingram olkin eds..
(36) Fellegi, Ivan P. (1963): The EIncomplete data in Sample Surveys; Volume 2, Theory and
Bibliographies, New York, Academic, 73-80valuation of the Accuracy of Survey Results Some
Canadian Experiences. International Statistical Review, 41, 1-14.
(37)----------------(1964): Response Variance and its Estimation, JASA, 59, 1016-1041.
(38) Fellegi, Ivan and Sunter, A.B. (1974): Balance between Different Sources of Survey
Errors, Some Canadian Experiences, Sankhya, 36, Series C), 119-142.
(39) Ferver, Rebert (1966): Items non-response in a consumer survey, Public Opinion
Quarterly, 12,669-676.
(40) Ford, Barry L. (1976): Missing Data Procedures, A Comparative Study, American
Statistical Association, Proceedings of the Social Statistics Section 1976, Pt. 1, 326329.
(41)Frances Joe D and Lawrence Busch (1975): What we know about – I don’t know, Public opinion
quarterly 34, 207 – 218.
(42)Ghosh, J.K. and Maiti, Pulakesh (2003): The Indian Statistical System at cross roads
an appraisal of Past, Present and Future,
presented at the IMS meet during 2-3
January - 2004.
(43)Ghosh, A (1953): Accuracy of Family Budget Data with reference to period of recall, Calcutta Statistical Association Bulletin, 5, 16-23.
(44)Gower,
A.R.
(1979):
Characteristics
of
non-respondents
in
the
Labour
Force
Survey,
Statistics Canada.
(45)Groves, Robert, M. and Kahn Robert Louis (1979): Surveys by Telephone, A national
comparison with personal interview, New York; Academic.
(46)Gray, P. and Gee, F.E.N. (1972): A Quality check on the 1966 ten percent sample
census of England Wales, office of the population census and surveys, London.
(47)Ghosh,
J.K.,
Maiti,
P.
Mukhopadhyay,
A.C.,
Pal,
M.P
(1977):
Stochastic
Modeling
and
Forecasting
of
Discovery,
Reserve
and
Production
of
Hydrocarbon-with
an
application, Sankhya, Series B, 59, pt. 3,288-312.
(48)Godambe, V.P. (1976) A historical perspective of the recent development in the theory of sampling from actual
populations, Dr. Panse memorial lecture organized by Ind.Soc.Agri. stat., New Delhi, 29th March, 1976.
(49)Hansen, M.H., Madow William G., and Tepping B.J. (1983): An Evaluation of Model
dependent
and
Probability
Sampling
inference
in
Sample
Surveys,
JASA,
78,
776807.
(50)Hanse,
M.H.,
Hurwitz
William
N.
(1946):
The
Problem
of
non-response
in
Sample
Survey, JASA, 41,516-529.
(51)---------and Nisselson, JASA, 50, 701-719H., Steinberg, J. (1955): The redesign of the current population
survey,.
(52)----------Jubine, Tomas B. (1963): The use of imperfect lists for Probability Sampling
at U.S. Bureau of Census, Bulletin of the International Statistical Institute, 40(1), 497517.
(53)--------and
Pritzker,
Lenon
(1964):
The
Estimation
and
interpretation
of
Gross
differences and the simple response variance. In C.R. Rao with D.B. Lahiri, K-P.
(54)Messmer, Donald J, and Daniel T. Seymour (1982): The effects of branching on item non-response, Public opinion
quarterly 46, 270 – 277.
(55)
Nair,
P.
Pant
and
S.S.
Shrikhande
eds.
Contributions
to
Statistics
Presented
to
Professor
P.c.
Mahalanobis
on
the
occasion
of
his
70th
birth
day
Oxford,
England,
Pergaman, Calcutta Statistical Publishing Society, 111-136.
(56)-----------and
Bershad,
Max
A.
(1961):
Measurement
errors
in
censuses
and
surveys,
Bulletin of the International Statistical Institute, 38, 359-374.
(57)-----------Marks,
Elis
Mauldin,
Parker
W.
(1951):
Response
Errors
in
Surveys,
JASA,
46, 147-190.
(58)------------(1976):
Some
Important
Events
in
the
Historical
Development
of
Sample
Surveys
in
Donald
Bruce
Owen
ed.,
on
the
History
of
Statistics
and
Probability,
Statistics Text Books and Monographs, Volume 17, New York Dekker, 73-102.
(59) Hacking, I. (1965): Logic of Statistical Inference, Cambridge University Press.
(60)
Hurscgberg,
David
Frederick,
J.
Scheuren
and
Yuskavage
Robert
(1977):
the
impact
on
Personal
and
Family
income
of
adjusting
the
current
population
survey
for
under
coverage,
Proceedings
of
the
Social
Statistics
Section,
American
Statistical
Association, 70-80.
(61) Hubback, J.A. (1927): Sampling for rice yields in Bihar and Orissa, Imp. Agr. Res.
Inst. Bulletin, Pusha (reprinted in Sankhya (1946), 7, 282-294).
(62) Halden, J.B.S. (1957): The Syadvada System of Prediction, Sankhay 18, 195-2000.
(63) Hacking, J. (1965): Heinemann. Lobgic of Statistical Inference Cambridge University Press.
(64)Hoinville, Gerald and Robert Joell (1978): Survey Research Practic , London,
(65) Jessen, Raymund J. (1978): Statistical Survey Techniques, New York, Wiley.
(66) Jeganathan, P. (1997a): Structural Reading and Evolution of Indus Script Viewed as
a Complex System, Part I: Meteorological Reading, Prague Bulletin of Mathematical
Linguistics, 67, pp. 75 – 137.
(67) Jeganathan, P. (1997b): Also appeared with agreement of editors in RASK, International Tidsskrift for Sprogog
Kommunikation, 8, December 1998, pp. 47 – 78.
(68) Kendal, Maurice George and William R. Buckland (1960): A dictionary of statistical terms, 2nd Edition, London,
Oliver and Boyd.
(69) Kiawer, A. (1895): Observations et experiences concernant des denombrements
representatives, Bull. Int. Statist. Inst. 9, 176-183.
(70) Kruskel, William and Frederick Mosteller (1980): Representative Sampling, IV, the
History of the concept in Statistics, 1895-1939, International Statistical Review, 48,
169-195.
(71) Kish, L. (1965): Survey Sampling Wiley and Sons, New York.
(72) ------ and Hess 1. (1958): on non-coverage of sampling dwellings, JASA, 54, 509524.
(73) Kalton, Graham and Daniel Kasprzyk (1982): Imputing of missing survey Response, American
Statistical Association 1982, Proceedings of the Section on Survey Research Methods, 22-31.
(74) Koop, J.C. (1974): Notes for a unified theory of estimation for sample surveys taking into account
response errors, Metrika, 21, 19-39.
(75) Kalton, Graham (1983): Compensating for missing survey data Research Report
Series, Ann. Arbor Ml, Institute for Social Research, University of Michigan.
(76) Kendal, Maurice George and William R. Buckland (1960): A Dictionary of Statistical Terms, 2nd
edition, London, Oliver and Boyd.
(77) Kiaer, A.N. (1985): Observations of experiences concernantles denombremetns representatives (79)
Lessler J.T. , Bull. Inst. Int. Stat. I. Div. I pp 1976.
(78) Lessler, J.T. and Kalsbuk W.D. (1992): Non-sampling Error in Surveys, John Wiley and Sons Inc.
(1974): A double sampling scheme model for eliminating measurement process bias and estimating
measurement errors in surveys, Institute of Statistics Mimeo Series No. 949, University of North
Carolina, New ……
(80) Lessler J.T. (1980): Error associated with the frames, Proceedings of the America Statistical Section
in Survey Research Methods, 125 – 130.
(81) Lubin, B. Levitt, E. and Zuckerman, M. (1962): Some personality differences
between respondents and non-respondents in a survey questionnaire, Journal of
Consulting Psychology, 26-192.
(82) Lundberg, G.A. and Larsen, O.A. (1949): Characteristics of Hard-to-reach individuals in field
surveys, Public Opinion Quarterly, 13,487-494.
(83) Lyberg, L. and Rapaport, E. (1979): Unpublished non-response problems at the
national central Bureau of Statistics, Sweden.
(84) Little, Roderick J.A. (1982): Models for non-response in Sample Surveys, JASA, 77, 237-250.
(85)-------------------------(1983): Super Population models for non-response, Part IV. In William G. Madow and
Ingram
Olkin
eds.
Incomplete
data
in
Sample
Surveys,
Volume 2, Theory and Bibliographies, New York, Academic, 337-413.
(86) Lessler, J.T. (1974): A double sampling scheme model for eliminating measurement process bias and
estimating measurement errors in surveys, Institute of Statistics Mimeo Series No. 949, University of North
Carolina, New Chapel.
(87) --------------------(1980): Errors associated with the frames, Proceedings of the
American Statistical Association Section on Survey Research Methods, 125-130.
(88) Madow, W.G. Nisselson, Harold and Olkin, Ingram (1983): Incomplete data on
Sample Survey, Volume 1, Report and Case studies; New York, Academic.
(89) Mc. Neil, John M. (1981): Factors affecting the 1980 census content and the effort to develop a post
census disability survey. Presented at the annual meeting of the American Public Health Association.
(90) Mahalanobis, P.C. and Lahiri, D.B. (1961): Analysis of errors in censuses and
surveys, Bulletin of the International Statistical Institute, 38(2), 359-374.
(91)----------------and Sen, S.B. (1954): On some aspects of the Indian National Sample Survey, Bulletin of the
International Statistical Institute, 34, pt. 2.
(92)Mahalanobis,
P.C.(1944):
On
Large
Scale
Sample
Surveys,
Philosophical
Transactions of Royal Society, 231-(B), 329-451.
(93)-------------------(1946):
Recent
Experiments
in
Statistical
Sampling
in
the
Indian
Statistical Institute.
(94)--------------------(1941): A Sample Survey of the Acre-age under jute in Bengal, 4, 511-30.
(95)---------------------(1954):
The
Foundations
of
Statistics,
Dialectica,
8,
95-111
(reprinted in Sankhya 18, 183-194)
(96) Maiti, P (1983): unpublished Ph.D. Thesis entitled Some Contributions to the
Sampling Theory using auxiliary information" submitted to the Indian Statistical
Institute, Calcutta.
(97)---------------------(2008): Existence of the BLUE for finite population mean under multiple
imputation, Statistics in Transition new series , p. 223 – 258; Volume 9, Number 2.
(98)------------------------- (2009): Estimation of non-sampling variance components under the linear
model, Statistics in Transition new series, p. 193 – 233.
(99) Maiti, P. etal. (1999): Strengthening local Government in Madhya Pradesh, Indian Statistical
Institute, Kolkata.
(100) Maiti, P. (2009): Intra-and-Inter-block variation between fourteen blocks of the rural sector of the
district of Howrah, Report on decentralized planning, Indian Statistical Institute, Kolkata.
(101) Maiti, P. (2003): Development of Statistical Information System for Decentralised Planning,
Occasional paper no. 10 under Development Research Support Scheme, Deparment of Economics and
Rural Development, Vidya Sagar University, East Midnapore, West Bengal.
(102)---------------, Pal, M. and Sinha, B.K. (1992): Estimating unknown Dimensions of a Binary matrix
with application to the estimation of the size of a mobile population, Statistics and probability, 220 –
233.
(103) Moser, Clays Adolf and Graham Kalton (1972): Survey methods in Social
investigation, 2nd edition, New York, Basic Books.
(104) Mooney, H. (1962): On Mahalanobis' contributions to the development of sample survey theory
and method in C.R. Rao etal (eds) contributions of statistics, Pergamon Press.
(105)------------------(1967):
Sampling
Theory
and
Methods,
Statistical
Publishing
Society, Calcutta.
(106) Neyman Jerzy (1934): On the two different aspects of the representative method, the method of
stratified sampling and the method of purposive selection, J. Roy, Statist. SOC. 97, 5589625.
(107) Neter, J. and Waksberg, J. (1965): Response errors in collection of Expenditures data by household
interview. An Experimental Study Technical Report No. 11 U.S. Bureau of the Census.
(108) Newman, S. (1962): Difference between early and late respondents in a mailed survey, Journal of
Advertising Research, volume 2,37-39.
(109) Ognibene, P. Traits affecting questionnaire response, Journal of Advertising Research Volume 10, 1820.
(110) Pan, J.S. (1951): Social Characteristics of respondents and non-respondents in a questionnaire study
of later maturity, Journal of Applied Psychology, 35, 780-781.
(111) Politz, A.N. and Simmons, W.R. (1949): An attempt to get Not-at Homes into the sample without callbacks, JASA, 44, 9-31.
(112) Platek, R. (1977): Some factors affecting non-response, Survey Methodology, 3.
(113) Plan, V.T. (1978): A Critical appraisal of household surveys in Malaysia
Multipurpose
household
survey
in
developing
Countries,
Development
Centre,
OECD, Paris.
(114) Reuss, C.F. (1943): Differences between persons responding and not responding to mail
questionnaires, American Sociological Review, 8,433-438.
(115) Rao, V.R. and Sastry, N.S. (1975): Evolution of a total survey design, The Indian Experience, Invited
paper presented to the International Association of Survey Statisticians Warsaw.
(116) Rogers Theresa F. (1960): Interviews by Telephone and in person: Quality of responses and Field
Performance, Public opinion quarterly, 40, 51-56.
•Report of Research Projects
(117)
(1975-76):
Cost
Benefit
Analysis
of
Rural
Electrification,
Project
Leader
Professor J. Roy, Computer Science Unit, ISI, Calcutta
.
(118)
(1977-78):
Calcutta
Urban
Poverty
Survey,
Project
Leader
Professor
J.
Roy,
Computer Science Unit, ISI, Calcutta.
(119) (1975-76): CMDA, Health Survey, Computer Science Unit, ISI, Kolkata.
(120)
(1988-89):
A
Survey
on
Domestic
Tourists
in
Orissa,
Project
Co-ordinator,
P.
Maiti, IS I, Calcutta.
(121) (1994-95): An Enquiry into the Quality of Life in five communities in selected
districts of Rural West Bengal, Project Co-ordinator, P. Maiti, ISI, Calcutta.
(122)
(1995):
Community
attitudes
and
Preferences
pertaining
cremated
related
issues
in
the
East
Rand
in
the
Republic
CENSIAT, HSRC, Pretoria, South Africa, Principal Statistician - P. Maiti.
to
of
cemetery
South
and
Africa,
(123) (1995): Survey of family and Community life in the Selected Communities of the
cape
Peninsula
of
the
Republic
of
South
Africa,
CENSTAT,
HSRC,
Principal
Statistician - P. Maiti.
(124) (1995): the Socio-economic demographic and cultural pattern of the female labour force participation in the
North West and the Cape; CENSTAT, HSRC South Africa, Principal Statistician - P. Maiti.
(125)
(1996):
Stanza-Bopape
Project;
CENSTAT,
HSRC,
South
Africa,
Principal
Statistician - P. Maiti.
(126)
(1998):
Mid.
Term
Review
of
IPPVIII
in
Calcutta
Metropolitan
Area,
ISI,
Calcutta, Survey Statistician - P. Maiti.
(127) (1998): ISI-PWI Project on Strengthening Local Government in Madhya Pradesh,India, ISI, Kolkata – P. Maiti.
(128) (1999): ISI-HLL Collaborative Research Project on Business Research, ISI, Kolkata – P. Maiti.
(129) (2001, August): National Statistics Commission, Government of India
(130) (2009): Statistical Information System for decentralized planning with an application to District of Howrah – P.
Maiti, Preserved in the Prasanta Chandra Mahalanobis Memorial Archive and Museum, Kolkata.
(131)Roshwalb, Alan (1982): Respondent Selection Procedures within Households, American Statistical
Association 1982 Proceedings of the section on Survey Research Methods, 93-98.
(132) Rubin Donald B. (1983): Conceptual issues in the presence of non-responses, In William G. Madow
and Ingram Olkin eds. Incomplete data in Sample Surveys 2, Theory and Bibliographies, New York,
Academic, 123-142.
(133) -------------------(1977): formalizing Subjective notions about the effect of non- respondents in
Sample Surveys", JASA, 72, 538-543.
(134) ---------------(1978): Multiple imputations in Sample Surveys - A Phenomenological Bayesian
Approach to non-response, American Statistical Association 1978 proceedings of the Section on Survey
Research Methods, 20-28.
(135) --------------(1987): Multi imputation for non-response in Surveys, New York, Wiley.
(136) Rizvi, M. Haseeb (1983): Hot-Deck Procedures Imputation in William G. Madow and Ingram Olkin
eds., Incomplete Data in Sample surveys, 3, Proceedings of the symposium, New York, Academic, 351352.
(137) Shamasastry, R. (1929): Translation of Kautilya’s Arthasastra, 3rd Edition, Mysore, Wesleyan
Mission Press.
(138) Sinha, Bikas (2006): Sample size determination in survey sampling, A lecture notes prepared for
the participants under UNDP programme.
(139)Stephen, Frederick F. (1948): History of the uses of Modern Sampling Procedures, JASA, 43, 12-39.
(140) Smith, T.M.F. (1976): The Foundations of Survey Sampling, A Review. , JRSS, 139A, 183-195
(141) S arndal, C.E., Swensson, B. and Wretman, J. (1992): Model Associated Survey Sampling, Springer
Verlag, New York, Inc.
(142) Scott Christopher (1973): Experiments on recall error in African Budget Surveys,
paper presented to the International Association of Survey Statisticians, Viena.
(143) Shah, Nasra M. (1981): Data from Tables used in the paper presented at Weekly Seminar of East
West Population Institute, October 28, Honolulu.
(144) Skeleton, V.C. (1963): Patterns behing income refusals, Journal of Marketing Volume 27.
(145) Sudman, Seymour (1976): Applied Sampling, New York, Academic.
(146) Suchman, Edward A. (1962): An analysis of bias in Survey Research, Public Opinion Quarterly, 26,
102-111.
(147) Scheaffer, Richard, L. Mendenhall, William and Ott lyan (1979): Elementary Survey Sampling, 2nd
edition, North Scituate, MA: Duxbury Press.
(148) Szameitat, Kleus and Schaffer, Karl August (1963): Imperfect Frames in Statistics and the
consequences for their use in sampling, Bulletin of the International Statistical Institute, 40, 517-544.
(149) Singh, Bahadur, Sedransk, Joseph (1978): A two phase sampling design for estimating the finite
population mean when there is non-response, In N. Krishnan Namboodiri ed. Survey Sampling and
measurement, New York, Academic, 143-155.
(150) Thompson, Ib and Siring, E. (unpublished): On the causes and effect of non- response, Norwegian
Experiences, Central Bureau of Statistics, Norway.
(151) Tuygan, Kuthan and Cavador, Tevfik (1975): Comparison of self and presale responses related to
children of ever married women. In laboratories for population Statistics Scientific Report Series No.
17,22-28.
(152) Turner, Anthony G., Waltmen, Henry. F, Fay Robert and Carlson Beverly (1977): Sample Survey
Design in developing Countries - three illustrations of methodology, Bulletin of the International
Statistical Institute.
(153) U. S. Bureau of the Census (1974): Standards for the discussion and presentation of errors in data,
Technical Report No. 32.
(154)……………(1976):
An
overview
of
population
and
housing
census
evaluation programmes conducted at the U.S. Bureau of Census, Census Advisory Committee of the
American Statistical Association.
(155) Verma, Vijay (1980): Sampling for national fertility
surveys, World fertility survey conference, London.
(156) Warwick, Donald P. and Chartes A. Lininger (1975): The
Sample Survey, Theory and Practice, New York, Me. Graw Hill.
(157) Woltman, Henry and Bushery, John (1975): A panel bias
study to the national crime survey. Proceedings of the Social
Statistics Section. American Statistical Association.
(158) William, W.H. and Mallows, C.L. (1970): Systematic biases
in panel surveys JASA, 65, 1338-1349.
(159) Warner, Stanley L. (1965): Randomised Response, A
Survey Technique for eliminating Evasive answer bias, JASA, 60,
63-69.
(160) Zarkovich. S.S. (1966): Quality of Statistical data, Rome;
FAO of the United Nations.
THANK YOU!!
PART II: Bayesian Mode of Analysis:
2. When one is not interested on how the data were
collected; but given the data,
how analysis can be made: Bayesian Mode of
Analysis.
The projects where stochastic models were developed
and used for explaining the process and estimating the
parameters involved using Bayesian Mode of analysis all
under this category. Transformation of the real life
problems into the statistical ones required a series of
discussions with technical experts at different levels of
the organizations. In fact, the project formulations were
not of routine works; instead, definitions and other
related concepts were defined, developed and redefined
into the frame work of the present problems.
• 2.1. Indian Statistical Institute- Hindustan Lever Limited
(ISI-HLL)- collaborative
project on Business Research (October 1999-August 2000):
•
Objective: A corporate sector company engaged in
producing some products, particularly, consumer products,
would like to know, in the face of uncertainty, the buying
behaviour, during the given period in terms of (a) the
proportion of buyers purchasing a given brand and no other
brand; (b) proportion of buyers purchasing a given brand or
a 'combination of brands' at least once out of those, who
did purchase the same brand at least once prior to the
period under consideration, (c) average purchase frequency
of a particular brand or a 'combination of brands' (d)
market penetration of a brand or a particular' combination
of brands'. Their purpose of gathering the information is to
make predictions, if possible, about the demand for its
products in the market at some future date.
• There are some models of buying behaviours
available in the existing literature on
Market Science. Among these, the Ehrenberg
Bayesian model appeared to have given
good results in western countries. One area of
interest which was important to the HLL
was to this Ehrenberg model, if this could be
applied to consumer panel data in Indian
context.
•
At the request of General Manager,
Business research and corporate planning,
Hindustan lever Limited (HLL) this project was
undertaken at the Indian Statistical Institute,
Kolkata to examine the validity of the Ehrenberg
model.
The appropriateness of the distributional
assumption of the model namely,
• assumption of the negative binomial
distribution on the number of purchases and
• assumption of the Beta priors on pj, the
choice probability of the jth brand in a product
group were examined.
Some Technical novelties of the work were as follows.
• The data used for the purpose of analysis consisted of, among
others, the information on household identification number, brand
code, month number and quantity of the product purchased. The
number of purchases was the basic input of the model. Thus for the
model verification exercise, the data on the quantity (in grammes)
purchased of each brand by the households had to be converted
into the number of units. This required a knowledge about the
standard size of each brand. In the absence of such information
precisely and also for the sake of rendering flexibility to our
analysis, we used, alternative sizes, viz, 125 grammes, 250
grammes, 500 grammes and 1000 grammes with proper rounding
off to obtain the number of units. By this procedure, we generated
four sets of data on number of purposes of each brand made by
each household in each of the months. The data on number of
purchases based on 1000 gramme was in agreement with the
experiences as realised by the practitioners of the HLL.
•“Empirical Bayes” as well as “Hierarchical Bayes”
estimates for the model parameters were obtained;
•Data failing to support the assumption of negative
binomial distribution on the number o purchases,
insisted us to make some further studies, for each
of income group separately, as if the data resulted from a
mixed distribution with some mass p at n = 0 (n being the
number of purchases) and a binomial of (3,1-p);
•It was interesting to observe that there had been a good
fit on the distribution of the number of brands. The
distribution appeared to follow a truncated geometric
distribution (truncated at Zero);
•Assumption of Beta prior on Pj, the choice probability for
the jth brand also appeared to be not tenable in the
Indian Data Context;
• The reasons behind the model for not being fitted to
the
Indian
data
were
tried to be found out. One of the reasons was that
unlike the European family, the number of decisions
makers in an extended family in a country like India
may be many. The model was revised by introducing
into it a new stochastic variable namely, the number of
decision makers at the household level to examine if
the revised model was supported by the indian data.
• After classifying the households according to family
sizes
and
income
groups, buying behaviours across such classes were
examined also to detect any pattern of similarity or
otherwise; However, buying behaviour did not appear
to have been affected by such classifications.
•
•
.
2.2. ISI-ONGC Collaborative Research Project (1985-1992)
Objective: At the request of Oil and Natural Gas Commission (ONGC) of India,
the Indian Statistical Institute evaluated the economic and physical consequences
of various strategies for action in different basins in India. Both economic and
stochastic models were used for estimating reserve and the per unit cost of
hydrocarbon. (Ref Estimation of Discovery and Production Costs of Hydrocarbon
with some application to Indian Data. Indian Statistical Institute, Calcutta).
In fact, the project formulation was not of a routine work; instead, definitions and
other
related concepts were defined, developed and redefined into the frame work of
the
present problem. FoTransformation of the real life problem into the statistical one
required a series of
discussions with the technical experts working at the different levels of the
organisationr example, The 'reserve in place' was distinguished from the
'recoverable reserve'.
The following were the technical novelties among others:
• The available primary cost data were in the form of well-wise cost.
The well- wise cost figures were measured at current prices, and
therefore, the cost data for different years were non-comparable. In
order to make them comparable, it was necessary to deflate them
using a suitable index number of well-drilling cost. No such index
number was available which could be used for the purpose of
deflation of cost figures measured at current prices and for this, a new
index was constructed and applied to the given data;
• The question of how to aggregate and what economic models to
choose had to be resolved;
• To test on the constancy of success ratio in hydrocarbon exploration,
data were examined through a number of statistical devices some of
which were based on graphical representations, while the others
explored standard statistical techniques.
•A fully Bayesian Hierarchical method, which provides
better
estimates
of
errors in estimation and prediction was sought for. But
because
of
analytical
complications, an empirical Bayesian view was taken in
predicting
the
(n+1)th discovery, given the data on past discoveries. Two
types
of
simulation estimates were provided - one based on the
assumption
of
an
approximate Gamma distribution of the field sizes where
as
the
second
alternative used a Gamma population and employed the
classical
method
of
'importance sampling' for adjustment. Both methods
involved
novel
methods
of simulation developed by us.
Where the use of sample survey technique is itself an error
(complete enumeration):
One may not always need to make an estimate of the
aggregate population characteristic, but requires information
on required items for every population units. Sample survey
may not be appropriate in such situations. In such a situation
use of a sample survey itself may be an error. The following
project was directed towards this direction.
Objective: Planning for development involves four different
types of activities, formulation, implementation, monitoring
during implementation and evaluation on completion. To
carry out each of these activities, relevant, reliable and timely
information is needed at every stage.
The 73rd and 74th amendments of the constitution made by the
Government of India have squarely laid the responsibility of local
planning on local bodies. It stands to reason that information needed for
such planning should also be collected and managed
by local bodies and hence the Government of India had requested Asian
Development
Bank (ADB) for technical assistance to strengthen local Government of
Madhya
Pradesh
as part of the reform agenda of the state. At the request of Price Water
House
India,
who
was one of the prime contractor to the Bank for this purpose, an interfirm
agreement
between PWI and ISI was made to develop the Statistical Information
System
(SIS)
as
a
part of the project assignment. The SIS was envisaged to be a statistical
database
for
rational decision making. It was expected to address the information
needed
for
planning
at Panchayat, Janpad, district and higher levels. This was developed in
accordance
with
rd
the 73 /74th amendment of the constitution of India, 1992.
The work involved in developing the SIS consisted of (a)
identification
of
required data items, (b) designing of formats of data
collation,
collection
and
compilation,
(c) specification of output formats, amenable to
computerised data base and
(d) organisation of workshops in collaboration with the
state
government
to
finalise
the
methodology of data collection and data formats.
The SIS developed for decentralised planning has the
following
major
components:
• Computer hardware forming the container of
Statistical Information;
• Computer software to process the information;
• Statistical data, the actual content of the system;
Highlights of some of the Technical Feature:
•Extensive surveys of the areas of activities as listed in the 11th
schedule of 73rd and in the 12th schedule of 74the amendment
and of the lists of items prepared by the expert committee on
small area statistics and also of the report by Hanumanth Rao
Committee were made.
•A Survey on identification of the availability of required
information with an analysis of the existing data gap was
made.
•Twelve (12) rural and thirty one (31) urban schedules with
indications of respective sources of data servings as SIS input
manual were developed.
•The appropriate method of data collection was suggested.
•Designing the output format for (i) general information on
variables considered for decentralized planning, (ii) quarry
based information, (iii) report based information and for (iv)
summarized information, was made.
Schematic diagram of the work
involved in development of statistical
information system(S I S)