1. Random Variation in Demographic Behavior
Demography is one of the many fields of science where statistical theory can be put to good use. Typical events of interest to demographers are births, deaths, the formation and disruption of marital and nonm-arital unions, and geographical migration. The quintessential determinants of demographic behavior are sex, age, and cohort or calendar period, and other typical individual-level factors are race, social and family background, ethnicity, religious orientation, labor-force participation, and educational attainment. There may also be contextual determinants, such as institutional settings, laws and regulations (including public policies), and other collective features that individuals face. Aspects of aggregate behavior in the same population may also be important. (When many people already cohabit in nonmarital unions, individuals may find it easier to form such a union themselves. When divorce is common, dissatisfied spouses may find it easier to dissolve their own marriage.) To handle this multitude of features, individual demographic behavior is described in terms of event-history models where individuals move between predefined statuses under the influence of personal or group characteristics, of their own personal history, of contextual factors, and of period influences in the population they belong to.
Since any human population has finitely many members, there is an amount of random error in analytic procedures based on individual-level data. If life histories are selected in some random manner from a larger population, as in a survey sample, then some identifiable part of the error may be due to the sampling procedure, but another part is caused by the fundamental randomness of the individual process paths. Stochastic variation does not only arise from sampling variation in survey data, or from deliberate randomization of experiments. Whether the data were obtained for a sample or for the whole population segment in question is a separate issue. The size order of the random variation involved usually depends on the number of contributing individuals, not on whether the data comes from a sample survey or not. As was expounded by Westergaard (1880) more than a century ago and repeated many times over by people like Udry et al. (1979) and by Brillinger (1986) and his discussants, random variation is intrinsic to data on demographic behavior, just as it is for data from other social and behavioral sciences.
Demography may be special in that some of its data come from populations large enough to permit the investigator essentially to disregard randomness in the corresponding vital rates. In such cases, the challenge is to explain any irregularities in plotted curves of vital rates as local variations of substantive interest, or as caused by systematic registration errors. Significance tests and similar statistical tools may then become less important. This does not preclude random variation from being prominent in data from small populations and from smallish segments of national populations.
2. The Event-history Approach
In event-history analysis, demographic events are seen as transitions between states in a suitable state space. The factors that influence the transitions may be fixed or change over time, they may be exogenous to the individual history or endogenous to it, recorded in an available set of individual-level data, or unobserved causes of observable selectivity. Early behavior on a process may impact on later behavior on the same process, as when age at first birth is shown to influence an individual’s later childbearing behavior; this is one of the most persistent findings in family demography. Individual behavior in one arena may interact with that in another arena, as when a woman’s educational attainment and labor-force participation are seen as influencing her childbearing while conversely having children determines her chances of improving her educational level or maintaining her ties with the labor market.
The states between which transitions are made are demographic (and other) statuses like ‘never married,’ ‘in a consensual union,’ ‘married,’ ‘childless,’ ‘of parity «’ (having had precisely n live births), ‘under education,’ ‘in the labor force,’ and so on. When the state space is not too complex, it can sometimes be represented graphically as in Fig. 1, where the boxes represent the various statuses in question and the arrows indicate possible transitions for an analysis of first births before and in first marital or nonmarital unions. There are moves in three dimensions in Fig. 1, namely (a) changes in civil status (never married, married, in consensual union), (b) school enrollment (under education or not in school), and (c) parity (in this case childless or of parity 1). Arrows are formatted to indicate transitions of prime interest, competing risks, and other transitions. As is typical for such diagrams, some states and transitions have been left out to simplify the graphic, namely (a) the moves in and out of education for individuals living in partnerships, and (b) the disruption of existing partnerships. In empirical analyses one would account for (though possibly not analyze) such transitions as well.
A focus of attention is the intensity or hazard Xtj(t; x(t)) of transition from some state i to some other state j at time t, for a suitable set of pairs (/, j) and a suitable vector x(t) of determinants. The basic time variable t typically is the individual’s own age or else the duration since a given event (duration of partnership, age of youngest child). Theories about individual demographic behavior are reflected in the specification of these elements, including the choice of t, the selection of covariates in the vector x(f), and the form of the intensities Xir For instance, an investigation of stepfamily fertility may address the hypothesis that for comparable partnerships, the rate of arrival of their first common child should be independent of the number of children that each partner had before the union was formed {the union-confirmation hypothesis). This would be reflected in a specification of a first-birth (or first-conception) intensity, say In general the intensity Xtj is the probabilistic counterpart of the empirical rate of transition from i to j. In the simplest case the latter is an occurrence/exposure rate \^g) = Dij(g)/Ri(g) (also called a vital rate of the first kind) of some population group g in a given range for t during a particular period. The occurrences Dtj(g) and exposures Rt(g) are recorded for the group g in question. Under such circumstances, ^(g) is the maximum-likelihood estimator for the intensity, it is asymptotically normally distributed with the intensity as its mean and^ an asymptotic variance that can be estimated by l^(g)/i^(g), and different such rates are asymptotically independent of each other. This means that there exists a statistical theory for conventional demographic rates, a theory which can be used for the purpose of estimation, testing of hypotheses, the computation of confidence intervals, and so on, in the normal manner of statistical analyses. Various ways of computing demographic occurrence/exposure rates are discussed in Demographic Techniques: LEXIS Diagram. Keiding (1990) has given a probabilistic interpretation of the Lexis diagram in epidemiological research. Many other demographic practices can similarly be given a useful underpinning in statistical theory, as we show in what follows. 3. Individual-level Demographic Models This three-part expression, which was developed by Heligman and Pollard (1980), is based on considerations similar to those of Thiele's mortality formula. In an unorthodox approach, Ansley Coale (1971) fitted a model for first-marriage formation to first-marriage rates 'of the second kind,' i.e., to rates where the number of marriages formed at a certain age is divided by the total person-years at that age, including person-years lived after first-marriage formation, and not only divided by the person-years exposed to risk as in a rate of the first kind. (See Demographic Models and Demographic Techniques: Rates of the First and Second Kind.) The statistical theory for rates of the second kind is useful for the legitimacy it gives them in empirical analyses and for the further developments it has led to (Finnas 1980; Borgan and Ramlau-Hansen 1985). Another class of procedures that can benefit from being brought into the realm of statistical theory is what demographers know as Brass's relational models (Brass 1974, etc.). They can be described in elementary probabilistic terms as follows. Given a 'standard' probability distribution function H(x), a whole family of further distribution functions {Hab(x); co0} can be generated by the ‘relational transformation’ l{Hab(x)} = a + bl{H(x)}, where for 0 < y < 1 l(y) is some function with a positive derivative everywhere. Brass chose l(y) = ln{y/(l y)} and l(y) = In( Iny). He used the former for a relational transformation of H(x) = l lx and the latter for a corresponding transformation of H(x) = F(x)/F(5Q), where F(x) is a set of age-specific fertility rates cumulated up to age x. If we let *P(x) = /Ir^x), then *P(x) is another probability distribution function and l(y) is its percentile function. Brass's two choices of l(y) give *P(x) = 1/(1 +e~") and *P(x) = exp( e-*), a logistic distribution and an extreme-value distribution, respectively. We can rewrite the generating relation as HJx) = V{a + bV-i[H(x)]} Once it has been established that we are dealing in parameters in a class of statistical models, then the usual considerations from statistical theory can be brought to bear on the issues involved. If Hand *P are given, then a and b can be estimated in the usual way for statistical parameters, and their statistical properties may be known. If one does not want to use an outside 'standard' distribution for H, this function can perhaps be estimated from the data by a method whose statistical properties one knows. In either case, statistical principles are useful. One of these may be that fiab(x) = hab(x)/{l Hab(x)} is more sensitive to subgroup differences than the distribution function HJx) itself is. (With hjx) = dHJx)/dx, fijx) is the hazard function of Hab.) Optimal fitting may not be what one strives for, but when it is, one may be well advised to use the hazard as a basis for model fitting, model testing, and so on, as usual. In this connection it may pay to heed the warning that Feller gave against fitting models to sigmoid curves as far back as in 1940 (Feller 1940). Demographic models for schedules of age-specific rates (and other similar schedules) are valuable in the analysis of inaccurate or incomplete data and as tools for population forecasting and other kinds of 'technical' computations. They have a place in data-quality assessment, though such usage should be tempered by the fact that the demographic models themselves may be inaccurate representations of reality. (This is demonstrated by the perennial need to revise model life tables and by amusing report titles asking why we get such lousy fits; see Lockridge and Brostrom 1983.) Demographic models are also useful for smoothing age-specific demographic schedules, and some such efforts may be for the purpose of removing random variation. In most cases, the probabilistic approach provides useful guidelines and helpful operational procedures. In one case, it even makes certain demographic procedures redundant. The whole theory of increment-decrement life tables may be replaced by parts of the theory of time-continuous Markov chains and some elements of numerical analysis (Hoem and Funck Jensen 1982). For more information about these issues, see Demographic Models, Multistate Transition Models in Demography, and Demographic Techniques: Rates of the First and Second Kind. 4. Standards and Standardization Brass's use of a 'standard' probability distribution Hx follows an old tradition in demography. Finding and using a 'standard' schedule plays a prominent role in much demographic analysis of rate schedules {Xx}. Such a standard is often sought outside of the current data set and is used to highlight special features of the data in hand. Once the relation to statistical theory has been established for an empirical procedure, the need for an outside standard may be removed and all parameters may be estimated from the current data, as was demonstrated by Xie and Pimentel (1992) for the celebrated Coale-Trussell formula (1996) for marital fertility. (See Demographic Models.) This approach can sometimes provide solutions to problems that have pestered the discipline for a long time. For instance, the competition between direct and indirect standardization has been settled in favor of the latter (Hoem 1987), in the sense that when standardization is fully legitimate there exists an improved form of indirect standardization based on a maximum likelihood estimator which has minimal variance. This result relieves the user of the onerous job of demonstrating the suitability of a selected outside standard population for use in direct standardization, which is suboptimal in any case. Here is a sketch of the basic ideas. 5. Population-level Models Beside the focus on individual-level behavior, a large part of demographic analysis is concerned with the dynamics of entire populations, collected in the theory of stable populations and its ramifications. The purpose is to investigate how the consequences of individual-level processes work their way up to the population level. In this connection as in others, a probabilistic approach gives a more satisfactory basis for the mathematical derivations and opens the way for results that cannot be formulated, let alone proved, in deterministic theory. Extinction theorems and stochastic ergodicity theorems in stable population theory fall in this category (Cohen 1979, 1982, Tulj-apurkar 1989, and others). The probabilistic approach has inspired a new treatment of uncertainty in population forecasts (Cohen 1986). Keiding and Hoem (1976) have directly addressed long-standing concerns of demographers in a probabilistic vein. For more information about population dynamics, see Population Dynamics: Theory of Stable Populations, Population Dynamics: Classical Applications of Stable Population Theory, Population Dynamics: Momentum of Population Growth, Population Dynamics: Theory of Nonstable Populations, Population Dynamics: Mathematic Models of Population, Development, and Natural Resources, and Population Dynamics: Probabilistic Extinction, Stability, and Explosion Theorems. See also Population Cycles, Formal Theory of. For material about population forecasts, see Population Forecasts and also Demographic Techniques: Small-area Estimates and Projections and Families and Households, Formal Demography of. 6. Conclusions The present Encyclopedia article has been devoted to the merits of the probabilistic approach to demographic analysis. Such an approach is generally accepted in other fields of application (biostatistics, epidemiology, insurance mathematics), but in demography there is still considerable tension between those who feel that probabilistic notions are essential to the field's methodology and others who largely regard them at best as the icing on a cake. It is our contention that it does not pay to develop demographic procedures of analysis in isolation from statistical principles any more than it pays in other disciplines. The probabilistic approach is the way to a deeper understanding of inferential techniques as well as to further methodological development beyond intuitive reasoning.



