![]() |
|
| français - Español |
|
|
IntroductionCompiling longitudinal population information poses unique data-management challenges. Projects must maintain changing individual-level information on the composition and household structure of a large, geographically defined population. Events that arise — births, deaths, migrations, etc. — must be linked to individuals and other entities at risk of these events. These events affect not only demographic rates, for instance, but also relationships within and between households. As event histories grow, records of new events must be logically consistent with those of events in the past. Seemingly obvious checks on data to meet minimal standards of integrity can result in hundreds of lines of code. Relating critically needed auxiliary data to dynamic population registers poses further challenges. Morbidity and cause-of-death data must be entered, linked, and stored. Most DSS projects also maintain socioeconomic data such as on marriage, family relationships, and economic conditions, owing to the strong correlation between health and socioeconomic status. These must be logically consistent with other longitudinal data on the population at risk and relationships among individuals under surveillance. Moreover, projects are often launched to assess the impacts of health technologies, service strategies, or policies, and this necessitates data entry, management, and checking procedures for the internal consistency of service information, as well as procedures to link this information to demographic histories. Variance in exposure to interventions must be monitored at the individual level, in conjunction with precise registration of demographic events and individual risk. Maintaining a detailed record of demographic events, relationships, and exposure to risks or interventions requires complex data-management operations, with a carefully controlled field-operation infrastructure to oversee and support data collection and entry, and a comprehensive computer system for the data-management operation. Data-management systems required for this operation typically encompass thousands of lines of computer code. A key contribution of the INDEPTH network has been technology-sharing to offset the complexity of developing a data system and creating a reference data model for storage of DSS data. This generic model for data storage facilitates cross-site comparative analyses of the type described in this volume, as it standardizes data rules and concepts across sites. Future work of the network will address the need for generic analytical and data-management software compatible with the reference data model. This chapter outlines features of this reference data model that pertain to the INDEPTH DSSs. In the not-too-distant past, developing DSS software was difficult, time-consuming, and prone to conceptual and programmatic errors. Software generators and object-oriented tools for software development greatly simply the task of developing a complex system, once common principles of software structure are instantiated in a common applications framework. The mechanisms of INDEPTH have marshalled these software innovations to meet the collective needs of member stations. The reference data model will facilitate exchange of information, swift formulation of site-specific data management software and common software for data analysis, and simplified technical assistance and capacity-building operations. BackgroundThe work of the INDEPTH Technical Working Group (TWG) has been informed by the achievements, limitations, and future needs of projects in Bangladesh, Burkina Faso, Ghana, Indonesia, Mali, Senegal, South Africa, Tanzania, and Uganda. One of the earlier systems, the Bangladesh DSS in Matlab District, was developed in the 1960s and has since been used for a wide range of studies of demographic dynamics, family planning, epidemiology, health-services research, and other issues (Rahman and D’Souza 1981; D’Souza 1984). Although the Bangladesh DSS has redeveloped its computer operations several times, its field operations have provided a model for a wide range of DSS applications in developing countries. The Bangladesh DSS precisely defined eligibility rules for members of a population under study; this, combined with a data system with rigorous logical-consistency checks, has provided high-quality data for many research papers. A number of software systems have been written, based on experiences with the Bangladesh DSS, including the Sample Registration System (Leon 1986a, b, 1987; Phillips et al. 1988; Mozumdar et al. 1990) and the Indramayu Child Survival Project of the University of Indonesia (Utomo et al. 1990). The DSS in Niakhar, Senegal, most recently described in Garenne (1997), has also influenced the technical design of a number of systems, including those of PRAPASS in Nouna, Burkina Faso (Sauerborn et al. 1996), and Agincourt, South Africa (Tollman et al. 1995). Garenne (1997) described the concept of entry–exit files (similar to the concept of “episodes” described here) as a means of modeling both intervals of residence at a location and intervals of relationships. Garenne also provided useful observations regarding the implementation of field and software systems for longitudinal population studies. To develop its data model, TWG synthesized the experience of these disparate applications. The model specifies a demographic “core” common to field stations doing longitudinal research on populations (MacLeod et al. 1991; Phillips et al. 1991). Sites have developed software systems to manage this demographic core, maintain a consistent record of significant demographic events in the population of a fixed geographic region, generate registration books that the fieldworkers use, and compute basic demographic rates, such as birth, mortality, and total fertility. These core capabilities establish a computational framework to which projects add their site-specific data and consistency specifications. The concept of a core also entails some generic principles of data collection and management that apply to all INDEPTH sites. The INDEPTH concept of a data coreAll participating sites in INDEPTH collect and maintain a common core of data. Attempts to standardize data processing have led to the concept of a “core system” that provides many of the common software requirements of field research laboratories and can be extended and modified to tailor software to various specifications. This concept is based on the principle that certain characteristics of households, household members, relationships, and demographic events are common to all longitudinal studies of human populations, and software required to collect, enter, and manage data can therefore be generic to a family of applications. TWG has identified these features of a core system common to all DSS operations. In this framework, the core system maintains a consistent record of baseline and longitudinal data on all households, household members, and their relationships in a geographically defined population, including births, deaths, migrations, and marriages. The core system maintains information on events and observation dates to give each entity in the study corresponding “person-day” counts of risk for demographic events. Core computer operations structure data and maintain logical integrity on the following basic elements of a household unit:
Although these are seemingly trivial items, mundane relationships tend to become complex and unwieldy when arrayed as a logical system of longitudinal population data; and portraying even simple relationships requires rigorous standards to avoid error. For example, to be counted as a death in a resident population, a concerned household member must be resident in the study area at the time of death; a live birth to a woman 5 months after she gave birth to another child would be an inconsistent event. A central contribution of TWG has been to clarify such minimal system logic so that the system prevents errors resulting in violation of business rules and rendering data useless. All INDEPTH computer systems maintain standard DSS-processing operations:
Most INDEPTH sites have also developed software for reporting outcomes and managing data:
Tailoring the core systemGiven the basic core model for data structure, each site has developed site-specific applications using building blocks of the core framework, which allow software developers to construct additional modules for project-specific data. At nine INDEPTH sites, standard tools of database-management packages have been used for an INDEPTH product known as the household-registration system (HRS) for the core specification.1 Other INDEPTH sites have developed project-specific core capabilities to maintain the logical integrity of birth, death, migration, and marriage data over time and in a format consistent with the reference data model. Each site modifies the core to accommodate new cross-sectional data, special longitudinal modules, or variable classes or labels investigators want to add to field registers, along with logic to maintain the integrity of new variables. The tools of commercially available database packages greatly facilitate the process of core modification. Standard features of commercially available database systems include those for easily adding data to the core system. For example, the HRS is built from the form menu (data-entry screen) and database builders of the Microsoft FoxPro system. These builders encourage and facilitate an object-oriented software-development approach through easily understandable mouse and menu procedures. To make changes to the core, a programmer locates the database table, menu, or form 1 The HRS formed the basis for INDEPTH software systems in The Gambia, Ghana (Binka et al. 1995), Indonesia, Mali, Mozambique, Tanzania (three sites), and Uganda. Applications involve a wide range of INDEPTH studies, including family-planning research, malaria interventions, child and maternal health, and correlates of HIV transmission. The current INDEPTH data model improves on the original HRS and other INDEPTH systems by allowing investigators to track nonresident individuals; include more general relationships, rather than just marital relationships; and separate membership in social groups (such as the household or family) from the location. object to be changed, then works with the small pieces of code, called code snippets, which are “attached” to the object. Some code snippets control the timing of the entry of data for a variable; others enforce rules of consistency. Some INDEPTH sites, such as Hlabisa, are developing similar capabilities, using systems in SQL Server and Access. The reference data modelAs explained in Chapter 3, a DSS tracks the presence of individuals in a defined study area. These individuals can enter and leave the study area in a small set of well-defined ways (for example, entering through birth or in-migration and leaving through death or out-migration). The INDEPTH reference model uses events to record the ways individuals enter (or return to) and leave the study area over time. Thus, events bracket the residency of any individual in the study area. In general, they occur in pairs, with one event (such as presence in the study area) initiating a state and another event (such as migrating out or death) terminating that state. Use of episodes in the reference model makes this pairing of initiating and terminating events explicit. The concept of episodes is diagramed in the centre section of Figure 4.1. When a DSS tracks episodes, the concept of the “time resolution” of this tracking is very important. Below a certain time threshold, movements into or out of a particular place are not recorded. If a person leaves the physical location in the morning to go to the market and returns in the afternoon, this is not reflected in the DSS. If this period of absence increases beyond a certain threshold (6 weeks, 3 months, or some other period), it turns into an episode to be recorded in the DSS. This threshold varies from project to project, but the project always makes it explicit. The time resolution for “in” episodes should be consistent with the time resolution for “out” episodes, that is, the time before a visit becomes residency or the time after which an absence becomes an out-migration. DSSs are concerned not only with the physical location or residence of individuals but also with their membership in social groups (such as households) and their relationships with other individuals (such as marital unions or parenthood). Many DSSs also need to reconstruct genealogies and to record isolated events, such as pregnancy outcomes or births and deaths external to the study area. To support field operations and routine cleaning of data, a DSS must also keep track of where, when, and by whom a particular event was recorded. In this respect, the reference model provides a number of fields to facilitate construction of a good-quality data set. Another challenge for demographic field operations is to correctly identify migrating individuals. To resolve this problem, the reference data model includes fields to designate the place a migrant is moving to or coming from. The INDEPTH reference model meets these requirements through its use of the following entities and the relationships between them (see Figure 4.1):
Figure 4.1. Reference Demographic Surveillance Data Model.
In summary, Figure 4.1 illustrates the entities and relationships of the INDEPTH reference data model. Mandatory fields and entities are displayed in bold type, whereas optional fields and entities are displayed in normal (nonbold) type. The role of the reference data model in maintaining data integrityAs explained in Chapter 3, any DSS must maintain a large volume of data over an extended period. Unless specific measures are taken, the integrity of the data will suffer, along with the accuracy and reliability of the information in the system. INDEPTH has taken steps to foster common standards for data integrity, based on a well-defined relational model. Although not all systems have the same measures to protect data quality, the following have been proposed or used at one or more INDEPTH site:
(The actual values used to indicate the standard values depend on the data type of the field and the natural value range for the data item. Care should be taken to exclude these values from quantitative analysis of the data.)
Extending the coreAlthough the INDEPTH reference data model covers aspects common to all INDEPTH DSSs, it makes no attempt to specify all site-specific needs. However, it is designed to accommodate new components to meet the needs of a wide spectrum of longitudinal studies, without losing its clear overall structure. Several ways are presented in this section:
Social groups can be related to other social groups, or “first-level” social groups like households can be members of “second-level” social groups like clans or other types of networks. DSSs designed to track the interaction of households might define relationship and membership episodes for social groups, to store information about this topic. Households are normally associated with only one homestead, even if the members of the household reside in more than one physical location. When social groups are used to record households, this association can be depicted by an episode that records the start and termination of occupation at a physical location. Households also normally have a head of household. This head may change with time, but the household will still retain its identity, and head of household can be recorded either as an updatable attribute (“Current head of household”) or as a member of the social group. If the temporal dimension is important, the extension can be specified as an episode linking the household to an acting head of household. In summary, the reference data model provides a structure to accommodate great flexibility in the design of longitudinal studies, and for this reason, INDEPTH includes sites engaged in various study designs, with a wide range of data-management needs. Despite this diversity, the model has a core of logic and structure lending integrity to operations and providing a crucial foundation for technical collaboration among sites. ConclusionThis chapter has described the data model that INDEPTH has developed as the guiding framework for processing data at member sites. It makes attributes common to most health and family-planning studies explicit. As well, it serves as a structural framework for the addition of project-specific data. Much work still needs to be done to develop this model and a common data-processing system for INDEPTH core operations. However, the common framework for data management has already facilitated data sharing within the network, and nearly one-half of all INDEPTH sites use a common software system for core operations. If this use of generic software is more broadly accepted, the INDEPTH data model could serve as the basis for sharing system development, capacity-building, and collaborative research. |
||||||||||||
| guest (Read)(Ottawa) Login | Home|Careers|Copyright and Terms of Use|General Infomation|Contact Us|Low bandwidth |