1.0 Purpose of this Document
This document describes the components of the National Contaminants Information System, NCIS. It begins by explaining the goals and some of the history of the project development. It goes on to document the main components and definitions used as the basis of organizing data and information in the system. It also describes the roles of regions in the updating and maintenance of the system.
Some of the material used in this document is an excerpt from more comprehensive documents. No attempt is made to remove repetitions of some introductory material between this document and others.
This document was prepared with the assistance of all of the regions. The electronic form is maintained at ISDM in Ottawa. Inconsistencies, errors or omissions should be reported to ISDM for correction.
2.0 Overview and Objectives of the NCIS
The NCIS started as a project funded by the Green Plan initiative of the Department of Fisheries and Oceans. It was recognized that other Green Plan projects would be producing a variety of data but that these data, just as similar data collected prior to Green Plan, would not necessarily find their way into well organized archives. The NCIS has structures that describe the data collections and hold the data from all of these projects.
The NCIS contains information about data collections made in fresh and salt waters in and around Canada. The data come from projects that were initiated under the Green Plan, but also data collected in earlier studies or obtained through international data exchange. The earliest records go back into the early years of the 1900s and the most recent data are from the present. The contaminants referenced include chemicals such as dioxins that are considered hazardous, but also chemicals such as nitrates and silicates which are naturally occurring components in water. These latter are included so that background information will be available through the same system as contaminant information.
The interpretation of data is strongly dependent on the data collection techniques and circumstances. Without information about these techniques and procedures the observations have more limited use. The NCIS project, therefore, developed an organized archive and information system in which data and information would be stored.
The system has been built based on a relational database to store both data and information. It is a distributed system with ISDM in Ottawa and DFO regions taking part in the design, building and operation. A client-server interface is provided to the system. Each participant operates a server on which is held the database. Client software runs on one or more computers in each region and permits queries to be made across the DFO communications network to examine holdings in the local region or other regions. Upon identification of data of interest, a request can be made for the data. Depending on access privileges, the data plus information pertaining to the data are extracted from the appropriate archives. Provision of data is not a component internal to the NCIS. Rather, data managers must make contact with the requester to transfer the data.
Software is also available to assist in the maintenance of the contents of the NCIS and to load new information into the database.
3.0 History of the Project
The development of the NCIS proceeded through contract work and work carried out by DFO personnel. Guidance for the project was provided by the Toxic Contaminants Data Management Working Group (TCDMWG) consisting of representatives from all regions and ISDM. This grouped met or carried out conference calls at roughly 6 month intervals. The meetings planned the work to be carried out, allocated resources from the available funds and monitored progress.
Designing the NCIS began with a number of contracts let to AXYS Environmental Consulting Ltd. They conducted workshops in various regions to determine the requirements of the system and to get ideas on how it should work. The following reports were produced.
- DFO National Contaminants Inventory Model Workshop (September, 1992)
- Inventory Model Design for a National Department of Fisheries and Oceans Contaminants Data Management System (January, 1993)
- Protocol Development Workshop for the Contaminants Information System (June 1993)
- System Specification and Design for the Fisheries and Oceans Canada Contaminants Information System Inventory (November, 1993)
The contract reports were extensive and produced the initial design of the system. Many of the protocol tables and the overall design were set at this stage.
The overall design established a three component system. General information would be contained in a Directory, information about the data collected in an Inventory and the observations themselves in Archives.
It was agreed that the information in the Directory would be common to all regions. For speed of access, the identical structures and contents would be held in each region. The Inventory in each region would describe the contents of the regional Archives. So, the structure of the Inventory is the same in all regions, but the content varies. The archives in each region are different in both structure and content. Some regions already had existing archives, some of which were SQL compliant and some not. Other regions had no existing archives.
A review of the design by the members of the TCDMWG identified some short comings to the initial design. It was decided at this stage that the system would be implemented through a distributed, relational database and the only existing system that met the requirements was the Oracle RDMS. To take the next step, to develop greater detail for the Directory and Inventory, a rapid development workshop was held at Oracle offices in Ottawa in March, 1994. Regional representatives discussed the system components, modified some of the concepts in the contract work of AXYS and developed tables and attributes for the relational database. The workshop work was developed in the CASE 6 environment. This was taken back to ISDM. The tables and attributes were modified slightly as the design was implemented.
Subsequently, representatives of regions with no existing archives met to design a generic archive structure for both survey and experimental data. This structure was implemented in a number of regions.
Each region bears some responsibilities in the overall NCIS. Generally, ISDM is responsible for the loading and maintenance of Directory information. Each region is responsible for the loading and maintenance of its Inventory and Archives. In addition, responsibilities for maintaining protocol tables of the NCIS has been distributed.
Software is needed to load, maintain, query, and extract information and data from the NCIS. Each region is responsible for access to their archives. Resources were pooled by some regions to create software to support loading data into the generic archive model, and to move the required information to the regional Inventory. Likewise, resources were pooled to develop software to maintain protocol tables and to query the Directory and regional Inventories through a communications network (DFONet).
4.0 Major concepts
The NCIS has some underlying concepts and rules that need to be understood. There is a three level structure touched upon briefly earlier. The basic building blocks of the information in the Directory concern projects. In the Inventory, information is stored based on the concept of an event. Data are available to users based on the sensitivity of the data and implications of its potential for misuse. Some of the data in the archives have had a rating applied to them to assist users in evaluating the general applicability of the data. Finally, provision of data also implies provision of information about the data. The sections below treat each of this items in greater detail.
4.1 Overall Structure
The NCIS has three major components which form a logical structure for the data and information. The Directory holds information about the projects (goals, deliverables, resources, people involved, organizations, documents, etc.). It is the repository of the general information about the projects under which data collections were made. The Inventory holds information about regional data holdings. It contains information about the contaminants involved, details of collection, storage, and analysis procedures, where and when the observations were made, and the organisms that might have been studied. It allows a user to identify data of interest. Finally, the Archives are the repositories of the measurements. The structure of the archives varies from region to region. Some regions have archives that existed before NCIS was started. Others, had no archives. In this case, some regions collaborated to develop a generic archive structure.
The loading and maintenance of the contents of these three components is a shared responsibility as described later. Further details of the structure of the components is given in a later section.
The concept of an event is extremely important in the NCIS. In order that Inventory records should summarize the contents of archives, some grouping of the data is necessary. In its simplest terms, an event is a collection of observations with certain characteristics in common.
For the NCIS, there are two types of events. An Experiment Event is composed of the measurements collected under an experiment in which all of the conditions remained the same. That is, the hypothesis to test, the analytical methods, the organism involved and so on remain the same. Generally, Experiment Events describe data collected in a laboratory setting.
The second is the Survey Event. Generally, data collected in the natural environment are considered as Survey Events. These typically cannot control all of the environmental factors. Survey Events group observations with the same sampling and analytical procedures, and with the same contaminant observed among other factors.
In both types of events, the overall rating of the data must be the same.
4.3 Access to information and data
All information about data collections that is found in the Directory and Inventory is freely available to any user. The point of the Directory and Inventories is to make known the existence of data.
Access to the data is not so straightforward. Most data are collected using public funds and so are considered to be in the public domain. These data receive a classification of 'Open' and are available upon request.
Most data collected for scientific reasons are in the public domain. However, provision is needed to ensure that scientists have enough time to publish their findings based on the data. It is desirable to make public the existence of these data even before they are enerally available. Likewise, it may be important to allow a scientist to check with a requester of data to be sure that wrong interpretations are avoided. In these cases a classification of 'Consult' is tagged to the events.
Finally, there are data collected under monitoring or for health and safety concerns which cannot be made freely available for legal reasons. Again, it was considered better to reference these data, and so know where they reside, than to wait until free access was given. For these data, the classification is 'Restricted'. Access to these data can be granted by senior staff in DFO. If a user requests such data, he will be notified that the data are restricted and informed about how to seek permission to be provided with the data.
4.4 Data Rating
The data stored into the NCIS archives is rated by the regional data managers in collaboration with the data producer who knows the data best. The data is rated following a series of protocols developped by the University of Victoria.
4.5 Provision of Data and Information
Because of the nature of the data contained in the NCIS, substantial effort was made to store the information about the data collection process as well. Using the query capabilities of the NCIS a user can request that the data so qualified, be extracted from the appropriate archives. This is done and provided with due respect for the access privileges described before. However, proper interpretation of the data requires the information as well. Thus a user also receives the complete contents of the Directory and Inventory tables that are relevant to the events qualified. While this cannot ensure misinterpretation of the data, it puts the onus on the requester to take proper consideration of the conditions under which the data were collected and any particular characteristics of the data collection that may require greater care in interpretation.
Measurements of harmful substances in the environment are often accompanied by measurements of other physical variables such as temperature. Rather than restricting the suite of constituents referenced by the NCIS, all of the measured parameters are included in the NCIS.
A hierarchical structure for organizing the constituents was developed. At the most general level are Parameters. These are used as a broad classification into categories such as hydrocarbons, metals and so on. At the next greater level of detail Are the Groups. This is a sub category with each Parameter. Finally, there are the Constituents themselves. Mostly these are the individual substances measured.
Chemical measurements of substances permit isolation of various isomers. In the design process, it was deemed that the Inventory should not distinguish all of these minor variations. This has been extended so that within the Constituents table there exist entries that are themselves groupings of substances as well as, in some cases, the individual substances in the grouping. For example, Aliphatic Hydrocarbons is one of the listed constituents and represents a grouping of compounds. One compound in the group is Methane, which is also an entry in the Constituents list. When an event, in which Methane is measured, is created, it can be tagged as having measurements of either Methane or Aliphatic Hydrocarbons. The consequence is that, if a user chooses to find measurements based on the more specific entry (Methane), events classified as measuring Aliphatic Hydrocarbons will not be selected. As a result, the query may not identify all of the events in which methane was measured. To ensure all events of interest are found, one must either use a more general term in the hierarchy (such as one from the Groups or Parameters tables), or select both the specific and any grouping that contains the specific compound that may be present in the Constituents list.
The NCIS uses a list of chemical contaminants. The chemical contaminants are identified by the Chemical Abstracts Service (CAS) numbers, which is a specific and unique number that identifies (in all the chemical litterature) a specific chemical compound.
4.7 Taxonomic codes
The NCIS needs a taxonomic code scheme to reference organisms referenced in events. Rather than build a new one, it was decided to make use of an existing table. The one developed by the U.S. National Oceanographic Data Center  was the most extensive one available. Work was carried out to trim the entries of this table to the subset which were relevant to the NCIS. Generally, only organisms living in aquatic environments were retained. This trimmed the list to roughly 1500 entries. More information about this taxonomy can be obtained from the region which maintains the codes for NCIS.
The development of the NCIS started with the Directory and Inventory models. The design by AXYS laid out the first draft structure for the system. The Rapid Development Workshop in Ottawa largely completed the final design of the system. Since that time, extensions to add tables to assist in queries and code maintenance have been added. As well there have been minor changes in table attributes.
After the Directory and Inventory model was built, attention turned to the archival of the data. While some regions had existing archives, others had none. The design of a Generic Archive Model was undertaken. This initially handled observations resulting from Survey Events. It was expected to extend the model to include measurements from Experiment events, but this proved to be more difficult a problem than time permitted to solve. Much of the development of this model was carried out by Pacific and Maritime (Gulf) regions.
- Date modified: