Overview of The Database Design
The design of the NCIS Archive tries to achieve three things.
- Allow full documentation of all characteristics of samples and measurements. Many purpose fields exist as well as description fields to record additional supporting information for some of these fields. Also a flexible and expandable ‘meristics’ component exists to allow biological sample characteristics, biological responses, as well as other unforeseen information to be documented.
- Permit complex relationships between samples to be explicity documented. This includes complex subsampling and pooling, and relating replicate samples.
- Minimize repetition of information in the database by allowing related samples to be ‘nested’. A sample lower in the nesting structure will ‘inherit’ the characteristics of any samples above it. In this way more general information need only be stored once, against one sample as high in the nesting structure as possible, such that this information applies to all samples below it.
More on How to Report Sample Relationships
The following discussions are meant to provide a bit of overview on how entries in the different database tables go together to record all the details available about the samples you work with in your spreadsheet datasets.
They assume you have read the table and field descriptions. A lot of information presented there is not repeated here. If you haven’t read this material you should refer to them as you work your way through the following.
An ES can consist of one-or-more classes of samples (ie Sample Classes).
The idea is to include everything sampled at the same place and time, using the same sampling method, under one ES in order to minimize repeating the sampling information many times in the database.
Environmental Samples and Meristics
Create Meristics for any supporting information that is directly applicable to the particular ES you are dealing with.
- IF the ES actually represents a whole organism (eg. the ES is a single fish that was caught) then, you can create true biological meristics for sex, age, length etc, just as you would for an AS which represented a whole fish.
- Suppose the ES is a sediment core, and suppose also that measurement of sediment deposition rate was obtained from the sampling site. This rate represents a non-biological ‘meristic’ that would be appropriate to link to the ES
There are four (4) sample classes (SC): Geophysical (sediments), Water, Air, or Biota.
You would need one SC record for each different class of sample which the sampling device collected.
For Biota, the organism(s) can be identified to any taxa level, not just to species level.
Suppose you took a box core of sediment. From this sample you took a subsample of the sediment for chemical analysis, and you then sieved another larger subsample to collect all the macro benthic infauna (clams, worms, etc) that were living in the sediment. The box core represents a single ES, but it also constitutes one SC for the sediment, and additional SCs for the biota: a separate SC for each species of organism identified.
Alternatively, consider the case where each species of clam was examined independently for some contaminants, but an additional sample was created by pooling material from all species in order to get enough material to look for a particularly rare and low-concentration contaminant. In this case you would also need a SC where the organism is specified as the lowest common denominator taxa (eg. The Family, if all the clams belonged to the same taxonomic family).
- A trawl net is hauled up on deck. In the net are 3 species of fish. This represents one ES, and three SCs: one for each species of fish.
- A sediment core is collected. It is sectioned into 10 slices for the purposes of examining the vertical stratigraphy of contaminants with depth in the sea floor. This represents one ES, and one SC.
A SC can yield many ASs, and any particular AS can only belong to one SC.
Groups of ASs can be related to each other in complex ways. For example it is common for samples (or portions thereof) to be pooled to form an additional ‘composite’ sample. These common relationships are illustrated with the following examples.
Some Examples of Common Sample relationships:
- Single fish, multiple tissues.
If Meristics (eg. Sex, age) measured on fish create AS#1 for the fish as a whole organism, and record the meristics against this sample. Create AS#2 for the liver, and AS#3 for the muscle. To preserve the relationships create Produced-By (PB) and Contributed-To (CT) records as follows:
AS#1 CT AS#2, AS#3
AS#2 PB AS#1
AS#3 PB AS#1
- Livers from multiple juvenile fish forming a single composite sample.
3 Fish involved. Eviscerated carcass of each fish ground up and analyzed separately. Create AS#1, AS#2, AS#3 for the individual fish. Create AS#4 for the composite liver sample. Create sample heritage records as follows:
AS#4 PB AS#1,AS#2, AS#3
AS#1 CT AS#4
AS#2 CT AS#4
AS#3 CT AS#4
Analytical Samples and Measurements
All ASs do not necessarily have to have Measurements associated with them. An AS may not necessarily have been measured for contaminants. You can have the situation where two fish are caught. A separate AS is created for each. Fish #1 has it’s length, weight, and age measured and these data are recorded as associated MERISTICS, but no contaminants are measured. Fish #2 also has biological meristics recorded but it is also ground up and analyzed for contaminants, the results of which are recorded as measurements.
More complicated ES/SC/AS scenarios:
As a Data Analyst you will be creating the SC records to which all ASs will later be linked. You need to be quite clear on what is required in different circumstances. Above it was stated that "any particular AS can only belong to one SC". Here is a not very likely but certainly possible situation you might encounter in preparing a dataset.
- AS #1 is the liver from a fish belonging to ES#1, for which you would need a SC#1. AS #2 is the liver from a fish belonging to ES#2, for which you would need a SC#2. A ‘composite’ sample AS#3 is created by pooling material from AS #1 and #2.
- Question: What ES/SC record do you link AS #3 to?
- Solutions: ?? Think about this and come and see me.
If data you are working on involves this situation, discuss it with your Data Manager.
Analytical Samples and Meristics
Create Meristics for any supporting information that is directly applicable to the particular AS you are dealing with.
- Ex. 1 The AS is a whole fish. Create meristics for the whole body measurements such as age, sex, length, weight, gonado-somatic index, etc.
- Ex. 2 The AS is for the liver from a fish. Create meristics for organ-relevant measurements such as: weight of the entire liver (ie whole organ weight), % Moisture or % Lipid in the portion of the liver used to create the actual sample.
The most important thing to remember about measurements is that there can only be one measurement of a specific contaminant (eg. Copper, or PCB111) in a given sample. Suppose you had a sample where Copper was in fact measured by two different techniques. You can NOT record both these measurements against the one sample. You would have to get the researcher to pick one of the measurements as being the representative value. The alternative is to create two samples. And this is most likely what must have happened physically since a sample split most likely occurred with portion A being put in a machine which measured Copper by method A, while portion B went to machine B using method B.
There is actually nothing in the NCIS database structure which prevents recording two measurements of the same contaminant in the same sample. However the applications software does expect there to be only one.
A common limitation of many database systems is to predefine, and therefore limit, the parameters that can be recorded. This is done by creating a table(s) with a purpose field for each parameter to be recorded. This is an inflexible approach, and is fundamentally at odds with the very of nature of science, which is the discovery of new things to be measured.
To avoid this problem we developed the ‘Meristics’ approach. Initially, as the name implies, the Meristics table was developed as a means of recording supporting biological information about the samples. This was to include standard meristic items such as the length, weight, age, sex, etc of a fish which are so important to the interpretation of the contaminant burden measured in the animal. The table was also to be used for recording those ‘biological response’ measurements made on a sample which are not characterized by entries in the Constituents table. This would include a wide range of measurements from %Moisture and %Ash, through to behavioral observations.
The Meristics construct is robust enough to be extended to cover almost any aspect about a sample you might come across.
So to some extent the use of the word meristics as a name for this table is misleading in terms of the scope of data that can be recorded within it.
Rules for Using Meristics
All meristics must be ‘registered’ in the Meristics_Code table. This has to happen before the data can be loaded into the NCIS archive database. There are two important concepts you need to know about in order to properly organize meristic data so that it will load successfully.
First, there are two types of meristics. Some are measured empirically, such as %Moisture and Age of an organism. Others are characterized by means of a code of some sort, such as Sex where M = male and F = female. In the latter case, it is important to try to create and use coding schemes that can be shared by most if not all the researchers’ data. For example we obviously don’t want three or more coding schemes for Sex if we can avoid it.
Second, every Meristic_Type (eg Sex) must have a Method_Name associated with it. Thus meristics exist as a type/method combination.
Consult your Data Manager for assistance in deciding how to implement meristics for datasets that you work on.
QA Samples differ from AS samples in that the former do have purpose fields for recording %Lipid and %Moisture measurements made on the sample. In AS samples you would have to use Meristics to do this. QA samples are not linked to meristics however. This has an impact in preparing import sheets, since you would have to use purpose tags to flag these data for QA samples, but use a Meristic to do the same for AS samples.
Sample Management Strategies
When deciding how to treat very closely related AS samples, here are some pointers to keep in mind.
Situation #1. Recombining "part" Samples.
It is not uncommon for the ‘same’ sample to be analyzed at more than one laboratory. For example the researcher may prepare a sample right through to final purification. He then may split the sample in three, with each sample bearing the same ID except that each has a unique suffix, such as adding a different letter (eg. 95-A, 95-B, 95-C). Sample A is retained by the researcher who analyses it for a suite of contaminants we’ll call suite A. Sample B goes to contract lab B who analyses for suite B, and Sample C goes to lab C for suite C analyses. This then is how you get the data. How are you going to prepare it? You have two choices: (1) Leave it as is, or (2) recombine A,B, & C and have just a single sample with a Sample ID of ‘95’.
The recombination option is preferred. First of all it simplifies the representation of these data in the database. Secondly, no information need be lost by this recombination. You merely have to note as a comment, using the AS.General Description field, that lab A did Suite A analyses, lab B did suite B, and lab C did suite C.
Situation #2. Avoiding Repetitious Descriptive or Meristic Information.
Often the same descriptive comments apply to all ASs generated from an ES. Rather than repeat this textual information for every AS, strive to report it at the highest level in the Sample nesting as is appropriate.
Ex. 2-1 The analytical method description may apply to all ASs from a given ES. Therefore place this in the General Description field of the ES if possible and save repeating it within each AS sample. When data are retrieved by NCIS, all ES information, including descriptive textual information, will be part of what is sent to the user. So why not store it just once against the ES and reduce the storage space of the dataset in the database.
Ex. 2-2 An AS exists for a fish as a whole organism. Three other related ASs exist: one each for the liver, muscle, and blood from this fish. Record the Length, Weight and Age information for the fish against the whole organism AS to avoid repeating it for each of the organ ASs.
Situation #3. Simplifying the Reporting of Personnel Responsibility Information
NCIS does allow you to record explicitly against each sample, the various people and their various responsibilities for a sample, and explicitly record the person’s role as an analyst responsible for a measurement. This was a design requirement for some but not all DFO regions to use the Archive Database. The intent was that a NCIS user should be able to search for data based in whole or in part on who did the work, or had various responsibilities for the data.
At IOS in Pacific Region, we believe that this ability is overkill and not a feature we wish to support to such a degree in our data. We do do the following however.
- We DO explicitly record the PI (only one) for all ES samples, and any Scientific Collaborators (SCI). We do think it likely that users will want to search for data in terms of who is ultimately responsible for it. For example I may want to pose the query, "Give me all Joe Blow’s data." What this translates to is all samples for which Joe Blow is listed as the PI
- By reporting the PI against the ES, this person is automatically the PI for all ASs belonging to that ES.
- We try to report SCI people only against the ES if possible and applicable. As such they are automatically SCI for all ASs in the same manner as the PI If there is an AS where the SCI(s) is different than those linked to the ES then SCI(s) can also be explicitly linked to the AS. This may be the case for a fish sample where DFO researcher Joe was the PI , and DFO researcher Fred was a SCI in all matters except some blood analyses. A colleague of theirs, Tom at UBC, did the blood work for them but had no other involvement with the project. So for the blood samples Tom could be explicitly linked as a SCI. But you would only do this if you decided it would be warranted in terms of supporting a desire by users be able to search for Tom’s data specifically. In this case I doubt this would be true.
- We can report people with SCI and other responsibilities descriptively in a ‘description’ field if the PI of the data desires us to do so. Again we do this against the ES if possible.
By not fully implementing the explicit recording of all people the only functionality we are forgoing is the ability to do a search to find those samples where Joe Blow had Technical or Contractor responsibilities, or finding those Measurements where he was an Analyst. We consider such a need as a highly unlikely search scenario. What we gain is easier data preparation and simplifying the storage of the dataset in the Archive.
More on Understanding Chemistry Sample Relationships
The following discussions are meant to provide a bit of overview on the different types chemistry samples you work with in your spreadsheet datasets, and how they relate to each other.
They assume you have read the table and field descriptions. A lot of information presented there is not repeated here. If you haven’t read this material you should refer to them as you work your way through the following.
Related Database Tables
An Illustration using a Dioxin Analysis Example
Let’s consider a "day-in-the-life" of the Regional Dioxin Lab, by examining how a tray of samples is assembled for a run through the Mass Spectrometer. This is among the more involved analytical procedures, and by discussing this more complex case it should then be comparatively easy for you to work with data from simpler analytical procedures.
The initial steps to prepare a hunk of fish liver into a purified sample suitable for instrument analysis can be ignored. The end result is that vial of the purified material is placed into the Mass Spectrometer tray ready for contaminant detection. The livers from other fish are similarly prepared to produce other vials of ‘real’ samples. Typically several such vials (6-10?) form a ‘batch’ and are placed sequentially into slots in a tray that can hold many samples.
QAQC Sample Groups
Each of these ‘real’ sample batches are preceded (followed?) by QA samples, typically a Procedural Blank sample, and sometimes also a Certified or Standard Reference Material sample. This collection of batch of ‘real’ samples and it’s preceeding QA samples we call a QAQC Group. There will normally be several such groups of samples prepared to be ‘run’ through the Mass Spec together. The idea is that as the samples in the batch are sequentially processed, the data from the associated QA samples for that batch are used to confirm that the analytical procedure was in control, and that no contamination of the samples occurred during workup and analysis. Thus we need a way to record what QA samples belong to what ‘real’ samples.
For the purposes of preserving this QAQC group association for NCIS, we assign each such group a code which we call the QAQCNO. There is also an import tag by the same name. The value of QAQCNO can be anything at all and only has to be unique within the data you intend to import into a CSAdatabase. This code does not get exported from this CSAdatabase, but is used to preserve the associations into the NCIS Archive database.
Another aspect of documenting analytical control is to add surrogates, or internal standards, to both real and QA (blanks) samples. Surrogates are chemicals very like the ‘real’ chemicals which the analysis is attempting to detect. In fact a common way to make surrogates is simply to radio-label (eg Carbon 13) a selection of the same compounds that will be the target of the analysis. Surrogates are used to check on how well the analytical procedure and instrumentation can detect the ‘real’ chemicals of interest. A given amount of a surrogate is added and from the amount detected the percentage recovery is calculated. This % recovery value can then be used to ‘correct’ the detection values the machine reported for the ‘real’ chemicals.
Such surrogates can be added at the beginning of the sample preparation process as a means of determining any loss of ‘real’ chemicals occurring as a result of sample workup. Alternatively they may also be added to a sample just before it goes into the Mass Spec for detection. In this latter case the term ‘internal standard’ is more common, and these ‘standards’ then serve to assess the accuracy of the instrument measurements.
Surrogate chemicals must be registered in the NCIS Constituents table.
Certified Reference Materials
These are materials which mimic as closely as possible the matricies found in real samples. For example there are CRMs for sediment, others for fish muscle, and so on. The idea is to make the CRMs as similar to the real materials as possible so that the extraction and analysis of chemicals from them will be as comparable as possible with that happening from real samples.
CRMs contain known and certified concentrations of various ‘real’ chemicals. Our lab’s procedures and instrumentation must be sufficient to determine these concentrations to a specified level of accuracy, to prove the lab’s processes are in control, reliable and accurate.
Another aspect of CRMs is that they are certified by the manufacturer (eg NRC) to be stable over time under the prescribed storage regime.
CRMs must be registered in the NCIS Arc_Mediums table.
Standard Reference Materials
These are materials which are very like CRMs except that they lack the certification. They are generally created by a lab for it’s own use. They are usually created because there are no suitable CRMs available.
SRMs must be registered in the NCIS Arc_Mediums table.
Spiked Matrix or Spiked Blanks
These are technically different from SRMs but are treated as SRMs in NCIS (for now). These are samples of a material (usually whatever the true ‘blanks’ are composed of) that are created with a known concentration of a selection of the ‘real’ chemicals which are to be looked for in the ‘real’ samples.
Most measurements will have some sort of detection limit (DL) associated with them. The idea is that the relative size of this DL value compared with the actual measured (ie detected) value for the chemical gives you some insight into how believable the actual measured value is. Measured values that are within certain limits of the detection limit value may have to be viewed with some reservations as to the true value present in the sample.
Detection Limits can be determined in different ways. In the simplest case, tests of the procedures and instrumentation can establish a "Method Detection Limit". In our dioxin example "Standard Detection Limits" are actually calculated (by some complicated balck magic) separately for each congener.
Regardless of the method used, the Detection Limit is stored in only one field, the SDL_Value field of the MEASUREMENT (and QA_Measurement) table. Which method applies should be noted descriptively in the Analysis Descriptor field.
We have to be careful in talking about sample ‘replicates’. This word carries two different connotations.
Field or Environmental replicates are taken to assess the natural variability of contaminants in the environment. As such, two such samples would be treated as separate ESs, with descriptive comment under General Description to note their replicate nature. Suppose these are two sediment grabs and each generates a single AS. You would NOT declare the sample type of these two ASs as being ‘replicates’.
True analytical replicates are used to assess the precision of the analytical procedure and detection instrumentation. Replicate samples are treated internally by NCIS as ‘real’ AS samples, and not as a QA sample. This is the case since a replicate, while it does serve as a check on the analytical procedure, may also be legitimate environmental data. Suppose from a single fish two samples of it’s liver are excised and analyzed as two ASs. The results from both samples, be these results similar or distinct, represent the variability (or lack of it) of contaminant burden within the liver as an organ. You WOULD declare the sample type of these two ASs as being ‘replicates’. And you would use the REP import tag to explicitly link these two samples to each other as replicates. Of course part of any observed variability in the above case could be due to the analytical process and not necessarily dependent upon where in the liver the sample came from.
A complication arises if you are dealing with two sets of measurements done on the actual same sample. For example this would be the case if a purified sample is split in two just before the vials are placed into the Mass Spec tray. This would also be the case if the Mass Spec probe was physically lowered into the same vial twice during the run. In both cases the intent is strictly to assess the precision of the instrumentation. In NCIS we have no way of dealing with this situation explicitly. We either don’t report data like this, or each ‘portion’ of the same purified sample becomes a sample in it’s own right, treated as replicates of each other as above, and with a textual description of the nature of their being replicates.
- Date modified: