argo-bann.jpg (13838 octets)

Data Management Workshop
Brest (France)
3-4-5 Octobre

                                      

Contribution  : QC Discussion paper from Bob Keeley

QC Discussion Points

QC of Real-time Profiles

Is it necessary that everyone apply the same suite of tests?

I suggest yes. If this does not happen, users of the data will get a mixed data set. This is the case with other data either those on the GTS now or received in
delayed mode from one or other provider and we should strive to improve on this. The impact will be that certain procedures may require access to reference
files, such as the same climatology or the same weekly NCEP files. Somehow all real-time QC centres need access.

Do we need to write some documentation that explains the tests? If this varies, does each centre need to do this?

I believe we need documentation and if there are variations, all must be well documented. The documentation should be detailed enough that a new centre can
pick up the document and know exactly how to do the tests. The documentation need state what are the consequences of a test failure. For example, if the
float seems to have a failing buoyancy, the observed values of the profile may be okay, it just doesn?t get as deep or stays at the surface too long. In this
case, failing the test does not cause observed values to be removed (in the case of TESAC coding) from the real-time data. For other tests, such as outside
acceptable ranges, the documentation should state clearly when the point is removed from real-time transmission (if it is).

How do we keep documentation up to date?

If all use the same documentation, a single web site can be designated the master site for the documentation. This could be at the site of the Argo
coordinator, but it could be elsewhere. It may be better at one of the QC sites since they will be closer to the workings of the software. Maintaining the web
site also means maintaining the documentation.

How careful should we be about what goes out on the GTS? Do we err on the conservative side allowing only those points and profiles that we are sure
are correct, or do we allow suspect data out?

Practically, I suspect we will allow suspect data out simply because the automated tests will not catch everything. We could use the monthly review that
MEDS conducts to help trap those data that do get out to the GTS and therefore retrospectively catch problems. However, for some users this will be too late.

How do we communicate the results to PIs? Is this necessary?

Since the data that go through the hands of the automated QC will go both to the GTS and in full resolution to PIs the results of automated tests should
accompany the data. This has implications for the exchange format. These results are both those that identify problems in the data and those that identify
problems with failing floats.

Do insertion centres need to have a facility to block suspect data from going out on the GTS?

I can see this may be very useful but we would need to be careful. This is a facility that Service Argos offers for drifter data. We may want to provide a similar
function. If so, we must be in constant touch with PIs to ensure good data are not blocked.

Can we specify now the QC tests to perform or how do we develop them?

AOML and MEDS have a precedent with the GTSPP test procedures. However, the experience is that automated tests can let obviously bad data through and
we have not yet developed the foolproof suite of tests. One strategy would be to compile the tests that each of us recommends now, implement these then
carry out a review in a year?s time to see what improvements should be made. This will get us up and running, but promises a review. We should also make
use of feedback from PIs doing the delayed QC as to what they found, how they found it, and automating the detection process.

What monitoring of data quality is required?

Since real-time QC is automated, and assuming all centres employ the same tests, an intercomparison should be done routinely where every centre
processes the same data through its automatic systems and results are compared. This will verify that test procedures are identical, or find variances due to
changes made at one centre but not at others.

What feedbacks from users should be set up and how will this be used?

The real-time data will be employed in assimilation models and other endeavours. Through their use, problems in the data will be identified. QC centres need
to get this information since it will impact QC procedures. This will be a positive and ongoing indicator of what improvements are needed. Being able to track
which centre provided the data will be valuable since the "responsible" centre can take the lead in seeking solutions. I suggest each QC centre make a
connection to a centre carrying out data assimilation. This pairing will work out how to exchange information. Argo meetings will provide a forum to discuss
feedbacks from the modelers.

QC of Delayed Mode Profile Data

Is it necessary that everyone apply the same suite of tests?

Again, I suggest yes, but this may be less practical. The important thing is that the QC process be well documented. I would suggest a tracking system such
as GTSPP which records who processed data and did what to the files. This will be useful later in sorting out versions of data and levels of QC applied.

Do we need to write some documentation that explains the tests? If this varies, does each insertion centre need to do this?

Again, documentation is crucial. Without clear and readily available documentation a user of the data will have to redo everything. Our experience is that while
some users will do this anyway, others will accept some or all results, and add other tests specific to their needs.

Do we preserve quality flags with the data and if yes, how?

I advocate preserving the flags. Then, this becomes a format issue, but there are examples around of data sets with strategies for this. GTSPP is one I am
familiar with and one I would recommend be considered.

What other information that affects the reliability and future use of the data do we need to keep?

This touches on the larger question of future use of the data and so goes beyond what might be considered quality control issues.

a.We need to keep a unique tag with each station to permit matching real-time to delayed mode data and the unequivocal substitution of delayed mode
for real-time in the final archives.
b.We need to store information about tests performed and failed. As data are collected and archives built, QC systems will evolve and users (and
archives) will want to know what tests were performed against data.
c.We need to store information about who processed the stations and what they did. This is helpful in knowing who did what to data, including QC.
d.We want to store sufficient information about the instrumentation so that should problems arise later about the performance of particular floats we have
a chance to make corrections. We want to strive to not repeat the problems encountered when manufacturers fall rate equations for XBTs were found to
be wrong. (At present, there is insufficient information about some XBTs that were used so that making corrections to them is difficult.)
e.We want to record changes made in the data as a result of QC. However, we also want the original values to be kept, in case a user wishes to
backtrack from changes.
f.We may wish to record not only assessment of data quality, but also precise information about a datum (such as which tests were failed) and
information about the suspected cause of the failure or the reason why a suspicious datum (such as a temperature inversion) is considered okay.

What monitoring of data quality is required?

Since PIs will be involved in QC of delayed mode data, we can expect variations in results. Also, it may well be sensible for different centres/PIs to carry out
different test procedures. I recommend that an intercomparison be carried out on the same delayed mode data set. The analysis of results should be carried
out by one centre, and results distributed. A discussion of the results should take place at an appropriate Argo meeting. I would recommend this happens very
soon after operations become "routine".

What feedbacks from users should be set up and how will this be used?

In delayed mode, this will be more difficult since the users will be more dispersed. However, if a regular user can be found, a QC centre should try to set up a
liaison with them to provide feedback of data quality problems.

QC of Trajectory Data

The same questions for profile data apply here, however, there is no transmission of these data in real-time.

Flags

Do we need to send flags out with the real-time data?

Historically, this has not been done. The present GTS code forms (BATHY, TESAC) have no provision for this and are unlikely to change. There is the
possibility of using BUFR, a binary format that is self describing. However, the real question is would the flags be useful to users who need the real-time rather
than delayed mode data. I have no feel for this and so need help from the users.

What level of flagging do we need? Do we attach a flag to each value indicating the assessment of the quality of the value, do we attach a flag that
indicates what tests were failed by that value, do we attach a flag that says what tests were failed and why to each value?

My own view is that a flag on a datum or grouping of data, such as a profile, should be used to indicate an assessment of the quality only. This will serve the
majority of users. Of second importance is to identify which tests were failed at the detail of a grouping of data such as a profile. To say that a profile failed a
freezing point test, for example, states that one or more points of the profile failed. The failed points will be marked with a flag indicating poor quality. Of
course, a datum failing one test may also fail others, so a flag of poor quality does not uniquely indicate which test was failed. My view is that the number of
users interested in this level of detail is low. In addition, those interested in this detail, likely will want to retest data that fail a particular test and are unlikely to
use the fact that a single datum failed a test. Finally, recording a possible reason for a failure or a reason why a suspicious point is deemed good is again of
lesser interest to people. It would be useful to record such information, to document systematic failures, provide clues to problems in test procedures, and to
identify regions of the oceans where unusual features are common. However, this should be stored in such a way that it does not "clutter" access to the data
and the simple quality assessment flags. In summary, I argue that all are desirable, but of decreasing utility to users. I would advocate a format that can
preserve it all, but one that keeps the information beyond simple data quality, in a location of the format that does not impede access to the data and quality
assessment flags.

Do we preserve flags with the delayed mode data?

Yes. Whatever we do to the data, the quality assessment should be kept with the data whether real-time or delayed. The format structures should be identical
for real-time and delayed mode so all information can be recorded at whatever is the appropriate point in the processing.

What flag convention should we use?

I recommend the one used by GTSPP. It is pretty commonly used and simple. It is

0 = no QC done

1 = data judged good

2 = data judged probably good although there is some aspect that is troubling.

3 = probably bad with insufficient information to be sure it is bad

4 = bad data

5 = data value has been changed by QC procedures. The original value is found elsewhere in the data record.

GTSPP Tests

In GTSPP we view the data arriving from the GTS using the same software as we use for other profile data arriving in delayed mode. So, we use the routines
as described in M&G#22. This provides a suite of tests looking at ranges, position checks, etc. Science centres of GTSPP employ other tests to further
identify failures missed by these tests. You can see some documentation about these test procedures at
http://www.meds-sdmm.dfo-mpo.gc.ca/meds/Prog_Int/GTSPP/QC_e.htm

The GTSPP tests are not "tuned" to find failure modes particular to profiling floats. For example, I have seen profiles from floats with spikes at the surface.
These are usually less than 1.0 degrees C and are not automatically found by the Spike Test. For profiling floats, we may want to tighten up such a test or
devise instrument specific tests

Other Tests

I did a quick poll to see what other things are tested and how. I include ones here that are not already a part of GTSPP procedures.

Buoyancy Failure

If the buoyancy adjustment begins to fail then the float will spend increasing time at the sea surface and this will show up as larger drift segments. My own
estimation is that such a failure is an indication of the coming demise of a float, but that the profile data returned from the float is still okay. This is information
the PI may wish, but not something we need include in archived data.

Failure to reach Parking Depth

I view this as the same type as a buoyancy failure and would recommend the same handling.

T-S Failure

This uses a T/S plot procedure to look for a conductivity failure. I am not sure how this would be automated except through some sort of T/S climatology.
Assuming there is such a test, it should result in quality flags assigned to data.

Anomaly Failure

This looks for unusual changes in anomalies of T and S. The float history is used to generate mean profiles and then anomalies computed for each profile
based on the mean. Any abrupt or large changes would be viewed with suspicion. Again, I am not sure how we could automate such a test, but this might be
appropriate for the high resolution data processing.

NCEP Comparison Failure

This compares the latest data to the previous week?s temperature analysis from NCEP. It assumes NCEP is correct and if data are > 3 standard deviations
from NCEP, they are considered bad.

Other Issues

I invite everyone to offer suggestions of other points that need discussion.