|
| |

Data
Management Workshop
Brest (France)
3-4-5 Octobre |
Contribution : QC Discussion paper from Bob
Keeley
QC Discussion Points
QC of Real-time Profiles
Is it necessary that everyone apply the same suite of tests?
I suggest yes. If this does not happen, users of the data will get a mixed data set. This
is the case with other data either those on the GTS now or received in
delayed mode from one or other provider and we should strive to improve on this. The
impact will be that certain procedures may require access to reference
files, such as the same climatology or the same weekly NCEP files. Somehow all real-time
QC centres need access.
Do we need to write some documentation that explains the tests? If this varies, does each
centre need to do this?
I believe we need documentation and if there are variations, all must be well documented.
The documentation should be detailed enough that a new centre can
pick up the document and know exactly how to do the tests. The documentation need state
what are the consequences of a test failure. For example, if the
float seems to have a failing buoyancy, the observed values of the profile may be okay, it
just doesn?t get as deep or stays at the surface too long. In this
case, failing the test does not cause observed values to be removed (in the case of TESAC
coding) from the real-time data. For other tests, such as outside
acceptable ranges, the documentation should state clearly when the point is removed from
real-time transmission (if it is).
How do we keep documentation up to date?
If all use the same documentation, a single web site can be designated the master site for
the documentation. This could be at the site of the Argo
coordinator, but it could be elsewhere. It may be better at one of the QC sites since they
will be closer to the workings of the software. Maintaining the web
site also means maintaining the documentation.
How careful should we be about what goes out on the GTS? Do we err on the conservative
side allowing only those points and profiles that we are sure
are correct, or do we allow suspect data out?
Practically, I suspect we will allow suspect data out simply because the automated tests
will not catch everything. We could use the monthly review that
MEDS conducts to help trap those data that do get out to the GTS and therefore
retrospectively catch problems. However, for some users this will be too late.
How do we communicate the results to PIs? Is this necessary?
Since the data that go through the hands of the automated QC will go both to the GTS and
in full resolution to PIs the results of automated tests should
accompany the data. This has implications for the exchange format. These results are both
those that identify problems in the data and those that identify
problems with failing floats.
Do insertion centres need to have a facility to block suspect data from going out on the
GTS?
I can see this may be very useful but we would need to be careful. This is a facility that
Service Argos offers for drifter data. We may want to provide a similar
function. If so, we must be in constant touch with PIs to ensure good data are not
blocked.
Can we specify now the QC tests to perform or how do we develop them?
AOML and MEDS have a precedent with the GTSPP test procedures. However, the experience is
that automated tests can let obviously bad data through and
we have not yet developed the foolproof suite of tests. One strategy would be to compile
the tests that each of us recommends now, implement these then
carry out a review in a year?s time to see what improvements should be made. This will get
us up and running, but promises a review. We should also make
use of feedback from PIs doing the delayed QC as to what they found, how they found it,
and automating the detection process.
What monitoring of data quality is required?
Since real-time QC is automated, and assuming all centres employ the same tests, an
intercomparison should be done routinely where every centre
processes the same data through its automatic systems and results are compared. This will
verify that test procedures are identical, or find variances due to
changes made at one centre but not at others.
What feedbacks from users should be set up and how will this be used?
The real-time data will be employed in assimilation models and other endeavours. Through
their use, problems in the data will be identified. QC centres need
to get this information since it will impact QC procedures. This will be a positive and
ongoing indicator of what improvements are needed. Being able to track
which centre provided the data will be valuable since the "responsible" centre
can take the lead in seeking solutions. I suggest each QC centre make a
connection to a centre carrying out data assimilation. This pairing will work out how to
exchange information. Argo meetings will provide a forum to discuss
feedbacks from the modelers.
QC of Delayed Mode Profile Data
Is it necessary that everyone apply the same suite of tests?
Again, I suggest yes, but this may be less practical. The important thing is that the QC
process be well documented. I would suggest a tracking system such
as GTSPP which records who processed data and did what to the files. This will be useful
later in sorting out versions of data and levels of QC applied.
Do we need to write some documentation that explains the tests? If this varies, does each
insertion centre need to do this?
Again, documentation is crucial. Without clear and readily available documentation a user
of the data will have to redo everything. Our experience is that while
some users will do this anyway, others will accept some or all results, and add other
tests specific to their needs.
Do we preserve quality flags with the data and if yes, how?
I advocate preserving the flags. Then, this becomes a format issue, but there are examples
around of data sets with strategies for this. GTSPP is one I am
familiar with and one I would recommend be considered.
What other information that affects the reliability and future use of the data do we need
to keep?
This touches on the larger question of future use of the data and so goes beyond what
might be considered quality control issues.
a.We need to keep a unique tag with each station to permit matching real-time to delayed
mode data and the unequivocal substitution of delayed mode
for real-time in the final archives.
b.We need to store information about tests performed and failed. As data are collected and
archives built, QC systems will evolve and users (and
archives) will want to know what tests were performed against data.
c.We need to store information about who processed the stations and what they did. This is
helpful in knowing who did what to data, including QC.
d.We want to store sufficient information about the instrumentation so that should
problems arise later about the performance of particular floats we have
a chance to make corrections. We want to strive to not repeat the problems encountered
when manufacturers fall rate equations for XBTs were found to
be wrong. (At present, there is insufficient information about some XBTs that were used so
that making corrections to them is difficult.)
e.We want to record changes made in the data as a result of QC. However, we also want the
original values to be kept, in case a user wishes to
backtrack from changes.
f.We may wish to record not only assessment of data quality, but also precise information
about a datum (such as which tests were failed) and
information about the suspected cause of the failure or the reason why a suspicious datum
(such as a temperature inversion) is considered okay.
What monitoring of data quality is required?
Since PIs will be involved in QC of delayed mode data, we can expect variations in
results. Also, it may well be sensible for different centres/PIs to carry out
different test procedures. I recommend that an intercomparison be carried out on the same
delayed mode data set. The analysis of results should be carried
out by one centre, and results distributed. A discussion of the results should take place
at an appropriate Argo meeting. I would recommend this happens very
soon after operations become "routine".
What feedbacks from users should be set up and how will this be used?
In delayed mode, this will be more difficult since the users will be more dispersed.
However, if a regular user can be found, a QC centre should try to set up a
liaison with them to provide feedback of data quality problems.
QC of Trajectory Data
The same questions for profile data apply here, however, there is no transmission of these
data in real-time.
Flags
Do we need to send flags out with the real-time data?
Historically, this has not been done. The present GTS code forms (BATHY, TESAC) have no
provision for this and are unlikely to change. There is the
possibility of using BUFR, a binary format that is self describing. However, the real
question is would the flags be useful to users who need the real-time rather
than delayed mode data. I have no feel for this and so need help from the users.
What level of flagging do we need? Do we attach a flag to each value indicating the
assessment of the quality of the value, do we attach a flag that
indicates what tests were failed by that value, do we attach a flag that says what tests
were failed and why to each value?
My own view is that a flag on a datum or grouping of data, such as a profile, should be
used to indicate an assessment of the quality only. This will serve the
majority of users. Of second importance is to identify which tests were failed at the
detail of a grouping of data such as a profile. To say that a profile failed a
freezing point test, for example, states that one or more points of the profile failed.
The failed points will be marked with a flag indicating poor quality. Of
course, a datum failing one test may also fail others, so a flag of poor quality does not
uniquely indicate which test was failed. My view is that the number of
users interested in this level of detail is low. In addition, those interested in this
detail, likely will want to retest data that fail a particular test and are unlikely to
use the fact that a single datum failed a test. Finally, recording a possible reason for a
failure or a reason why a suspicious point is deemed good is again of
lesser interest to people. It would be useful to record such information, to document
systematic failures, provide clues to problems in test procedures, and to
identify regions of the oceans where unusual features are common. However, this should be
stored in such a way that it does not "clutter" access to the data
and the simple quality assessment flags. In summary, I argue that all are desirable, but
of decreasing utility to users. I would advocate a format that can
preserve it all, but one that keeps the information beyond simple data quality, in a
location of the format that does not impede access to the data and quality
assessment flags.
Do we preserve flags with the delayed mode data?
Yes. Whatever we do to the data, the quality assessment should be kept with the data
whether real-time or delayed. The format structures should be identical
for real-time and delayed mode so all information can be recorded at whatever is the
appropriate point in the processing.
What flag convention should we use?
I recommend the one used by GTSPP. It is pretty commonly used and simple. It is
0 = no QC done
1 = data judged good
2 = data judged probably good although there is some aspect that is troubling.
3 = probably bad with insufficient information to be sure it is bad
4 = bad data
5 = data value has been changed by QC procedures. The original value is found elsewhere in
the data record.
GTSPP Tests
In GTSPP we view the data arriving from the GTS using the same software as we use for
other profile data arriving in delayed mode. So, we use the routines
as described in M&G#22. This provides a suite of tests looking at ranges, position
checks, etc. Science centres of GTSPP employ other tests to further
identify failures missed by these tests. You can see some documentation about these test
procedures at
http://www.meds-sdmm.dfo-mpo.gc.ca/meds/Prog_Int/GTSPP/QC_e.htm
The GTSPP tests are not "tuned" to find failure modes particular to profiling
floats. For example, I have seen profiles from floats with spikes at the surface.
These are usually less than 1.0 degrees C and are not automatically found by the Spike
Test. For profiling floats, we may want to tighten up such a test or
devise instrument specific tests
Other Tests
I did a quick poll to see what other things are tested and how. I include ones here that
are not already a part of GTSPP procedures.
Buoyancy Failure
If the buoyancy adjustment begins to fail then the float will spend increasing time at the
sea surface and this will show up as larger drift segments. My own
estimation is that such a failure is an indication of the coming demise of a float, but
that the profile data returned from the float is still okay. This is information
the PI may wish, but not something we need include in archived data.
Failure to reach Parking Depth
I view this as the same type as a buoyancy failure and would recommend the same handling.
T-S Failure
This uses a T/S plot procedure to look for a conductivity failure. I am not sure how this
would be automated except through some sort of T/S climatology.
Assuming there is such a test, it should result in quality flags assigned to data.
Anomaly Failure
This looks for unusual changes in anomalies of T and S. The float history is used to
generate mean profiles and then anomalies computed for each profile
based on the mean. Any abrupt or large changes would be viewed with suspicion. Again, I am
not sure how we could automate such a test, but this might be
appropriate for the high resolution data processing.
NCEP Comparison Failure
This compares the latest data to the previous week?s temperature analysis from NCEP. It
assumes NCEP is correct and if data are > 3 standard deviations
from NCEP, they are considered bad.
Other Issues
I invite everyone to offer suggestions of other points that need discussion.
|