Use of results of interlaboratory comparisons

Abstract

The ISO/IEC 17043 and ISO 13528 standards, which deal with inter laboratory comparisons for testing of the proficiency of the laboratories, explicitly mention that abnormal test results may be obtained even by laboratories having good practice from competent personnel. That is why the results of an inter laboratory comparison must not be used to condemn a participant. They shall be used as alerts that must trigger a search of the possible causes of deviations and, as far as necessary, a corresponding appropriate corrective action.

Introduction

In most cases, the motivation for a laboratory to participate to an inter laboratory comparison to participate is to fulfil the requirements for accreditation (especially the § 5.9 of the ISO/IEC 17025 standard).

After having participated, the laboratory shall analyse the results of the evaluation included in the report of the inter laboratory comparison that it receives. Several risks of misunderstanding of these results exist, which may limit the advantages that the participant gets from its participation. These misunderstandings often come from a lack of knowledge of the methods of elaboration of the individual results and of the meaning of the alerts.

The scope of this document is:

To explain the ways with which the results of evaluation are obtained and the risks of possible misunderstanding;
To propose actions to undertake when an alert is triggered.

Choice of the parameters used to evaluate the proficiency of a laboratory

The organisation of an inter laboratory comparison implies a prior decision on what will be the parameters used to measure the proficiency of the laboratory.

For this, the organisers of inter laboratory comparisons generally use ideas coming from the ISO 5725-1, which defines the accuracy of a test method as a combination of its trueness and of its precision. In this standard:

The “trueness” is defined as the narrowness of the agreement between the mean value of a large series of tests and an accepted value of reference. Translated into usual words, test results are true when their mean value is equal the “true” value;
The “precision” is defined as the narrowness of the agreement between the test results in fixed conditions. Translated into usual words, test results are precise when several different tests provide “similar” results.

The standard mentions that the concept of precision is related to the conditions of tests which are fixed and those which are not fixed. The limit conditions of precision are:

The maximal precision of the method, named “repeatability”, for which all parameters that may be fixed are fixed: the test are performed on identical test specimens, in a same laboratory, by a same operator, with same equipment and same consumables, within a short time interval;
And the minimal precision, named “reproducibility”, for which only the samples and the test method are identical: tests are performed in different laboratories, by different operators, with different equipment and consumables, within a long time interval.

As a consequence of the upper, the organisers of inter laboratory comparisons generally compute:

The bias of the participant laboratories to assess the trueness of their results;
The repeatability of the participant laboratories to assess the part of the precision of their results which is attributable to them.

Note: In addition to these parameters, which are parameters describing the proficiency of the participants, the inter laboratory comparisons provide general information on the test method that is useful for the laboratories: the repeatability and the reproducibility of the method.

The standard ISO 13528 provides the guidance necessary to compute these parameters and to check with appropriate statistical tests whether the results of the participant are significantly different from the values regarded as true (see Abstract of the ISO 13528).

The organiser of the inter laboratory comparison may also ask the participants to provide their uncertainty on their results and use this information to assess the proficiency of a laboratory. However, the standard ISO 13528 does not cover the assessment of the uncertainties provided by the participants.

In most cases, the proficiency of a laboratory is assessed by the determination of the trueness and the precision of its results.

Conventional aspect of the assessment

All assessments contain a more or less important part of convention for the determination of:

The value of reference regarded as “true”;
The value of scatter taken as reference;
The limits chosen to decide alerts.

The existence of a value that can be regarded as true is not always obvious. In some cases (especially the case of the determination of the composition of a mix), it clearly exists. In other cases (especially when the material is intrinsically heterogeneous), the standard ISO 5725-1 acknowledges that it might not exit and the assigned value is then conventional. Among the inter laboratory comparisons organised by CompaLab, a typical example of it is the measurement of A_gt in accordance with the standard ISO 6892-1: the results obtained for this characteristic change according to where the determination is performed on the test specimen, according to the length of the test specimen, according to the length of the basis of measurement, etc. ... a “true” value should then take into account all possible places of measurement on the test specimen, all possible lengths of test specimens, all possible lengths of basis of measurement, etc. ... This is technically impossible hence the test is destructive and hence an infinite number of lengths of test specimen and of measurement basis are possible for a finite length of sample. The assigned value is then an accepted reference value determined in accordance with a determined process, but which includes a part of convention.

The existence of a value of scatter which can be regarded as “true” is also not always obvious, especially in the same cases and for the same reasons than upper. Confusion should be avoided between this problem and the problem of the estimation of the value (always a bit difficult in the case of the estimation of a scatter). The first problem is a technical problem of existence of a “true” value, while the second one is a classical problem of statistics concerning the estimation of the value of a parameter from a limited number of determinations.

The clauses 5 and 5 of the standard ISO 13528 includes the possibility that the values of reference for the assessment of laboratories come from a standard or a regulation. In this case, the definition of these values makes them conventional.

Most of the times, the “true” values used to assess the laboratories include then a more or less important of convention.

Concerning the limits, the habits enforced by the standard ISO 13528 encourage the organisers of inter laboratory comparisons to grant scores to the results of participants. These scores are then used in a statistical test to decide whether they have a maximum probability of 0,3% or they have a maximum probability of 5% to overcome corresponding limits if the test results are part of their mother population.

These maximum probabilities come from the “u” values of the (standard) normal law of distribution respectively equal to ±2 and ±3 and are totally arbitrary. Limit values of probability of 1% or 10% would make sense as much as these ones, with alert limits obviously different!

Reliability of the assessment

The reliability of any assessment of the performance of a laboratory with an inter laboratory comparison is conditioned by:

The stability and the homogeneity of the samples which were used;
The adequacy and the robustness of the statistics which were used, with regard to technical issues related to the test methods and the number of participants;
The influence of possible artefacts, particularly inappropriate rounding of results;
When applicable, the way with which the measurement of uncertainty was taken into account.

The main scope of the standards ISO/IEC 17043 and ISO 13528 is to assure the reliability of the assessments realised with inter laboratory comparisons. In particular, the issues of stability and homogeneity of samples, adequacy of the statistic methods are widely developed. An appropriate assessment of the performances is obviously impossible if the samples used for it are not enough similar or if inappropriate statistical methods are used. For more details concerning this issue, please see the abstract of the ISO 13528.

Fulfilling the requirements of the standards ISO/IEC 17043 and ISO 13528 assures the avoidance of risks of unreliability related to the operations carried out under the control of the organiser of the inter laboratory comparisons.

Independently from the wording of these texts, we should keep in mind that a statistical computation cannot provide more information than what the data contains. Consequently, the conditions of elaboration of the data count as much as the adequacy of the used statistical methods. Among the most important issues to control, there are:

The control of the scatter which is taken into account;
The artefacts.

A significant example of the importance of the control of the scatter which is taken into account is the case when the preparation procedure of the test specimens significantly influences the test results. The competence of the laboratory for preparing correctly the test specimens may then become more critical than the competence necessary to carry out properly the tests themselves. The organiser of the inter laboratory comparison shall in these cases decide, with regard to technical issues related to the product and the test method, which part of the preparation of the test specimens is granted to the participants, which parameters of preparation are fixed by the organiser and which parameters are left at the discretion of the participant. The participants shall be aware of what the published standard deviations represent, according to how the parameters of preparation of the test specimens were fixed.

Note: Among the inter laboratory comparisons organised by CompaLab, the Charpy test is particularly concerned by this issue. The corresponding test standard requires narrow dimensional tolerances for the test specimens. The meaning of the published standard deviations is obviously not identical if the test specimens were machined by the organiser or if they were machined by the participants.

Another significant artefact is the case of inappropriate rounding of the test results provided by the participants. Obviously if, to limit, results are rounded at ±1 while the reproducibility of the test is ±0,1, all results of the participants will be rounded to the same value and the standard deviations computed by the organiser will all be equal to 0. In this case, the inappropriate rounding suppressed from the data all relevant information concerning the scatter, and any mathematically correct computation provides a wrong result for the assessment. Practically speaking, such an inappropriate rounding never occurs because it obviously causes a big problem to the organiser. However, rounding the test results in the range of the uncertainty of the results is a good practice of laboratories. But this is harmful when they participate to an inter laboratory comparison because, in accordance with the process described here upper, this sneakily induces a bias in the estimation of the standard deviations.

Another significant example of artefact is the cases when the assigned value is located at the limit of detection of the devices of measurement (for example determination of traces in chemical analysis) for which some participants express their results in the shape of “<0,001”. The computations of bias and of scatter are then obviously modified. The standard ISO 13528 recommends to organisers of inter laboratory comparisons not to accept results expressed in this shape.

Beyond the fulfilling of the requirements of the standards ISO/IEC 17043 and ISO 13528 by the organiser of the inter laboratory comparisons, participants shall carefully study and interpret the reports of inter laboratory comparisons, and particularly the meaning of the provided standard deviations.

At last, as in all cases of assessment results, the alert signals given at the occasion of the participations to inter laboratory comparisons are subject to two risks, related to so called α and β risks in statistical tests:

The α (alpha) risk to trigger an alert in spite that the results of the laboratory are part of the population of the results obtained by those who fulfilled the requirements of the test methods;
The β (beta) risk not to trigger an alert in spite that the results are not part of population of the results obtained by those who fulfilled the requirements of the test methods.

The α risk is related to the case of a laboratory that implements the test method in accordance with the requirements by competent personnel, uses appropriate equipment and consumables in proper environmental conditions. However, an unfortunate combination of acceptable deviations occurred to produce a result which bias is significant. This combination has got a probability α to happen which low but not zero. This α value is the one used in statistical implemented by the organiser during the inter laboratory comparison. As mentioned upper, the values implicitly recommended for α by the standard ISO 13528 are 5% for the warning signal and 0,3% for the action signal.

The β is related to the case of a laboratory that shows one or several significant deviations in its implementation of its test method and/or in the competence of its personnel and/or in the state of its equipment and/or in the adequacy of the consumables it uses and/or in the environmental conditions in which it carries out the tests. However, on the day on which the tests for the inter laboratory comparison took place, a fortunate combination of these deviations happened, so that no alert signal was triggered for this the participant. The corresponding probability β is not easy to compute but it is obviously not zero and even, is important in some cases. It depends mainly :

On the number of participants, when the assigned values are estimated from the results of the participants (see § 5 and § 6 of the abstract of the ISO 13528). It is obvious that the greater is the number of participants, the more accurate is the estimation of the assigned values, the more the statistical tests are discriminant. In the limit cases where the number of participants is reduced to 2 and the usual values are selected for the α risk, the computation can never lead to an alert, whatever the results of participants are: the β risk is then equal to 1;
On the importance of the deviations of the laboratory: the deviations having a very important influence on the test results have got a higher probability to trigger an alert signal than a significant but less important deviation.

In conclusion, the absence of alert does not necessarily imply the absence of deviation and the presence of an alert is a presumption but not a certainty of presence of a deviation to be corrected by the laboratory.

In addition to the parameters of bias and of scatter, the organiser may ask the participants to provide the uncertainties attached to their results. This information may be used to compute scores of the type z’, in which the own uncertainty of the participant is taken into account beside the reproducibility. The corresponding scores are then less severe for the laboratories. The ISO 13528 standard describes several types of scores with, for each of them, a version that takes into account the uncertainty.

It must be stressed out that for some test methods, the estimation of uncertainties is complex and unreliable. These laboratories may then be lead to underestimate their uncertainties. In those cases, taking into account the uncertainties in the inter laboratory comparison leads to reduce the α risk, but to increase the β risk.

Scores for assessment of the individual results

The assessment of the individual results is generally carried out using scores (see upper). The scores and the corresponding limits are more or less efficient according to the method used to compute them (cf. particularly the choice of a law of distribution). The choice of this method is mainly aimed to assure the reliability of the assessment results (see upper). This criteria may lead to eliminate some computation methods more efficient but less adapted to the case (cf. for example the cases where we refrain to use the normal distribution law because it does not fit the population of results).

It follows that for a same set of data, an signal of alert is triggered or not for the laboratory, according to the statistical method used.

Meaning of alerts

The results of a proficiency test are valid only for the product and the test method involved in the inter laboratory comparison.

In accordance with the standards ISO/IEC 17043 and ISO 13528, it shall be reminded that abnormal results may be obtained even in laboratories having showing good practice, with skilled personal. For this reason, criteria given in these standards and adopted in this ILC shall not be used to condemn a participant.

The triggering of an alert means that the results are so exceptional (cf. α risks here upper) that they claim investigation. Usual causes for deviation are:

Product submitted to ILC different from those usually handled by the laboratory;
Mistake in input of data necessary to perform the tests, or in computation or in reporting of the results;
Unsatisfactory equipment;
Unsatisfactory consumables;
Unsatisfactorily skilled personnel;
Deviation from the prescribed method;
Combination of minor causes.

These causes may be at the reason of a bias and/or a lack of repeatability.

A corrective action should be considered with regard to the results of this investigation. With regard to the upper, this could be:

Improvement of the accuracy of the definition of the field of competence of the laboratory;
Improvement of the procedure of input of data, of computation or of reporting of the results, of the check of these operations;
Repair or replacement of unsatisfactory equipment, improvement of the maintenance procedure of equipment;
Improvement of the requirements and/or check procedure for procurement of consumables;
Improvement of the requirements for personnel and corresponding training;
Identification and suppression of deviations to the method;
Implementation by the laboratory of requirements more severe than those of the standard describing the test method (applicable when the detected bias comes from a combination of minor deviations).

The investigation may demonstrate that no substantial cause of deviation of the results is present. The laboratory may then decide not to implement any corrective action immediately and put its testing process under surveillance for a certain period of time. The standard ISO/IEC 17043 § 9.5 proposes several corresponding possibilities, in addition to the participation to an inter laboratory comparison:

Check with material reference;
Remake of tests;
Correlation of results.

This surveillance of the test process shall make possible to decide whether the cause of the deviation was not found or the absence of a substantial cause of deviation is made certain.

Conclusion: the results of an inter laboratory comparison shall be used as alerts that shall trigger a search of possible causes of deviation and, as far as necessary, appropriate corresponding corrective actions.

More information about ILCs

More details concerning: What inter laboratory comparisons are proposed

Click here to register now