| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
REVIEW |
1Department of Oral Radiology of the Academic Centre for Dentistry in Amsterdam (ACTA), Amsterdam, The Netherlands; 2Department of Medical Decision Making, University of Leiden Medical Center (LUMC), Leiden, The Netherlands
*Correspondence to: Dr PA Mileman, Academic Centre for Dentistry in Amsterdam (ACTA), Louwesweg 1, 1066 EA Amsterdam, The Netherlands. E-mail: phil.mileman{at}acta.nl
Received 24 July 2007; accepted 20 December 2007
| Abstract |
|---|
|
|
|---|
Keywords: diagnosis, radiography; evidence-based medicine; decision making; state of the art review
| Introduction |
|---|
|
|
|---|
There are numerous introductions available for evidence-based diagnosis2, 3 stemming from an inspirational publication in 1972.4 To complement them, international guidelines for reporting diagnostic research have been published,5 whilst in Europe6 and the USA evidence-based guidelines for prescribing radiographs have recently been issued. In this article we will illustrate important aspects of evidence-based dentistry as it pertains to approximal caries diagnosis. How should articles in the literature be evaluated and used in the clinic? Our approach here will be that of clinical decision analysis.7 In it, treatment decisions are determined not only by obtaining published evidence but also by being able to combine this diagnostic evidence in a rational, transparent and systematic way to optimize interventions for the benefit of the patient (Figure 1
).
|
| Diagnostic accuracy |
|---|
|
|
|---|
Diagnostic imaging procedures usually demand the interpretation of the image by an observer who is therefore part of the diagnostic sytem and as such contributes to its level of accuracy. There is considerable variation in diagnostic accuracy for dentin caries between dentists themselves when viewing bitewing radiographs.8 Feedback about their own diagnostic accuracy and how to improve it would seem part and parcel of an evidence-based approach to the use of imaging tests by dentists. Tools should be developed to enable dentists to assess their own accuracy – at least for commmon diagnostic problems such as caries,9 periodontal disease and periapical lesions.
Innovations in research on improving diagnostic accuracy for imaging tests have included that of identifying features on radiographs linked to the increased probability of a correct diagnosis and therefore of improving the prognosis after treatment for the patient. This approach may improve accuracy in diagnosing the likelihood of treatment complications occuring during, for example, wisdom tooth extraction.10 Practitioners may, however, need training in this method of feature recognition11 to be able to use diagnostic aids or "expert systems" such as Oral Radiographic Differential Diagnosis (ORAD) which have appeared on the internet.12, 13
The gold standard
A prerequisite for being able to assess the accuracy of a test is the availability of a valid "gold" or reference standard diagnosis. The appropriate gold standard should be carefully chosen14 and based on a technique other than that of the test being evaluated. It is therefore better to evaluate, for example, an imaging test by comparing it with a non-imaging gold standard test such as histology or biopsy. Otherwise, there is a danger that the estimation of the performance of the test under consideration will be inflated since both the gold standard and the index test may have the same systematic errors.
Measures of diagnostic accuracy
Articles comparing diagnostic tests often use and calculate a broad spectrum of measures of diagnostic performance.15–17 Table 1
provides an overview, with illustrative calculations of a simple dichotomous test.
|
In publications evaluating tests, because of the relationship between sensitivity and specificity, these two measures viewed in isolation provide an incomplete picture of the performance of a diagnostic test. This has resulted in a plethora of accuracy measures being used in diagnostic publications. Overall measures which combine sensitivity and specificity are, for example, likelihood ratios and the diagnostic odds ratio (Table 1
).19, 20 Another important reason to consider measures other than sensitivity and specificity is that they answer the wrong question, that is, "What is the probability of a certain test result, given the absence or presence of disease?" A clinically more important question would be, "What is the probability of disease, given a positive or negative test result?" Superficially the answer might seem to be provided precisely by two other measures: the positive and negative predictive values of a test (Table 1
).17 However, a publication concerning a diagnostic test using predictive values is only comparable with another publication about a patient population with exactly the same prevalences of disease as that reported in the original article. To compare articles about a diagnostic test using patients from populations with different prevalences of disease additional calculations are needed. Predictive values are therefore poor for comparing research publications about diagnostic tests because differences in the measures can arise due to differences in both diagnostic accuracy and the underlying prevalence of disease.
| Obtaining diagnostic "evidence" from the literature |
|---|
|
|
|---|
Systematically summarising the evidence of the accuracy of diagnostic tests has had a number of problems: literature on diagnostic test evaluation has been difficult to identify, sensitivity and specificity need to be analysed together and published studies have often used heterogenous populations where the prevalence of the target disease was not reported.19 Publications without explicit inclusion of frequency data meant that summarising the accuracy of different tests in a meta-analysis was impossible.19 Because of the inadequacies of reports in the past, there are now guidelines for readers for judging the quality of articles about diagnostic tests15 and for writers of articles about diagnostic accuracy: Standards for Reporting of Diagnostic Accuracy, the so-called STARD initiative.5 Using these guidelines, diagnostic articles can be searched for in databases such as Medline by using PubMed and meta-analysis summaries can be made.19
Diagnostic tests in dentistry for which there are published measures of accuracy include oral examination, patient features, dental history and complaints, electric and other forms of vitality testing for pulp necrosis, digital and film radiographic imaging techniques for approximal and occlusal caries, periodontal and periapical disease, and pocket probing for periodontal defects.21
Guidelines for evaluating publications on diagnostic tests
Identifying diagnostic publications has been made easier recently as a result of guidelines advising authors to use the keywords "sensitivity and specificity" or "accuracy" in their publications. There is also now a specialized search engine using a filter for articles found in PubMed, SUMSEARCH,22 which can be used for initial searches of the literature for diagnostic test literature in which the articles found are ranked according to criteria for the strength of evidence. For example, in a search using the key words "radiography" and "dental caries", 695 articles were identified. When the diagnostic filter of the search engine (using the keywords "sensitivity and specificity") was used, this was reduced to 176, 7 of which were reported as "probably systematic reviews".
Quality criteria should be used when reviewing the literature17, 19 to judge the validity of the study, the manner of presenting results and whether the conclusions can be applied to help dental practitioners in caring for their patients. According to the criteria cited, a diagnostic study should report results of a double blind prospective independent comparison of the "index" test with a valid gold standard reference test for actual pathology. The study should report the cut-off point used for the diagnosis of pathology for the index and reference test, the prevalence and spectrum of disease, previous tests and referrals, and about patient demographics. The results should be reported in the form of a frequency table so that likelihood ratios for the index test can be calculated (Table 1
). The reproducibility and accuracy of interpretation of the test by the practitioner in general practice should be comparable with that in the article reporting the test. Furthermore, the results of the test should be applicable to patients seen in general practice, change the management of the patients and improve their overall health status. However, in a recent systematic review of the use of bitewing radiography compared with panoramic radiography as a test for caries diagnosis,23 only five publications were found of a high enough standard to answer the question that the study had posed. Insufficient evidence to support the use of panoramic radiographs for the diagnostic task was found. The authors concluded it was not possible to combine the results in a meta-analysis because there were too many differences between the study populations and the reference tests used. This conclusion is echoed in other systematic reviews of diagnosis in dentistry.24
Summarising and comparing accuracy of diagnostic performance
Meta-analyses of diagnostic literature are carried out in a similar way to those for therapeutic studies.25 A previously developed protocol is used for retrieving the pertinent documents and extracting and combining the accuracy data, which are aggregated using a weighted quality scoring system from the published literature.5 Specifically for meta-analyses of diagnostic tests is the problem that sensitivity and specificity are correlated, and therefore pooling of estimates across different studies may render biased estimates of a diagnostic test's performance. For this reason, the log diagnostic odds ratio (DOR) has been advocated for summarising results in meta-analyses (Table 1
).19, 20 A more explicit way to take into account the relationship between sensitivity and specificity is the summary ROC (SROC) method.19 In diagnostic imaging, the cut-off points or thresholds along the ROC curve usually represent different degrees of certainty with which a diagnosis of pathology was made. Different outcomes in studies of diagnostic accuracy between publications originate, in part, from using these different thresholds. The SROC-method for summarising data, however, considers the data from each scientific publication as originating from the same ROC curve whilst taking into account the possibility that the data points represent different thresholds.
| From prevalence of disease to clinical decision making |
|---|
|
|
|---|
Estimating prevalence of disease
A prerequisite for decision analysis is to have an estimate of the initial probability of pathology, taking into account the patient's clinical characteristics, signs and symptoms. Initial sources of information about chances of pathology for the dentist will be memories based on previous experience of other patients, but these memories may be selective. Another primary source of evidence is the scientific epidemiological literature about the average prevalence in the population. However, in the dentist's surgery, patients presenting and suspected of having, for example, periapical pathology may have complaints, other features of disease such as discolouration of the tooth in question or the suspected tooth may have been crowned. These factors together with the clinical examination will all modify the chances of periapical pathology actually being present. Vitality testing will further modify this probability of pathology even before radiographs are considered as an additional diagnostic test.
Recalculating the chance of disease following a diagnostic test
A critical determinant of the value of a diagnostic test is how the test result changes the probability that the patient has the disease. In other words, is there a sufficient reduction in uncertainty as a result of the test about whether or not the patient has a pathological condition to make a decision about therapy? The pre-test probability (prevalence or a priori probability) of disease for patients in the waiting room can be used together with the likelihood ratios of the test to calculate the post-test (a posteriori chance) probability of disease given the test result. These calculations can best be illustrated by making use of modified chances in the form of odds. Odds are related to probabilities (and therefore prevalence) by the following formulae: odds = probability/(1–probability) or prevalence/(1–prevalence) and probability = odds/(1+odds). The odds is the relationship between the chance of an event occurring and of it not occurring, and is famously used in the Anglo-Saxon world for betting. The odds of disease, before and after the test result is known, are related by the following formula: post-test odds = likelihood ratio x pre-test odds.
This formula is called Bayes' theorem and is attributed to an English clergyman, the Reverend Thomas Bayes (1702–1761). The likelihood ratio used should be either the positive or negative likelihood ratio, depending on whether we want to calculate the probability that disease is present or absent. The respective likelihood ratios are larger or smaller than one, so that a positive test result increases the odds (and therefore the chance) of disease and a negative test result reduces the odds of disease presence.
Here we will use, as an example, a prevalence of 0.43 proximal dentin caries and a positive likelihood ratio of 24 (Table 1
). After a positive test result, the post-test probability of dentin caries can be calculated as follows:
This means that prior to the bitewing radiograph used as a test for dentin caries the chance of pathology was 0.43, but after a positive test the probability had increased to 0.948. This test has therefore substantially increased the certainty with which the diagnosis of pathology can be made. By definition the post-test probability in this example is the same as the positive predictive value. However, if the prevalence had been 0.02, the post-test probability in our example would be:
With the same accuracy of diagnostic test, a lower prior probabilty of disease means that the post-test probability is much lower, and we are now much less sure that pathology is present after a positive test result. The probability of pathology after a positive test given the measures of accuracy in Table 1
also can be arrived at with the help of calculators on the internet (for example, the EBP calculator on http://sumsearch.uthscsa.edu/ or http://araw.mede.uic.edu/cgi-alansz/testcalc.pl/).
The threshold approach to diagnostic testing
Bayes' theorem allows the post-test probability of disease to be calculated once the result of the test is known. If the pre-test probability is sufficiently low, the post-test probability may still not be sufficiently high to justify instigating treatment despite a positive test result (see the second calculation in the previous paragraph). Similarly, the pre-test probability can be so high that even after a negative test result it would be irresponsible not to instigate therapy. In short, at very high and low prevalences, initiating treatment on the basis of the test result may lead to deterioration in the health of the patient. This means it is in the interests of the patient not to undergo a test.
Consider the test described in Table 1
, with a sensitivity of 40% and a specificity of 98%. Furthermore, assume that a realistic prevalence of approximal dentin caries in the patient is 2%. Then, treatment based on the outcome of the test will result in 3.2% of decisions being incorrect (60% of 2% results in 1.2% FN errors and 2% of 98%
2% FP errors), whereas refraining from testing and treatment would only render 2% of decisions incorrect (all FN). In fact, up to a prevalence of 5%, testing leads to more incorrect treatment decisions than refraining from testing and therefore treatment. Conversely for prevalences above 62%, due to the far from perfect 40% sensitivity, deciding on treatment based on the results of testing would lead to more incorrect decisions than instigating treatment without testing. Only in the intermediate range of prevalence, from 5–62%, is a better chance of a correct decision obtained by testing and initiating treatment after a positive test result.
Unless a diagnostic test has perfect sensitivity or perfect specificity, watchful waiting is the best option if the prevalence is sufficiently low and directly proceeding to treatment is the best strategy if the prevalence is sufficiently high. The prevalence above which it is better to test instead of wait (5% in the example above) is called the "test threshold". The prevalence above which it is better to treat without testing first (62% in the example above) is called the "test–treatment threshold".2, 16 Even if the exact values of the test and treatment thresholds are unknown, it is still important to be conscious of their existence. Testing is often considered a safe option, without the realization that an imperfect test will result in loss of health for the patient due to the FP and FN results of diagnosis and treatment. For example, one article reported diagnostic accuracy using radiographs for periapical lesions as sensitivity 70% and specificity 77%.21 Using this test at a prevalence of lesions up to 25% would result in more lesions being incorrectly diagnosed than correctly diagnosed and treated.
The pre-test probability of disease will be modified by the patient work-up, including referral of patients and use of patient selection factors recommended in international guidelines for prescribing radiographs.6 For example, the selection factor "presence of anterior caries or restorations" could put children in such a high caries-risk group that the test threshold for screening bitewing radiographs might be exceeded. Similarly, it may mean that a clinical work-up including periodontal probing will push the chance of moderate periodontal disease over the test–treatment threshold so that additional radiographs – irrespective of the findings – will not provide information which would change the anticipated management of the patient. The answer to the question, "Will my patient benefit from bitewing radiographs?" will therefore not only depend on the prevalence of disease, the use of selection factors, the accuracy of the dentist in diagnosing radiographs and the radiographic (film or digital) diagnostic technique used, but also on the patient's valuation of the desirable and undesirable outcomes of diagnosis and therapy.
Measuring patient values
When using an imperfect test, errors occur in diagnosis, which means dentists will be missing lesions when they are present (FN errors) and/or "finding" lesions when they are not (FP errors). The threshold approach described above, in which test and test-treatment thresholds were defined, gave equal weight to both types of error. However, the health states resulting from these two types of error may be valued differently by the patient. Figure 2
illustrates one method by which the numerical values for possible health outcome states (utilities) can be elicited.26 Respondents are asked to indicate the value of a possible health outcome of diagnosis and therapy between best and worst possible outcomes on a visual analogue scale (valued at 100 and 0 respectively). The best outcome is given by a true negative (TN) decision. Realistically from the patient's point of view after the treatment decision the value of the true positive (TP) and FP treatment outcomes might be considered equivalent. However, fourth year dental students consider the value of the outcome of a FP decision significantly lower than that of a TP decision (utility of FP 36 and TP 78). In the limited number of studies in dentistry on measuring utilities, dentists vary in their values26 and generally seem to value treatment outcomes more highly than patients do.27 This is important because the value of diagnostic testing depends in part on the value attached to the health states of the possible outcomes involved.
|
We demonstrate in Figure 3
how a typical diagnostic therapeutic problem can be analysed using a decision tree. Three possible strategies are compared: use of a bitewing radiograph to diagnose if dentin caries is present in an approximal tooth surface which therefore requires restorative treatment, and the other hypothetical options of "watchful waiting" (without testing or treating) and treating all cases (without testing). Each pathway in the decision tree has its own probability. For example, the probability of a TP decision after a radiograph is the product of the prevalence and the sensitivity of the test. When, without testing, treatment is the strategy, the probability of a TP decision is equal to the prevalence of disease.
|
Figure 4
shows a so-called sensitivity analysis, where we have recalculated the expected utility of the three strategies for a prior probability varying between 0% and 100%. This sensitivity analysis takes into account the utilities of the outcomes so that the two thresholds (Figure 4
), the test threshold and the test–treatment threshold, can be seen. Watchful waiting is the optimal strategy up to a prevalence of 4% (test threshold); treatment following positive diagnosis of dentin caries on a bitewing radiograph is the best strategy between 4% and 57% (the test–treatment threshold) and instigating treatment for all cases without testing is optimal above 57%.
|
| Implications for dental practice and student education |
|---|
|
|
|---|
Scientific publications concerning diagnostic research have a number of additional complications when compared with those on therapeutic research. Randomized controlled clinical trials of diagnosis may be difficult to carry out because omitting the use of a valid test or treatment option in the control arm could be seen as unethical. The relationship between the sensitivity and specificity of a test means that summarising the results of tests in meta-analysis has lead to problems, although these may be resolved by improved data summarising techniques20 and the adoption of new guidelines for reporting diagnostic studies.
The benefits of diagnosis are dependent on the trade off between FP and FN decisions. The optimal balance between these two types of error depends not only on the accuracy of the diagnostic test used and the prevalence of disease but on how serious the patient considers these errors in outcome to be. When the chance of pathology falls below a certain threshold, the use of a diagnostic test will, because of unnecessary treatment, lead to deterioration in the health of the patient. The use of validated guidelines in dental radiography should, however, lead to a selection of patients who will actually benefit from a diagnostic examination.
In conclusion, dentists need to know the factors that play a role in determining the chance of a correct diagnosis and subsequent treatment decision making (Figure 1
). More than ever, they need to gain insight into their own diagnostic accuracy for common pathological conditions in order – where necessary – to be able to improve in diagnosis and to be able to interpret the relevance of the literature on diagnostic testing for their dental practice. In dental education there are already various computer programmes available to help the coming generation of dentists in this task.9 The further development of such programmes and their availability on the internet for dentists is likely. Finally more research into the values that patients place on the results of interventions is essential for improving the evidence base of diagnosis in dentistry. These developments in evidence-based diagnosis, aimed at improving patient health, should not pass by the student of dentistry or the dental practitioner unnoticed.
| Acknowledgments |
|---|
PA Mileman, Hout WB van der. Evidence-based diagnostiek en klinische besluitvorming. Ned Tijdschr Tandheelkd 2007; 114: 187–194.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |