Keywords
INTRODUCTION
One of the main limitations of stress echocardiography is its variability. Hoffmann's first study, although carried out with fundamental imaging and without uniform reading criteria, found only low agreement in the interpretation of dobutamine stress echocardiography.1 This improved in a subsequent study by the same author when using harmonic imaging and uniform reading criteria.2
However, and surprisingly, although exercise echocardiography (EE) is the oldest,3 most sensitive and safest4,5 method of administering stress, as well as being the most widely used,6 no study has been done to investigate intercenter agreement using this technique. Thus, the purpose of this study was to evaluate: a) intercenter agreement on EE, and b) the sensitivity, specificity, and diagnostic accuracy of the technique under blinded conditions.
PATIENTS AND METHODS
Six centers participated in the study, each having broad experience with stress echocardiography and, in particular, with EE (having carried out between 1000 and 7000 EE). Each of the 6 centers sent 25 study results. Of these, 15 were positive or negative EE studies on consecutive patients undergoing coronary angiography within 3 months of EE; and the other 10 studies were on non-diabetic patients, also consecutive, asymptomatic or with non-coronary chest pain and with a <10% pretest probability of coronary artery disease (CAD) according to sex, age, and risk factors.7 Thus, each center evaluated 150 cases: 125 under blinded conditions (data from other centers) and 25 from their own center with knowledge of the clinical data.
State-of-the-art equipment was used with second harmonic imaging and stress digitalization packs (Sonos-5500, Philips, used by 4 centers and Vivid-5, GE, used by 2 centers). Each study was sent to the coordinating center on optical disk, which then re-distributed them to the other centers either in the same format or on video tape, depending on each center's capabilities. Apical 4- and 2-chamber and parasternal long-axis and short-axis views were compared, at rest and under stress in quad-screen format.
Reading Criteria
Uniform reading criteria8 were used. A positive EE was defined when there was at least 1 abnormal segment at rest or under stress, or tardokinesia in the event that there were no alterations in conduction, and negative EE when no segment was abnormal at rest or under stress, or there was hypokinesia isolated from the posterobasal and/or septobasal segment, unless accompanied by dyssynergy in one adjacent segment.
Each center categorized every positive result as necrosis (regional alteration in wall motion that persisted or improved with stress), ischemia (alteration in wall motion with stress), ischemia plus necrosis in the same territory (alteration in baseline wall motion that worsened in the same territory with stress), or ischemia at a distance (alteration in wall motion in 1 or more territories at baseline, with the appearance of new alterations in wall motion in a different territory with stress). Wall motion score index at rest and under stress was calculated in each reading by dividing the left ventricle into 16 segments.9 The territories affected in each study were determined according to whether they were dependent on the left anterior descending coronary artery (LAD), circumflex artery (Cx), right coronary artery (RC), or a combination of them.
In addition, each center objectively and subjectively assessed the quality of each study. A segment quality score was used for the objective assessment where a score of 3 was assigned to each segment with good visibility (thickness and displacement), 2 to those with fair visibility, 1 to those with poor visibility, and 0 to the non-visible. For the subjective assessment, each study was qualified as good, fair, poor, or non-interpretable.
Statistical Analysis
The SPSS 12.0 statistical package was used. Continuous variables are presented as mean±SD. Discrete variables are presented as percentages. Comparisons between patients with and without CAD were done via χ² test for discrete variables and Student t test for continuous variables. Agreement between 2 centers was estimated by the percentage agreement (negative or positive EE) found after analyzing studies from other centers without including the cases of the centers themselves (150-50 cases=100 cases). The percentage agreement and kappa coefficients (κ) (proportion of agreement higher than that due to chance) were as follows: a κ coefficient between 0 and 0.20 was considered very low; between 0.21 and 0.40, low; between 0.41 and 0.60, moderate; between 0.61 and 0.80, good; and between 0.80 and 1.0, excellent.10 The sensitivity, specificity, and diagnostic accuracy for each center were calculated by the centers assessing their own cases, as well as by blinded assessment of the other centers'cases. Sensitivity was defined as the percentage of cases with positive EE among patients with significant coronary stenosis in at least 1 vessel. Specificity was defined as the percentage of cases with negative EE among patients without angiographically demonstrated coronary lesions or with a low pretest probability. Diagnostic accuracy was defined as the percentage of successes (cases with positive EE and CAD, plus cases with negative EE and absence of CAD) from total patients.
RESULTS
One hundred and forty-nine studies were available for analysis (1 study was excluded due to poor images). Contrast agents were used for left ventricular opacification in 9 studies (6%) and the stress study was done with peak stress imaging in 124 cases (83%).
Baseline Clinical Characteristics
and Response to Stress
Significant CAD was found in 58 patients (39%) as defined by stenosis ≥50% in ≥1 coronary artery, main branch, or coronary artery bypass graft, whereas 91 patients (61%) had angiographically demonstrated non-significant CAD (n=37), or low pretest probability according to the previous definition (n=54). There was 1-vessel disease in 24 patients with CAD, 2-vessel disease in 18, and 3-vessel disease in 16. The LAD was stenosed in 40 patients, the RC in 39 and the Cx in 29. Table 1 shows baseline clinical characteristics, medication, and baseline electrocardiogram (ECG) data in patients with and without CAD. Table 2 shows data on response to stress in patients with and without CAD.
Image Quality
The subjective assessment of the quality of the studies differed significantly between the different centers. Some centers described a high percentage of studies as good (≥80% of studies), whereas others only considered less than half the cases as good and between 0 and 8% as non-interpretable (Figure 1). The same differences were found when the different centers calculated the quality of the segment wall motion score (Figure 2). In general, the centers that qualified the others as worse tended to have better quality images according to the other centers.
Figure 1. Percentage of studies qualified as good, fair, poor, and non-interpretable according to the different centers.
Figure 2. Scoring of quality of studies from other centers according to the referring center (light columns) and scoring of quality of studies from each center according to the other centers (dark columns).
Agreement
Four or more of the 5 centers that assessed each case under blinded conditions agreed on a positive diagnosis of CAD in 51 patients and on a negative diagnosis in 65 patients, which means that there was agreement on a total of 116 of the 149 patients (78%). There was agreement regarding a positive or negative diagnosis of CAD in 4.1±0.9 centers out of the 5 centers. There was a mean κ coefficient of 0.48 between the different centers, with mean intercenter κ coefficients ranging from 0.45 to 0.52. The percentage agreement and the κ coefficients in different scenarios are shown in the Table 3. The percentage agreement and the κ coefficient differed according to the diagnosis of regional contractility anomalies by the referring center, and the percentage agreement was greater when the referring center had detected baseline anomalies in regional contractility in a given territory, contractility anomalies at rest and/or with stress in the LAD territory, or when a worse wall motion score index with stress were reported (Table 4).
Sensitivity, Specificity, and Diagnostic Accuracy
The percentage of positive and negative readings, as well as the sensitivity, specificity, and diagnostic accuracy differed between the different centers when assessed under blinded conditions (Figure 3). There were 2 centers with high sensitivity but low specificity and 1 where the opposite occurred.
Figure 3. Sensitivity, specificity and diagnostic accuracy of each center that assessed, under blinded conditions, the cases referred by the other centers.
The mean sensitivity, specificity, and diagnostic accuracy of the 6 centers regarding stenosis ≥50% in at least 1 vessel (according to visual estimation) was 68%, 66%, and 67%, respectively. The mean sensitivity and specificity of the different centers was similar in tests which were higher or lower than submaximal (68% vs 64% and 66% vs 65%, respectively). These data contrast with the mean sensitivity, specificity and diagnostic accuracy of each center when they assessed their own cases (Figure 4). If we consider that the positive cases of coronary artery disease were those with stenosis ≥50% in at least 1 vessel or with a history of acute myocardial infarction (AMI) and baseline dyssynergy according to the diagnosis of the referring center, then the sensitivity, specificity, and diagnostic accuracy in the blinded reading were similar to those obtained with stenosis ≥50% as the only criterion: 69% (intercenter range, 53%-82%), 70% (range, 49%-89%) and 69% (range, 64%-78%). The sensitivity, specificity, and diagnostic accuracy according to the decisions of the majority (4 or more centers when 5 centers were assessing data; 3 or more when 4 centers were assessing data) were similar regarding ≥50% stenosis in one vessel and for a criterion of ≥50% stenosis or a history of AMI and baseline dyssynergy depending on the referring center (72% vs 73%; 74% vs 80%; and 73% vs 77%, respectively; P=NS):
Figure 4. Sensitivity, specificity and diagnostic accuracy of exercise echocardiography in detecting coronary lesions with ≥50% stenosis assessed under blinded conditions or with knowledge of clinical data and response to stress. The means of the centers and intercenter ranges are shown.
False Positive Readings
Out of 403 readings corresponding to 83 patients without coronary stenosis, previous AMI, or baseline dyssynergy according to the referring center, there were 124 false (+) readings corresponding to 31% of the assessments without CAD, with a wide intercenter range (11%-51%). These false (+) readings were mainly due to ascribing contractile alterations to the RC territory (36% of the readings) or to the LAD (35%), and less often to the Cx territory (10%) or to several territories (19%). The segment wall motion score index (WMSI) measured by the assessing centers in these cases was 1.1±0.2 at rest and 1.3±0.2 with stress.
False Negative Readings
There were 319 readings corresponding to 66 patients. Of those who had angiographically demonstrated coronary stenosis some had a history of AMI. In those who did not undergo coronary angiography, or where this was negative, all had a medical history of AMI and dyssynergy according to the referring center. There were 102 false (-) readings which corresponded to 32% of the assessments with CAD (intercenter range, 18%-47%). In most of these cases there was only 1-vessel disease (45%; LAD disease in 23 of them) or 2-vessel disease (34%), and on fewer occasions 3-vessel disease (9%) or disease in no vessels (13%). The referring center reported dyssynergy in 32 of these 66 patients (48%), which was severe (WMI, 1.50) in 12 of them (18%).
DISCUSSION
The main interest of this study lies in it being the first in which intercenter agreement on exercise echocardiography has been assessed. The main findings were as follows: a) the intercenter agreement on exercise echocardiography was moderate, and b) the sensitivity, specificity and diagnostic accuracy of the technique when carried out under blinded conditions were lower than those commonly reported when baseline characteristics and patient response stress are known.
Intercenter Agreement on Exercise Echocardiography
Although Hoffmann et al studied intercenter agreement on dobutamine stress echocardiography,1,2 there are no similar studies on exercise echocardiography despite being more frequently used, sensitive and safe.4,6 Low agreement was observed (κ=0.37) in Hoffmann's first study1 which was carried out with fundamental imaging and without uniform reading criteria, whereas in the second study, carried out with harmonic imaging and uniform reading criteria, agreement was moderate (κ=0.55).2 The improvement in agreement seemed to be due both to using harmonic imaging and the standardization of the reading criteria, since the degree of agreement on the same patients studied with fundamental imaging was greater than in Hoffman's first study. We used the same reading criteria as in Hoffmann's second study,2 which, in general, did not involve any change in the normal clinical practice followed in each center. It could be expected that the degree of agreement on exercise echocardiography would be less than that carried out with dobutamine, since there should be better quality images with the latter technique. However, by means of uniform reading criteria and harmonic imaging the percentage agreement was moderate, with a mean κ coefficient of 0.48, which is better than that of Hoffmann's first study and similar to the author's second study. The percentage agreement was greater in 3-vessel disease, in left anterior descending coronary artery disease, when there were baseline alterations in regional contractility and when the referring center reported dyssynergy in the LAD territory or serious dyssynergy: the fact of higher percentage agreement in these circumstances has clinical diagnostic and prognostic relevance, since the patients with these characteristics have a worse prognosis.11-13
Intercenter Agreement on Other Diagnostic Techniques
Concern over the degree of agreement is not exclusive to stress echocardiography. Different degrees of variability in interpretation have been observed with other techniques. Thus, very low levels of agreement have been reported regarding the interpretation of ST-segment elevation (κ=0.05) or ST-segment depression (κ=0.38) between 2 centers in patients with acute coronary syndrome.14 Studies on myocardial perfusion with nuclear medicine procedures also present difficulties in interpretation, since these techniques are subjective and, as in exercise echocardiography, the experience of the observer and image quality can influence the interpretation. A moderate-high agreement has been reported with thallium imaging, with κ coefficients ranging between 0.56 and 0.74 in 2 studies.15,16 However, in a multicenter study with 25 participating hospitals, agreement between different centers without uniform reading criteria was low (κ=0.27).17 In a study by Candell-Riera et al,18 good agreement was found with exercise technetium-99m tetrofosmin myocardial perfusion single-photon emission computed tomography, with κ coefficients between 0.62 and 0.70 depending on whether topographical images or polar mapping were evaluated.18 This study also found that the sensitivity of the report under blinded conditions was significantly lower than that reported when the clinical data of the patient were known.
Sensitivity, Specificity, and Diagnostic Accuracy
Although we present mean sensitivity and specificity scores for the different centers, the variability among them regarding interpretation under blinded conditions is of more interest. However, the sensitivity, specificity, and diagnostic accuracy of the technique carried out under blinded conditions were lower for each center than when they assessed their own cases and the baseline characteristics of the patients and their response to stress were known. This finding is not surprising, but it gives us an idea of the limitations of the technique when the pretest probability, hemodynamic, clinical, and ECG response to stress are not known. It is clear that EE can and should be used in clinical practice, but not under blinded conditions.
Limitations
The reading format of the studies was the same for all the centers (apical 4- and 2-chamber and parasternal long-axis and short-axis views at baseline and with stress), although the quality differed since, depending on whether the centers could read optical disks, the study was recorded on video or was sent via optical disk. However, the percentage agreement and the κ coefficients were similar for studies of optimal and suboptimal quality. The complete study recorded on video was not sent out as was done in the study by Hoffmann et al.1 This fact could lead to overestimating agreement, since the operator tends to acquire and store the images he/she considers more representative taking into account other test characteristics different from those of the image. Twenty-five percent of the patients were receiving treatment with beta-blockers and up to 40% of the tests were lower than submaximal. This fact could have led to underestimating sensitivity, although we have not found higher sensitivity in the tests that were maximal in comparison to those that were submaximal.
Study financed by the RECAVA Cardiovascular Network.
Correspondence: Dr. J.C. Peteiro.
P.o Ronda, 5, 4.o izqda. 15011 A Coruña. España.
E-mail: pete@canalejo.org
Received June 9, 2005.
Accepted for publication October 18, 2005.