ISSN: 1885-5857 Impact factor 2023 7.2
Vol. 59. Num. 1.
Pages 33-40 (January 2006)

Agreement Between Centers on the Interpretation of Exercise Echocardiography

Concordancia intercentros en la interpretación de la ecocardiografía de ejercicio

Jesús PeteiroaÁngel M AlonsobRafael FlorencianocCarlos González JuanateydGonzalo de la MorenacIgnacio IglesiaseMar MorenofMiguel A Rodrígueze

Options

Introduction and objectives. A low-to-moderate level of agreement on the interpretation of dobutamine echocardiography has been reported, but there are no similar findings on exercise echocardiography. The objectives of this study were to assess the level of agreement between centers on the use of exercise echocardiography and to evaluate the accuracy of the technique when used in a blinded manner. Patients and method. Six institutions with experience in exercise echocardiography each sent 25 study results to the other centers. Of these, 15 were positive or negative studies on consecutive patients undergoing coronary angiography, and 10 were on non-diabetic patients who had non-coronary chest pain or were asymptomatic and whose pretest probability of coronary artery disease was < 10%. Each institution evaluated 150 studies: 125 blinded and 25 of their own with knowledge of clinical data. Results. For 116 patients (78%), four or more of the five centers blindly evaluating each study agreed with the positive or negative result. The average kappa coefficient was 0.48 (intercenter range 0.45-0.52). The percentage agreement was higher with three-vessel disease (93%, range 85%-95%), with left anterior descending coronary artery disease (83%, range 80%-86%), and when the referring institution reported baseline dyssynergy (86%, range 82%-90%), dyssynergy in left anterior descending coronary artery territory (81%, range 76%-84%), or a peak wall motion score index >1.50 (88%, range 85%-90%). When the technique was used blinded to detect ≥50% coronary narrowing in ≥1 vessel, its sensitivity, specificity and accuracy were 68%, 66% and 67%, respectively, with wide variability between centers. Conclusions. There was moderate agreement between centers on the interpretation of exercise echocardiography. When used blinded, the technique's accuracy was lower than that reported when clinical data is known.

Keywords

Exercise echocardiography
Intercenter agreement
Accuracy

INTRODUCTION

One of the main limitations of stress echocardiography is its variability. Hoffmann's first study, although carried out with fundamental imaging and without uniform reading criteria, found only low agreement in the interpretation of dobutamine stress echocardiography.1 This improved in a subsequent study by the same author when using harmonic imaging and uniform reading criteria.2

However, and surprisingly, although exercise echocardiography (EE) is the oldest,3 most sensitive and safest4,5 method of administering stress, as well as being the most widely used,6 no study has been done to investigate intercenter agreement using this technique. Thus, the purpose of this study was to evaluate: a) intercenter agreement on EE, and b) the sensitivity, specificity, and diagnostic accuracy of the technique under blinded conditions.

PATIENTS AND METHODS

Six centers participated in the study, each having broad experience with stress echocardiography and, in particular, with EE (having carried out between 1000 and 7000 EE). Each of the 6 centers sent 25 study results. Of these, 15 were positive or negative EE studies on consecutive patients undergoing coronary angiography within 3 months of EE; and the other 10 studies were on non-diabetic patients, also consecutive, asymptomatic or with non-coronary chest pain and with a <10% pretest probability of coronary artery disease (CAD) according to sex, age, and risk factors.7 Thus, each center evaluated 150 cases: 125 under blinded conditions (data from other centers) and 25 from their own center with knowledge of the clinical data.

State-of-the-art equipment was used with second harmonic imaging and stress digitalization packs (Sonos-5500, Philips, used by 4 centers and Vivid-5, GE, used by 2 centers). Each study was sent to the coordinating center on optical disk, which then re-distributed them to the other centers either in the same format or on video tape, depending on each center's capabilities. Apical 4- and 2-chamber and parasternal long-axis and short-axis views were compared, at rest and under stress in quad-screen format.

Reading Criteria

Uniform reading criteria8 were used. A positive EE was defined when there was at least 1 abnormal segment at rest or under stress, or tardokinesia in the event that there were no alterations in conduction, and negative EE when no segment was abnormal at rest or under stress, or there was hypokinesia isolated from the posterobasal and/or septobasal segment, unless accompanied by dyssynergy in one adjacent segment.

Each center categorized every positive result as necrosis (regional alteration in wall motion that persisted or improved with stress), ischemia (alteration in wall motion with stress), ischemia plus necrosis in the same territory (alteration in baseline wall motion that worsened in the same territory with stress), or ischemia at a distance (alteration in wall motion in 1 or more territories at baseline, with the appearance of new alterations in wall motion in a different territory with stress). Wall motion score index at rest and under stress was calculated in each reading by dividing the left ventricle into 16 segments.9 The territories affected in each study were determined according to whether they were dependent on the left anterior descending coronary artery (LAD), circumflex artery (Cx), right coronary artery (RC), or a combination of them.

In addition, each center objectively and subjectively assessed the quality of each study. A segment quality score was used for the objective assessment where a score of 3 was assigned to each segment with good visibility (thickness and displacement), 2 to those with fair visibility, 1 to those with poor visibility, and 0 to the non-visible. For the subjective assessment, each study was qualified as good, fair, poor, or non-interpretable.

Statistical Analysis

The SPSS 12.0 statistical package was used. Continuous variables are presented as mean±SD. Discrete variables are presented as percentages. Comparisons between patients with and without CAD were done via χ² test for discrete variables and Student t test for continuous variables. Agreement between 2 centers was estimated by the percentage agreement (negative or positive EE) found after analyzing studies from other centers without including the cases of the centers themselves (150-50 cases=100 cases). The percentage agreement and kappa coefficients (κ) (proportion of agreement higher than that due to chance) were as follows: a κ coefficient between 0 and 0.20 was considered very low; between 0.21 and 0.40, low; between 0.41 and 0.60, moderate; between 0.61 and 0.80, good; and between 0.80 and 1.0, excellent.10 The sensitivity, specificity, and diagnostic accuracy for each center were calculated by the centers assessing their own cases, as well as by blinded assessment of the other centers'cases. Sensitivity was defined as the percentage of cases with positive EE among patients with significant coronary stenosis in at least 1 vessel. Specificity was defined as the percentage of cases with negative EE among patients without angiographically demonstrated coronary lesions or with a low pretest probability. Diagnostic accuracy was defined as the percentage of successes (cases with positive EE and CAD, plus cases with negative EE and absence of CAD) from total patients.

RESULTS

One hundred and forty-nine studies were available for analysis (1 study was excluded due to poor images). Contrast agents were used for left ventricular opacification in 9 studies (6%) and the stress study was done with peak stress imaging in 124 cases (83%).

Baseline Clinical Characteristics
and Response to Stress

Significant CAD was found in 58 patients (39%) as defined by stenosis ≥50% in ≥1 coronary artery, main branch, or coronary artery bypass graft, whereas 91 patients (61%) had angiographically demonstrated non-significant CAD (n=37), or low pretest probability according to the previous definition (n=54). There was 1-vessel disease in 24 patients with CAD, 2-vessel disease in 18, and 3-vessel disease in 16. The LAD was stenosed in 40 patients, the RC in 39 and the Cx in 29. Table 1 shows baseline clinical characteristics, medication, and baseline electrocardiogram (ECG) data in patients with and without CAD. Table 2 shows data on response to stress in patients with and without CAD.

Image Quality

The subjective assessment of the quality of the studies differed significantly between the different centers. Some centers described a high percentage of studies as good (≥80% of studies), whereas others only considered less than half the cases as good and between 0 and 8% as non-interpretable (Figure 1). The same differences were found when the different centers calculated the quality of the segment wall motion score (Figure 2). In general, the centers that qualified the others as worse tended to have better quality images according to the other centers.

Figure 1. Percentage of studies qualified as good, fair, poor, and non-interpretable according to the different centers.

Figure 2. Scoring of quality of studies from other centers according to the referring center (light columns) and scoring of quality of studies from each center according to the other centers (dark columns).

Agreement

Four or more of the 5 centers that assessed each case under blinded conditions agreed on a positive diagnosis of CAD in 51 patients and on a negative diagnosis in 65 patients, which means that there was agreement on a total of 116 of the 149 patients (78%). There was agreement regarding a positive or negative diagnosis of CAD in 4.1±0.9 centers out of the 5 centers. There was a mean κ coefficient of 0.48 between the different centers, with mean intercenter κ coefficients ranging from 0.45 to 0.52. The percentage agreement and the κ coefficients in different scenarios are shown in the Table 3. The percentage agreement and the κ coefficient differed according to the diagnosis of regional contractility anomalies by the referring center, and the percentage agreement was greater when the referring center had detected baseline anomalies in regional contractility in a given territory, contractility anomalies at rest and/or with stress in the LAD territory, or when a worse wall motion score index with stress were reported (Table 4).

Sensitivity, Specificity, and Diagnostic Accuracy

The percentage of positive and negative readings, as well as the sensitivity, specificity, and diagnostic accuracy differed between the different centers when assessed under blinded conditions (Figure 3). There were 2 centers with high sensitivity but low specificity and 1 where the opposite occurred.

Figure 3. Sensitivity, specificity and diagnostic accuracy of each center that assessed, under blinded conditions, the cases referred by the other centers.

The mean sensitivity, specificity, and diagnostic accuracy of the 6 centers regarding stenosis ≥50% in at least 1 vessel (according to visual estimation) was 68%, 66%, and 67%, respectively. The mean sensitivity and specificity of the different centers was similar in tests which were higher or lower than submaximal (68% vs 64% and 66% vs 65%, respectively). These data contrast with the mean sensitivity, specificity and diagnostic accuracy of each center when they assessed their own cases (Figure 4). If we consider that the positive cases of coronary artery disease were those with stenosis ≥50% in at least 1 vessel or with a history of acute myocardial infarction (AMI) and baseline dyssynergy according to the diagnosis of the referring center, then the sensitivity, specificity, and diagnostic accuracy in the blinded reading were similar to those obtained with stenosis ≥50% as the only criterion: 69% (intercenter range, 53%-82%), 70% (range, 49%-89%) and 69% (range, 64%-78%). The sensitivity, specificity, and diagnostic accuracy according to the decisions of the majority (4 or more centers when 5 centers were assessing data; 3 or more when 4 centers were assessing data) were similar regarding ≥50% stenosis in one vessel and for a criterion of ≥50% stenosis or a history of AMI and baseline dyssynergy depending on the referring center (72% vs 73%; 74% vs 80%; and 73% vs 77%, respectively; P=NS):

 

Figure 4. Sensitivity, specificity and diagnostic accuracy of exercise echocardiography in detecting coronary lesions with ≥50% stenosis assessed under blinded conditions or with knowledge of clinical data and response to stress. The means of the centers and intercenter ranges are shown.

False Positive Readings

Out of 403 readings corresponding to 83 patients without coronary stenosis, previous AMI, or baseline dyssynergy according to the referring center, there were 124 false (+) readings corresponding to 31% of the assessments without CAD, with a wide intercenter range (11%-51%). These false (+) readings were mainly due to ascribing contractile alterations to the RC territory (36% of the readings) or to the LAD (35%), and less often to the Cx territory (10%) or to several territories (19%). The segment wall motion score index (WMSI) measured by the assessing centers in these cases was 1.1±0.2 at rest and 1.3±0.2 with stress.

False Negative Readings

There were 319 readings corresponding to 66 patients. Of those who had angiographically demonstrated coronary stenosis some had a history of AMI. In those who did not undergo coronary angiography, or where this was negative, all had a medical history of AMI and dyssynergy according to the referring center. There were 102 false (-­) readings which corresponded to 32% of the assessments with CAD (intercenter range, 18%-47%). In most of these cases there was only 1-vessel disease (45%; LAD disease in 23 of them) or 2-vessel disease (34%), and on fewer occasions 3-vessel disease (9%) or disease in no vessels (13%). The referring center reported dyssynergy in 32 of these 66 patients (48%), which was severe (WMI, 1.50) in 12 of them (18%).

DISCUSSION

The main interest of this study lies in it being the first in which intercenter agreement on exercise echocardiography has been assessed. The main findings were as follows: a) the intercenter agreement on exercise echocardiography was moderate, and b) the sensitivity, specificity and diagnostic accuracy of the technique when carried out under blinded conditions were lower than those commonly reported when baseline characteristics and patient response stress are known.

Intercenter Agreement on Exercise Echocardiography

Although Hoffmann et al studied intercenter agreement on dobutamine stress echocardiography,1,2 there are no similar studies on exercise echocardiography despite being more frequently used, sensitive and safe.4,6 Low agreement was observed (κ=0.37) in Hoffmann's first study1 which was carried out with fundamental imaging and without uniform reading criteria, whereas in the second study, carried out with harmonic imaging and uniform reading criteria, agreement was moderate (κ=0.55).2 The improvement in agreement seemed to be due both to using harmonic imaging and the standardization of the reading criteria, since the degree of agreement on the same patients studied with fundamental imaging was greater than in Hoffman's first study. We used the same reading criteria as in Hoffmann's second study,2 which, in general, did not involve any change in the normal clinical practice followed in each center. It could be expected that the degree of agreement on exercise echocardiography would be less than that carried out with dobutamine, since there should be better quality images with the latter technique. However, by means of uniform reading criteria and harmonic imaging the percentage agreement was moderate, with a mean κ coefficient of 0.48, which is better than that of Hoffmann's first study and similar to the author's second study. The percentage agreement was greater in 3-vessel disease, in left anterior descending coronary artery disease, when there were baseline alterations in regional contractility and when the referring center reported dyssynergy in the LAD territory or serious dyssynergy: the fact of higher percentage agreement in these circumstances has clinical diagnostic and prognostic relevance, since the patients with these characteristics have a worse prognosis.11-13

Intercenter Agreement on Other Diagnostic Techniques

Concern over the degree of agreement is not exclusive to stress echocardiography. Different degrees of variability in interpretation have been observed with other techniques. Thus, very low levels of agreement have been reported regarding the interpretation of ST-segment elevation (κ=0.05) or ST-segment depression (κ=0.38) between 2 centers in patients with acute coronary syndrome.14 Studies on myocardial perfusion with nuclear medicine procedures also present difficulties in interpretation, since these techniques are subjective and, as in exercise echocardiography, the experience of the observer and image quality can influence the interpretation. A moderate-high agreement has been reported with thallium imaging, with κ coefficients ranging between 0.56 and 0.74 in 2 studies.15,16 However, in a multicenter study with 25 participating hospitals, agreement between different centers without uniform reading criteria was low (κ=0.27).17 In a study by Candell-Riera et al,18 good agreement was found with exercise technetium-99m tetrofosmin myocardial perfusion single-photon emission computed tomography, with κ coefficients between 0.62 and 0.70 depending on whether topographical images or polar mapping were evaluated.18 This study also found that the sensitivity of the report under blinded conditions was significantly lower than that reported when the clinical data of the patient were known.

Sensitivity, Specificity, and Diagnostic Accuracy

Although we present mean sensitivity and specificity scores for the different centers, the variability among them regarding interpretation under blinded conditions is of more interest. However, the sensitivity, specificity, and diagnostic accuracy of the technique carried out under blinded conditions were lower for each center than when they assessed their own cases and the baseline characteristics of the patients and their response to stress were known. This finding is not surprising, but it gives us an idea of the limitations of the technique when the pretest probability, hemodynamic, clinical, and ECG response to stress are not known. It is clear that EE can and should be used in clinical practice, but not under blinded conditions.

Limitations

The reading format of the studies was the same for all the centers (apical 4- and 2-chamber and parasternal long-axis and short-axis views at baseline and with stress), although the quality differed since, depending on whether the centers could read optical disks, the study was recorded on video or was sent via optical disk. However, the percentage agreement and the κ coefficients were similar for studies of optimal and suboptimal quality. The complete study recorded on video was not sent out as was done in the study by Hoffmann et al.1 This fact could lead to overestimating agreement, since the operator tends to acquire and store the images he/she considers more representative taking into account other test characteristics different from those of the image. Twenty-five percent of the patients were receiving treatment with beta-blockers and up to 40% of the tests were lower than submaximal. This fact could have led to underestimating sensitivity, although we have not found higher sensitivity in the tests that were maximal in comparison to those that were submaximal.

See editorial on pages 9-11

Study financed by the RECAVA Cardiovascular Network.


Correspondence: Dr. J.C. Peteiro.
P.o Ronda, 5, 4.o izqda. 15011 A Coruña. España.
E-mail: pete@canalejo.org

Received June 9, 2005.
Accepted for publication October 18, 2005.

Bibliography
[1]
Hoffmann R, Lethen H, Marwick T, Arnese M, Fioretti P, Pingitore A, et al..
Analysis of interinstitutional observer agreement in interpretation of dobutamine stress echocardiograms..
J Am Coll Cardiol, (1996), 27 pp. 330-6
[2]
Hoffmann R, Marwick TH, Poldermans D, Lethen H, Ciani R, van der Meer P, et al..
Refinements in stress echocardiographic techniques improve interinstitutional agreement in interpretation of dobutamine stress echocardiograms..
Eur Heart J, (2002), 23 pp. 821-9
[3]
Wann LS, Faris JV, Childress RH, Weyman AE, Feigenbaum H..
Exercise cross-sectional echocardiography in ischemic heart disease..
Circulation, (1979), 60 pp. 1300-8
[4]
Beleslin BD, Ostojic M, Stepanovic J, Djordjevic-Dikic A, Stojkovic S, Nedeljkovic M, et al..
Stress echocardiography in the detection of myocardial ischemia. Head-to-head comparison of exercise, dobutamine, and dipyridamole tests..
Circulation, (1994), 90 pp. 1168-76
[5]
Acquatella H..
Ecocardiografía de estrés en Latinoamérica. Revisión de 5 años (1997-2002)..
Rev Esp Cardiol, (2003), 56 pp. 21-8
[6]
Rodríguez García MA, Iglesias-Garriz I, Corral Fernández F, Garrote Coloma C, Alonso-Orcajo N, Branco L, et al..
Evaluación de la seguridad de la ecocardiografía de estrés en España y Portugal..
Rev Esp Cardiol, (2001), 54 pp. 941-8
[7]
Prevention of coronary heart disease in clinical practice..
Recommendations of the Second Joint Task Force of European and other Societies on Coronary Prevention..
Eur Heart J, (1998), 19 pp. 1434-503
[8]
Hoffmann R, Lethen H, Marwick T, Rambaldi R, Fioretti P, Pingitore A, et al..
Standardized guidelines for the interpretation of dobutamine echocardiography reduce inter-institutional variance in interpretation..
Am J Cardiol, (1998), 82 pp. 1520-4
[9]
Bourdillon PD, Broderick TM, Sawada SG, Armstrong WF, Ryan T, Dillon JC, et al..
Regional wall motion index for infarct and noninfart regions after reperfusion in acute myocardial infarction: comparison with global wall motion index..
J Am Soc Echocardiogr, (1989), 9 pp. 398-407
[10]
The measurement of interrater agreement. In: Fleiss JL, editor. Statistical methods for rates and proportions. New York: John & Sons; 1981. p. 212-36.
[11]
Arruda AM, Das MK, Roger VL, Klarich KW, Mahoney DW, Pellikka PA..
Prognostic value of exercise echocardiography in 2,632 patients ≥ 65 years of age..
J Am Coll Cardiol, (2001), 37 pp. 1036-41
[12]
Elhendy A, Mahoney DW, Khandheria BK, Paterick TE, Burger KN, Pellikka PA..
Prognostic significance of the location of wall motion abnormalities during exercise echocardiography..
J Am Coll Cardiol, (2002), 40 pp. 1623-9
[13]
Peteiro J, Monserrrat L, Mariñas J, Garrido I, Bouzas M, Muñiz J, et al..
Valor pronóstico de la ecocardiografía de ejercicio en cinta rodante..
Rev Esp Cardiol, (2005), 58 pp. 924-33
[14]
Holmvang L, Hasbak P, Clemmensen P, Wagner G, Grande P..
Differences between local investigator and core laboratory interpretation of the admission electrocardiogram in patients with unstable angina pectoris or non-Q-wave myocardial infarction (a Thrombin Inhibition in Myocardial Ischemia [TRIM] substudy)..
Am J Cardiol, (1998), 82 pp. 54-60
[15]
Okada RD, Boucher CA, Kirshenbaum HK, Kushner FG, Strauss HW, Block PC, et al..
Improved diagnostic accuracy of thallium-201 stress test using multiple observers and criteria derived from interobserver analysis of variance..
Am J Cardiol, (1980), 46 pp. 619-24
[16]
Atwood JE, Jensen D, Froelicher V, Witztum K, Gerber K, Gilpin E, et al..
Agreement in human interpretation of analog thallium myocardial perfusion images..
Circulation, (1981), 64 pp. 601-9
[17]
Wackers FJ, Bodenheimer M, Fleiss JL, Brown M..
Factors affecting uniformity in interpretation of planar thallium-201 imaging in a multicenter trial. The Multicenter Study on Silent Myocardial Ischemia (MSSMI) Thallium-201 Investigators..
J Am Coll Cardiol, (1993), 21 pp. 1064-74
[18]
Candell-Riera J, Santana-Boado C, Bermejo B, Armadans L, Castell J, Casáns I, et al..
Impacto de los datos clínicos y concordancia interhospitalaria en la interpretación de la tomogammagrafía miocárdica de perfusión..
Rev Esp Cardiol, (1999), 52 pp. 892-7
Are you a healthcare professional authorized to prescribe or dispense medications?