Keywords
INTRODUCTION
The prevalence of heart failure has increased in recent decades.1 Data from the Framingham study indicate an incidence of heart failure in the population aged over 45 years of 7.2 and 4.7/1000 person/years for males and females, respectively.2 In developed countries, heart failure is the most frequent cause of hospitalization in patients aged 65 years or over and is the cause of at least 5% of all hospitalizations and 4% of all deaths.3
Heart failure has a considerable impact on patients' daily activities, an impact which is comparable to or even greater than that of other chronic diseases such as diabetes or arthritis.4 The impact of the disease has traditionally been measured using clinical tools, such as the New York Heart Association (NYHA) functional classitification5 or the 6 minute walk test (6MWT).6 Health-related quality of life (HRQL) instruments provide means of exploring patient perceptions of the ways in which heart failure affects their daily lives and well-being. Given that clinical indices of severity correlate only weakly or moderately with patient perceptions, HRQL assessments provide additional information which cannot be directly extrapolated from clinical measures.7
As heart failure treatments are primarily symptomatic, disease-specific questionnaires for use in these patients have become increasingly important in recent decades.8 To date, information has been published on the development and validation of 5 questionnaires specifically for use in heart failure, ie, the Minnesota Living with Heart Failure Questionnaire (MLHFQ),9 the Quality of Life Questionnaire for Severe Heart Failure,10 the Chronic Heart Failure Questionnaire,11 the Kansas City Cardiomyopathy Questionnaire,12 and the Left Ventricular Dysfunction Questionnaire.13 The most widely known and used of these is the MLHFQ, which has been adapted for use in over 32 languages and has demonstrated good psychometric properties in numerous studies.14-16
The MLHFQ was linguistically adapted for use in Spain in 1997 and has been widely used in several settings.17-20 Nevertheless, we are not aware of any published studies which have examined the metric properties of the adapted version. The objective of the present study was to assess the feasibility, reliability, validity, and sensitivity to change of the Spanish version of the MLHFQ in daily clinical practice in cardiology outpatient clinics.
METHODS
Study Design
This was a prospective study in which patients admitted for heart failure were recruited consecutively in 50 Spanish hospitals. Patients were followed up for a period of 3 months after discharge in cardiology outpatient clinics. Patients were considered eligible to participate in the study if they were admitted to hospital for suspected heart failure in a coronary, cardiology, internal medicine, or intensive care unit, and if heart failure was confirmed at discharge as the primary or secondary diagnosis. Inclusion criteria were those of the European Society of Cardiology (symptoms of heart failure and evidence of cardiac dysfunction based on findings from complementary explorations).21 Exclusion criteria were: a) heart failure secondary to a reversible acute cause (supraventricular tachyarrhythmia which reverts to a sinus rhythm, hyperthyroidism); b) heart failure or acute pulmonary oedema secondary to serious valvulopathy requiring surgery; c) presence of serious concomitant illness (chronic kidney disease requiring renal replacement therapy, in treatment for a neoplasm) or a diagnosis of cor pulmonale; and d) unable to participate due to clinical status.
The study was approved by the Ethics Committee of the Vall d'Hebron Hospital in Barcelona.
On hospital admission, demographic and clinical data (illness history and co-morbidity, severity, and etiology of heart failure, functional capacity) were collected, explorations were performed, and treatment prescribed. Functional capacity was described using the NYHA classification together with 3 questions using a dichotomous yes/no response (Do you regularly walk outside? Do you perform any recreational activity requiring physical exertion? Do you refrain from exerting yourself?).
HRQL data was collected and a clinical evaluation (rehospitalizations, visits, and diagnostic tests, functional status, and changes in treatment) at baseline, which was 1 month after discharge. The same data were collected at a second visit 2 months after the first.
Quality of Life Questionnaires
The MLHFQ was developed in the USA by T. Rector.12 It is a self-administered questionnaire consisting of 21 items. It provides an overall score as well as a score for the 2 dimensions of physical (8 items) and emotional (5 items) health. Response options are from 0 (no impact on HRQL) to 5 (maximum impact on HRQL). Overall (0-105) and dimension (physical, 0-40; emotional, 0-25) scores are obtained by summing responses to each of the items. Scores can be imputed as long as there are fewer than 4 missing values on the physical dimension, fewer than 3 on the emotional dimension, and fewer than 11 for the overall score.
The generic SF-36 questionnaire was administered alongside the disease-specific MLHFQ. The SF-36 health questionnaire can be administered in the general population and different patient groups,22 and has been used to evaluate several interventions in heart failure.17,19 It includes 36 questions which measure the following 8 dimensions of health: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health.23 A score is obtained for each dimension ranging from 0 (worst) to 100 (best health). The instrument also generates 2 summary scores representing mental and physical health. These are standardized to a mean of 50 and a standard deviation [SD] of 10 using Spanish general population reference scores.24 Summary scores above 50 indicate better HRQL than the general population; those below 50 indicate poorer HRQL.
Sub-Groups
Patients were divided into 2 sub-groups based on the degree of change on the 4 functional capacity variables (NYHA and the 3 additional questions) between the 2 visits. Patients who did not show any change on any of the 4 variables were considered stable and were used to evaluate test-retest reliability. Patients who improved or deteriorated on at least 2 of the 4 variables were included in the sub-group used to analyze sensitivity to change.
Statistical Analysis
Sub-group socio-demographic and clinical characteristics, and HRQL scores, were compared using parametric or non-parametric tests depending on the distribution of the continuous variables. The c2 test was used to compare sub-groups on categorical variables.
The observed range of scores on the 2 HRQL questionnaires was calculated for the baseline visit. Feasibility was assessed by calculating the percentage of patients per dimension with at least 1 missing value. Floor and ceiling effects (the proportion of patients with the maximum and minimum score, respectively) were obtained for each score. Reliability was assessed by calculating: a) internal consistency (estimated using Cronbach's alpha coefficient,25); and b) test-retest reliability (estimated using the intraclass correlation coefficient [ICC]26). Cronbach's alpha measures the degree of homogeneity among items in a dimension at a single administration. In the present study, it was calculated using data for the whole sample from the baseline evaluation. The ICC is a measure of agreement and was calculated using data from the 2 assessments in the reproducibility sub-sample. Both Cronbach's alpha and the ICC take values between 0 and 1. A value of .7 has been suggested as the threshold for comparisons at group level, while for individual level comparisons a value of a=.9 is considered appropriate.27
Construct validity refers to the extent to which scores correlate in expected ways with other clinical or HRQL measures.27 In order to examine the pattern of HRQL scores across known groups defined by different levels of clinical severity, figures were constructed showing the means and 95% confidence intervals (CI) for each NYHA functional class. The construct validity of the MLHFQ was assessed using a Spearman correlation matrix of the SF-36 and MLHFQ dimensions (multi-trait multi-method matrix).28 Expected correlations between the 2 HRQL instruments were categorized as convergent or discriminant. Convergent validity refers to the idea that correlations between different instruments measuring similar concepts should be moderate to high, ie, >0.4 and >0.6, respectively. We hypothesized that the highest correlations would be observed between: a) the physical dimension of the MLHFQ and the SF-36 physical functioning, role physical, and physical summary scores; and b) the emotional dimension of the MLHFQ and the role emotional, mental health, and mental component summary score of the SF-36. To show discriminant validity, there should be low correlations between instruments which aim to measure different traits. We therefore hypothesized a priori that there would be low correlations between: the SF-36 physical functioning, role physical, and physical summary scores, and the emotional dimension of the MLHFQ as well as between the physical dimension of the MLHFQ and the SF-36 role emotional, mental health, and mental summary scores.
Analysis of the MLHFQ's sensitivity to change was performed using data from the sub-groups who reported improvement and deterioration. First, mean scores from the visits 1 and 3 months after discharge were compared using Wilcoxon t test. The effect size (ES) was calculated for both the MLHFQ and the SF-36 based on the change in score between the 2 visits.29 The effect size (ES) is equivalent to mean change / baseline SD. An ES of >0.8 is considered high; one of 0.5 moderate, and one close to 0.2 is considered low.
RESULTS
A total of 677 patients with heart failure were included in the study. The final sample had a mean age (SD) of 69.6 (11.9) years; 61% were male. Patients were generally classified in NYHA groups I, II, and III (19.6%, 53.1%, and 25.2%, respectively). All HRQL instruments were completed by patients at the study visit and were only administered by health care personnel when required.
Table 1 shows the sample's socio-demographic and clinical characteristics, as well as HRQL scores for the overall sample and for the reproducibility (n=245) subgroup and for the sub-groups of patients who improved (n=60) and deteriorated (n=43). There were statistically significant differences between sub-groups on the following variables: distribution on the NYHA, age, SF-36 mental summary score, on 3 dimensions of the SF-36, and on the MLHFQ scores.
Observed scores on the MLHFQ and SF-36 covered the full theoretical range (Table 2). The rate of missing responses was practically zero in the 2 MLHFQ dimensions, and was only high for the overall score (22.5%). MLHFQ ceiling and floor effects were very low, though there were substantial ceiling effects on 4 of the SF-36 dimensions. Cronbach's alpha was high for all dimensions, ranging from .817-.915 and .70-.93 on the MLHFQ and SF-36, respectively. The ICC was >0.7 for the 3 MLHFQ scores and close to 0.6 in the majority of SF-36 dimensions.
The difference in score between the different NYHA classes was statistically significant (Kruskal-Wallis, P<.001) in all cases (Figure 1), with mean scores for the MLHFQ physical dimension ranging from 7.9 (8.4) for class I to 27.8 (8.3) for class IV. Mean SF-36 physical summary scores were 43.7 (8.6) and 28.1 (8.4) in functional classes I and IV, respectively. When discriminant validity was assessed using scores from the second evaluation, the results were very similar (P<.001) (Figure 2).
Figure 1. Scoring gradient for the MLHFQ physical and emotional domains and SF-36 summary scores by NYHA classification. Baseline assessment.
Figure 2. Scoring gradient for the MLHFQ physical and emotional domains and SF-36 summary scores by NYHA classification. Second assessment.
The correlation matrix between the dimensions of the SF-36 and the MLHFQ shows that all of the correlations which had previously been hypothesized to be moderate or high (Table 3) were >0.52, with the exception of the correlation between mental health and the emotional domain of the MLHFQ, which was 0.39. Correlations which were expected a priori to be low (discriminant validity) were <0.5 (Table 3).
Between the 2 study visits, approximately 20% of patients remained stable and change in NYHA functional class was similar in the improvement and deterioration sub-groups: the majority changed by only 1 class (70% of those who improved and 69.8% of those who deteriorated) and none of the patients changed by more than 2 classes. In the sub-group of patients who improved, ES were quite low (Table 4), the largest being seen on the physical domain and the MLHFQ overall score (0.42 and 0.41, respectively). In the sub-group of patients who worsened, ES were even smaller, from -0.09 to -0.26 on the MLHFQ.
DISCUSSION
The Spanish version of the MLHFQ has demonstrated adequate measurement properties which are similar to those of the original version. The excellent results in terms of reliability and validity were particularly noteworthy. The study results support the use of the MLHFQ in Spain as well as its use in international comparisons.
The analysis of the MLHFQ's feasibility showed that there were virtually no missing responses except on 2 items referring to the respondent's profession and sexual activity. These 2 items are not included in any of the questionnaire dimensions, but they do contribute to the overall score. Despite the results on these 2 items, the study indicates that the Spanish version of the MLHFQ is feasible for use in heart failure patients. The distribution of the scores clearly illustrates some of the advantages of disease-specific over generic instruments. The small ceiling and floor effects and the use of the full range of scores in a sample which covers the full range of severity, such as that included in the present study, indicate that the questionnaire addresses problems of relevance to these patients and suggest that the instrument is likely to detect improvement or deterioration. The high percentage of patients with the maximum possible score (ceiling effect) on several dimensions of the SF-36 is, in part at least, a reflection of the instrument's lack of relevance for patients with this condition. On the other hand, the 2 SF-36 role dimensions have shown high ceiling effects in several populations and is one of the reasons for recent modifications to the response scale used in these dimensions in version 2 of the SF-36.30-32
The MLHFQ also showed excellent reliability, both in terms of internal consistency and reproducibility, with reliability coefficients over the minimum recommended standard27 on all dimensions. On the physical dimension and the overall score, Cronbach's alpha was over .90, which has been proposed as the standard for individual level comparisons. The ICC was also >0.7 for all scores. On the other hand, the 95% CI for the SF-36 mean scores for the 4 NYHA classes overlapped, which contrasts with the generally independent means observed for the MLHFQ. This indicates the MLHFQ's greater ability to discriminate between patients according to the degree of functional impairment, whilst the correlations observed between the 2 instruments provide evidence of the construct validity of the physical and emotional domains of the MLHFQ.
The effect sizes observed on the MLHFQ physical dimension and overall score (both were close to 0.4) can be considered moderate, according to Cohen's criteria,29,33 though they were larger than those observed on the SF-36 dimensions (0.03-0.33). The greater capacity of the MLHFQ to detect change supports the hypothesis, which has also been confirmed in other studies, that disease-specific instruments are more sensitive to change than generic instruments.
Study Limitations
One of the study limitations was that the analysis of test-retest reliability was based on data collected after a 2 month interval, which is a longer time than is generally recommended in this type of design. Over that length of time, there may have been changes in the treatment or in the patient's situation. Furthermore, assessment of the stability of the patient's condition was based on physician ratings of functioning (among other variables), and not only on patient self-report using the health status transition item. The latter may have been more appropriate given that physician and patient ratings do not correlate strongly.34,35 These 2 methodological characteristics may have led us to underestimate the reproducibility of the Spanish version of the MLHFQ, which may actually be greater than that observed here. Patient improvement or deterioration was likewise measured indirectly, as it was assumed that a change in functional status would be accompanied by a change in self-perceived HRQL, despite the fact that the metric characteristics of the NYHA classification are not well-known,36 there is considerable variability in its use, and little evidence regarding its capacity to detect a minimum clinically important difference. When evaluating sensitivity to change, the inclusion of a higher proportion of symptomatic patients could provide a more homogeneous distribution across the four NYHA functional classes. It would also be useful to employ a pre-post design with an intervention which would produce a clear improvement in health status. Despite these limitations, the coefficients obtained showed that the Spanish version of the MLHFQ is sensitive to change, and more so than the SF-36.
The study results cannot be considered to provide a representative description of the HRQL of patients hospitalized for heart failure in Spain. Nevertheless, the variety of patients included provides some support for the external validity of the Spanish version's psychometric characteristics. It can therefore be affirmed that the MLHFQ is appropriate for measuring HRQL in patients with heart failure with a range of characteristics.
CONCLUSIONS
This study has shown that the MLHFQ has excellent reliability and validity and moderate sensitivity to change when used to evaluate HRQL in patients with heart failure. Given its cross-cultural characteristics, it will allow for comparisons between countries and will provide a particularly useful measure of HRQL in multi-center international studies.
ACKNOWLEDGEMENTS
The authors are grateful to Ana María Rodríguez for her insights and contributions to the discussion of the results. We would also like to thanks the patients who took part.
HF-QOL GROUP
J. Ariza, J. Fernández, V. López, R. Calvo, P. Bureo, J. Carretero; A. Bayes, D. Gil, C. Ligero, J. Comín, P. Cabero, J. Roure, G. Peñarrojas, M.A. Paz, S. Castro, J. Roca, L. Per-digón, J.A. Ruiz, D. Jiménez, V. Bertomeu, A. Mateu, A. Carrión, S. Martí, A.M. Rubio, J. García, J. Blanquer, J.C. Vargas, C. Pérez, M.A. García, L. Pérez, C. Borasteros, F. Taboada, A. Grande, A.I. Huelmos, J. Bilbao, A. Melero, A. Díaz, J.L. Diago, A. Navarro, J.F. Sotillo, J. Rovira, J.A. Velasco, A. Chaume, D. Atienza, A. Salvador, P. Baello, J. Muñoz, V. Ruiz, M.J. Fombella, J.M. Cerqueiro, E. Freire, J. Jiménez, C. Hidalgo, F. Santolaria, M. Rodríguez, O. Afonso, I. Lekuona, J.A. Alarcón, A. Pérez, J. Marasa, A. del Río, T.Soriano, E. Roig, I. Vallejo, A. Álvarez, J. Julià, R. Bagà, J. Mesquida, A. Tobaruela, J.M. Lomas, A. Martínez, A. Aguilera, and A.M. Campos.
ABBREVIATIONS
ES: effect size
HF: heart failure
HRQL: health-related quality of life
ICC: intraclass correlation coefficient
MLHFQ: Minnesota Living with Heart Failure
Questionnaire
NYHA: New York Heart Association SD: standard deviation
SF-36: short form, 36 items
SEE EDITORIAL ON PAGES 233-5
Correspondence: Dra. M. Ferrer.
Doctor Aiguader, 88. 08003 Barcelona. España.
E-mail: mferrer@imim.es
Received July 6, 2007.
Accepted for publication September 28, 2007.