Objectives To examine the validity of the clinician-assessed version of a military occupational outcome measure (the functional activity assessment; FAA) and to compare the validity with the self-assessed version.
Methods The relationship between the clinician-assessed FAA and the SF-36 and Physical Workload Questionnaire was examined in 192 service personnel with musculoskeletal injuries. Concurrent validity was checked by comparing actual medical category with the FAA.
Results Clinicians preferentially chose an FAA grade of 2 (56% of all grades). The clinician-assessed FAA was significantly correlated with all measured variables in the expected direction. The performance of the regression models did not fully support construct validity. The discriminative ability of the models was poor.
Conclusions The clinician-assessed FAA is a less valid measure than its self-assessed counterpart. Use of the patient-reported FAA outcome measure is recommended.
- REHABILITATION MEDICINE
- OCCUPATIONAL & INDUSTRIAL MEDICINE
- Quality in health care < HEALTH SERVICES ADMINISTRATION & MANAGEMENT
Statistics from Altmetric.com
- REHABILITATION MEDICINE
- OCCUPATIONAL & INDUSTRIAL MEDICINE
- Quality in health care < HEALTH SERVICES ADMINISTRATION & MANAGEMENT
Assessment of functional outcome is important.
Validity is dependent on the mode of application.
The patient-reported functional activity assessment (FAA) rather than the clinician-reported FAA is recommended for use.
Outcome measurement is a critical requirement of all health interventions and activities. Healthcare professionals are under pressure from stakeholders, funders of health policies and patients to provide evidence of good clinical outcomes. Outcome measures alone however cannot measure the effects of an intervention1 because they are influenced by factors other than the intervention such as the natural course of a condition or injury.
The main aim of military rehabilitation is to return personnel who are fit to carry out their trade and also be able to perform basic military duties. Rehabilitation may also be used to try to increase an individual's effectiveness at work. In a study of over 10 000 personnel, 13% of those deployed and 6% of those who were not deployed to Iraq were medically downgraded, that is, had a reduced capability to complete their normal role.2 This suggests that between 7000 (6%) and 14 500 (13%) personnel are medically downgraded at one time if extrapolated to the whole of the British Army. Outcome measures that provide information on the level of occupational ability within a military setting are therefore of great interest.
Within the NHS, there has been a focus on patient choice and care. In line with this policy, there has also been a rise in the use of patient-reported outcome measures that indicate states of health and illness from the patient's perspective.3 A list of systematic reviews of patient-reported outcome measures for specific diseases and demographic groups has been compiled as part of the National Centre for Health Outcomes Development (available from phi.uhce.ox.ac.uk/selectpubs.php). These reviews include measures for cancer; long-term conditions such as asthma and stroke; and elective procedures. Within musculoskeletal medicine, well-validated questionnaires for quality of life (eg, Centers for Disease Control and Prevention Health-related Quality Of Life-14;4 Short Form-36 (SF-36)5) and condition-specific health (eg, disabilities of arm, shoulder and hand;6 knee injury and osteoarthritis score7) are available. Questionnaires that focus on work limitations, productivity and instability are also validated within the civilian population.8
There are several types of validity that can be examined for an outcome measure. Construct validity is established when an instrument is demonstrated to measure what it purports to measure. Construct validity can be quantified by identifying the extent to which the variance in the score may be explained by one or more attributes, such as physical health and physical workload.9 Concurrent validity is the degree that a measure correlates with other measures intending to quantify the same concept; discriminative validity is the ability of a measure to differentiate between different groups.
We recently demonstrated the validity of the self-assessed functional activity assessment (FAA) as a measure of occupational outcome in military rehabilitation.10 The FAA is a simple, single item, five-point ordinal measure and was originally based on a clinical assessment of the patient by either the doctor or physiotherapist (Table 1).
The nature of a self-assessed FAA necessitates that usage is carefully monitored. For example, if the FAA was used in a way that it had a direct effect on the patient's employment (eg, leading to restriction of duties), it could be open to bias or abuse by patients with a poor motivation to work. A clinician-assessed FAA would be advantageous in this respect. Reporting of symptoms and limitations to clinicians may also be subject to similar biases. However, astute clinicians may have the ability to recognise this and correct for this bias in their assessment. Clinician- and patient-reported outcomes measuring the same construct of knee health have been shown to have a moderately strong correlation,11 although there have also been reports of differences between clinician- and patient-reported ratings using the same measurement tool.12
The aim of this study was to examine the construct and concurrent validity of the clinician-assessed version of the FAA with respect to self-reported questionnaire outcomes.
On arrival at the Defence Medical Rehabilitation Centre (DMRC) or Regional Rehabilitation Units at Aldershot or Bulford for a multidisciplinary assessment clinic patients with musculoskeletal injuries were invited to participate. Those consenting completed a set of questionnaires and then underwent their clinic as usual. Clinicians were blinded to the questionnaire responses. Part of this study has been previously reported by Roberts et al.10 The study was approved by the Ministry of Defence Research Ethics Committee (Ref 0735/120).
Demographic data and patient and clinician-assessed FAA and P grade were collected and all participants completed the UK SF-36 (V.2.0) questionnaire (36 questions)5 and the Physical Workload Questionnaire (PWQ). The SF-36 is a well-validated questionnaire producing summary measures of physical health and mental health and the PWQ has previously been validated in over 1000 patients from three distinct populations with musculoskeletal disorders,13 although it has not previously been used or validated in a military setting.
Construct validity was tested by evaluating the degree that validated measures of physical health, mental health (SF-36 scores) and physical workload (PWQ scores) account for the FAA grading. We considered good evidence of construct validity to be there if both health and physical workload from the validated questionnaires were included in the final regression models. The predictive ability of the final models was used as an indicator of discriminative validity. Concurrent validity was considered to be good if there was a strong relationship between the FAA and the medical category.
Data processing and analysis
All the data were pooled and anonymised before analysis. The Response Consistency Index was used to confirm data quality for the SF-36.5 The raw SF-36 scores were transformed to norm-based scores, where a score of 50 represents the mean score in a study of 8889 respondents from the UK general population (respondents limited to four counties) in 1999;14 one SD is equal to 10 points on all scales. This is the most recent large scale study in the UK that provides sufficient information to allow calculation of the norm-based scores. The component (summary) scores for the SF-36 for physical (PCS) and mental health were calculated; subscale scores for physical functioning (PF), role-physical (RP), bodily pain (BP), general health, vitality, social functioning (SF), role-emotional and mental health were also calculated. Two sum scores were calculated from the responses on the PWQ. These represented heavy physical workload (HPW) and long-lasting postures and repetitive movements. The medical categories were transformed into an equivalent tri-service P grade (Table 2) and a single ‘medically unfit’ category consisting of those with either a P0 or P8 grading was created to transform the scores to a truly ordinal scale.
The relationship between the FAA score and all interval variables was determined by Kendalls τ-c rank correlation coefficient. Two ordinal regressions were conducted using the complementary log-log link function with the clinician-assessed FAA score was entered as the dependent variable. FAA grades 4 and 5 were combined due to the low count of patients graded 5. The analysis was repeated with either the eight subscales or the two component scores of the SF-36 and the remaining independent variables. The proportional odds assumption was tested using the test of parallel lines and the first iteration of each model entered all variables. Subsequent iterations removed variables not contributing to the model or demonstrating colinearity. Correlations between independent variables entered into the regression were assessed using Pearson's correlation coefficient. The significance level for all tests was set at 0.05. All statistical analyses were carried out using SPSS Statistics 17.0 (SPSS Inc, Chicago, Illinois, USA).
In all, 192 consecutive patients completed questionnaires and 155 patients (22 women; mean (SD) age 32 (7.8)) fully completed the SF-36 and PWQ and were assigned an FAA score by the clinician; an administrative error meant that some patients were discharged prior to the FAA being assigned by the clinician. Physical workload scores ranged from 0 to 32 where the maximum is 36. Norm-based PCS ranged from 7 to 58. Overall the frequency distribution (%) of the FAA was 10 (5%), 107 (56%), 36 (19%), 32 (17%) and 7 (4%) for grades 1–5, respectively.
Kendall's τ-c correlation coefficients for each variable are shown in Table 3. Significant (p<0.001) negative correlations were found between the FAA and all SF-36 scores (both component and subscale), and a positive correlation with the HPW sum score. The HPW score was significantly negatively correlated with age (r=−0.37, p<0.001), years in service (r=−0.34, p<0.001), the RP subscale (r=−0.40, p<0.001) and the PCS (r=−0.27, p<0.001). Higher scores on the SF-36 scales signify better function, whereas higher FAA grades signify poorer work ability. Higher levels of occupational ability were therefore associated with better health, less physical workload and an older age.
A significant regression model (χ2=50.9, df=2, p<0.001) emerged retaining two variables. In order of predictive power (according to Wald statistic), these were SF (OR 0.94, Wald=13.9) and PF (OR 0.91, Wald=8.9). Pseudo R-square values, demonstrating reasonable prediction of the FAA using the regression model, were 0.24, 0.27 and 0.12 corresponding to Cox and Snell, Nagelkerke and McFadden statistics, respectively. The analysis using the component scores was similar in fit to the first model. This second model retained both the PCS and mental component score. The models predicted 59%–60% of patients FAA grades correctly based on the scores of the included variables, although this was achieved by predicting membership in only two of the four categories entered into the initial model. No patient was predicted to have a grade of 1 or 3 by either model; 84%–86% of the predictions from both models were within one grade of the actual grade.
The FAA was significantly correlated with self-reported (τ-c=0.35; p<0.001) and clinician-recommended P grade (τ-c=0.36; p<0.001).
This study provides some evidence of construct and concurrent validity of the clinician-assessed FAA. There was a good range of patients with varying physical health and physical workload. All simple correlations were significant in the expected direction with better FAA grades associated with better health and less physical workload. The strength of the correlations indicated that the physical health variables were most strongly related, while the mental health variables had a weaker yet still significant correlation. Workload variables however did not correlate to the extent expected. In addition, performance of the regression models did not fully support the construct validity of this version of the FAA, in line with the strength of the individual correlations. The discriminative validity of each FAA grade was also not supported.
The ability to differentiate all FAA grades is an essential element of confirming discriminative validity. However, the regression models only predicted that patients had FAA grades of 2 or ‘4 or 5’. This was probably in part due to the sampling distribution used to create the model with low numbers of FAA 1. This bias may have been created as the perception among medical staff may be to grade all who present to their clinics as FAA 2 or higher, because if patients were FAA 1, they should not be seen at regional rehabilitation unit (RRU) or DMRC level clinics. The limitations of low numbers of patients with FAA 1 or 5 could be improved if future studies sampled from primary care facilities and complex trauma clinics. There were similar numbers of patients with either an FAA of 3 (n=36) or an FAA of 4 or 5 (n=39). Notwithstanding the limitations described, these results still suggest that differentiation may be a real problem for clinicians. The fit of the models is also lower than that demonstrated for the self-assessed FAA.10 Pseudo R-squared values representing predictive ability in the models for the self-assessed FAA were approximately double those reported in this paper.
The FAA is intended to be a measure of occupational activity that reflects physical health, and as such it should be related to a patient's physical health in relation to physical workload and should not correlate with other variables. Construct validity was supported to some extent as age, gender and years in service did not correlate with the FAA significantly (and were not included in the model).
Both models included measures of physical health which had the strongest correlations to the FAA; however, neither model included a physical workload variable. This was likely due to the fact that the only physical workload variable significantly correlated with the FAA had one of the weakest correlations. The workload variable representing long-lasting postures and repetitive movements was not significantly correlated. This may reflect the relatively poor knowledge of a clinician of the working environment and task demands of the individual. Conversely, a clinician may understand the physical health of an individual better than the patient themselves. However, it is the understanding of the effects of ill health on the individual's ability to carry out trade and military duties that is required to make a valid FAA grading. One reason for the exclusion of the HPW score, from the regression models, could be because it shared variance with four other variables. However, of these variables, only the PCS was included in a final model. This suggests that the clinician-assessed FAA is more a surrogate measure of physical health than occupational ability. In comparison with the clinician-assessed FAA, both regression models for the self-assessed FAA included a measure of physical workload.10
The inclusion of the SF subscale in the final model is also not in line with predictions. This subscale is calculated from the responses to two items determining the extent that an individual's physical health or emotional problems have interfered with social activities. A more valid measure would have included the RP subscale (as reported for the self-assessed FAA) which directly reflects limitations with work and other activities. It could be that clinicians have greater knowledge of the social limitations due to injury (perhaps patients more readily report this information). If this were the case, clinicians may be using this information to infer the occupational limitations that are required to accurately score the FAA.
It is important to note that although RP and BP were not included in the final model, they did demonstrate similar moderate correlations (with the FAA) to the SF and PF subscales that were included. Their exclusion does not however mean the construct is not valid. The selection of variables to be included in the model is dependent on the strength of correlation, any variance shared with variables already included in the model, along with the model specification. As such, with these results, we can conclude that SF and PF in combination explained most of the variability in the FAA scores. However, RP and BP would likely provide only a slightly worse explanation of the FAA scores.
The clinician-assessed FAA had a better correlation to their own recommended P grade rather than the patient-reported P grade. Patient-reported P grades should represent actual medical categories unless the patient was uncertain of their actual grade. It is not surprising that two assessments of occupational ability by the same clinician are reasonably well correlated. Similarly, self-assessed FAA scores are better correlated with self-reported rather than clinician-recommended P grades. The degree of correlation between these measures however is much lower for clinician-assessed FAA (τ=0.39) in comparison with self-assessed FAA (τ=0.58).10 This suggests that the clinician FAA has relatively poor concurrent validity, although it must be noted that the P grade has not been formally validated.
As discussed in the paper by Roberts et al,10 this study is similarly limited by not measuring work capability directly and relying on self-reported measures to confirm validity. However, there are no well validated work capability evaluations for the wide range of military occupations. The use of self-reported measures is a valid way of testing the validity of new measures as long as these measures are validated themselves. This study is also limited by the frequency distribution of the FAA. Despite a diverse set of patients as confirmed by the wide range of the SF-36 and physical workload scores, clinicians were more likely to confer an FAA of 2 than any of the other FAA grades combined.
A few studies have directly compared clinician and patient-reported outcome measures. Bastien et al15 demonstrated good correlation ‘between patient's insomnia severity ratings and collateral ratings from their significant others and from independent clinicians’. Chassany et al12 however reported discrepancies between patient-reported and clinician-reported outcome measures in three chronic diseases. The authors demonstrated that both raters outcomes were correlated; however, general practitioners (GPs) estimated that patients had lower pain intensity and a greater pain-free walking distance than reported by the patients themselves. Depending on the disease, GPs also either overestimated or underestimated quality of life scores. Our study suggests that similar discrepancies are also apparent in the FAA. Although correlated to all the same constructs, a clinician-assessed FAA does not appear to be as valid a measure as the self-assessed FAA.
For its original intended use as a measure of occupational ability, the clinician-assessed FAA was not found to be a valid measure. However, to some extent, it is a valid surrogate measure of physical health. The small numbers of FAA 1 and 5 suggest that the measure may not be particularly useful for tracking changes within a single rehabilitation unit, but its validity may improve when a larger range of facilities are assessed. The ability to distinguish between the different grades was a problem for our regression model, which may also be a problem for clinicians. Use of the better validated self-assessed FAA as reported by Roberts et al10 rather than the clinician-assessed FAA reported here is recommended.
Contributors AJR and JE planned, drafted and revised the manuscript. AJR conducted the data collection and statistical analysis phases.
Competing interests None.
Ethics approval MODREC.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.