Measuring meaningful outcomes for adolescent idiopathic scoliosis: a narrative review and critical appraisal of the Scoliosis Research Society-22 revised (SRS-22r) instrument

Armaan K. Malhotra; Husain Shakil; Christopher S. Lozano; Vishwathsen Karthikeyan; Jennifer A. Dermott; Jefferson R. Wilson; Unni G. Narayanan; David E. Lebel

doi:10.21037/jss-25-54

Review Article

Measuring meaningful outcomes for adolescent idiopathic scoliosis: a narrative review and critical appraisal of the Scoliosis Research Society-22 revised (SRS-22r) instrument

Armaan K. Malhotra^1,2,3, Husain Shakil^1,2,3, Christopher S. Lozano^1,2,3, Vishwathsen Karthikeyan^1,2,3, Jennifer A. Dermott⁴, Jefferson R. Wilson^1,2,3, Unni G. Narayanan^3,4*, David E. Lebel^4*

¹Division of Neurosurgery, Unity Health Toronto, Toronto, Ontario, Canada; ²Li Ka Shing Knowledge Institute, Unity Health, Toronto, Ontario, Canada; ³Institute for Health Policy, Management and Evaluation, University of Toronto, Ontario, Canada; ⁴Division of Orthopedic Surgery, Hospital for Sick Children, Toronto, Ontario, Canada

Contributions: (I) Conception and design: AK Malhotra, UG Narayanan, DE Lebel; (II) Administrative support: JR Wilson, DE Lebel; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: AK Malhotra; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^*These authors contributed equally to this work as co-senior authors.

Correspondence to: Unni G. Narayanan, MBBS, MSc. Institute for Health Policy, Management and Evaluation, University of Toronto, Ontario, Canada; Division of Orthopedic Surgery, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada. Email: unni.narayanan@sickkids.ca.

Background and Objective: The Scoliosis Research Society-22 revised questionnaire (SRS-22r) is the most widely used patient-reported outcome measure (PROM) to evaluate health-related quality of life (HRQL) for patients with adolescent idiopathic scoliosis (AIS). Here, we seek to critically appraise the development process and psychometric properties of the SRS-22r.

Methods: We evaluated the item generation, item reduction, sensibility and measurement properties including reliability, validity and responsiveness of the SRS-22r. To accomplish this, we examined available literature describing psychometric properties of the SRS-22r and summarized the findings in a narrative review format.

Key Content and Findings: The SRS-22r represents a multi-dimensional outcome measure that demonstrates generally appropriate responsiveness to outcomes after surgical AIS deformity correction. Despite its strengths, several limitations were identified, including (I) absence of a conceptual framework for HRQL in AIS; (II) lack of direct patient involvement during development of the instrument; (III) minimal evidence of evaluation of the interpretation, appropriateness and importance, comprehensiveness of the items by adolescents; and (IV) the inclusion of a satisfaction with treatment (surgery) domain within a HRQL instrument. Though the SRS-22r is responsive to change after surgical intervention, its ability to discriminate between mild and moderate scoliosis remains limited.

Conclusions: Our findings characterize the strengths and limitations of the SRS-22r. An ideal AIS HRQL measure should be guided by a conceptual framework informed by, and aligned with the priorities and goals of adolescents with idiopathic scoliosis based on their lived experience, complemented by parental perspectives and input from clinician experts with an understanding AIS management.

Keywords: Adolescent idiopathic scoliosis (AIS); outcomes; Scoliosis Research Society questionnaire (SRS questionnaire); validity; reliability

Submitted Apr 03, 2025. Accepted for publication Jul 03, 2025. Published online Sep 24, 2025.

doi: 10.21037/jss-25-54

Introduction

Adolescent idiopathic scoliosis (AIS) affects 1–3% of patients under 18 years of age (1). Patients may experience erosion of body image and self-esteem with resultant impaired social functioning (2,3). Management of AIS is influenced by the magnitude of the deformity, age, other risk factors of progression and the expected impact of untreated scoliosis on patient health and health-related quality of life (HRQL) (4). Treatment of AIS includes non-operative management such as bracing as well as surgical intervention, which is usually reserved for patients with curves >45–50 degrees, significant risk for progression and/or those experiencing impairments in HRQL (5-7).

HRQL is a concept influenced by both an individual’s intrinsic values and their environment, which in turn affects the lived experiences of symptoms, functioning and perceptions, and ultimately contributing to overall quality of life (8). Specific HRQL considerations in AIS may include an adolescent’s body-image, self-esteem, pain, and psychological distress associated with spinal deformity (3,9,10). The outcomes of interventions that address AIS must therefore be judged by whether an individual’s priorities and goals have been achieved. Such outcomes are best evaluated using patient-reported outcome measures (PROMs) of HRQL which integrate expectations, patient values and general health perceptions relevant to AIS (10,11).

The Scoliosis Research Society-22 revised (SRS-22r) questionnaire is the most widely used PROM to evaluate HRQL outcomes in AIS. Despite the widespread use of the SRS-22r, emerging literature has raised concerns regarding its developmental underpinnings and relevance to adolescent patients’ lived experience (12). Given the increased emphasis on validity of PROMs and desire to generate informative evidence to guide practice, there is a need to critically examine whether the SRS-22r aligns with current standards of PROM development. This narrative review aims to synthesize available psychometric evidence and assess the instrument’s development, sensibility, and measurement properties, with attention to content relevance for AIS populations (13). We present this article in accordance with the Narrative Review reporting checklist (available at https://jss.amegroups.com/article/view/10.21037/jss-25-54/rc).

Methods

In this narrative review, we searched available literature using a combination of Medical Search Heading (MeSH) terms summarized in Table 1. Literature review was performed between November 2024 to January 2025. Additional references were identified from bibliometric review of included studies. For the purposes of this study, we focused on identification of articles reporting psychometric property testing or SRS-22r development or evaluation in AIS patients. We included studies reported in the English language and therefore focused on the SRS-22r English version. We included additional studies if they involved item generation or reduction of earlier SRS questionnaire versions that preceded the SRS-22r. Searching was conducted by AKM, and additional studies were discussed with senior authors DEL and UGN. Given the heterogeneity of anticipated measurement property testing across psychometric domains, we at the outset intended to provide a narrative summary, rather than any quantitative evidence synthesis.

Table 1

Search strategy summary

Items	Specification
Date of search	Search was initially conducted in November, 2024; additional studies were reviewed until January, 2025 from identified bibliographies and databases
Databases and other sources searched	MEDLINE and Embase; supplemented with citation review from included articles
Search terms used	Search terms included three concepts that were combined:
	• Concept 1: terms such as “scoliosis”, “deformity”, “curve”, “adolescent idiopathic scoliosis”
	• Concept 2: “scoliosis research society”, “SRS-22”, “SRS”, “outcome measure”, “SRS-22r”
	• Concept 3: “measurement”, “psychometric”, “reliability”, “validity”, “sensibility”, “item generation”, “item reduction”, “responsiveness”
Timeframe	Articles identified from database inception to January 2025
Inclusion criteria	English language studies pertaining to psychometric evaluation of the SRS-22r for adolescent idiopathic scoliosis patients. We were inclusive to studies performed for item generation and reduction to ultimately arrive at the SRS-22r instrument
Selection process	A.K.M. conducted the search and senior authors U.G.N. and D.E.L. supplemented and reviewed search results for additional content where applicable

SRS, Scoliosis Research Society; SRS-22r, Scoliosis Research Society-22 revised.

Key content and findings

Overview of the SRS-22r

The SRS-22r comprises 22 items spanning 5 domains: function/activity, pain, self-perceived image, mental health, and satisfaction. Higher scores reflect better HRQL (13). This measure has been revised several times since its introduction in 1999 (originally 24 items/7 domains) (11,14).

Development of the SRS-22r: item generation, item reduction

The original motivation to develop an instrument for AIS was to measure patient-reported HRQL outcomes for disease monitoring and evaluation of the effectiveness of surgical deformity correction beyond just radiographic outcomes such as percent curve correction (11). There was also a lack of standardized methods to collect patient satisfaction following treatment of AIS, leading to heterogenous studies with limited interpretability (15). In 1999, the item generation and reduction phases culminated in the development of the original SRS instrument (SRS-I), which was intended to measure the overall construct of HRQL in patients undergoing deformity surgery for AIS.

The methodology for the item generation for the original SRS-I has not been clearly described (11). Item generation for a new condition-specific PROM should ideally include a systematic review of the literature and the development of a conceptual framework for the target construct in order to ascertain patient priorities, typically using qualitative methods, such as interviews or focus groups with patients in various stages of their condition as well as with other stakeholders (10,16). The authors of the original SRS stated that candidate items were identified from two previously published studies. One study focused on retrospectively collected surgical outcomes from adults with scoliosis, of which 80% had idiopathic scoliosis (mean age of scoliosis diagnosis was 15 years but mean time from diagnosis to surgery was 27 years). The second study focused on outcomes from patients with low back pain and incorporated qualitative interviews with patients and providers (17,18). Neither study directly involved AIS patients to inform item generation. The authors of the original SRS explicitly added items associated with self-image based on their own clinical expertise, recognizing this was an important contributor to HRQL for AIS. The item generation phase resulted in 55 candidate items spanning the following key areas with a combination of 5-point Likert scales and binary response options:

The amount of pain endured by the patient;
Functional status of the patient;
Self-image perception and sense of attractiveness to others;
Degree of patient satisfaction with treatment.

After prospective administration of the complete set of candidate items to 108 AIS patients managed across 3 independent centers over a 2-year period, authors removed 30 items due to redundancy and high non-response rates. The final 25 items underwent factor analysis to organize these items into meaningful subscales. Factor analysis revealed a latent structure that explained 71% of the observed variance. One item related to sexual activity was an outlier, which the authors felt did not map to the concept of HRQL for AIS patients undergoing surgery and was therefore dropped yielding 24 items with 7 domains: pain, general self-image, satisfaction, post-operative self-image, general function, overall activity level and post-operative function. There was no clear description about the methodology used for individual item weighting, but each 5-point item was scored on a scale from 1 (worst) to 5 (best) points, or 1 (worst) or 5 (best) for binary items. Notably, the authors did not describe involvement of patients with AIS during the generation of the content, removal of items, phrasing, scoring and structure of the instrument.

Modifications leading to the contemporary SRS-22r

Initial feedback following pilot testing raised concerns regarding internal consistency of items within sub-domains as well as unknown test-retest reliability, criterion validity and discriminant validity (19). Item 23 was deleted given its contribution to low internal consistency (self-image item comparing current perception to pre-treatment perception) and items 16–21 were removed given dependency on recall of pre-treatment condition rather than assessment of current condition. The original 7 domains were combined into 4 broader categories and an additional mental health domain was added [items adapted from SF-36 mental health component], totaling 5 domains, resulting in the SRS-23 (modified SRS) (20). Concurrent validity was demonstrated through comparisons with the SF-36. Shortly thereafter, the measure was reduced by an additional item, which was removed again for low internal consistency (self-image domain) (21). A final refinement of the SRS-22 occurred in 2006, which was prompted by discovery of low internal consistency among patients aged <18 years for item 18 during validation studies in Spanish and Turkish languages (14,22-25). Revision of item 18 question stem language and response options yielded the SRS-22r (Figure 1).

Figure 1 Schematic overviewing the SRS-22r questionnaire development (11,14,20,21). SRS, Scoliosis Research Society; SRS-22r, Scoliosis Research Society-22 revised.

The 22 items of the SRS-22r (each scored from 1 to 5), are distributed across four domains consisting of 5 items each (Score range: 5 to 25) and a fifth domain with only 2 items (Score range: 2 to 10). These are function/activity, pain, self-perceived image, mental health, and satisfaction. The global score is the sum of the scores of all items of the SRS-22r (22 to 110).

Sensibility assessment of the SRS-22r

Sensibility refers to the suitability of the outcome measure for the target population. Sensibility of the SRS-22r was evaluated using Feinstein’s framework, which includes assessment of (I) purpose and framework; (II) overt format; (III) face validity; (IV) content validity; and (V) ease of usage (26-28).

Purpose and framework

The SRS developers clearly intended to construct an outcome measure of HRQL for patients with AIS. The concept of HRQL was based on an implicit understanding of HRQL derived from clinical experience, rather than an a priori conceptual framework of HRQL of AIS. This limitation could lead to potentially incomplete or inaccurate representation of HRQL for this population.

Format and face validity

Face validity refers to whether an instrument appears to be relevant and suitable to measure what it is meant to measure. In the absence of any documented input from patients with AIS, it is difficult to assess how well the domains of the SRS-22r capture the priorities of this population. While the domains make sense, it is unclear how well the items themselves might be aligned with concerns or goals of respondents. We could not find documentation in the literature describing whether the phrasing of the instructions and items was ever evaluated by adolescents with AIS for their comprehension, recognizing the differences between early and late adolescence with respect to reading comprehension and interpretation. The language utilized throughout the SRS-22r has undergone modifications based on psychometric measurement property assessments and has been translated into 12 languages with variable measurement property assessment quality in each translated language (14,21,29). Despite this, some of the language used is medical/technical (e.g., “trunk”, “extremities” and “narcotics”) or uses idioms such as “down in the dumps”, “down hearted and blue”. The time horizon considered for each item is also variable, ranging from “current” to “last month” or “past 6 months”.

The response categories for specific items utilize Likert-like scale response options from 1 to 5. The global rating is reported either as a sum of all items (minimum 22 points, maximum 110 points) or as a mean of all items (from 1 to 5). All items are equally weighted, so four domains (each with 5 items) contribute equally to the overall score, with the fifth domain consisting of only two items focusing on satisfaction following surgery. Since each item contributes equally to the global score, the contribution of each domain to the global score is a function of the number of items in that domain, which may not correspond to the relative importance of each domain.

Content validity

Content validity refers to whether the domains and items are relevant and important to the target respondents, and that important domains and items are not omitted, nor redundant. The content must be derived from (or at least be validated by) the population of interest and may include the perspectives of the experts who care for them.

The SRS-22r domains include function, pain, self-image, mental health, and satisfaction with management. Although these derived from the literature (not directly related to AIS) and from the perspectives of clinician experts, the chosen domains seem appropriate for AIS, and the use of multiple mutually exclusive domains reflects a recognition of the inherent complexity of measuring HRQL. The inclusion of self-image and perception of one’s appearance highlights an essential component of HRQL for adolescents with spinal deformity (30). However, it is essential to ensure or confirm that the individual items within each domain adequately capture the perspective of patients with AIS. There is a significant emphasis on pain, which might reflect the original source of the content from adults and populations with low back pain (18). A potential shortcoming of the SRS-22r is that the items were not derived from patients with AIS. Further, in the literature of the development of the original SRS-I and its subsequent revisions we could not find any documentation that patients with AIS were asked if the items and domains in the instrument were relevant or important to them, nor what was missing. This is corroborated by a recent qualitative study involving in-depth semi-structured interviews of 11 adolescents with AIS, which identified many areas of concern within the themes of physical, activity-related, psychological and social effects, and only a weak association found between items of the SRS-22r and these issues (27). The lack of items capturing some of these domains adequately or at all, may undermine the content validity of the SRS-22r (12,31).

The inclusion of a domain on satisfaction with treatment within a HRQL instrument for AIS has been questioned, as patient satisfaction may be influenced by various factors not directly tied to HRQL. It is well-recognized that patients can be satisfied even when their outcomes are poor, since patient satisfaction is influenced by many factors besides the actual outcome (32,33). Moreover, these items do not apply at baseline (prior to treatment). It isn’t clear how change from pre- to post-treatment scores are reported. If patient satisfaction with treatment (surgery) is important to measure, this should be done separately from the measurement of HRQL which should be limited to the perceived impact of AIS on the respondent’s HRQL at any given time. The actual change in the HRQL following treatment is more important than the subjective and often biased attribution of that change to the treatment, which is what is measured by questions on satisfaction with treatment. It is important to note that subtotal scores excluding this domain have been proposed as a means to better isolate specific HRQL constructs (22,34,35). Further, use of pre-treatment phrasing may be appropriate in both surgical and non-surgical contexts provided that validity is demonstrated.

Feasibility: ease of use

The SRS-22r is intended to be self-administered but has also been shown to be reliable when completed over the phone (36). The SRS-22r questionnaire includes standardized instructions and procedures, which are clearly displayed. The ease of use of the SRS-22r, including the clarity of the instructions, clarity of the questions, appropriateness of the items, suitability of the response options or the burden of response, has not been formally evaluated by target respondents. These are necessary to ascertain as it is possible that adolescents of different ages and developmental stages may interpret questions differently, which may affect reliability.

Measurement properties

Reliability & internal consistency

Reliability is a fundamental property for any outcome measure that ensures that a PROM would generate similar responses with repeated administrations between which there has been no change. Reliability is a prerequisite for validity and responsiveness. If an instrument is not reliable, any change detected over repeated measures could be wrongly attributed to a real change rather than the “change” being a product of random variation or measurement error. In this context, reliability is established by comparing responses of respondents at two time points (test-retest reliability) during which time no change is expected, and to determine whether the responses at both time points are similar. Test-retest reliability is quantified using the intra-class correlation coefficient (ICC) when response options are continuous or by weighted Kappa if the response options are ordinal.

Internal consistency is a form of reliability and represents the degree to which items within a multi-item scale relate to each other. This is typically quantified using Cronbach’s alpha.

In a cross-sectional study involving adolescents (age 8–18 years) with scoliosis (mean largest Cobb of 29.8°±12.3°; range, 10° to 66°) 70 patients completed the initial assessment of the SRS-22r in the out-patient clinic, 54 of whom completed the retest assessment, mailed to them 1 week after initial clinic visit (13). Forty-eight of these patients had unoperated AIS, 3 with AIS had completed surgery, and the remainder had a combination of congenital, syndromic, neuromuscular deformities. The ICC was determined to be 0.73 for the global score. Domain-specific ICC scores ranged from 0.56 (satisfaction domain) to 0.80 (pain). Based on three domains of the SRS-22r with ICC ≥0.75, authors concluded there was excellent test-retest reliability. Domain-specific Cronbach’s alpha measures ranged from 0.71 (self-image) to 0.93 (satisfaction). A Cronbach’s alpha greater than 0.9 may suggest redundancy of items. The average Cronbach’s alpha determined was 0.81.

Although this SRS-22r reliability study included a mixed population of AIS patients as well as those with other etiologies, the authors did report results separately for the subgroup of 48 unoperated AIS patients. The test-retest ICC was 0.71 (ranging from 0.48 to 0.80) and Cronbach’s alpha was 0.79 (ranging from 0.75 to 0.83). This provides evidence of acceptable test-retest reliability and internal consistency of the SRS-22r. These results also demonstrated an improvement of the internal consistency over the original SRS (24-item) scale, providing empirical evidence in support of the decision to modify the original SRS-24 to SRS-22r (10). A recent study highlighted low reliability of the function domain relative to other sub-domains of the SRS-22r, emphasizing caution when utilizing the full SRS-22r score, rather than assessing each sub-component individually (35).

Validity

Validity of a PROM is the property of the instrument to actually measure what (the construct) it intends to measure. There are many types of validity.

Construct validity

Construct validity is evaluated by testing hypotheses of logical relationships that should exist between a measure and characteristics of patients or patient groups; these hypotheses are often informed by domain knowledge and known associations (37-39).

Factor analysis was used in the derivation of the original 24-item outcome measure which yielded 7 distinct original domains (11). This original study used known-group comparisons to assess construct validity, hypothesizing higher SRS scores among healthy controls without scoliosis compared with patients with scoliosis. They found significantly higher SRS scores across all domains in healthy controls compared with the scoliosis group, confirming their hypothesis that the SRS (24-item) would be able to discriminate between known groups (with and without scoliosis).

A study assessed structural construct validity in a cohort of AIS patients operated previously by a single surgeon at a single institution through mailed assessments (40). Of the 235 surveyed patients, 121 (51%) completed and returned the questionnaire. An iterated principal component factor analysis was performed to assess the structure of the item pool. The factor analysis revealed that three domains contributed to the majority of observed variance, namely pain, mental health, and self-image. There were no pre-specified hypotheses, though the implicit hypothesis was domain convergence with the previously identified domains outlined in the original SRS-24, which was what they found.

Asher et al. assessed the discriminant validity of SRS-22r based on patient curve magnitude and treatment modality. Their study involved three groups of patients aged 10–16 years: control group without scoliosis (cobb angle <10 degrees), patients with scoliosis undergoing non-surgical treatment (bracing or observation alone), and presurgical scoliosis patients. They demonstrated that the SRS-22r did not discriminate between non-surgical patients with average scoliosis Cobb angles of 27 degrees compared with controls without scoliosis (23). However, the SRS-22r did discriminate non-surgical and control patients from pre-surgical patients, with lower (worse) pain and self-image domain scores in the latter. There was a negative correlation between the global SRS scores and magnitude of spinal deformity as measured by the Cobb angle and angle of trunk inclination [Pearson correlation coefficient (r): −0.48 and −0.39, respectively].

Concurrent construct validity of the SRS-22r was assessed in a cross-sectional study of a cohort of AIS patients aged 10–17 years treated at a single academic center (41). In this study, 200 participants (across observation, bracing, pre-surgical and post-surgical groups) completed the SRS-22r concurrently along with 8 modules of the Patient-Reported Outcomes Measurement Information System (PROMIS) computerized adapted testing (CAT) measure (PROMI-CAT: physical activity, mobility, anxiety, depressive symptoms, peer relationships, physical stress experiences, pain behaviour and pain interference) (42). Known-group comparisons were used to assess construct validity and convergent validity between PROMIS domains and SRS-22r domains. All tested domains were moderately to strongly correlated with the exception of the physical domain (PROMIS) to function domain (SRS-22r) and peer relationships domain (PROMIS) to the self-image domain (SRS-22r), which were weakly correlated.

Responsiveness

Responsiveness is the ability of an outcome measure to detect change in a construct when change has occurred. Floor and ceiling effects describe the ability to distinguish respondents at extremes of a measurement scale, which is an important consideration for responsiveness (43). Floor and ceiling effects of SRS-22r were measured in early research, in which adolescents more frequently reported ceiling responses (20–44% frequency across domains) compared to older patients. Floor responses were not as commonly encountered, with only a 5.8% occurrence of floor responses in the satisfaction domain among adolescents (14). A secondary study identified only floor effects for self-image (<7%) and satisfaction (<12%), while pain and satisfaction exhibited moderate (>20%) ceiling effects (34). These floor and ceiling effects may stem from the lack of adolescent involvement in the item generation phase as well as limited input regarding item phrasing.

Several studies have examined longitudinal changes in HRQL using the SRS-22r in the context of known or presumed efficacy—where radiographic deformity correction is explicitly measured and HRQL is compared across these change groups. In 2003, Asher et al. utilized the SRS-22 in a consecutive prospective observational case series of 58 AIS patients undergoing instrumented spinal fusion (mean Cobb angle 63°) (22). Patients were assessed at pre-operative baseline and then at 6, 12 and 24 months post-operatively. Paired t-tests with adjustment for multiple comparisons revealed significant improvements of self-image throughout all follow-up intervals. Function decreased significantly at 3 months but normalized at 6 months onwards. Similarly, pain scores were worse than pre-operative baseline at 3 months but improved compared to baseline by later follow-up periods. Taken together, they concluded the SRS-22 questionnaire was responsive to change, but most useful beyond 3 months.

In a prospective multi-center study of AIS patients, the pain-specific SRS-22r subscale was used to assess change in pain from pre- compared to post-operative period (n=505) (44). Patients with painful scoliosis improved from a mean SRS-22r pain score of 3.29 to 4.03 after surgery; there was greater improvement in overall SRS-22r composite score at 2 years for patients with painful scoliosis compared to those without pain. Another prospective evaluation of HRQL for AIS patients following treatment (surgery only) demonstrated significant improvement in post-surgical HRQL scores at 12 months following spine fusion (45). Another retrospective multicenter database study corroborated these findings suggesting that among 99 patients undergoing posterior spinal fusion for AIS, there was significant improvement in total SRS-22r scores at 2 and 5 years (46). This notably correlated with mean percentage of curve correction of 74% from 51.7°±14.2° to 13.7°±6.3°. The data do suggest measurable differences across patient reported HRQL following AIS deformity correction. However, the lack of pre-specified expected effect size makes it unclear whether the significant change measured exceeded minimal clinically important difference (MCID). Study designs could be improved through use of a general change anchor, such as addition of an item asking patients about the overall change patients experienced in their scoliosis-associated quality of life.

Interpreting change is an important aspect of measurement property assessment because it can inform clinicians about what magnitude of response matters after intervention. The minimum detectable measurement difference (MDMD) is the smallest difference that exceeds measurement error, whereas the MCID represents the smallest difference that is clinically relevant. The MDMD must be smaller than the MCID for a usable MCID threshold. A multicenter retrospective registry study of AIS patients (n=1,281) treated with surgery was used to estimate the MDMD (47). Standardized response means were used to determine responsiveness and MDMD was estimated for overall score and domain sub scores. The resulting MDMD values range from 0.23–0.31; the pain specific MDMD was 0.3, which exceeded a previously estimated MCID threshold of 0.2 for this sub-domain; activity domain MDMD was 0.24, which also exceeded the previously estimated value of 0.08 (48). This work suggests that interpretability remains a challenge using the SRS-22r, whereby previously proposed MCID thresholds are exceeded by the MDMD, suggesting the latter may be more relevant threshold values.

Table 2 summarizes key psychometric terminology used throughout and the corresponding evidence for each concept.

Table 2

Psychometric property descriptions with list of citations used to write each corresponding review section

Psychometric property	Definition	References used
Sensibility	Sensibility is the “common sense” of an instrument. This represents an aggregate of properties such as face and content validity, format and ease of usage that collectively assess the suitability of an outcome measure for the target population	(23-25)
Face validity	Face validity is the extent to which the instrument appears to measure the target construct. This judgement is usually made by users and experts	(13,22,26)
Content validity	Refers to whether the instrument captures all relevant domains and items important to the target construct. This is often based on review by respondents and experts	(27-29)
Reliability & internal consistency	A reliable instrument produces stable and consistent results over repeated measures, such that observed differences are attributable to changes in the underlying construct rather than measurement error. Internal consistency is a form of reliability and is the degree to which items in a multi-item scale relate to one another	(10,12)
Construct validity	Construct validity is assessed through hypotheses using known relationships or comparison to existing measures. This is the psychometric property assessing whether the instrument measures the intended theoretical construct	(11,33-39)
Responsiveness	The ability for an instrument to detect meaningful change over time when true clinical changes occur. This concept often includes floor and ceiling effects	(13,40-47)

Conclusions

We sought to critically appraise the widely used SRS-22r, focusing on the English version. The SRS questionnaires have facilitated decades of AIS research focused on PROMs and led to contributions that have informed modern operative and non-operative practice. A comprehensive list of strengths and weaknesses are summarized in Table 3. We could not find any evidence that the perspectives and input of AIS patients were incorporated in the direct development of the SRS-I including item generation or reduction phases. This affects the face and content validity, comprehensibility and interpretation of the instrument. Absence of a conceptual framework also reduced transparency of original item generation, which could have led to incomplete or inaccurate representation of HRQL for this population (49). Other limitations of the SRS-22r include floor and ceiling effects, which appear to be more pronounced in adolescent cohorts (compared to adults). This may limit ability to detect change in the self-image domain specifically.

Table 3

Summary of key strengths and weaknesses stemming from this critical appraisal of the SRS-22r

Strengths	Weaknesses
HRQL should be a multidimensional concept to measure. The SRS-22r includes multiple domains including function/activity, pain, self-perceived image, mental health and satisfaction; together this reflects the complexity of assessing HRQL and enhances the measure’s face validity	General lack of target patient population involvement throughout initial development phase and subsequent modifications. Despite face validity, there is potential that critical components of AIS-related HRQL may have been omitted. Construct and content validity may be compromised by this limitation
Acceptable internal consistency and test-retest reliability in most domains	Presence of floor and ceiling effects may limit the ability to detect change in some domains
Validation and psychometric testing have been performed across multiple languages using patients recruited from multiple cultural backgrounds	The inclusion of a satisfaction (with treatment) domain within a measure of HRQL is not appropriate. It has the potential to introduce biased responses. These items are not applicable at baseline prior to intervention, or for patients being followed over time (without treatment) for their natural history. Satisfaction with treatment is not a good measure of treatment effectiveness as it is influenced by many factors beyond treatment outcomes. Treatment effectiveness based on HRQL should be based on a change in HRQL scores that exceeds the threshold of MCID that needs to be defined. If patient satisfaction is an important domain, it should be measured separately from HRQL. Levels of satisfaction can then be used as an external anchor to evaluate he MCID of the HRQL measure
Responsive to change particularly in AIS patients that undergo surgical intervention	Difficulty distinguishing changes in patients with mild or moderate AIS undergoing conservative treatments. This may be a consequence of limited representation of this patient group involved in development of the instrument

AIS, adolescent idiopathic scoliosis; HRQL, health related quality of life; MCID, minimal clinically important difference; SRS-22r, Scoliosis Research Society-22 revised.

Conversely, there are notable strengths including extensive psychometric examination, multi-dimensional utility when sub-domains are reported independently, evidence of construct validity demonstrated by the correlation with different PROMIS domain scores, and the ability to discriminate between known groups of AIS patients. This was particularly true of patients undergoing surgical intervention, but less so in patients undergoing non-surgical treatment where the ability to detect differences compared to controls was limited (possibly related to satisfaction item susceptibility to affirmation bias for operated patients). A remaining challenge is the inability to distinguish patients with mild from moderate scoliosis, though distinguishing severe scoliosis is more feasible. Further work is also required to clarify clinically important differences to aid the interpretability of change scores.

Although the SRS-22r has had extensive psychometric property evaluation and remains the current choice for assessing patient reported outcomes of treatment for AIS patients, its limitations are important to outline for practitioners and researchers alike. These include the choice of items, response options, scoring, weak discriminant validity and problems with interpretability of change scores. There may be elements of AIS related HRQL that are missing or inaccurately represented according to the target patient population. The SRS-22r remains the gold standard instrument to evaluate HRQL for AIS patients; however, we could also benefit from the development of an instrument that is informed by the lived experiences of a representative spectrum of patients with AIS, their parents, with room for additional input from clinical experts.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the Narrative Review reporting checklist. Available at https://jss.amegroups.com/article/view/10.21037/jss-25-54/rc

Peer Review File: Available at https://jss.amegroups.com/article/view/10.21037/jss-25-54/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jss.amegroups.com/article/view/10.21037/jss-25-54/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Weinstein SL, Dolan LA, Cheng JC, et al. Adolescent idiopathic scoliosis. Lancet 2008;371:1527-37. [Crossref] [PubMed]
Stone LE, Upasani VV, Pahys JM, et al. SRS-22r Self-Image After Surgery for Adolescent Idiopathic Scoliosis at 10-year Follow-up. Spine (Phila Pa 1976) 2023;48:683-7. [Crossref] [PubMed]
Gallant JN, Morgan CD, Stoklosa JB, et al. Psychosocial Difficulties in Adolescent Idiopathic Scoliosis: Body Image, Eating Behaviors, and Mood Disorders. World Neurosurg 2018;116:421-432.e1. [Crossref] [PubMed]
Cheng JC, Castelein RM, Chu WC, et al. Adolescent idiopathic scoliosis. Nat Rev Dis Primers 2015;1:15030. [Crossref] [PubMed]
Agabegi SS, Kazemi N, Sturm PF, et al. Natural History of Adolescent Idiopathic Scoliosis in Skeletally Mature Patients: A Critical Review. J Am Acad Orthop Surg 2015;23:714-23. [Crossref] [PubMed]
Addai D, Zarkos J, Bowey AJ. Current concepts in the diagnosis and management of adolescent idiopathic scoliosis. Childs Nerv Syst 2020;36:1111-9. [Crossref] [PubMed]
Seleviciene V, Cesnaviciute A, Strukcinskiene B, et al. Physiotherapeutic Scoliosis-Specific Exercise Methodologies Used for Conservative Treatment of Adolescent Idiopathic Scoliosis, and Their Effectiveness: An Extended Literature Review of Current Research and Practice. Int J Environ Res Public Health 2022;19:9240. [Crossref] [PubMed]
Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis 1985;38:27-36. [Crossref] [PubMed]
Théroux J, Le May S, Fortin C, et al. Prevalence and management of back pain in adolescent idiopathic scoliosis patients: A retrospective study. Pain Res Manag 2015;20:153-7. [Crossref] [PubMed]
Narayanan U, Wright J, Hedden D, et al. Concerns, desires & expectations of surgery for adolescent idiopathic scoliosis: a comparison of patients’, parents’ and surgeons’ perspectives. Orthop Procs 2008;90-B:81.
Haher TR, Gorup JM, Shin TM, et al. Results of the Scoliosis Research Society instrument for evaluation of surgical outcome in adolescent idiopathic scoliosis. A multicenter study of 244 patients. Spine (Phila Pa 1976) 1999;24:1435-40. [Crossref] [PubMed]
Alamrani S, Gardner A, Falla D, et al. Content validity of the Scoliosis Research Society questionnaire (SRS-22r): A qualitative concept elicitation study. PLoS One 2023;18:e0285538. [Crossref] [PubMed]
Glattes RC, Burton DC, Lai SM, et al. The reliability and concurrent validity of the Scoliosis Research Society-22r patient questionnaire compared with the Child Health Questionnaire-CF87 patient questionnaire for adolescent spinal deformity. Spine (Phila Pa 1976) 2007;32:1778-84. [Crossref] [PubMed]
Asher MA, Lai SM, Glattes RC, et al. Refinement of the SRS-22 Health-Related Quality of Life questionnaire Function domain. Spine (Phila Pa 1976) 2006;31:593-7. [Crossref] [PubMed]
Haher TR, Merola A, Zipnick RI, et al. Meta-analysis of surgical outcome in adolescent idiopathic scoliosis. A 35-year English literature review of 11,000 patients. Spine (Phila Pa 1976) 1995;20:1575-84. [Crossref] [PubMed]
Malhotra AK, He Y, Harrington EM, et al. Development of the cervical myelopathy severity index: a new patient reported outcome measure to quantify impairments and functional limitations. Spine J 2024;24:424-34. [Crossref] [PubMed]
Simmons ED Jr, Kowalski JM, Simmons EH. The results of surgical treatment for adult scoliosis. Spine (Phila Pa 1976) 1993;18:718-24. [Crossref] [PubMed]
Waddell G, Reilly S, Torsney B, et al. Assessment of the outcome of low back surgery. J Bone Joint Surg Br 1988;70:723-7. [Crossref] [PubMed]
White SF, Asher MA, Lai SM, et al. Patients' perceptions of overall function, pain, and appearance after primary posterior instrumentation and fusion for idiopathic scoliosis. Spine (Phila Pa 1976) 1999;24:1693-9; discussion 1699-700. [Crossref] [PubMed]
Asher MA, Min Lai S, Burton DC. Further development and validation of the Scoliosis Research Society (SRS) outcomes instrument. Spine (Phila Pa 1976) 2000;25:2381-6. [Crossref] [PubMed]
Asher M, Min Lai S, Burton D, et al. The reliability and concurrent validity of the scoliosis research society-22 patient questionnaire for idiopathic scoliosis. Spine (Phila Pa 1976) 2003;28:63-9. [Crossref] [PubMed]
Asher M, Min Lai S, Burton D, et al. Scoliosis research society-22 patient questionnaire: responsiveness to change associated with surgical treatment. Spine (Phila Pa 1976) 2003;28:70-3. [Crossref] [PubMed]
Asher M, Min Lai S, Burton D, et al. Discrimination validity of the scoliosis research society-22 patient questionnaire: relationship to idiopathic scoliosis curve pattern and curve size. Spine (Phila Pa 1976) 2003;28:74-8. [Crossref] [PubMed]
Bago J, Climent JM, Ey A, et al. The Spanish version of the SRS-22 patient questionnaire for idiopathic scoliosis: transcultural adaptation and reliability analysis. Spine (Phila Pa 1976) 2004;29:1676-80. [Crossref] [PubMed]
Alanay A, Cil A, Berk H, et al. Reliability and validity of adapted Turkish Version of Scoliosis Research Society-22 (SRS-22) questionnaire. Spine (Phila Pa 1976) 2005;30:2464-8. [Crossref] [PubMed]
Feinstein AR. Clinimetrics. New Haven: Yale University Press; 1987.
Bombardier C, Tugwell P. A methodological framework to develop and select indices for clinical trials: statistical and judgmental approaches. J Rheumatol 1982;9:753-7.
Buchbinder R, Goel V, Bombardier C, et al. Classification systems of soft tissue disorders of the neck and upper limb: do they satisfy methodological guidelines? J Clin Epidemiol 1996;49:141-9. [Crossref] [PubMed]
Scoliosis Research Society. SRS22r | Patient Outcome Questionnaires 2023 [cited 2024]. Available online: https://www.srs.org/Research/Patient-Outcome-Questionnaires
Belli G, Toselli S, Latessa PM, et al. Evaluation of Self-Perceived Body Image in Adolescents with Mild Idiopathic Scoliosis. Eur J Investig Health Psychol Educ 2022;12:319-33. [Crossref] [PubMed]
Tetreault TA, Garg S. Return to play following spine surgery. Front Pediatr 2023;11:1176563. [Crossref] [PubMed]
Tevis SE, Kennedy GD, Kent KC. Is There a Relationship Between Patient Satisfaction and Favorable Surgical Outcomes? Adv Surg 2015;49:221-33. [Crossref] [PubMed]
Lyu H, Wick EC, Housman M, et al. Patient satisfaction as a possible indicator of quality surgical care. JAMA Surg 2013;148:362-7. [Crossref] [PubMed]
Parent EC, Dang R, Hill D, et al. Score distribution of the scoliosis research society-22 questionnaire in subgroups of patients of all ages with idiopathic scoliosis. Spine (Phila Pa 1976) 2010;35:568-77. [Crossref] [PubMed]
Oeffinger DJ, Iwinski H, Talwalkar V, et al. Psychometric analysis and the implications for the use of the scoliosis research society questionnaire (SRS-22r English) for individuals with adolescent idiopathic scoliosis. N Am Spine Soc J 2024;19:100545. [Crossref] [PubMed]
Bokshan SL, Godzik J, Dalton J, et al. Reliability of the revised Scoliosis Research Society-22 and Oswestry Disability Index (ODI) questionnaires in adult spinal deformity when administered by telephone. Spine J 2016;16:1042-6. [Crossref] [PubMed]
Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010;19:539-49. [Crossref] [PubMed]
Streiner DL, Norman GR, Cairney J. Health Measurement Scales: A practical guide to their development and use. 5th edition. Oxford University Press; 2014 01 Jan 2015.
Malhotra AK, Nathens AB, Shakil H, et al. Days at Home After Traumatic Brain Injury: Moving Beyond Mortality to Evaluate Patient-Centered Outcomes Using Population Health Data. Neurology 2024;103:e209904. [Crossref] [PubMed]
Lai SM, Asher MA, Burton DC, et al. Identification of Scoliosis Research Society-22r Health-Related Quality of Life questionnaire domains using factor analysis methodology. Spine (Phila Pa 1976) 2010;35:1236-40. [Crossref] [PubMed]
Mitchell SL, McLaughlin KH, Bachmann KR, et al. Construct Validity of Pediatric PROMIS Computerized Adaptive Testing Measures in Children With Adolescent Idiopathic Scoliosis. J Pediatr Orthop 2022;42:e720-6. [Crossref] [PubMed]
Cella D, Riley W, Stone A, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008. J Clin Epidemiol 2010;63:1179-94. [Crossref] [PubMed]
Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:34-42. [Crossref] [PubMed]
Djurasovic M, Glassman SD, Sucato DJ, et al. Improvement in Scoliosis Research Society-22R Pain Scores After Surgery for Adolescent Idiopathic Scoliosis. Spine (Phila Pa 1976) 2018;43:127-32. [Crossref] [PubMed]
Pellegrino LN, Avanzi O. Prospective evaluation of quality of life in adolescent idiopathic scoliosis before and after surgery. J Spinal Disord Tech 2014;27:409-14. [Crossref] [PubMed]
Hwang SW, Samdani AF, Marks M, et al. Five-year clinical and radiographic outcomes using pedicle screw only constructs in the treatment of adolescent idiopathic scoliosis. Eur Spine J 2013;22:1292-9. [Crossref] [PubMed]
Kelly MP, Lenke LG, Sponseller PD, et al. The minimum detectable measurement difference for the Scoliosis Research Society-22r in adolescent idiopathic scoliosis: a comparison with the minimum clinically important difference. Spine J 2019;19:1319-23. [Crossref] [PubMed]
Carreon LY, Sanders JO, Diab M, et al. The minimum clinically important difference in Scoliosis Research Society-22 Appearance, Activity, And Pain domains after surgical correction of adolescent idiopathic scoliosis. Spine (Phila Pa 1976) 2010;35:2079-83. [Crossref] [PubMed]
Staffini A, Fujita K, Svensson AK, et al. Statistical Methods for Item Reduction in a Representative Lifestyle Questionnaire: Pilot Questionnaire Study. Interact J Med Res 2022;11:e28692. [Crossref] [PubMed]

Cite this article as: Malhotra AK, Shakil H, Lozano CS, Karthikeyan V, Dermott JA, Wilson JR, Narayanan UG, Lebel DE. Measuring meaningful outcomes for adolescent idiopathic scoliosis: a narrative review and critical appraisal of the Scoliosis Research Society-22 revised (SRS-22r) instrument. J Spine Surg 2025;11(3):698-708. doi: 10.21037/jss-25-54

Measuring meaningful outcomes for adolescent idiopathic scoliosis: a narrative review and critical appraisal of the Scoliosis Research Society-22 revised (SRS-22r) instrument

Introduction

Methods

Table 1

Key content and findings

Overview of the SRS-22r

Development of the SRS-22r: item generation, item reduction

Modifications leading to the contemporary SRS-22r

Sensibility assessment of the SRS-22r

Purpose and framework

Format and face validity

Content validity

Feasibility: ease of use

Measurement properties

Reliability & internal consistency

Validity

Construct validity

Responsiveness

Table 2

Conclusions

Table 3

Acknowledgments

Footnote

References

Article Options

Download Citation

Share