Twenty-four month follow-up for reporting results of spinal implant studies: Is this guideline supported by the literature?
Article Outline
Abstract
Background
Traditionally, spine societies and journals have set guidelines requiring a minimum 24-month follow-up for reporting results of surgical implant studies. However, the basis for this particular time period is not clear. The purpose of this study was to analyze prospective spinal implant studies reporting data at multiple specific follow-up periods to determine if there were significant changes in the clinical outcome throughout the 24-month follow-up period.
Methods
A comprehensive literature search was conducted using PubMed as well as searching the FDA web page. Studies were evaluated to identify those meeting the inclusion criteria: involved at least 100 patients receiving a spinal implant with data reported at multiple pre-defined time periods post-operatively for at least 24-months. Data recorded from each study included, number of patients, diagnoses, implant used, outcome measures used, and the results reported. The primary outcome data were analyzed in the current study to determine the amount of change in scores, with particular focus on the six and 24-month follow-up periods.
Results
Only 7 studies met the inclusion criteria. All seven studies were FDA-regulated trials published since 1997. Six addressed the treatment of symptomatic disc degeneration and 1 involved patients with neurogenic claudication due to stenosis. The outcome measures in the studies varied but pain and function were frequently assessed. In none of the studies was there a significant deterioration in results between the 6 and 24-month follow-up periods. In fact, the only changes during the follow-up periods were slight, not statistically significant, improvements, with the exception of 1 scale in 1 study where a slight, not statistically significant, decrease in the extent of improvement on a physical function assessment was noted between 6 and 24 months. These results suggest a great deal of stability in the mean scores for various outcome measures between the 6 and 24 months in patients receiving spinal implants.
Conclusions
Although long-term follow-up is certainly desirable for any clinical outcome study, there appears to be no significant change in outcome measures between the 6-month and 24-month follow-ups. These results support that earlier dissemination of results may be appropriate without producing overly-optimistic reports.
Keywords: Clinical outcome, Spinal implants, Prospective studies
Traditionally, spine societies and journals have set guidelines requiring a minimum 24 months follow-up for evaluating results of surgical implants. However, the basis for this guideline is not clear, although it is sometimes attributed to mirror the frequently-employed guidelines set by the FDA (Food and Drug Administration) for clinical trials evaluating implants. No documentation could be found providing the rationale for the 24-month follow-up requirement. This requirement typically dictates that a device must have been introduced approximately 4.5 years (1 to 2 years to enroll an adequate number of patients, 2 years to follow them, and time to analyze results and complete the abstract/manuscript submission and review process) prior to the dissemination of data on the performance of a new device. With the rapidly increasing number of spinal treatments being evaluated and developed, there is a very high level of interest among clinicians and patients in receiving information in the outcomes from these emerging treatments as soon as possible.
The purpose of this study was to analyze prospective studies in the literature that have reported data at specific, multiple time periods following implantation of spinal devices to determine if there were significant changes in the results during 24-month follow-up.
Methods
A comprehensive literature search was conducted using PubMed and submissions of Investigational Device Exemption (IDE) clinical trial data to the U.S. Food and Drug Administration (FDA). Only studies meeting the following criteria were included in the analysis:
Data recorded from each study included the number of patients, diagnoses, implant(s) used, outcome measures, and results. The data were reviewed with the primary focus being changes in the outcome measures over time, particularly the 6- and 24-month follow-up periods. In addition to clinical outcome, the other important factor in evaluating new technologies is safety. In this study, we reviewed the articles to determine if any change in the incidence of complications could be identified during the 24-month follow-up.
Results
Studies included in the review
Only 7 articles met the inclusion criteria for this study.1, 2, 3, 4, 5, 6, 7 The implants investigated in the studies were the BAK cage (Sulzer Spine-Tech, Minneapolis, Minnesota), Ray Threaded Fusion Cage (Surgical Dynamics, Norwalk, Connecticut), Brantigan I/F Cage (DePuy–Acromed Corp., Raynham, MA), InFUSE Bone Graft (Medtronic Sofamor Danek, Memphis, Tennessee), CHARITÉ Artificial Disc (DePuy Spine, Rayham, Massachusetts), ProDisc-L Total Disc Replacement (Synthes Spine, West Chester, Pennsylvania), and the X-STOP (St. Francis Medical Technologies, Concord, California). All 7 studies were FDA-regulated multicenter clinical trials published since 1997.
Table 1 provides an overview of the studies included in this review. The number of patients enrolled ranged from 191 to 947. Symptomatic disc degeneration was the primary diagnosis in six of the studies.1, 2, 3, 4, 5, 7 In one study, patients were treated for neurogenic claudication due to stenosis.6 Four studies were randomized.1, 3, 6, 7 The treatments used in the control groups included anterior fusion using cages packed with autograft,1, 3 circumferential fusion,7 and nonoperative management.6
Table 1. Description of the studies included in the analysis
| Author | Implant | Diagnosis | Control | N | Primary outcome measures used |
|---|---|---|---|---|---|
| Kuslich1 | BAK cages | Symptomatic disc degeneration | None | 947 | Modified Prolo Scale |
| Ray2 | Ray threaded fusion cages | Symptomatic disc degeneration | None | 236 | Prolo Scale |
| Brantigan3 | Brantigan I/F cages | Symptomatic disc degeneration with failed discectomy | None | 221 | 5 point Likert scale for pain |
| Burkus4 | InFUSE bone graft in tapered cages | Symptomatic disc degeneration | Autogenous iliac crest bone graft in tapered cages | 143 Infuse; 136 autograft | ODI, back pain, leg pain, patient satisfaction, work status |
| Blumenthal5 | CHARITÉ Artificial Disc | Symptomatic disc degeneration | ALIF (BAK cages with iliac crest bone graft) | 205 CHARITÉ; 99 fusion | VAS, ODI |
| Zigler6 and Synthes Spine7 (on FDA website) | ProDisc-L Total Disc Replacement | Symptomatic disc degeneration | Combined anterior/posterior instrumented fusion | 162 ProDisc-L; 80 fusion | ODI, VAS, satisfaction |
| Zucherman8 | X-STOP | Neurogenic intermittent claudication due to stenosis | Non-operative care | 100 X-STOP; 91 control | Zurich Claudication Questionnaire |
Clinical outcome
The outcome measures varied in the seven studies, but pain and/or function were generally the parameters used to assess results. The outcome measures for this review study were the changes between the scores reported between the 6- and 24-month follow-up periods. In none of the seven studies was there significant deterioration in results between the 6- and 24-month follow-up periods. In fact, the only change between 6 months and 24 months was a slight improvement, though not statistically significant, with the exception of a physical function assessment where the percentage improvement decreased slightly between these two follow-up periods. Each study is reviewed in greater detail.
In the Kuslich et al. study, pain was measured on a 1 to 6 Modified Prolo Scale.4 There was a 0.4 improvement in the mean scores between the 6- and 24-month follow-up visits (Fig. 1). In a later study of the same device, the 4-year results were presented on a subgroup of patients.8 This longer follow-up data indicated that there was no significant change in scores between the 24- and 48-month mean scores.

Fig. 1.
Pain was measured on a 1–6 point Modified Prolo Scale. A slight improvement in pain scores was noted between 6- and 24-month follow-up periods.
(Adapted with permission from Kuslich et al.4)
In the Ray study, results were reported in terms of the percentage of patients with excellent, good, or fair results at various follow-up periods.5 The percentage of patients classified as having a favorable outcome increased 14% on the pain assessment and 8% on the functional assessment between 6 and 24 months (Fig. 2).

Fig. 2.
Pain and function were assessed using Prolo scales and results reported as the percentage of patients classified as having excellent, good, or fair outcome. The percentage of patients increased 14% on the pain assessment and 8% on the functional assessment between 6 and 24 months.
(Graph generated from data published by Ray.5)
In the Brantigan et al. study, pain was assessed using a 1 to 5 point Likert scale, with greater scores indicating less pain.2 The mean scores changed by only 2.7% (improvement) between the 6- and 24-month follow-up visits (Fig. 3).

Fig. 3.
Pain scores were assessed on 1–5 point Likert Scale. Greater scores indicate less pain. There was a slight improvement (2.7%) in mean pain scores between the 6- and 24-month follow-up visits.
(Graph generated from data published by Brantigan et al.2)
In the study investigating the use of rhBMP-2 in tapered cages, back and leg pain were assessed separately, each on a 20 point scale evaluating pain intensity and duration.3 Pain and Oswestry scores stabilized after six months (Fig. 4).The mean back pain scores improved slightly (1.3 in the treatment group and 1.0 in the control group on a scale of 0 to 20) between 6- and 24-month follow-up. The leg pain scores did not change between these 2 follow-up periods. Oswestry scores changed only 5.4 (on a scale of 0 to 100). This study also reported the percentage of patients working at the various follow-up periods. At 6 months, 50.7% of the BMP group and 45.5% of the control group were working. At 24 months, these figures improved to 66.1% and 56.1%, respectively. A recent study reporting 6-year outcome in this group found that there were no significant changes in outcome scores between the 2-year and 6-year follow-up.9 The authors stated that improvements in mean scores noted at 6 weeks were maintained at the 6-year follow-up. Data comparing specifically the 6-month to 6-year results were not available from the paper.

Fig. 4.
(A) The only changes in the mean back pain scores between the 6- and 24-month follow-up were improvements of 1.3 (on a scale of 20) in the BMP group and 1.0 in the control group. (B) There were no changes in the mean leg pain scores when comparing the 6- and 24-month follow-up values in the investigational or control group. (C) Mean Oswestry scores stabilized in both groups 6 months after surgery. Scores improved 5.4 in the the BMP group and 5.6 in the control group between 6 months and 24 months.
(Graphs generated from data published by Burkus et al.3)
In the Blumenthal et al. study, visual analogue scale (VAS) and Oswestry (ODI) scores were stable between 6 and 24 months in both the total disc replacement (TDR) and fusion groups (Fig. 5).1 In the TDR group, the mean VAS score improved only 1.9% and the Oswestry scores improved 1.2% between the 6- and 24-month follow-up periods. The fusion group had slightly greater improvements during this time frame with the VAS scores improving 6.4% and the mean Oswestry scores improving 5.3%. In a recent report on the 5-year follow-up of patients enrolled in this study, no changes in outcome were found in the treatment or control group between the 24-month and 60-month follow-up scores on the Oswestry or VAS.10

Fig. 5.
(A) In the TDR group, there was a 1.2% change between the 6-month and 24-month Oswestry scores. In the fusion group, the scores improved 5.3% between these two time periods. (B) In the TDR group, there was a 1.9% change between the 6-month and 24-month VAS pain scores. In the fusion group, the scores showed an improvement of 6.4% between these follow-up periods.
(Graphs generated from data published by Blumenthal et al.1)
In the FDA submission data for the other TDR trial, the mean Oswestry scores improved by approximately 4% in the TDR and fusion groups between 6 and 24 months postoperative (Fig. 6A).11 The VAS and satisfaction scores appeared to remain stable (Fig. 6B), although the numerical data were not available to calculate the actual change in scores between 6- and 24-month follow-up for these two outcome measures.7

Fig. 6.
(A) The mean Oswestry scores improved by approximately 4% between 6 and 24 months in both the TDR and fusion groups. (B) The mean VAS pain scores changed only slightly between 6 months to 24 months in both the ProDisc and fusion groups.
(A) (Created based on data from reference 11). (B) (Adapted from Zigler et al.7)
In the interspinous device study, the symptom severity scores remained relatively unchanged between 6 and 24 months in both the investigational and control groups (Fig. 7A).6 A slight decrease in the percentage improvement in physical function scores was observed at 24 months in both treatment groups (Fig. 7B). These changes were not statistically significant.

Fig. 7.
(A) Outcome was reported as the percent change in the preoperative to postoperative scores at the various time periods on the symptom severity outcome assessment. There were no significant changes in scores between 6 and 24 months in either the interspinous device or control groups. (B) Based on the physical function scores, there were no significant differences between the 6 and 24 month time points for either the interspinous device or control groups, although some decrease in the percentage improvement was seen in both groups.
(Adapted from Zucherman et al.6)
Complications
Device safety is paramount in evaluating new technologies. Several postoperative device-related problems were reported in the studies reviewed, but their time frames were not reported.
Results from a survival analysis of a TDR device based on data for almost 2,000 patients from 8 international sites were reported.12 The analysis covered a span of 60 months. The survival rate was 93% with a rate of at least 90% at each of the individual centers. The authors also noted that the majority of reoperations occurred during the first 24 months. The rate between 6 and 24 months could not be compared based on the data in the abstract.
Discussion
This study found that there were no significant differences in outcome measures between the 6 and 24-month follow-up evaluations in studies dealing with lumbar spinal implants. The outcome measures used in these studies included in the review varied. While this is typically a weakness in review and meta-analysis studies, it was actually a strength in the current study. Regardless of the outcome measure used, the scores were stable between the 6- and 24-month follow-up visits. This supports the generalizability of the finding. In most studies there was a slight, not statistically significant, improvement between the 6- and 24-month follow-up visits. In only the interspinous device study was there a diminution in the percentage improvement in the outcome measures; however, this change was not significant. These findings supports that there was no worsening of scores during the longer follow-up, suggesting that the 24-month results were at least as good at the 6-month values.
The reason for the stability in the scores could not be determined from the data presented in the studies reviewed. There are two possibilities. First, the data may be stable for each patient. That is, care providers could feel relatively comfortable that the patient's condition at 6 months after surgery will remain stable during future follow-up visits. The other possibility is that when analyzing a group of patients there are compensatory changes among patients. That is, some patients improve while others worsen. These compensatory changes could produce mean values similar to those that would be produced by individual patients stabilizing early in the study. Investigating which of these scenarios occurs in the studies would require analyzing changes in scores for each patient across time. Such data are not available from the literature. However, such work is currently underway at our center to determine if the stability over time is due to each patient's scores remaining relatively stable or if there tends to be compensatory improvement and worsening between patients that produces stable mean values.
The data analyzed for this study came from studies evaluating patients undergoing implantation of a spinal device for the treatment of symptomatic degenerative spinal conditions. The results of this study found that in prospective clinical trials evaluating lumbar spinal implants, there is little change in the mean outcome scores following the 6-month follow-up period. Dissemination of early results, positive or negative, could help guide decision making. It may also provide information earlier to those designing the next generation of implants to address any problems with those currently under evaluation.
One important factor that could not be addressed in this study was comparing the occurrence of device-related complications or reoperations between the 6-month and 24-month follow-up periods. While such events were reported in the reviewed studies, the timing of the events was not reported and thus temporal comparison, which is the focus of this paper, could not be made. Of note, follow-up of greater than 5 years available for some of the devices included in this study have not identified a significant increase in device failure beyond 2-year follow-up.8, 9, 10, 12, 13, 14, 15
Of course, long-term follow-up is desirable and important for ongoing assessment of implants. The results of this study, reviewing outcomes from a variety of devices and using a variety of outcome assessments, found no significant changes in outcome scores between the 6-month and 24-month follow-up periods. Longer follow-up for some of these devices has not identified significant problems with device failure in 5 to more than 10 years.8, 9, 10, 12, 13, 14, 15 The results of this review study support that earlier dissemination of results of new implants may be acceptable without producing overly optimistic reports.
References
- A prospective, randomized, multicenter Food and Drug Administration Investigational Device Exemption study of lumbar total disc replacement with the CHARITE Artificial Disc versus lumbar fusion: Part I: Evaluation of clinical outcomes. Spine (Phila Pa 1976). 2005;30(14):1565–1575
- . Lumbar interbody fusion using the Brantigan I/F cage for posterior lumbar interbody fusion and the variable pedicle screw placement system: two-year results from a Food and Drug Administration Investigational Device Exemption clinical trial. Spine (Phila Pa 1976). 2000;25(11):1437–1446
- . Anterior lumbar interbody fusion using rhBMP-2 with tapered interbody cages. J Spinal Disord Tech. 2002;15(5):337–349
- . The Bagby and Kuslich method of lumbar interbody fusion (History, techniques, and 2-year follow-up results of a United States prospective, multicenter trial). Spine (Phila Pa 1976). 1998;23(11):1267–1279
- . Threaded titanium cages for lumbar interbody fusions. Spine (Phila Pa 1976). 1997;22(6):667–679
- A multicenter, prospective, randomized trial evaluating the X STOP interspinous process decompression system for the treatment of neurogenic intermittent claudication: two-year follow-up results. Spine (Phila Pa 1976). 2005;30(12):1351–1358
- Results of the prospective, randomized, multicenter Food and Drug Administration Investigational Device Exemption study of the ProDisc-L Total Disc Replacement versus circumferential fusion for the treatment of 1-level degenerative disc disease. Spine (Phila Pa 1976). 2007;32(11):1155–1163
- Four-year follow-up results of lumbar spine arthrodesis using the Bagby and Kuslich lumbar fusion cage. Spine (Phila Pa 1976). 2000;25(20):2656–2662
- . Six-year outcomes of anterior lumbar interbody arthrodesis with use of interbody fusion cages and recombinant human bone morphogenetic protein-2. J Bone Joint Surg Am. 2009;91(5):1181–1189
- Prospective, randomized, multicenter Food and Drug Administration investigational device exemption study of lumbar total disc replacement with the CHARITE artificial disc versus lumbar fusion: five-year follow-up. Spine J. 2009;9(5):374–386
- . Summary of Safety and Effectiveness Data: ProDisc-L Total Disc Replacement. http://www.accessdata.fda.gov/cdrh_docs/pdf5/P050010b.pdfAccessed August 25, 2009.
- Survivorship analysis of the Charite artificial disc: review of 1,938 patients from eight leading international spine centers. Berlin, Germany: Spine Arthroplasty Society; 2007;
- . Long-term results of one-level lumbar arthroplasty: minimum 10-year follow-up of the CHARITE artificial disc in 106 patients. Spine (Phila Pa 1976). 2007;32(6):661–666
- . Lumbar total disc replacement (Seven to eleven-year follow-up). J Bone Joint Surg Am. 2005;87(3):490–496
- . Clinical and radiological outcomes with the Charite artificial disc: a 10-year minimum follow-up. J Spinal Disord Tech. 2005;18(4):353–359
Extended References
- A prospective, randomized, multicenter Food and Drug Administration Investigational Device Exemption study of lumbar total disc replacement with the CHARITE Artificial Disc versus lumbar fusion: Part I: Evaluation of clinical outcomes. Spine. 2005;30:1565–1575
- . Anterior lumbar interbody fusion using rhBMP-2 with tapered interbody cages. J Spinal Disord Tech. 2002;15:337–349
- . Threaded titanium cages for lumbar interbody fusions. Spine. 1997;22:667–679
- A multicenter, prospective, randomized trial evaluating the X STOP interspinous process decompression system for the treatment of neurogenic intermittent claudication: two-year follow-up results. Spine. 2005;30:1351–1358
- Results of the prospective, randomized, multicenter Food and Drug Administration Investigational Device Exemption study of the ProDisc-L Total Disc Replacement versus circumferential fusion for the treatment of 1-level degenerative disc disease. Spine. 2007;32:1155–1162
PII: S1935-9810(09)00004-8
doi:10.1016/j.esas.2009.09.003
© 2009 Elsevier Inc. All rights reserved.
