Quantitative Measurement of Melanoma Spread in Sentinel Lymph Nodes and Survival

Download PDF České info

Background:
Sentinel lymph node spread is a crucial factor in melanoma outcome. We aimed to define the impact of minimal cancer spread and of increasing numbers of disseminated cancer cells on melanoma-specific survival.

Methods and Findings:
We analyzed 1,834 sentinel nodes from 1,027 patients with ultrasound node-negative melanoma who underwent sentinel node biopsy between February 8, 2000, and June 19, 2008, by histopathology including immunohistochemistry and quantitative immunocytology. For immunocytology we recorded the number of disseminated cancer cells (DCCs) per million lymph node cells (DCC density [DCCD]) after disaggregation and immunostaining for the melanocytic marker gp100. None of the control lymph nodes from non-melanoma patients (n = 52) harbored gp100-positive cells. We analyzed gp100-positive cells from melanoma patients by comparative genomic hybridization and found, in 45 of 46 patients tested, gp100-positive cells displaying genomic alterations. At a median follow-up of 49 mo (range 3–123 mo), 138 patients (13.4%) had died from melanoma. Increased DCCD was associated with increased risk for death due to melanoma (univariable analysis; p<0.001; hazard ratio 1.81, 95% CI 1.61–2.01, for a 10-fold increase in DCCD + 1). Even patients with a positive DCCD ≤3 had an increased risk of dying from melanoma compared to patients with DCCD = 0 (p = 0.04; hazard ratio 1.63, 95% CI 1.02–2.58). Upon multivariable testing DCCD was a stronger predictor of death than histopathology. The final model included thickness, DCCD, and ulceration (all p<0.001) as the most relevant prognostic factors, was internally validated by bootstrapping, and provided superior survival prediction compared to the current American Joint Committee on Cancer staging categories.

Conclusions:
Cancer cell dissemination to the sentinel node is a quantitative risk factor for melanoma death. A model based on the combined quantitative effects of DCCD, tumor thickness, and ulceration predicted outcome best, particularly at longer follow-up. If these results are validated in an independent study, establishing quantitative immunocytology in histopathological laboratories may be useful clinically.

Please see later in the article for the Editors' Summary

Published in the journal: . PLoS Med 11(2): e32767. doi:10.1371/journal.pmed.1001604
Category: Research Article
doi: https://doi.org/10.1371/journal.pmed.1001604

Summary

Introduction

For melanoma staging, sentinel node biopsy has been established to assess melanoma cell dissemination and has become the most widely used procedure to determine the regional lymph node status in patients with cutaneous melanoma [1],[2]. Because an evidence-based lower threshold for clinically relevant melanoma spread could not be defined [3], the detection of isolated melanoma cells in sentinel nodes was included in the latest American Joint Committee on Cancer (AJCC) staging recommendations [4]. However, the prognostic value of small tumor deposits in sentinel nodes is not unanimously accepted [5]–[7]. One potential reason for this lack of confidence may be that, unlike measurements of primary tumor thickness, highly sensitive methods for precise and direct quantification of sentinel node involvement by histopathology, optimal sample preparation, and screening are still lacking. The chance of detecting rare tumor cells by histopathology depends on the number of sections screened [8], and extensive histopathological protocols [7],[9]–[11] can achieve a detection rate of 30% but require the analysis of 24 to 36 slides per node [7]. Since this translates into 42 h of examination time for about ten melanoma patients a week [12], it is obviously impracticable for many institutions. Because of a lack of standardization [7],[10],[11],[13]–[16], the accuracy of sentinel node analysis has been limited for the mentioned practical reasons.

We previously developed a quantitative immunocytological assay to identify early cancer spread in sentinel nodes [17]. In this assay, the sentinel node is disaggregated, and disseminated cancer cells (DCCs) are detected by immunostaining for gp100, an antigen involved in melanin synthesis, using the HMB45 antibody [17]. The number of DCCs per 10⁶ isolated cells defines the DCC density (DCCD). In the current study we applied this assay to a prospective cohort of 1,027 patients. The aim was to evaluate its predictive value as a quantitative variable in comparison to qualitative routine histopathology and to address the role of minimal tumor seeding in the survival of melanoma patients.

Methods

Ethics Statement

Our study complied with the guidelines of the Declaration of Helsinki. As such, the institutional review boards of the University of Tübingen (ethics vote number 5/99) and the University of Regensburg (ethics vote number 07/79) approved the study. All patients provided written informed consent to examination of their sentinel nodes by disaggregation immunocytology, to the recording of their follow-up data in the Central Malignant Melanoma Registry of the German Dermatological Society, and to the molecular characterization of the isolated cells. Control (non-melanoma) nodes were obtained from patients with chronic venous insufficiency in whom a lymph node was removed during crossectomy, or from non-melanoma skin cancer patients. Sample acquisition was in agreement with the rules of the Ethics Committee of the University of Tübingen for use of waste tissue. No personal data were recorded from the control patients.

Patients

From February 8, 2000, to June 16, 2008, we enrolled 1,154 patients who underwent lymphatic mapping and sentinel node biopsy at the University Hospital Tübingen, Germany, for histopathologically proven first invasive primary cutaneous melanoma. At the Department of Dermatology, University Hospital Tübingen, sentinel node biopsy is generally recommended for patients with melanoma lesions with a Breslow's tumor thickness of ≥1.0 mm, with primary tumors of Clark level IV or V, or with tumors of any Breslow's thickness but showing regression or ulceration. Twelve patients without other risk factors requested sentinel node biopsy although their melanomas were thinner than 1 mm. The preoperative staging to exclude metastatic disease consisted of a physical examination, ultrasound examination of regional lymph nodes and the abdomen, chest X-ray, and computed tomography brain scans. For final analysis 119 patients were excluded as they had a follow-up of less than 3 mo, and eight patients were excluded because of missing information about primary tumor thickness. The remaining 1,027 patients (Table 1) included 322 whose DCCD results have been reported in an interim analysis [17], however without follow-up information. Skin draining control lymph nodes (n = 58) were obtained from 52 non-melanoma patients (52 skin draining nodes from nonmalignant conditions, six sentinel nodes from non-melanoma skin cancer patients) and disaggregated, stained, and evaluated identically to the lymph nodes of melanoma patients.

**Tab. 1. Baseline characteristics of study cohort.**

Lymphatic Mapping, Sentinel Node Biopsy, and Tumor Cell Detection

Cutaneous lymphoscintigraphy, sentinel node biopsy, and sample preparation were performed as previously described [17] with minor modifications. From the beginning of the study until 31 July 2003, the procedure was as follows. The lymph node was cut along its longitudinal axis for histopathological and immunocytological examination. One half of the sentinel node was fixed in 3.5% formaldehyde, paraffin-embedded, and subjected to standard histopathological treatment, which included hematoxylin and eosin staining and immunohistochemistry on three 4-µm paraffin sections from the central level. From 1 August 2003 until the end of the study, the procedure was as follows. The lymph nodes were cut perpendicularly to the long axis [15]. For histopathology, 2-mm slices were cut after formalin fixation of the tissue. Hematoxylin and eosin staining and immunostaining (using antibodies directed against S100, HMB45, and Melan-A) of sections from each level was performed as described above. Thus, the total number of sections examined per node varied according to the size of the node. However, in all cases at least two levels (with four 4-µm sections each) were examined. The complete histopathological workup of the lymph nodes was done at the Department of Pathology, University of Tübingen, without knowledge of the immunocytological gp100 result. A patient was documented as histopathologically positive if at least one node was considered positive by the histopathological examination. The patients with isolated tumor cells were considered histopathologically positive.

Quantitative immunocytology was performed immediately after sentinel node biopsy at the Department of Dermatology, University of Tübingen, using the other unfixed half of the lymph node [17]. The lymphatic tissue was cut into 1-mm pieces and disaggregated mechanically into a single-cell suspension by rotating knifes (DAKO Medimachine, DAKO), washed with HBSS (Life Technologies), and centrifuged on a density gradient made of a 60% Percoll solution (Amersham). Cells were counted using a Neubauer counting chamber. Per slide, 10⁶ cells from the interphase were then dispensed onto adhesion slides (Menzel) in a volume of 1 ml of PBS. After sedimentation for 1 h, the slides were air-dried overnight. Immunocytological staining was carried out with the alkaline phosphatase/anti-alkaline phosphatase method using primary antibodies against gp100 (HMB45, DAKO) and Melan-A (A103, DAKO), and 5-bromo-4-chloro-3-indolyl phosphate/NBT (DAKO) as substrate, yielding a blue reaction product. A lymph node was defined as gp100 positive or Melan-A positive if it contained at least one gp100-positive or one Melan-A-positive cell, respectively. The number of positive cells per million isolated cells was recorded after screening of the slides by a technical assistant and final evaluation by a dermatologist, both experienced in evaluation of cytological preparations. The recording was done without knowledge of the histopathological findings or other clinical data. Positive preparations were air-dried or stored for a maximum of 4 d in PBS until cell isolation for genomic analyses.

gp100/Melan-A Double Staining

For double immunofluorescence staining, additional slides were stained with primary antibodies against MART-1/Melan-A (rabbit monoclonal IgG, Epitomics) and gp100 (clone HMB45, mouse monoclonal IgG, DAKO). The cells were visualized after staining with Alexa Fluor 555 (donkey anti-rabbit IgG, Invitrogen) and Alexa Fluor 488 (donkey anti-mouse IgG, Invitrogen) and counterstained with 4′,6-diamidino-2-phenylindole (DAPI).

Single-Cell Comparative Genomic Hybridization

Single-cell comparative genomic hybridization (CGH) was performed as previously described [18],[19]. In brief, proteinase K was used to digest cellular proteins after isolation, the single-cell genome was digested using MseI, adaptors were ligated to the 5′ overhangs, and the DNA fragments were amplified by PCR, resulting in an MseI representation of a single-cell genome. The reagents and protocol are now commercially available as kit (Ampli1, Silicon Biosystems). These amplicons were labeled and hybridized onto metaphase spreads or an Agilent 180 K microarray for array CGH [35]. Histograms for the CGH data were generated using the online algorithms at http://progenetix.net [20]. Twelve of the 46 patients from whom we isolated gp100-positive cells for the had samples with a median DCCD of 2 gp100-positive cells per million isolated cells (range 1 to 7) and were included solely to investigate the genomes of the early DCCs.

Statistical Analyses

Melanoma-specific survival rates were calculated from the date of sentinel node biopsy until death from melanoma or the last follow-up. The 5-y survival percentages are derived from the Kaplan-Meier survival estimates, F(t). The 95% confidence intervals were based on the log(−log F(t)) transformation as described by Kalbfleisch and Prentice [21]. We calculated Pearson's correlation coefficient of log(DCCD + 1) and log(Melan-A + 1) for assessing the association of DCCD and Melan-A. For the comparison of positive DCCD values among subgroups of the other six prognostic variables we used either two-sample t-tests or one-way ANOVA after logarithmic transformation, i.e. we compared geometric means.

We used dot plots together with quartiles to show the differences in the distribution of DCCD values among the groups defined by the other variables.

We used univariable Cox regression models for the following seven predictors: gender, age, Breslow's thickness, ulceration, localization, nodal status pathology, and DCCD. A log transformation was used for the variable tumor thickness, and the logarithm of DCCD + 1 was used for the variable DCCD; hazard ratios are reported together with their 95% confidence intervals. p-Values are given for the likelihood ratio tests of the Cox models.

In addition, we calculated hazard ratios after grouping of the three continuous variables age, thickness, and DCCD. We plotted the hazard ratios of the models with the continuous and the grouped data for DCCD and tumor thickness in order to verify a linear model on a logarithmic scale.

For multivariable Cox regression analyses we adopted the model selection criterion according to Schwarz [22] and used the minimal value of the Bayes information criterion (BIC) to select the optimal model. This quantity is the sum of twice the negative log likelihood of the model plus the number of parameters times the logarithm of the sample size. Instead of evaluating the likelihood for all 128 ( = 2⁷) possible models we started with the model that included all seven variables, and then successively deleted the variable with the highest p-value. This approach finds the model with the lowest BIC value for a given number of variables, which was verified by calculating the BIC value for all 128 models.

For internal validation we used 100 bootstrap samples and calculated Harrell's c-index with and without correction for optimism [23]. We calculated Harrell's c-index instead of Somer's D because Harrell's c-index (c = (D+1)/2) estimates the proportion of concordant pairs among all comparable pairs of patients. We proceeded as follows. Step 1: we determined c_app from our model as selected using the BIC criterion. Step 2: we generated 100 bootstrap samples from the original dataset by sampling with replacement. Step 3: for each of these 100 bootstrap samples the same model selection procedure as for the original dataset was applied. Step 4: for each of the 100 bootstrap samples we calculated the c-index c_boot. Step 5: the 14 different models found in step 3 were applied to the original dataset, and the corresponding c-indices c_orig determined. Step 6: the average optimism of the fit was calculated as c_orig−c_boot. Step 7: the bootstrap-corrected performance of the original stepwise model was calculated as c_app−(c_orig−c_boot).

In order to verify the proportional hazards assumption of the Cox model we divided the patients into two groups for each of the three final predictors and plotted the ratio of their cumulative hazard functions as a function of time. According to the Cox model this ratio should stay constant. As an alternative to the Cox model we used the lognormal distribution as a model for predicting outcome by DCCD, thickness, and ulceration. The lognormal model allows determination of the maximum time-dependent hazard rate and the time at which it occurs (formulas for the lognormal survival probability are in Text S1). We used the lognormal model to determine the 5-y survival probabilities by a nomogram [24].

Model Validation and Comparison

To assess the goodness of fit of the models (Table S3), we divided the 1,027 patients into 18 subgroups according to the following criteria: three groups of DCCD values, with DCCD = 0 assigned the value 0, 0< DCCD <100 assigned the value 1, and DCCD ≥100 assigned the value 2; three groups of tumor thickness, with tumor thickness ≤2 mm assigned the value 1, 2 mm < tumor thickness ≤4 mm assigned the value 2, and tumor thickness >4 mm assigned the value 3; and ulceration no/yes. We restricted the analysis to 18 subgroups because otherwise the number of patients per subgroup would be too small. Only a slight difference in the hazard ratio was observed between the categories 0< DCCD ≤3 and 3< DCCD <100, as shown in Table 2. These categories were therefore combined. For each of the 1,027 patients, we calculated the expected failure probability for the individual follow-up times. The observed numbers of deaths in the 18 subgroups were compared to the expected numbers of deaths by the chi-square statistics. The goodness of fit of the models was compared by the sum of the 18 chi-square values, taking into account the number of degrees of freedom, which depends on the number of estimated parameters. Since the models were not nested we did not perform likelihood ratio tests. We provide the chi-square statistics only for descriptive purposes. The grouping of patients according to AJCC criteria was based on the AJCC 2009 recommendations (which include assignment of isolated tumor cells as nodal positive) [4], with the exception that mitotic rate could not be included because it had not been assessed at the beginning of the study. For the AJCC grouping, the nodal status was determined by histopathology and not by immunocytology.

**Tab. 2. Univariable survival analyses.**

For the parametric model with the variables thickness, DCCD, and ulceration, each individual patient was characterized by his or her risk score. This score was a linear combination of the logarithms of tumor thickness and DCCD and of ulceration. The purpose was to assess the goodness of fit of our model in four well-defined groups of patients. Since the precision of Kaplan-Meier estimates depends essentially on the number of deaths in a sample, we wanted to achieve similar precision in all four groups. To this end the 138 patients who died from melanoma were divided into four groups with increasing risk scores of death, as defined by the survival model. Subsequently, the risk thresholds of the four groups were applied to all the patients. For given values of the three variables included in the predictive models each patient can uniquely be assigned to one of the four groups. All four groups differed from each other significantly (all p-values <0.001 in the log-rank test comparing group 2 to group 1, group 3 to group 2, and group 4 to group 3).

We next compared the goodness of fit of the parametric model with the variables thickness, DCCD, and ulceration with a Cox model using the same variables. We found that the parametric model performed better (p = 0.13, sum of χ² values = 21.1) than the corresponding Cox proportional hazards model (p = 0.03, sum of χ² values = 27.4) after grouping all 1,027 patients into the 18 risk groups (see above and Table S3).

To assess whether a model that included the information on Melan-A staining in addition to gp100 staining could further improve outcome prediction we compared our model based on DCCD (the maximum number of gp100-positive cells per million isolated cells per patient), thickness, and ulceration with the model DCCD 2 (defined as the maximum number of gp100- or Melan-A-positive cells per million isolated cells), thickness, and ulceration.

We found that the model that included the information on Melan-A staining was not superior to gp100-based DCCD reporting (p = 0.09, sum of χ² values = 22.7; Table S3). Finally, we compared the predictions based on our parametric model and on AJCC staging [4]. The goodness of fit for our model (p = 0.13, sum of χ² values = 21.1) was much better than that of the AJCC staging model (p<0.0002, sum of χ² values = 36.7). For details and data, see the Table S3.

For comparison with the AJCC model, the patients were originally divided into three groups according to the differences in survival predictions between the two models. For individuals in Group S1 of Table S2, the survival probability for the new model was at all times greater than the survival probability according to the AJCC model; the absolute percentage difference was greater than 13%. In Group S2 the survival probability for the new model was at all times smaller than the survival probability according to the AJCC model; the maximum absolute percentage difference was greater than 13%. For Groups S1 and S2, the maximum absolute percentage difference of 13% was chosen because it exceeds the maximum absolute percentage differences for those patients for whom the survival in one model was not always greater than the survival in the other model. The remaining patients formed Group S3.

The risk scores were the sums of the products of the individual predictors multiplied by the corresponding regression coefficients. Since some tumor thicknesses were less than 1 mm, we also obtained negative risk scores because tumor thickness was on a log scale.

We performed statistical analyses with JMP (version 10.0.2).

Results

Patients

The final analysis included a total of 1,834 sentinel nodes from 1,027 patients examined by quantitative immunocytology and by histopathology. The baseline characteristics of all 1,027 patients are summarized in Table 1. The median follow-up was 49 mo (range 3 to 123 mo), with 370 (36%) patients having a follow-up of at least 5 y. During follow-up, 138 of 1,027 patients (13.4%) died from melanoma. The 5-y melanoma-specific survival probability for all patients was 86% (95% CI 83%–88%).

Detection of Disseminated Melanoma Cells by Immunocytology

The underlying rationale of our detection assay (Figure 1A–1E) comprises two aspects. First, the spatially inhomogeneous distribution of melanoma cells may be equalized by lymph node disaggregation and generation of a single-cell suspension, which facilitates melanoma cell detection when only parts of the sample are screened; second, the number of melanoma cells can be counted and the amount of analyzed tissue can be quantified by referencing the number of melanoma cells to a defined number of isolated lymph node cells. This allows determining the DCCD, i.e. the number of DCCs per 1 million isolated cells.

**Fig. 1. Sample preparation, melanoma cell detection, and distribution of disseminated cancer cell densities.**

We first assessed whether gp100-positive cells could be detected in skin draining nodes from non-melanoma patients. These lymph nodes were prepared identically to the sentinel nodes from melanoma patients (Figure 1B–1D), except that lymph nodes from cancer patients had to be split in half to provide tissue for routine histopathology. We could not detect a single gp100-positive cell among 171×10⁶ cells isolated from 58 non-melanoma skin draining lymph nodes.

On the other hand, we detected gp100-positive cells in the lymph nodes of 525 of the 1,027 melanoma patients (51%). Whenever enough cells were isolated from the lymph node half for immunocytology, we aimed to screen 2×10⁶ lymph node cells, i.e., two slides, per patient. The median number of slides screened per node was two (range 10⁴ to 6×10⁶ cells). The median DCCD in patients with DCCD >0 was 4 gp100-positive cells per million isolated cells (ranging from 0.2 to 950000; Figure 1F). We evaluated the relation of DCCD with the six established prognostic factors (Figure 2). Geometric mean values of DCCD were significantly higher in thicker and ulcerated melanomas (p<0.001), in melanomas located at other sites than the extremities (p = 0.02), and in patients with a pathologically positive sentinel node (p<0.001; Figure 2).

**Fig. 2. DCCD and standard clinical prognostic factors.**

However, severe concerns about the immunocytological assay may be raised by (1) the loss of architectural information, which helps to differentiate between intra-lymphatic nevi and colonies of melanoma cells, (2) the difficulty to identify melanoma colonies by morphological criteria, and (3) the fact that the gp100 antigen for melanoma detection may be down-regulated. We addressed these concerns by careful evaluation of lymph node preparations from melanoma and non-melanoma patients using a second melanoma-associated antibody directed against Melan-A and by genetic analysis of the gp100-positive cells.

We found Melan-A-expressing cells in three out of 38 (8%) control lymph nodes, all of which were gp100 negative. We then proceeded to determine the detection rate of the two antibodies for melanoma cells in sentinel nodes from melanoma patients with histopathologically proven lymphatic spread by applying a double staining method. Because Melan-A-positive cells were detected in 8% of control nodes, we restricted the direct comparison of gp100- and Melan-A-positive cells to samples from histopathologically positive nodes, where expression of Melan-A by benign cells would be less likely to confound the analysis. Evaluating 3,055 cells from 43 nodes of 41 patients after immunofluorescence double staining, we found that 39 of 43 (91%) lymph nodes harbored cells positive for gp100 and Melan-A (Figure S1), one of 43 (2%) had only gp100-positive cells, and three of 43 (7%) had only Melan-A-positive cells.

These data suggest that the slightly higher detection rate of the Melan-A antibody does not outweigh the lower specificity as determined by the control samples. However, to completely rule out that gp100-negative DCCs comprise a relevant confounding factor, additional slides were stained using the Melan-A antibody in 710 patients. Comparing the gp100 and Melan-A staining results of these 710 patients, we confirmed the high correlation of gp100 and Melan-A staining (r = 0.83, p<0.001) that we had previously seen by double immunofluorescence. As detailed in Table S3, we found that the inclusion of Melan-A did not improve the prognostic power of the gp100-based immunocytological assay.

Genetic Characteristics of Disseminated Melanoma Cells

Since all these findings provided indirect support that gp100-positive cells represent DCCs, we searched for direct evidence of their malignant origin. We randomly isolated 65 gp100-positive cells from 46 patients for a whole-genome screen of chromosomal aberrations by CGH. The DCCD values of these patients ranged from 0.2 to 800,000 gp100-positive cells per million isolated cells (median = 8), and we analyzed between one and three cells per patient. Metaphase CGH provided direct proof for the malignant origin of 57 gp100-positive cells (Figure 3A), while eight cells displayed normal karyotypes. As metaphase CGH has a resolution of 10–20 Mb, we subsequently applied array CGH [35], which has a resolution of <1 Mb, to these eight cells. While we could not detect any aberration in two cells, the remaining displayed between one and ten changes (median = 4.5) ranging from 0.1 to 19 Mb (median = 2 Mb). In summary, 63 of 65 gp100-positive cells (97%) displayed genomic aberrations, which classified 45 of 46 patients (98%) as harboring cancer cells in their sentinel nodes. There was no difference for cells isolated from lymph nodes classified as negative or positive by routine histopathology, demonstrating that our assay is suited to correctly identifying melanoma cells without morphological assessment of tissue architecture (Figure 3B).

**Fig. 3. Chromosomal aberrations of isolated gp100-positive cells.**

Disseminated Cancer Cell Density and Melanoma-Specific Survival

We evaluated DCCD as biomarker according to the REMARK criteria [25]. Of the standard prognostic factors, sentinel node histopathology (p<0.001), age (p<0.001), thickness (p<0.001), ulceration (p<0.001), and localization of the primary melanoma (p = 0.04) were associated with poor outcome in the univariable Cox regression analyses (see Table 2 and Figure S2 for Kaplan-Meier estimates). Increasing DCCD values were negatively associated with the time to death from melanoma in the univariable Cox regression analyses (p<0.001). We assessed the prognostic impact of DCCD after categorizing the values into four groups (Table 2). We found that even the detection of low DCCD values (0<DCCD ≤3) conferred a significant risk of death (hazard ratio 1.63, 95% CI 1.02–2.58, p = 0.04; Table 2 and Figure 4A) compared to patients without DCCs. Increasing hazard ratios were obtained for categories with higher DCCD values (Table 2). The relationship of increasing DCCD values and the hazard ratio is plotted in Figure 4B on the logarithmic scale. The unit risk ratio (corresponding to a 10-fold increase of DCCD + 1, e.g., from a DCCD of zero to a DCCD of nine) was 1.81 (95% CI 1.61–2.01), and a linear relation (on log scale) between DCCD and hazard ratio was identified (Figure 4B). A similar log-linear relationship was seen between tumor thickness and hazard ratio (Figure 4C).

**Fig. 4. The prognostic impact of disseminated cancer cells in sentinel nodes.**

We next performed stepwise multivariable Cox regression analysis starting with all six standard prognostic factors in addition to DCCD. After each step of the multivariable analysis the variable with the highest p-value was deleted (Tables 3 and 4). To identify the optimal model, we determined the BIC, which has a minimal value for the best model [22]. As can be seen from Table 4, the BIC value is lowest for the combined variables tumor thickness, DCCD, and ulceration, for which all p-values were below 0.001. The unit hazard ratios for this model were 6.96 (95% CI 3.61–13.28) for thickness, 1.43 (95% CI 1.27–1.61) for DCCD, and 2.04 (1.4–2.97) for ulceration. It should be noted that nodal status determined by routine histopathology had a maximum hazard ratio of 1.75 (95% CI 1.04–2.86) in multivariable analyses and was rejected already in step 3 (Table 3).

**Tab. 3. Multivariable survival analyses: hazard ratios together with their 95% confidence intervals.**

Multivariable survival analyses: model selection according to <i>p-</i>values and Bayes Information Criterion. — **Tab. 4. Multivariable survival analyses: model selection according to p-values and Bayes Information Criterion.**

Individual Risk Prediction by Tumor Thickness, Disseminated Cancer Cell Density, and Ulceration

To fully exploit the power of our quantitative assay, we combined the three most important risk factors identified by multivariable analysis (tumor thickness, DCCD, and ulceration) for individual risk assessment at diagnosis and during follow-up. While results of Cox models represent a useful summary for the average hazard ratios, we observed that the assumption of proportional hazards was not fulfilled for tumor thickness and DCCD (Figure S3). Therefore, we employed a lognormal survival model based on tumor thickness, DCCD, and ulceration that allows the calculation of changes in individual risk over time and of the predicted 5-y survival for all 1,027 patients (Figure 5A). Figure 5A shows that patients with thin melanomas never harbored high DCC numbers in their lymph nodes and poorest outcome was seen for thick tumors and high DCCD. It should be noted that DCCD and tumor thickness are plotted on a logarithmic scale, and therefore the curves of equal 5-y survival probability appear as straight lines. On a linear scale these curves (isoboles) are convex (Figure S4), which indicates synergism [26]. Using this model, we calculated the time-dependent hazard rates for 14 individual patients with five different hazard rate curves (Figure 5B). This calculation revealed that DCC-negative and DCC-positive patients might display identical hazard rate functions (e.g., compare Patients 2a and 2b in Figure 5B) and also that the hazard rate peaks later in low-risk than in high-risk patients. Furthermore, tumor thickness, DCCD, and ulceration state can be integrated into a preliminary nomogram to determine the 5-y survival of individual patients (Figure 5C).

**Fig. 5. Individualized risk estimation over time.**

Internal Validation of the Model

While the present survival model based on the predictors thickness, DCCD, and ulceration awaits validation by an independent multi-center study, we sought to validate it internally. For this we applied a bootstrapping approach [23]. We generated 100 bootstrap samples from the original dataset by sampling with replacement. For each of these 100 bootstrap samples the same model selection procedure as for the original dataset was applied. We obtained 14 different “best models” (Table 5). The present model was selected most often. The variable DCCD was included in 85 best models, whereas nodal status by routine histopathology was included in only 36 of the 100 models. Harrell's c-index, which estimates the probability of concordance between predicted and observed responses, for the present model was 0.763 in the original dataset. Harrell's c-index based on the current AJCC staging system was 0.737. This is significantly smaller (p<0.0001; McNemar's test). The bootstrap-corrected c-index [23] for the present model turned out to be 0.748, which is well above the value of 0.5 representing only random prediction ability.

**Tab. 5. Variables included in the 14 best models found in the 100 bootstrap samples created for internal validation.**

We then analyzed those patients for whom the predictions of the AJCC and the new model differed (survival probability in group S1: new model > AJCC; in group S2: new model < AJCC; group S3, remaining patients). We calculated the expected number of deaths at the observed follow-up time for each patient and compared this number with the observed number of deaths (Table S2). Only the new model provided an acceptable fit for all three groups. In Group S2 of Table S2 the number of deaths predicted in the AJCC model was significantly different from the number of observed deaths (p<0.0001).

Finally, we combined Groups S1 and S3 from Table S2 to form two groups. Group 1 now comprised patients for whom the novel model predicted better survival than the AJCC model and patients for whom the predictions of both models concurred. Patients for whom the novel model predicted a worse survival than the AJCC model formed Group 2.

Nearly 94% of AJCC low-risk patients (<IIB) were in Group 1 (Figure 6A and 6B). Nearly 29% of AJCC high-risk patients (>IIA) were in Group 2. Patients in Group 2 had higher DCCD values than patients in Group 1. The geometric means of DCCD + 1 are 2.23 (95% CI 1.98–2.51) in Group 1 and 80.53 (95% CI 59.54–108.94) in Group 2, respectively (p<0.0001; two-sample t-test for the logarithms).

**Fig. 6. Differences in survival prediction between this study's model and the AJCC-based model.**

For the time points 3 and 6 y after sentinel lymph node biopsy we determined which model provided a better fit for the survival of patients in Group 1 and 2. Kaplan-Meier plots demonstrated that the predicted and observed survival curves diverge particularly for the AJCC prediction of Group 2 patients (Figure 6C). For both time points 3 and 6 y (p<0.001 and p<0.01, respectively) the AJCC model significantly deviated from Kaplan-Meier estimates for Group 2 patients, whereas our model correctly predicted the number of deaths (Table 6).

**Tab. 6. Kaplan-Meier estimates versus predicted deaths for a follow-up of 3 and 6 y by model and group.**

For the low-risk patients of Group 1, both models provided acceptable fits, although we noted a borderline p-value (p = 0.06) for the new model at 3 y. However, the fit for predicted and observed survival becomes excellent for the new model over time (Figure 6C)—in line with the need for longer observation periods in low-risk patients. Thus, at 6 y follow-up there is perfect agreement for the new model, whereas the AJCC model overestimates the number of deaths (Figure 6C and Table 6).

We assessed the goodness of fit of the model after grouping the patients according to their risk scores, which was a linear combination of the logarithms of tumor thickness and DCCD and of ulceration, into four groups (see Methods). We compared predicted and observed survival curves and found that Kaplan-Meier curves and predicted curves were superimposable over the complete range of disease courses for all four risk groups and that all four groups differed significantly from each other (Figure 7). Finally, we compared the goodness of fit for several models (a model based on Cox regression analysis, a model that includes data on Melan-A staining, and a model based on the current AJCC criteria). In summary, we found that the parametric model based on thickness, DCCD, and ulceration most accurately predicted melanoma death (Table S3).

**Fig. 7. Goodness of fit of observed and predicted survival for four risk groups.**

Discussion

In this study, we quantified the number of DCCs per one million isolated lymph node cells (DCCD) and assessed its utility in predicting melanoma outcome. Based on a median follow-up of 49 mo, with 370 patients having follow-up times of more than 5 y, we found that at the time of sentinel node biopsy, quantitative assessment of DCCD predicted melanoma outcome by univariable and multivariable analysis in a large cohort of patients. Furthermore, quantitative DCCD showed a stronger association with outcome than qualitative conventional histopathology and, when combined with primary melanoma thickness and ulceration, had a synergistic impact on patient survival. Using these variables we developed a parametric model that proved to be the most accurate for predicting outcome. Although we currently lack an external validation cohort with long follow-up, the accepted prognostic role of sentinel lymph node spread [4] and the successful internal validation (bootstrap and goodness of fit) give credibility to the findings.

We found that even the detection of three or fewer DCCs per million leukocytes in the sentinel node increases the risk of death at 5 y from melanoma by 6% (8% for DCCD = 0 versus 14% for 0< DCCD ≤3). This finding is in line with reports supporting the clinical relevance of single DCCs [5],[27] and the novel AJCC recommendation [4] to refrain from using a lower threshold for sentinel node spread. However, we also identified three shortcomings of the AJCC categorization approach. First, AJCC staging does not differentiate between isolated cancer cells and small and large microscopic metastases. Our data demonstrate that the number of cells matters over the full range of DCCD. Second, because any measured DCCD value can be translated into a 5-y survival rate, the typical exaggerations of categorizing staging systems, such as upgrading from stage II to stage III because of the detection of a single melanoma cell, are also avoided, and individual disease courses can be accommodated better than by categorizing tumors based on the AJCC staging system. Third, we provide clinical evidence for the context dependency of the metastasis-forming potential of DCCs, which emerges from our observation that DCCD, tumor thickness, and ulceration—being the leading prognostic factors from the multivariable analysis—can be combined in a parametric survival model where the prognostic value of a single DCC differs for thick and thin tumors. For example, we observed DCCs in 46% of T1 stage melanomas; however, 5-y survival rates are more than 90% in this subgroup of patients, indicating that under most conditions DCCs do not result in clinically relevant metastasis. This may suggest that cellular programs such as senescence or dormancy are activated at initial homing to distant sites [28],[29] but may be released once primary tumors grow large. Such a scenario has gained credibility since secreted factors of primary melanomas, such as exosomes, were shown to evoke substantial systemic effects [30] promoting metastasis.

In addition to DCCD and thickness of the primary melanoma, ulceration status had an impact on survival in melanoma. The biological interdependencies between destructive growth (ulceration), tumor-mass-induced systemic alterations (tumor thickness), and metastatic dissemination (DCCD) for progression of an individual melanoma may explain why histopathologically node-positive patients can have a better outcome than histopathologically node-negative patients if the primary melanoma has more favorable prognostic features. This phenomenon is not reflected within a categorizing staging system. For example, the current AJCC staging predicts a 5-y survival of 53% for stage IIC (T4bN0M0) and 70% for stage IIIa (T1-4N1aM0). In contrast, the estimated survival of a patient with tumor thickness 7.4 mm and DCCD = 0 but without ulceration is identical to the estimated survival of a patient with ulceration and tumor thickness 3.35 mm and DCCD = 2, or with tumor thickness 1.2 mm and DCCD = 772 in our model.

Since summary measures of survival may provide insufficient information about population dispersion, we asked whether the new model reflects the prognostic heterogeneity of patients more accurately. Indeed, we identified a group of patients at high risk for progression in whom the AJCC model underestimates the risk of death. Although this group of patients is relatively small (13% in our cohort), these patients will most likely benefit from adjuvant therapy, and the model may help to improve patient stratification for clinical trials. It also identified a group of very low risk patients who have an excellent long-term outcome and whose risk of dying is overestimated by the AJCC staging model.

We carefully evaluated the performance of our assay. As lymph node disaggregation destroys the tissue architecture, some morphological criteria to identify melanoma cells are lost. However, we deem it unlikely that benign nevus cells in sentinel nodes, described in up to 28% of melanoma patients [9], confound our conclusions. These cells rarely express gp100 [31], and likewise we could not find gp100-expressing cells in non-melanoma lymph nodes. While this does not rule out the possibility that truly DCC-negative sentinel nodes from melanoma patients may contain gp100-expressing benign nevus cells, our finding that even low numbers of gp100-positive cells are prognostically relevant would then suggest that gp100-positive nevus cells may be prognostically informative. Furthermore, in 97% of all analyzed gp100-positive cells we detected chromosomal or subchromosomal alterations. For only one patient out of 46 could we not confirm the malignant descent of the isolated cell. In all other cases genetic alterations in gp100-positive cells proved disseminated melanoma, suggesting that morphological criteria for DCC identification are dispensable.

Then, we directly addressed the question of whether staining for another antigen (Melan-A) increases the detection rate and the prognostic power of the gp100-based immunoassay. However, while Melan-A staining added a few samples (7%) to the gp100-identified positive lymph nodes, it also stained 8% of control nodes. Moreover, assessing the prognostic power of the combined results of gp100 and Melan-A staining for 710 patients, we found that the gp100-only model was more accurate.

Compared to our assay, evaluation of sentinel nodes by pathology has two major limitations. First, sensitivity largely depends on the number of slides examined. Second, quantification of lymphatic melanoma spread—a three-dimensional and often multilocular process—is impossible by histopathology. We resolved these problems by homogenizing the patchy spatial distribution of tumor cells within the node [32]—which greatly impacts detection in tissue sections but less so in our approach—and counting the stained cells. Thus, screening of a median of only two slides (2×10⁶ cells) per node revealed a detection rate of 51%, whereas pathology was positive in only 14% of patients. To achieve a similar sensitivity by histopathology, it has been suggested that more than 36 slides per sample need to be analyzed, indicating that immunocytology might be advantageous also for practical reasons [7],[12]. In the future, both lymph node preparation and screening may even be subjected to partial automation and thereby decrease workload further.

The high detection rate of immunocytology is reminiscent of the sensitivity of RT-PCR methods, which is also around 50% [33]. However, despite 20 y of clinical evaluation, RT-PCR assays have failed to become clinical routine. Since one of our major findings consists in the quantitative impact of lymphatic cancer cell dissemination for patient outcome, we deem the non-quantitative nature of RT-PCR assays and the failure to prove the malignant melanoma origin of the detected nucleic acids to be a likely explanation for its failure. RT-PCR assays do not measure cell numbers but transcript numbers, which may be generated by a few high-expressing cells or many low-expressing cells. Since the unit of selection during malignant progression is a cell and not a transcript, even quantitative transcript information will always represent a qualitative assessment of cancer spread. Therefore, RT-PCR methods are unable to provide the information delivered here that the prognostic weight of a single disseminated melanoma cell is context dependent.

In summary, we provide evidence that quantification of lymphatic cancer cell dissemination is feasible and can be combined with other quantitative and qualitative characteristics of the primary tumor for accurate individual outcome prediction, probably not only for melanoma but also for other types of solid cancer [34]. It will be important to validate the findings in an independent study before the assay and the prediction model are used clinically.

Supporting Information

Zdroje

1. MortonDL, WenDR, WongJH, EconomouJS, CagleLA, et al. (1992) Technical details of intraoperative lymphatic mapping for early stage melanoma. Arch Surg 127: 392–399.

2. ReintgenD, CruseCW, WellsK, BermanC, FenskeN, et al. (1994) The orderly progression of melanoma nodal metastases. Ann Surg 220: 759–767.

3. GershenwaldJE, SoongSJ, BalchCM (2010) 2010 TNM staging system for cutaneous melanoma…and beyond. Ann Surg Oncol 17: 1475–1477.

4. BalchCM, GershenwaldJE, SoongSJ, ThompsonJF, AtkinsMB, et al. (2009) Final version of 2009 AJCC melanoma staging and classification. J Clin Oncol 27: 6199–6206.

5. MuraliR, DesilvaC, McCarthySW, ThompsonJF, ScolyerRA (2012) Sentinel lymph nodes containing very small (<0.1 mm) deposits of metastatic melanoma cannot be safely regarded as tumor-negative. Ann Surg Oncol 19: 1089–1099.

6. van AkkooiAC, de WiltJH, VerhoefC, SchmitzPI, van GeelAN, et al. (2006) Clinical relevance of melanoma micrometastases (<0.1 mm) in sentinel nodes: are these nodes to be considered negative? Ann Oncol 17: 1578–1585.

7. van der PloegAP, van AkkooiAC, SchmitzPI, KoljenovicS, VerhoefC, et al. (2010) EORTC Melanoma Group sentinel node protocol identifies high rate of submicrometastases according to Rotterdam criteria. Eur J Cancer 46: 2414–2421.

8. van DiestPJ (1999) Histopathological workup of sentinel lymph nodes: how much is enough? J Clin Pathol 52: 871–873.

9. AbrahamsenHN, Hamilton-DutoitSJ, LarsenJ, SteinicheT (2004) Sentinel lymph nodes in malignant melanoma: extended histopathologic evaluation improves diagnostic precision. Cancer 100: 1683–1691.

10. CochranAJ, BaldaBR, StarzH, BachterD, KragDN, et al. (2000) The Augsburg Consensus. Techniques of lymphatic mapping, sentinel lymphadenectomy, and completion lymphadenectomy in cutaneous malignancies. Cancer 89: 236–241.

11. StarzH, BaldaBR, KramerKU, BuchelsH, WangH (2001) A micromorphometry-based concept for routine classification of sentinel lymph node metastases and its clinical relevance for patients with melanoma. Cancer 91: 2110–2121.

12. MuraliR, ThompsonJF, ScolyerRA (2010) Location of melanoma metastases in sentinel lymph nodes: what are the implications for histologic processing of sentinel lymph nodes in routine practice? Am J Surg Pathol 34: 127–129.

13. BagariaSP, FariesMB, MortonDL (2010) Sentinel node biopsy in melanoma: technical considerations of the procedure as performed at the John Wayne Cancer Institute. J Surg Oncol 101: 669–676.

14. MortonDL, ThompsonJF, CochranAJ, MozzilloN, ElashoffR, et al. (2006) Sentinel-node biopsy or nodal observation in melanoma. N Engl J Med 355: 1307–1317.

15. PrietoVG, ClarkSH (2002) Processing of sentinel lymph nodes for detection of metastatic melanoma. Ann Diagn Pathol 6: 257–264.

16. ScolyerRA, MuraliR, McCarthySW, ThompsonJF (2008) Pathologic examination of sentinel lymph nodes from melanoma patients. Semin Diagn Pathol 25: 100–111.

17. UlmerA, FischerJR, SchanzS, SotlarK, BreuningerH, et al. (2005) Detection of melanoma cells displaying multiple genomic changes in histopathologically negative sentinel lymph nodes. Clin Cancer Res 11: 5425–5432.

18. KleinCA, BlankensteinTJ, Schmidt-KittlerO, PetronioM, PolzerB, et al. (2002) Genetic heterogeneity of single disseminated tumour cells in minimal residual cancer. Lancet 360: 683–689.

19. KleinCA, Schmidt-KittlerO, SchardtJA, PantelK, SpeicherMR, et al. (1999) Comparative genomic hybridization, loss of heterozygosity, and DNA sequence analysis of single cells. Proc Natl Acad Sci U S A 96: 4494–4499.

20. BaudisM, ClearyML (2001) Progenetix.net: an online repository for molecular cytogenetic aberration data. Bioinformatics 17: 1228–1229.

21. Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Hoboken (New Jersey): Wiley. 321 p.

22. SchwarzG (1978) Estimating the dimension of a model. Ann Statist 6: 461–464.

23. HarrellFEJr, LeeKL, MarkDB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15: 361–387.

24. Otto E (1963) Nomography. New York: Macmillan.

25. McShaneLM, AltmanDG, SauerbreiW, TaubeSE, GionM, et al. (2005) Reporting recommendations for tumor marker prognostic studies (REMARK). J Natl Cancer Inst 97: 1180–1184.

26. MachadoSG, RobinsonGA (1994) A direct, general approach based on isobolograms for assessing the joint action of drugs in pre-clinical experiments. Stat Med 13: 2289–2309.

27. ScheriRP, EssnerR, TurnerRR, YeX, MortonDL (2007) Isolated tumor cells in the sentinel node affect long-term prognosis of patients with melanoma. Ann Surg Oncol 14: 2861–2866.

28. BraumullerH, WiederT, BrennerE, AssmannS, HahnM, et al. (2013) T-helper-1-cell cytokines drive cancer into senescence. Nature 494: 361–365.

29. KleinCA (2011) Framework models of tumor dormancy from patient-derived observations. Curr Opin Genet Dev 21: 42–49.

30. PeinadoH, AleckovicM, LavotshkinS, MateiI, Costa-SilvaB, et al. (2012) Melanoma exosomes educate bone marrow progenitor cells toward a pro-metastatic phenotype through MET. Nat Med 18: 883–891.

31. PrietoVG (2012) Cutaneous melanocytic lesions: do not miss the invisible gorilla. Adv Anat Pathol 19: 263–269.

32. Riber-HansenR, NyengaardJR, Hamilton-DutoitSJ, SteinicheT (2009) The nodal location of metastases in melanoma sentinel lymph nodes. Am J Surg Pathol 33: 1522–1528.

33. MocellinS, HoonDS, PilatiP, RossiCR, NittiD (2007) Sentinel lymph node molecular ultrastaging in patients with melanoma: a systematic review and meta-analysis of prognosis. J Clin Oncol 25: 1588–1595.

34. SchillingD, HennenlotterJ, SotlarK, KuehsU, SengerE, et al. (2011) Quantification of tumor cell burden by analysis of single cell lymph node disaggregates in metastatic prostate cancer. Prostate 70: 1110–1118.

35. CzyzTZ, HoffmannM, SchlimokG, PolzerB, KleinCA (2004) Reliable single cell array CGH for clinical samples. PLoS ONE 9: e85907.