Standardization of the Czech Version of the Tower of London Test –  Administration, Scoring, Validity

Authors: J. Michalec 1;  O. Bezdicek 2;  T. Nikolai 2;  P. Harsa 1;  H. Zaloudkova 3;  E. Ruzicka 2;  T. Shallice 4
Authors‘ workplace: st Faculty of Medicine, Charles University, and General University Hospital in Prague: 1;  Department of Psychiatry 1;  Department of Neurology and Centre of Clinical Neuroscience 2;  Department of Psychology, Arts Faculty, Masaryk University, Brno 3;  Institute of Cognitive Neuroscience, University College London 4
Published in: Cesk Slov Neurol N 2014; 77/110(5): 596-601
Category: Original Paper
doi: 10.14735/amcsnn2014596


The aim of the study was to standardize (test construction, administration and scoring) the Czech version of the Tower of London test developed by Tim Shallice in 1982 (TOL). We sought to determine potential of the TOL to differentiate between patients with Parkinson’s disease mild cognitive impairment (PD‑ MCI) and control participants and to provide preliminary normative data.

TOL is a measure of planning and problem solving ability, subsumed under executive functions. There are several standardized TOL versions available. The original is the version developed by Shallice in 1982. Standardization of its Czech version is still lacking.

Sample of 76 patients with idiopathic Parkinson’s disease (PD) underwent neuropsychological dia­gnostic procedure for PD‑ MCI. Thirty‑ five PD patients met the criteria for PD‑ MCI. These were matched according to age and education with 70 subjects from a control sample (CS).

There was a statistically significant difference between PD‑ MCI and CS in planning ability in both scoring systems (S1 and S2) proposed by Shallice: S1 (p = 0.004) and S2 (p < 0.001). Area under the curve was 0.64 in S1 and 0.73 in S2. S1 only correlated significantly with education (p = 0.02), TOL performance was unrelated to age and education.

This study standardizes TOL Czech version. Our findings support the discriminative validity of TOL Czech version on a classical model of executive dysfunction represented by PD‑ MCI. We provide preliminary normative data for elderly people, thus enabling an estimation of planning deficit.

Key words:
Tower of London – mild cognitive impairment – Parkinson’s disease – planning – validity


Planning is a mental process that is necessary for correct execution of a num­ber of activities of daily living, such as organization of work, development of a plan for locomotion or preparation of meals. It is one of the basic processes involved in effective control of activity and action [1], and one of the executive functions [2] that are activated in goal‑ directed behaviour. The clinical test most often used to assess planning is the Tower Of London (TOL) [3– 5]. TOL is also used to dia­gnose a broader range of executive function components. Among other areas, the test contributes to assessing ability to initiate activity, working memory, implicit learning, mo­ni­-toring and self‑ regulation, ability to inhibit interference, etc. [3,6].

Relevant technical literature describes several standardized versions of TOL [7– 9] differing partially in terms of physical appearance of the apparatus, administration of the test or its scoring. These include the so‑ called “Stockings of Cambridge” [10]; even though this is visually rather different, it is conceptually identical to the original TOL. The original TOL version was developed by Tim Shallice in 1982 and was first intended for experimental purposes as a test of planning ability in patients with frontal lobe lesions [5]. This TOL version remained experimental within the Czech environment, i.e. it has not been standardized (sizes and colour of the apparatus, manner of administration and scoring) [11,12]. Internationally available versions have already been standardised [13]. Shallice’s version is not licenced and its administration and scoring takes about 15 minutes.

Standardization of the Czech version of TOL for clinical use is thus one of the principal steps towards effective and more precise assessment of executive deficit. It is the objective of this study:

  • a) to serve as a standardization study for TOL (Shallice’s version) including standardization of the apparatus, description of the test administration and scoring;
  • b) to provide data on the discriminant validity in healthy subjects as compared to the classic model of executive function damage in Parkinson’s disease;
  • c) to provide basic comparative data for elderly healthy individuals to estimate the degree and profile of planning deficit.

Patient sample and method

Our sample of patients with mild cognitive impairment (MCI) in Parkinson‘s disease (PD) (PD‑ MCI) was composed of patients with PD treated at the Department of Neurology, 1st Faculty of Medicine, Charles University and General University Hospital (GUH) who underwent neuropsychological assessment as part of routine cognitive dysfunction screening. All patients signed informed consent approved by the Ethics Committee of GUH. All included patients satisfied the Clinical Dia­gnostic Criteria of UK Parkinson’s Disease Society Brain Bank [14] and were in the “on” state at the time of evaluation. Disease duration and administered doses for quantification of antiparkinson medication were completed from patients’ medical records. A total of 16 patients were treated with L‑ DOPA only, 51 received the combination of L‑ DOPA and dopamine agonists, six patients received dopamine agonists only, seven patients did not receive any antiparkinson medication at the time of inclusion in the study. The total dose of dopaminergic treatment was calculated by transforming dopamine agonist doses to L‑ DOPA equivalents [15]. A trained neurologist administered UPDRS scale prior to neuropsychological evaluation, or the scale was verified from the patient’s medical records (Tab. 1). The patients were also administered a standardized battery for neuropsychological assessment of PD‑ MCI [16] that was composed of the following tests: overall cognitive performance (level I): Mattis Dementia Rating Scale (MDRS), (level II): 1. Attention and working memory: Trail Making Test (TMT), Digit Span backwards from WAIS‑ III; 2. Executive functions: Stroop test (Victoria version; VST), phonemic fluency (letters N, K, P); 3. Speech: WAIS‑ R Similarities and semantic fluency (animals, clothing, shopping); 4. Memory: Rey Auditory Verbal Learning Test (RAVLT), Family Pictures from WMS‑ III; 5. Visuospatial abilities: Benton Judgement of Line Orientation Test (BJOL), Clock test (CDT in version Clox, Tab. 2). Of a total of 76 patients with idiopathic PD, 35 satisfied the criteria of Litvan et al., level II [16] for inclusion in the PD‑ MCI group. Forty‑ one patients with PD were not included in the analysis, because they had no significant cognitive performance deficits, or exhibited, on the contrary, even more severe deficits in the neuropsychology battery in combination with activities of daily living (ADL) impairment. None of the patients was experiencing delirium at the time of evaluation or suffered a depressive disorder or any other abnormalities simultaneously with PD that would question the dia­gnosis of PD.

A control sample (CS) satisfied the following anamnestic history and testing criteria. Individuals that had brain injury, severe neurological or psychiatric disease affecting CNS, those with psychoactive substance abuse, repeated anaesthesia or currently using medications or substances affecting CNS were excluded. In terms of test performance, subjects from the control sample had the fol­lowing results: MMSE > 26 points, MDRS > 136 points [17], FAB > 15 points [18]. Their performance also did not fall below – 1 SD of the given norm in more than one of three tests of executive functions: in TMT (condition B)[19], in phonemic verbal fluency (letters N, K, P) [20] and in the interference condition of VST [21]. The subjects were also subjected to questionnaire methods to rule out the impact of depression on mental performance (Beck Depression Inventory, second edition ≤ 12 points [22]) and to exclude individuals with activities of daily living impairment (Functional Activities Questionnaire, FAQ, self‑ asses­s­­-ment ≤ 4 points [23]). All subjects in the CS signed informed consent and their examination was in part performed as part of a diploma thesis at Masaryk University, partly as part of the GAUK grant and The Alzheimer Foundation. Total sample size of patients with the dia­gnosis of PD‑ MCI was n = 35 and they were assigned 70 individuals from the control sample matched by age and education.

The patients with the dia­gnosis of PD‑ MCI and individuals from the control sample were also administered, in addition to the above testing methods, Shallice’s version of TOL [5]. TOL is a three‑ dimensional, non‑verbal test of executive functions, especially of planning. It is composed of three pegs of different heights and of three beads of different colours. A limited number of beads can be put on each of the pegs. The tested individual is asked to compose target arrangements presented by the administrator on cards, starting from the same starting bead arrangement on the pegs using a limited number of moves. Detailed description of test administration is provided in Appendix (English and Czech version of the Appendix is accessible in on‑line version of the article at:

The study verifies both TOL scoring systems created by Shallice [5] (see the Appendix for more detail). System 1 (S1), planning effectiveness score, expresses the number of first time correctly solved (without an error) tasks out of a total of 12. System 2 (S2), the time score, is obtained as the sum of points assigned based on the time needed to successfully solve a task.

Data collection was conducted from January 2008 to March 2013. The data were analysed statistically using the SPSS IBM for Windows software.

Differences between the groups were compared using parametric methods of inferential statistics. The discriminatory potential of TOL to detect cognitive deficit (PD‑ MCI (from – 1 SD to – 2 SD at level II)) in PD versus CS was evaluated with binary logistic regression. A ROC (Receiver Operating Characteristics) curve was constructed, the Area Under ROC Curve (AUC) was computed including the 95% CI (Confidence Interval), and sensitivity, specificity and likelihood ratio LR+ and LR– were computed for individual values of the raw scores S1 and S2. The relationship between performance in TOL and demographic variables age and number of years of education was also assessed using Pearson correlations. For statistically significant relationships, a regression equation was computed to determine expected performance.


Descriptive statistics

Tab. 1 reports the descriptive statistics for the sample of patients dia­gnosed with PD‑ MCI and for the control sample. The samples were not found to be significantly different for mean age and mean number of years of education. The PD‑ MCI sample had lower representation of women than the control sample (Tab. 1). Performance characteristics of patients with PD‑ MCI and of controls are presented in Tab. 2.

1. Descriptive statistics of PD-MCI and CS sample.
Descriptive statistics of PD-MCI and CS sample.

2. Descriptive statistics of CS and PD-MCI cognitive performance (from –1 SD to –2 SD, level II) according to neuropsychological battery (level I and level II).
Descriptive statistics of CS and PD-MCI cognitive performance (from –1 SD to –2 SD, level II) according to neuropsychological battery (level I and level II).

Inferential statistics

As shown in Tab. 3, mean and median values are close for both TOL scores in the CS. Moreover, the values of skewness and kurtosis approach zero. Based on these characteristics and Q‑ Q plots, we consider the distribution of TOL scores to be normal. We, therefore, used parametric methods to compare between‑ group differences. The t‑test for independent samples revealed statistically significant differences between CS and PD‑ MCI samples in terms of performance on TOL, both in S1 score (t = 2.982; p = 0.004) and S2 score (t = 4.272; p < 0.001).

3. Descriptive statistics of the TOL raw scores in PD-MCI and CS.
Descriptive statistics of the TOL raw scores in PD-MCI and CS.

ROC analysis for PD‑MCI

The discriminatory ability of S1 and S2 scores to detect PD‑ MCI versus cognitive health is illustrated with the graph in Fig. 1 where the AUC for the S1 score is 0.643 (95% CI, 0.531– 0.754). The AUC for S2 score is 0.731 (95% CI, 0.230– 0.840). Similar insight in the discriminatory ability of individual values of raw scores S1 and S2 are provided in Tab. 4.

ROC curve of TOL S1 and S2 scores (PD-MCI vs. CS).
ROC curve of TOL S1 and S2 scores (PD-MCI vs. CS).
S1 – planning effectiveness score, is the number of tasks out of the total of 12 correctly solved tasks during the first trial without errors, S2 – time score, the sum of points assigned according to the time needed for a successful solution of the task.

4. Discriminatory potential of the TOL scores (PD-MCI vs. CS).
Discriminatory potential of the TOL scores (PD-MCI vs. CS).

Dependence of performance in TOL on age and education

The CS was used to evaluate the relationship between performance on TOL and the factors of age and education using Pearson’s correlation. The only significant relationship was found between the S1 score and the number of years of education (r = 0.278; p = 0.020). Other relationships were weak and non‑significant: S1 vs. age (r = – 0.013; p = 0.917), S2 vs. education (r = 0.155; p = 0.200), S2 vs. age (r = 0.071; p = 0.561).

Comparative data

In these analyses, S2 score was found to be independent of both age and education. The distribution of this score can also be regarded as normal. Therefore, the mean (28.510) and SD (3.035) of the distribution for the CS version is perceived representative for the population of adults from 36 to 71 years of age and with 9 to 21 years of education. S1 score is weakly significantly related to education. Therefore, we evaluated a regression equation for the relationship between expected S1 score value and years of education: Expected value of S1 = 6.887 + 0.1438 * education.


Several standardized versions of TOL are available worldwide that differ with respect to the apparatus used, administration and scoring of the test. So far, none of these versions has been standardized in the Czech Republic, and TOL is, therefore, not available. In this study, we validated the original Shallice’s version [5] of TOL, on which all other TOL versions are based. So far, only the experimental version of TOL has been available in the Czech Republic [8]. In the Appendix, we provide detailed description of the apparatus used in the test, test administration and scoring; this description can be used to relatively simply construct the test and make it operational. This version is not subject to a licence and, therefore, purchase costs are low and administration easy.

The objective of this study was to establish whether the planning effective­ness score (S1), expressing the number of tasks solved at first attempt, and the time score of planning (S2) have the capacity to discriminate between the CS and PD‑ MCI group. The results show that both TOL scores reliably discriminate between the two groups. A more detailed analysis of the differentiation potential (ROC analysis) of both TOL scores revealed that the planning time score (S2) provides a better tool for discriminating between the groups –  it represents the sum of score points assigned for individual tasks according to the overall time needed for their successful completion. The discriminative potential of the planning effectiveness score (S1) is low. It should be mentioned that the experimental group of patients with PD‑ MCI was created using the gold standard for measurement of cognitive performance, i.e. with the standardized neuropsychological battery for PD‑ MCI [16]. It is, therefore, possible that the dia­gnosis of PD‑ MCI could have been made based on substandard performance in other cognitive domains as well (attention and working memory, and visuoconstruction, speech and me­m­-ory domains). As part of this cognitive spectrum, the PD‑ MCI decomposes into several subtypes according to which individual patients have been classified as patients with PD‑ MCI. This could explain the sensitivity and specificity levels found in the study. Another important argument is the na-ture of the PD‑ MCI dia­gnostic unit (en­-tity). It is generally understood as an incipient stage of the process of cognitive deterioration in patients with PD and as a possible antecedent to dementia [16], though the cognitive deficits usually are not as extensive as in PD with dementia syndrome. Discriminating abilities of any test will then be lower as a result of greater similarity and overlap between the PD‑ MCI group and the group of normally ageing individuals. These results are in line with the results of a similar study by Owen et al. [24], particularly with their “medicated PD patients severe” group (cf. means MMSE in the Owen’s study and DRS in this study). However, due to different test batteries and PD‑ MCI dia­gnostic procedures, more detailed conclusions cannot be drawn [10,24,25].

We did not identify any study that would enable us to compare findings with respect to the relatively low discriminative ability of TOL found in the present study. This is most likely because the dia­gnostic procedure to determine PD‑ MCI employed in our study has only recently been standardized [16].

The present study has also shown that the planning ability expressed by both TOL scores is independent of age. Moreover, the time score of planning (S2) did not correlate with education, though a weak relationship was found for the planning effectiveness score (S1). Based on this, we report basic comparative data for healthy elderly individuals for the estimate of the degree and profile of planning deficit. In addition, we report a regression equation for S1 score that can be used to calculate expected number of TOL tasks solved at first attempt in relation to the number of years of education. The equation makes it possible to compare any future performance in TOL with the control sample from our study. According to our knowledge, this regression equation represents the first comparative standard for the assessment of TOL performance in elderly individuals available in the Czech Republic.

The study, of course, is a subject to limitations. First, a convenience, rather than a random, sample was used for the sampling of the control group. Second, this study employed a cros-sectional approach to investigation; neither the patients nor the control group were followed for extended time periods that would enable us to assess progression of changes associated with deepened PD‑ MCI deficits. Third, as there are no Czech normative data for the entire neuropsychology battery, we had to use meta-analytic norms for some of the tests [26].

In conclusion, this study demonstrates that TOL is a sensitive dia­gnostic instrument to evaluate executive functions, namely the planning ability. The study provides all necessary information on the testing apparatus, administration and scoring and should make it possible, together with the reported comparative data and regression equation, to use the Czech version of the test in clinical practice.

Standardizace české verze testu Londýnské věže - příloha

Standardization of the Czech Version of the Tower of London Test - appendix

The authors declare they have no potential conflicts of interest concerning drugs, products, or services used in the study.

The Editorial Board declares that the manu­script met the ICMJE “uniform requirements” for biomedical papers.

Prof. Evzen Ruzicka, M.D., DrSc., FCMA

Department of Neurology and Centre of Clinical Neuroscience

Charles University in Prague

1st Faculty of Medicine and Gene-ral University Hospital in Prague

Katerinska 30, Praha 2, CZ-12000


Accepted to review: 4. 12. 2013

Accepted to print: 7. 4. 2014


1. Miller L, Cummings JL. Conceptual and Clinical Aspects of the Frontal Lobes. In: Miller L, Cummings JL (eds). The human frontal lobes: functions and disorders. 2nd ed. New York: Guilford Press 2006: 12– 21.

2. Fuster J. The Prefrontal Cortex. 4th ed. London: Academic Press 2008.

3. Krch D. Tower of London. In: Kreutzer J, DeLuca J,Caplan B (eds). Encyclopedia of Clinical Neuropsychology. New York: Springer 2011: 2530– 2532.

4. Zillmer EA, Spiers MV, Culbertson W. Principles of Neuropsychology. Belmont California: Cengage Learning 2007.

5. Shallice T. Specific impairments of planning. Philos Trans R Soc Lond B Biol Sci 1982; 298(1089): 199– 209.

6. Hinz AM, Kostov A, Kneißl F, Sürer F, Danek A. A mathematical model and a computer tool for the Tower of Hanoi and Tower of London puzzles. Inform Sci 2009; 179: 2934– 2947.

7. Schnirman GM, Welsh MC, Retzlaff PD. Development of the Tower of London‑ Revised. Assessment 1998; 5(4): 355– 360.

8. Kafer KL, Hunter M. On testing the validity of planning/ problem‑ solving tasks in a normal population. J Int Neuropsychol Soc 1997; 3(2): 108– 119.

9. Culbertson WC, Zillmer EA. Tower of London‑ Drexel (TOL‑ DX), Technical Manual. 2nd ed. Chicago, IL: Multi‑Health Systems 2005.

10. Robbins TW, James M, Owen AM, Shakian BJ, McInnes L, Rabbitt PM. Cambridge Neuropsychological Test Automated Battery (CANTAB): a factor analytic study of large sample of normal elderly volunteers. Dementia 1994; 5(5): 266– 281.

11. Kulišťák P. Metodický materiál pro stáže v neuropsychologii. Interní tisk. Praha: Katedra neurologie IPVZ 1997.

12. Kulišťák P. Struktura kognitivního deficitu u amyotrofické laterální sklerózy. Brno: Masarykova univerzita 2007.

13. Anderson P, Anderson V, Lajoie G. The Tower of London Test: validation and standardization for pediatric populations. Clin Neuropsychol 1996; 10: 54– 65.

14. Hughes AJ, Daniel SE, Kilford L, Lees AJ. Accuracy of clinical dia­gnosis of idiopathic Parkinson’s disease: a clinico‑ pathological study of 100 cases. J Neurol Neurosurg Psychiatry 1992; 55(3): 181– 184.

15. Tomlinson CL, Stowe R, Patel S, Rick C, Gray R, Clarke CE. Systematic review of levodopa dose equivalency reporting in Parkinson’s disease. Mov Disord 2010; 25(15): 2649– 2653. doi: 10.1002/ mds.23429.

16. Litvan I, Goldman JG, Tröster AI, Schmand BA, Weintraub D, Petersen RC et al. Dia­gnostic criteria for mild cognitive impairment in Parkinson’s disease: Movement Disorder Society Task Force guidelines. Mov Disord 2012; 27(3): 349– 356. doi: 10.1002/ mds.24893.

17. Dubois B, Burn D, Goetz C, Aarsland D, Brown RG,Broe GA et al. Dia­gnostic procedures for Parkinson’s disease dementia: recommendations from the movement disorder society task force. Mov Disord 2007; 22(16): 2314– 2324.

18. Dubois B, Slachevsky A, Litvan I, Pillon B. The FAB: a Frontal Assessment Battery at bedside. Neurology 2000; 55(11): 1621– 1626.

19. Bezdicek O, Motak L, Axelrod B N, Preiss M, Nikolai T, Vyhnalek M et al. Czech Version of the Trail Making Test: normative data and clinical utility. Arch Clin Neuropsych 2012; 27(8): 906– 914. doi:10.1093/ arclin/ acs084

20. Tombaugh T, Kozak J, Rees. Normative data stratified by age and education for two measures of verbal fluency: FAS and Animal Naming. Arch Clin Neuropsych 1999; 14(2): 167– 177.

21. Troyer AK, Leach L, Strauss E. Aging and response inhibition: Normative data for the Victoria Stroop Test. Neuropsychol Dev Cogn B Aging Neuropsychol Cogn 2006; 13(1): 20– 35.

22. Beck AT, Steer RA, Brown GK. Beck Depression Inventory‑ II. San Antonio, TX: Pearson 1996.

23. Bezdíček O, Lukavský J, Preiss M. Functional Activities Questionnaire, Czech Version –  a validation study. Cesk Slov Neurol N 2011; 74/ 107(1): 36– 42.

24. Owen AM, James M, Leigh PN, Summers BA, Marsden CD, Quinn NP et al. Frontostriatal cognitive deficits at different stages of Parkinson’s disease. Brain 1992; 115(6): 1727– 1751.

25. Owen AM. Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog Neurobi 1997; 53(4): 431– 450.

26. Mitrushina M, Boone KB, Razani J, D’Elia LF. Handbook of normative data for neuropsychological assessment. 2nd ed. New York: Oxford University Press 2005.

27. Krikorian R, Bartok J, Gay N. Tower of London: a standard method and developmental data, J Clin Exp Neuropsychol 1994; 16(6): 840– 850.

Paediatric neurology Neurosurgery Neurology

Article was published in

Czech and Slovak Neurology and Neurosurgery

Issue 5

2014 Issue 5

Most read in this issue
Forgotten password

Don‘t have an account?  Create new account

Forgotten password

Enter the email address that you registered with. We will send you instructions on how to set a new password.


Don‘t have an account?  Create new account