About CADDementia

Computer-aided diagnosis methods based on structural brain MRI may improve the early diagnosis of dementia. Promising results of MRI-based diagnostic classification have been presented in the literature, but frequently the methods are optimized for specific data sets. It is unclear how the different algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no more opportunity to adapt the algorithm to the data at hand. To address this, we are organizing a “grand challenge” to objectively validate and compare methods for computer-aided diagnosis based on structural brain MRI scans (Bron et al., 2015). A standardized evaluation framework is set up, in which methods can be validated with the same measures on the same multi-center data set. Research groups are invited to run their classification methods on the data and to submit the results. The results are published on this web site.


1         Objective

The objective of this challenge is to objectively compare computer-aided diagnosis methods for dementia. We aim to evaluate algorithms developed by international research teams that perform multi-class classification of patients with Alzheimer’s disease (AD), patients with mild cognitive impairment (MCI) and healthy controls (CN). For this challenge, we developed a standardized evaluation framework that provides a standardized evaluation methodology and a reference database that contains multi-center structural MRI data, clinical diagnoses and demographic information.

2         Scope

This project aims to compare algorithms for computer-aided diagnosis of dementia based on structural MRI data. In order to investigate how the algorithms would perform in clinical practice, participants can run their algorithms on a new clinically-representative multicenter dataset to make the diagnostic classification of AD, MCI and controls.  The algorithms can be trained and tuned on any suitable data (e.g. data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database). Additionally, we provide a small set of representative training data with diagnostic labels that the teams can use to get an indication of the performance of their algorithm before they submit the diagnostic labels on the test data.

For evaluation of the algorithms, the current clinical diagnosis criteria for AD (McKhann et al., 2011) and MCI (Petersen, 2004) are used, which is common practice in studies of computer-aided diagnosis methods (Cuingnet et al., 2011; Davatzikos et al., 2008; Duchesne et al., 2008; Fan et al., 2008a, 2008b; Gray et al., 2013; Klöppel et al., 2008; Koikkalainen et al., 2012; Magnin et al., 2009; Vemuri et al., 2008; Wolz et al., 2011).  Ground truth diagnosis of dementia can be only assessed using autopsy and is therefore hardly available. Of the previously mentioned papers, only one paper included one group of 20 AD patients with an autopsy confirmed diagnosis (Klöppel et al., 2008). Amyloid imaging (Klunk et al., 2004) has also proven to be a good biomarker for AD, as subjects with positive amyloid outcome showed to have a more rapid disease progression (Jack et al., 2010). However, availability of these data is also very limited.

For the current challenge, we therefore chose to use the current clinical diagnosis as standard diagnosis for evaluation. Using the clinical diagnosis has limitations as the pathology is not included and the diagnosis might be incorrect in some cases.

Another limitation of the clinical diagnosis is that all MCI patients are classified as one group. MCI is known to be a heterogeneous group as some cases will not progress to AD and also the stage of disease progression differs. Like in current clinical practice, MCI patients in our data set are diagnosed as one group. For inclusion of MCI patients, the criteria of Petersen (2004) were used, which include the presence of memory complaints as a criterion. So similar to ADNI, only amnestic MCI's were included in the challenge data.

The interpretation of the differentiation of ADs, MCIs, and controls as a multiclass classification might not be optimal as there is an ordering of the classes, i.e. classification of an AD patient as an MCI patient is less bad than classifying as healthy. We found only one paper using ordinal ranking for this problems (Fan, 2011). Because the current clinical diagnosis uses the three classes, we chose to focus on multiclass classification in this challenge.

Additionally, one should note that this challenge does not aim to assess the diagnostic accuracy of structural MRI for diagnosis, as MRI is also included in the criteria for clinical diagnosis. Instead, this project focuses on comparing computer-aided diagnosis algorithms on an unseen blinded test set with standardized evaluation methods using the clinical diagnosis as best available reference standard.

Another choice made for this challenge is to compare the full methodology, from image to diagnosis. As participants have a lot of freedom in their choices for the training data and the methods for image processing and classification, the discussion of the methods can be difficult. In some cases we would not be fully able to explain the performance difference between methods. For example, a very good method that uses a small amount of training data may have the same performance as another method that is worse but uses more training data. Since our main question is how good the current state-of-the-art methods would perform in clinical practice, we specifically chose to use few constraints for the participating methods.

3         Study design

3.1       Data

As data for the evaluation framework, we composed a multi-center data set consisting of 384 scans. The participating centers are: Erasmus MC (EMC), Rotterdam, the Netherlands; VU University Medical Center (VUmc), Amsterdam, the Netherlands; University of Porto / Hospital de São João (UP), Porto, Portugal  This data set contains structural MRI (T1w) scans of subjects with the diagnosis of probable Alzheimer’s disease (AD), mild cognitive impairment (MCI) and participants without a dementia syndrome (controls). In addition to the MR scans, demographic information (age, gender) and information on which data are from the same institute is included. A large set is needed for comparison of the different methods. In addition, a large set increases the scientific value of our framework, as the data better represents a clinical population. Another paper comparing ten methods (Cuingnet et al., 2011) used 509 subjects (AD, MCI and controls).

Most of the data is used for evaluation of the methods: the test set. Additionally, a small training dataset is provided, which consists of 30 scans distributed over the diagnostic groups. We decided to limit the training set to 30 subjects. Since we aim to evaluate the performance in a clinical situation, when not much data similar to the test data is available, we expect other data to be used for training as. Because we want to allow participants to have an indication of the performance of their algorithm on the target data before submission, we decided to make the diagnostic labels of 30 training sets available.

3.2       Multi-class classification

This challenge evaluates algorithms for multi-class classification of AD, MCI and controls. Methods that are developed for binary classification can be used for three-way classification by using either a one-vs-one (ovo) or one-vs-all (ova) strategy. In the first approach, three classifiers are trained for the three binary problems. And in the second approach, three classifiers are trained for separation of one class and the two other classes. In both approaches the three classifiers can be combined by for example multiplication and renormalization of the output probabilities.

3.3       Challenge

Research groups can apply via the challenge web site by submitting a brief description of their method. After the description is checked for validity by the challenge organizers and after a data confidentiality agreement is signed, the participants will get access to a password protected part of the challenge web site. From this protected web site a part of the data can be downloaded under strict conditions. The data available for download are, for the training set: 30 T1w scans from the probable AD, MCI and controls groups including diagnostic label, age, gender and scanner information; and for the test set: T1w scans from the probable AD, MCI and control groups including age, gender and scanner information. The participants run their methods on the data and predict the diagnoses of the patients in the test set. The results need to be submitted to the challenge organizers. The participants write a paper describing the method and results, which they submit via the web site. Participants are also asked to upload an executable of their methods.

4         Ground truth diagnosis

In the database AD patients, MCI patients, and controls are included. The data have been acquired either as part of clinical routine or as part of a research study. All patients underwent neurological and neuropsychological examination as part of their routine diagnostic work up. The clinical diagnosis is established by consensus of a multidisciplinary team. Patients with AD met the clinical criteria for probable AD (McKhann et al., 1984, 2011). Additionally, MCI patients fulfilled the criteria specified by Petersen (2004). Center-specific procedures are specified below.

4.1       Erasmus MC (EMC) 

From the Erasmus MC (Rotterdam, the Netherlands), AD patients, MCI patients, and healthy controls are included. The data have been acquired either as part of clinical routine or as part of a research study. All patients visited the Alzheimer Center South West Netherlands. Healthy control subjects were volunteers recruited in research studies, and did not have any memory complaints. All subjects signed informed consent and the study was approved by the local medical ethical committee.

4.2       VU University Medical Center (VUmc)


Patients with AD, patients with MCI and controls with subjective complaints were included from the memory-clinic based Amsterdam Dementia Cohort (Amsterdam, the Netherlands). The protocol for selection of patients and controls is the same as used by Binnewijzend et al. (2013). Controls were selected based on subjective complaints and had at least 1 year of follow-up with stable diagnosis. For the controls, the findings from all investigations were normal; they did not meet the criteria for MCI, psychiatric disorder, or other underlying neurologic diseases. Patients T1w-scans showed no observable stroke or other abnormalities. All patients gave permission for the use of the data for research.

4.3       University of Porto (UP)

Patients with AD and MCI were included who were diagnosed according to the  current clinical diagnosis criteria (AD: McKhann et al., 2011; MCI: Petersen, 2004). The majority of the included patients with AD and MCI has been followed at the outpatient dementia clinic of Hospital de São João (Porto, Portugal). A few patients with AD have been referred from external Institutions, for a second opinion. In addition, healthy control subjects were volunteers recruited in research studies. All subjects provided consent to be included in this study.

5         Data 

5.1       Data characteristics


Erasmus MC (EMC)

VU Medical Center (VUmc)

University of Porto (UP)


3T, GE Healthcare

Protocol 1: Discovery MR750

Protocol 2: Discovery MR750

Protocol 3: HD Platform

3T, GE Healthcare

Signa HDxt

3T, Siemens
Trio A Trim


3D inversion recovery (IR) fast spoiled gradient-recalled echo (FSPGR)

3D inversion recovery (IR) fast spoiled gradient-recalled echo (FSPGR)

3D magnetization prepared rapid acquisition gradient echo (MPRAGE)

Scan parameters

Protocol 1: TI=450 ms, TR=7.9 ms, TE=3.1 ms, Parallel imaging: Yes (ASSET factor=2)

Protocol 2: TI=450 ms, TR=6.1 ms, TE=2.1 ms, Parallel imaging: No

Protocol 3: TI=300 ms,TR=10.4 ms, TE=2.1 ms, FA=8, Parallel imaging: No

TI=450 ms, TR=7.8 ms, TE=3.0 ms, FA=12, Parallel imaging: Yes (ASSET factor=2)

TI=900 ms, TR=2300 ms, TE=3ms, FA=9, Parallel imaging: No


Protocol 1: 0.94x0.94x1.0 mm (sagittal)

Protocol 2: 0.94x0.94x0.8 mm (axial)

Protocol 3: 0.49x0.49x0.8 mm (axial)

0.9x0.9x1 mm

1x1x1.2 mm

Nr of scans




Age Mean (Std)

68.6 (7.8)

62.2 (5.9)

67.8 (9.1)

Male (%)




The class sizes will not be released before the workshop. The prior for each class is ~1/3.

5.2       Data preprocessing

The T1-weighted MRI data was anonymized and faces were removed from the scan. Next to the original anonymized T1w scans, also a version of the scans is available that is already non-uniformity corrected. The correction was performed with N3 (N4ITK, Tustison et al., 2010) with the following settings: shrink factor = 4, number of iterations = 150, convergence threshold = 0.00001, initial b-spline mesh resolution = 50.


Binnewijzend, M.A.A., Kuijer, J.P.A., Benedictus, M.R., van der Flier, W.M., Wink, A.M., Wattjes, M.P., van Berckel, B.N.M., Scheltens, P., Barkhof, F., 2013. Cerebral blood flow measured with 3D pseudocontinuous arterial spin labeling MR imaging in Alzheimer disease and mild cognitive impairment: A marker for disease severity. Radiology 267, 221–230.

Bron, E.E., Smits, M., van der Flier, W.M. , et al., Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: the CADDementia challenge, Neuroimage, 2015.

Cuingnet, R., Gerardin, E., Tessieras, J., Auzias, G., Lehéricy, S., Habert, M.O., Chupin, M., Benali, H., Colliot, O., 2011. Automatic classification of patients with Alzheimer’s disease from structural MRI: A comparison of ten methods using the ADNI database. Neuroimage 56, 766–781.

Davatzikos, C., Fan, Y., Wu, X., Shen, D., Resnick, S.M., 2008. Detection of prodromal Alzheimer’s disease via pattern classification of magnetic resonance imaging. Neurobiol Aging 29, 514–523.

Duchesne, S., Caroli, A., Geroldi, C., Barillot, C., Frisoni, G.B., Collins, D.L., 2008. MRI-based automated computer classification of probable AD versus normal controls. IEEE Trans Med Imag 27, 509–520.

Fan, Y., Batmanghelich, N., Clark, C.M., Davatzikos, C., others, 2008a. Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline. Neuroimage 39, 1731.

Fan, Y., Resnick, S.M., Wu, X., Davatzikos, C., 2008b. Structural and functional biomarkers of prodromal Alzheimer’s disease: a high-dimensional pattern classification study. Neuroimage 41, 277–285.

Fan, Y., 2011. Ordinal ranking for detecting mild cognitive impairment and Alzheimer’s disease based on multimodal neuroimages and CSF biomarkers. Proc First Int. Conf. Multimodal brain image Anal. 44–51.

Gray, K.R., Aljabar, P., Heckemann, R.A., Hammers, A., Rueckert, D., 2013. Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease. Neuroimage 65, 167–75.

Jack, C.R., Wiste, H.J., Vemuri, P., Weigand, S.D., Senjem, M.L., Zeng, G., Bernstein, M. a, Gunter, J.L., Pankratz, V.S., Aisen, P.S., Weiner, M.W., Petersen, R.C., Shaw, L.M., Trojanowski, J.Q., Knopman, D.S., 2010. Brain beta-amyloid measures and magnetic resonance imaging atrophy both predict time-to-progression from mild cognitive impairment to Alzheimer’s disease. Brain 133, 3336–48.

Klöppel, S., Stonnington, C.M., Chu, C., Draganski, B., Scahill, R.I., Rohrer, J.D., Fox, N.C., Jack Jr, C.R., Ashburner, J., Frackowiak, R.S.J., 2008. Automatic classification of MR scans in Alzheimer’s disease. Brain 131, 681–689.

Klunk, W.E., Engler, H., Nordberg, A., Wang, Y., Blomqvist, G., Holt, D.P., Bergstro, M., Savitcheva, I., Debnath, M.L., Barletta, J., Price, J.C., Sandell, J., Lopresti, B.J., Wall, A., Koivisto, P., Antoni, G., Mathis, C.A., Långstro, B., 2004. Imaging brain amyloid in Alzheimer’s disease with Pittsburgh Compound?B. Ann Neurol 55, 306–319.

Koikkalainen, J., Pölönen, H., Mattila, J., van Gils, M., Soininen, H., Lötjönen, J., others, 2012. Improved classification of Alzheimer’s disease data via removal of nuisance variability. PLoS One 7, e31112.

Magnin, B., Mesrob, L., Kinkingnéhun, S., Pélégrini-Issac, M., Colliot, O., Sarazin, M., Dubois, B., Lehéricy, S., Benali, H., 2009. Support vector machine-based classification of Alzheimer’s disease from whole-brain anatomical MRI. Neuroradiology 51, 73–83.

McKhann, G., Drachman, D., Folstein, M., Katzman, R., Price, D., Stadlan, E.M., 1984. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer's Disease. Neurology 34, 939–44.

McKhann, G.M., Knopman, D.S., Chertkow, H., Hyman, B.T., Jr., C.R.J., Kawas, C.H., Klunk, W.E., Koroshetz, W.J., Manly, J.J., Mayeux, R., Mohs, R.C., Morris, J.C., Rossor, M.N., Scheltens, P., Carillo, M.C., Thies, B., Weintraub, S., Phelps, C.H., 2011. The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement 7, 263–269.

Petersen, R.C., 2004. Mild cognitive impairment as a diagnostic entity. J. Intern. Med. 256, 183–94.

Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: improved N3 bias correction. IEEE Trans Med Imag 29, 1310–1320.

Vemuri, P., Gunter, J.L., Senjem, M.L., Whitwell, J.L., Kantarci, K., Knopman, D.S., Boeve, B.F., Petersen, R.C., Jack Jr, C.R., 2008. Alzheimer’s disease diagnosis in individual subjects using structural MR images: validation studies. Neuroimage 39, 1186–1197.

Wolz, R., Julkunen, V., Koikkalainen, J., Niskanen, E., Zhang, D.P., Rueckert, D., Soininen, H., Lötjönen, J., 2011. Multi-method analysis of MRI images in early diagnostics of Alzheimer’s disease. PLoS One 6, e25446.