www.visceral.eu
Tutorial material
Deliverable number
D5.5
Dissemination level
Public
Delivery data
due 05.05.2015
Status
Final
Authors
Oscar Alfonso Jim´enez del Toro, Roger Schaer, Abdel Aziz Taha, Marianne Winterstein, Markus Krenn, Andr´as Jakab, Georg Langs, Henning M¨uller and Allan Hanbury This project is supported by the European Commission under the Information and Communication Technologies (ICT) Theme of the 7th Framework Programme for Research and Technological Development.
D5.5 Tutorial material
Executive Summary This deliverable presents a collection of the tutorial material produced during the project. Different groups were addressed for tutorial talks in the VISCERAL organized workshops, this document fully address the diversity of the topics of interest within the project. The different sections in this document include: • ethical and privacy aspects for data access; • medical annotation procedure and quality checks; • cloud–based evaluation infrastructure; • metrics and evaluation tools. The content of this deliverable focuses on the necessary steps, taken during the project, for setting up and managing evaluation benchmarks with a large medical data set in a cloud infrastructure.
Page 1 of 37
D5.5 Tutorial material
Table of Contents 1
Introduction
5
2
Ethical and privacy aspects for data access
5
3
Medical annotation procedure and quality checks 3.1 Annotation ticket life cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The VISCERAL Ticketing Framework . . . . . . . . . . . . . . . . . . . . . . 3.3 Annotation Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 7 8 9
4
Cloud–based evaluation infrastructure 4.1 Setting up a cloud environment . . . . . . 4.1.1 Requirements . . . . . . . . . . . 4.1.2 Costs and logistics . . . . . . . . 4.2 Setting up a benchmark in the cloud . . . 4.3 Cloud setup for VISCERAL benchmarks . 4.3.1 Storing data sets . . . . . . . . . 4.3.2 Participants VMs . . . . . . . . . 4.3.3 Evaluation phase . . . . . . . . .
5
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
10 10 10 11 12 13 13 13 13
Metrics and evaluation tools 5.1 Introduction . . . . . . . . . . . . . . . . . . 5.2 Evaluation Metrics for volume segmentation . 5.3 Metric definitions and Algorithms . . . . . . 5.3.1 Spatial overlap based metrics . . . . . 5.3.2 Pair–counting based metrics . . . . . 5.3.3 Information theoretic metrics . . . . . 5.3.4 Probabilistic metrics . . . . . . . . . 5.3.5 Spatial Distance Based Metrics . . . 5.4 Multiple definitions of metrics in the literature 5.5 Implementation . . . . . . . . . . . . . . . . 5.5.1 Architecture . . . . . . . . . . . . . . 5.5.2 Programming Environment . . . . . . 5.5.3 Efficiency optimization . . . . . . . . 5.5.4 Usage . . . . . . . . . . . . . . . . . 5.6 Availability and requirements . . . . . . . . . 5.7 Conclusions on evaluation measures . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
14 14 15 17 17 21 24 25 26 29 30 30 30 31 31 31 32
. . . . . . . .
6
Conclusions
32
7
References
33
Page 2 of 37
D5.5 Tutorial material
List of Figures Fig.1 Fig.2
Work flow overview of the VISCERAL Ticketing Framework. . . . . . . . Entity Relationship diagram of the VISCERAL Ticketing System DB. . . .
Page 3 of 37
8 10
D5.5 Tutorial material
Notation Ii Ii ∈ R2
Image or volume with index i. If it is 2D or 3D data will become clear from the context. 2D data such as images.
Abbreviations MRI HTTP FTP ARI AUC AVD CT DICE FMS FN FNR FP FPR GCE HD ICC ITK JAC KAP MHD MI NLM PBD RI TN TNR TP TPR VOI VS
Magnetic Resonance Imaging HyperText Transfer Protocol File Transfer Protocol Adjusted Rand Index Area under ROC curve Average distance Computed Tomography Dice coefficient F-Measure False negative False negative rate False positive False positive rate Global Consistency Error Hausdorff distance Interclass Correlation Insight Segmentation and Registration Toolkit Jaccard index Cohens Kappa measure Mahalanobis Distance Mutual Information National Library of Medicine Probabilistic Distance Rand Index True negative True negative rate True positive True positive rate Variation of Information Volumetric Similarity
Page 4 of 37
D5.5 Tutorial material
1
Introduction
The VISual Concept Extraction Challenge in Radiology (VISCERAL) organized a series of benchmarks (3) with large-scale 3D radiology images in an innovative cloud–based evaluation approach. A continuous evaluation infrastructure and different annotation and management tools had to be developed and implemented to run these benchmarks in a rigorous scientific format. Several target groups were considered for the dissemination and exploitation of the project goals, as well as for the generated new knowledge. However, finding an specific audience for classical tutorial talks was not straightforward. People interested in running benchmarks with large data or developers of cloud–based infrastructures can have heterogeneous technical backgrounds and motivations. The majority of the technical and organizational aspects of the project were presented in the workshops organized for the result publications of the benchmarks. Guided talks and specification documents were also carried out to explain the annotation tools and cloud usage both for the medical annotators and benchmark participants. This document was compiled with the information gathered by the VISCERAL Consortium regarding: • ethical and privacy aspects for large data access; • medical annotation procedures and quality checks; • cloud–based evaluation infrastructure; • evaluation metrics and tools. A brief overview of these topics and lessons learned throughout the VISCERAL project are outlined in the following document. The individual sections reflect on the tasks and tools needed for future organizers and participants of such benchmarks, as well as the detailed workflow for data preparation and evaluation, as all of this can be useful in many environments.
2
Ethical and privacy aspects for data access
The data used in the project consists of human medical imaging data and corresponding metainformation. Therefore, its use is subject to specific regulations, on both the EU and national level, that controls the collection, use, and distribution of human data, and its inclusion in research studies. For VISCERAL, each data provider was responsible for the handling of the ethical, legal and privacy aspects individually. This typically involved: 1. Review of the data collection plan by the local competent Medical Ethics Committee / Institutional Review Board (MEC). 2. Handling of informed consent procedures. 3. Anonymization of the data prior to distribution..
Page 5 of 37
D5.5 Tutorial material
Free informed consent by participants in a medical study is a prime aspect of the ethical considerations concerning medical research. In the context of retrospective studies with usage of data acquired prior to study start, the collection of informed consent is neither feasible nor possible, benefits and risks have to be weighted by the competent ethics committee. The basis for a positive decision of the ethical committee of the University Hospital of Heidelberg (EKUKL-HD) is the positive approval of the EKMUW, including a study protocol providing detailed information regarding the study, the anonymization, the assurance of privacy, and the data handling, as well as a study protocol that covered the use of anonymized medical imaging data in the KHRESMOI project (Study protocol EK Nr.804/2010-Amendment December 2012). From this experience and the positive decision and approval of the ethical boards of the UKL-HD (S-465/2012) and MUW, we have gathered several aspects and recommendations, that may help in future similar projects to deal with privacy issues and obtaining approval from ethical committees: • Age of patients included: It may be helpful to only include data sets of patients of the age 18 or older. • Usage of retrospective vs. prospective data sets: Usage of retrospective, older data sets. An informed consent of patients in retrospective, older data sets is likely to not be needed, because a retrospective obtaining of informed consent is extremely complex and elaborated without being certainly successful, since it is probable that many patients already deceased or moved. Usage of prospective or current data sets only, if an informed consent is signed, agreeing with the usage of the patients images and anonymized data for the project. • Anonymization of all data: All selected image data sets were anonymized individually and locally by the three data providers. For anonymization, the following items were removed from the DICOM headers: Date of birth, (only age was preserved), institution name, patient name, patient-ID, examination number and study date. A key of the patientID and the referring pseudonym is held by the data provider and stored individually. Also, other meta data, such as clinical question and radiological reports were anonymized, using only extracted mesh terms (and their negotiations) from the reports. Additionally, whole body CT scans were defaced (image data of the face was destroyed), in order to ensure, that no identification of a patient is possible. • End User Agreement: In order to ensure the correct and only scientific usage of the data, benchmark participants have to sign an end user agreement 1 . The signed agreements where checked individually by VISCERAL. • Safe storage in the cloud: Evaluation campaigns are run on a cloud server, provided by Microsoft (Azure). Only authorized participants who signed the End User Agreement have access to the stored data. The data access closes, when one benchmark is finished. Participants only have access to a small, well chosen and anonymized data set. Since the cloud server is in Europe, it is subject to European law. Access regulation and local data storage is secure and protected by European law. 1 http://wiki.visceral.eu/images/3/31/User
Agreement VISCERAL Benchmark 1 final.pdf
Page 6 of 37
D5.5 Tutorial material
• Long-term usage of data: A central element of sustainable, deep-impacting evaluation campaigns in developing new methods is the the long-term availability of the data. So VISCERAL aims to provide the data as well as the algorithms beyond the project period. Comparable data sets are the BRATS-data set for computer-based segmentation of brain lesions 2 . A deletion of the data after the end of the project would mean, that the results of VISCERAL are not reproducible and cannot be verified anymore. In order to maintain the results and the scientific progress achieved through the project, the EKUKL-HD agreed to provide the data three more years after the end of the project. If further usage of the data is needed, an additional amendment for the corresponding study protocol (S-465/2012) will be provided.
3
Medical annotation procedure and quality checks
One goal of the VISCERAL project is to create large data sets that contain expert annotations (organ segmentations, landmarks and lesion annotations) of high quality in medical imaging data. For this purpose a ticketing framework was developed that allows the management of different annotation types, the distribution of annotation tickets to multiple annotators and to perform a quality check of annotations to ensure consistent annotation quality across different annotators. The remaining part of this section describes the typical life cycle of an annotation ticket in Section 3.1 and the framework that has been build to monitor and distribute annotation tickets in Section 3.2.
3.1
Annotation ticket life cycle
The typical life cycle of an annotation ticket within the VISCERAL project can be outlined as follows: 1. In a first step a ticket is created. An annotation ticket is defined by the volume to be annotated, the annotation type (segmentation, lesion or landmark) and the annotator. 2. The annotator receives the ticket through a web-interface, performs annotation and submits the annotated data through the web-interface. 3. Depending on the type of annotation, an automated quality check is performed to detect common annotation errors, such as empty label volumes, incorrect file extensions or wrong naming of landmarks. 4. If the annotation passes the automated quality check the ticket is assigned to a Quality Check (QC) annotator, otherwise it is reassigned for annotation. 5. The QC annotator receives the QC ticket trough the web-interface, performs the QC and submits the QC result (including textual feedback if the QC is negative) through the webinterface. 2 http://www2.imm.dtu.dk/projects/BRATS2012,
https://vsd.unibe.ch/WebSite/BRATS2012/Start
Page 7 of 37
D5.5 Tutorial material
6. The ticket reaches its final state if the QC is positive, otherwise the annotator receives textual feedback and the ticket is reassigned for annotation.
3.2
The VISCERAL Ticketing Framework
The VISCERAL ticketing framework is designed to monitor and manage the full life cycle of an annotation ticket, to provide an interface for annotators and QC team members for ticket submission and consists of three main components: 1. Ticketing Data base: A MySQL data base that stores information of volumes to annotate, annotators, annotation types and tickets and their states (pending, submitted, QC passed, QC failed). 2. Backend: A backend implemented in Matlab, to manage volumes, annotators, ticket types and annotation tickets. The backend is used to distribute tickets, to perform automated quality check and to distribute QC tickets of submitted annotations. 3. Frontend: A web-interface that is used by annotators and QC team members to receive their assigned tickets and to submit annotations and QC results. Figure 1 provides an overview of the ticketing system implemented within the VISCERAL project. The frameworks source code is publicly available at3 , which provides create statements
Figure 1: Work flow overview of the VISCERAL Ticketing Framework.
Page 8 of 37
D5.5 Tutorial material
for the ticketing DB, the source of the web interface, getter and setter functions of the backend implemented in Matlab including a tutorial and installation guidelines to set up the framework. Ticketing System Data Base The Data Base (DB) of the ticketing system is created by SQL scripts provided in the ticketing repository. All relevant information is stored in five DB tables: • Annotator: Identified by an AnnotatorID, holds next to contact information, name and password (for login) a flag indicating if the annotator is currently available and an additional flag if the annotator is considered a QC team member. • Volume: A volume is identified by its PatientID and VolumeID. Additionally the modality, bodyregion and the filename are stored. • AnnotationType: Entries in this table define which types of annotations can be managed by the ticketing system. Each entry is identified by its AnnotationTypeID. Additionally the name, the file extension of the submitted files, the remote upload directory, the category (segmentation, landmarks,...) and an optional string describing the files prefix are stored. Exemplary entries of this table are created within the given SQL scripts. • Status: Defines all states a ticket can have during its life cycle. A status is identified by its StatusID and stores its name and description as well as to which type of annotators (QC and normal annotators) the status option is available in the ticketing web interface. Default entries of this table are created within the given SQL scripts. • Annotation: This table represents an annotation ticket. An annotation entry is identified by its PatientID and VolumeID, the AnnotationTypeID, the StatusID and the AnnotatorID of the annotator a ticket is assigned to. Additionally the filename, a timestamp, the ID of the annotator who performs the QC of the ticket and a QC comment is stored for each ticket. Figure 2 illustrates the ER diagram of the resulting data base. A demo version of the webinterface including exemplary test data is available at4 . Login information is given on request for security reasons.
3.3
Annotation Guidelines
The annotation guidelines used for the VISCERAL data set manual annotations are added to this document as annex. These guidelines describe how to perform the anatomical structure segmentations and landmark locations of the gold corpus data set from the project. They were written in Hungarian for the VISCERAL medical annotators in Debrecen, Hungary. 3 https://github.com/mkrenn/visceralTicketing 4 https://www.cir.meduniwien.ac.at/visceral/tickets/demo/tickets/login.php
Page 9 of 37
D5.5 Tutorial material
Figure 2: Entity Relationship diagram of the VISCERAL Ticketing System DB.
4
Cloud–based evaluation infrastructure
4.1 4.1.1
Setting up a cloud environment Requirements
Cloud–based solution providers offer many varied features, such as : • data storage, both structured (database) and unstructured (files); • computation with virtual machines (VMs); • authentication and security mechanisms; • application–specific features: – distributed computing (e.g. Hadoop5 ); – media services (e.g. video transcoding); – monitoring tools; – ... The first step in setting up an environment is to determine which features are needed and to compare the availability, pricing and usage modalities of these features with different cloud providers. Another important step is to determine if there are any restrictions concerning the region in which the data and services are hosted. Sample questions include : 5 http://hadoop.apache.org
Page 10 of 37
D5.5 Tutorial material
• Can the data be hosted anywhere in the world, or only inside a specific region (USA, Europe...)? • If there is a region restriction, are all the required services available in that region? • What are the costs of moving data between different regions? Once the required features are identified and a suitable provider is selected, the next step is planning the setup of the environment. 4.1.2
Costs and logistics
It is important to evaluate the needs in terms of required resources, both to have a clear idea of the administrative load (managing virtual machines, storage containers, access rights, etc.) and to estimate the costs of maintaining the infrastructure. All major cloud providers have cost calculating tools, making it easier to make an accurate approximation of monthly costs. Depending on the provider, different things can add to to the total cost: • Storage – Data stored (usually billed as Gigabytes per month) – Incoming / outgoing data traffic (usually billed per Gigabyte, incoming traffic is usually free) – Storage requests (PUT/COPY/POST/LIST/GET HTTP requests) • Virtual Machines – Running virtual machines (usually billed by the hour) – Virtual machines attached storage – Data transfer to and from the virtual machines – Additional IP addresses The costs also depends on the usage scenarios: • Are data stored only for short periods and then removed, or do they need to be available for months or years? • Are virtual machines required to be running 24/7 or are they used periodically for heavy computation and then turned off? • Are Windows Virtual Machines required? (they are generally more expensive than Linux– based instances because of licensing costs) Making cost projections for several months or a year can help in managing the resources more efficiently and making adjustments before the costs exceed expectations. The other aspect of the planning phase is to think about the resource management tasks involved. Any manual tasks can quickly become daunting when they need to be performed on a multitude of Virtual Machines. Properly configuring the base images used for future virtual machine instances can save a lot of time and help in avoiding technical problems. Initial configuration tasks can include: Page 11 of 37
D5.5 Tutorial material
• Setting sensible values for password expiration and complexity requirements • Disable unscheduled reboots on automatic update installation • Configure the systems firewall if any ports need to be accessible from the outside • ...
4.2
Setting up a benchmark in the cloud
Once the cloud provider has been selected and the infrastructure requirements have been defined, a workflow for an evaluation benchmark should be created. This workflow should include at least the following elements: • Description of the different phases of the benchmark – Examples: Data set creation, training phase, testing phase... – Define what should happen in each phase and who is responsible for which task • Required security measures – Geographic location of the data and infrastructure – Access control for participants and administrators: time restrictions for accessing the data, user rights... – Create security protocols: firewall software, antivirus, end-user agreement... • Creation of the required resources for the various phases – Storage containers for the data ∗ Different containers for the phases are recommended. It makes locating and data management easier. – Virtual machines for computation ∗ Creation of preconfigured machine templates (images) is recommended. It allows avoiding additional manual configuration on each machine after creation ∗ The variety of operating systems provided to the participants impacts the administrative workload involved in setting up the infrastucture. Managing both Linux and Windows instances can make administrative tasks and automation more difficult, requiring at least two variants of all used scripts or tools • Definition of data exchange protocols between the participants and the cloud infrastructure – How do participants upload / download data to and from the cloud? – Are there additional data needed for the benchmark located outside the cloud (registration system, documentation...)?
Page 12 of 37
D5.5 Tutorial material
4.3
Cloud setup for VISCERAL benchmarks
The VISCERAL project was hosted in the Microsoft Azure cloud environment. The usage of a public cloud platform such as Microsoft Azure enabled virtually unlimited scalability, both in terms of storage space as well as computation power. The Microsoft Azure platform provides a framework for the creation and management of virtual machines and data storage containers. The platform’s web management portal was used for the VISCERAL project to simplify the administrative tasks. A large amount of documentation and tools used for the different administrative tasks and technical aspects of the project are described on the Microsoft Azure website. Provision and management of VMs, as well as data storage, were the main cloud services used during the project. In the following paragraphs a brief description of these services is given. 4.3.1
Storing data sets
Initially, the full data set with both the medical data and additional annotations created by expert radiologists was uploaded to a cloud storage container. Other cloud storage containers were then created in each benchmark to store the training and testing data sets, participant output files and evaluations. Time–restricted read–only access keys were distributed securely to the participants for accessing the training data sets. Participants had no access to the testing set and subsequent evaluation results. Over the course of the project, new images and their annotations were added to the storage containers when required. 4.3.2
Participants VMs
In order to run the VISCERAL benchmarks, the participants needed access to the stored data and computing instances to execute their algorithms. Virtual machines running on the Microsoft Azure cloud infrastructure were pre–configured to run these tasks. Different templates were configured for 5 operating systems including both Windows and Linux. A virtual machine was provided to each participant, allowing them to access the training data set and upload their algorithms. Each VM has a temporary storage space where the participant output files are stored during the testing phase. This temporary data is deleted each time the VM is shut down. All the participant VM instances had the same computing specifications and capabilities. Participants could remotely access their VMs during the training phase. Moreover, they could install all the tools and libraries needed to run their algorithms. At this stage they could optimize their approaches with the available training set. Specification guidelines were written by the administrators for each benchmark on the usage and permissions applying to the VMs. The VISCERAL registration and management system6 , containing all the information needed in the benchmarks, was created (user agreement, specifications, data set lists). Through the participant dashboard in the system, participants received the private access credentials for the their VM and had the option to start it or shut it down during the training phase. 4.3.3
Evaluation phase
For evaluation of the participant algorithms, the following procedure was implemented: 6 http://visceral.eu:8080/register/Login.xhtml
Page 13 of 37
D5.5 Tutorial material
1. Participants submit their VMs through the participant dashboard once they have set up their algorithm. 2. The access to the VM is then restricted to a pre–defined pool of machines belonging to the administrators networks. Therefore participants can no longer access their virtual machine. 3. A batch script is executed in the VM calling the participants executable to perform the segmentations in the testing set. 4. First the algorithms are tested on a small subset of the data and the existence of an expected output file is verified. If the procedure is successful the script follows the evaluation with the complete testing set. Otherwise, the VM is shut down and returned to the participant. 5. The generated output segmentations are then stored in a previously created cloud storage container. 6. The participant segmentations are evaluated against the manually annotated ground truth available in a different cloud storage container. 7. The final results are saved in the cloud and sent to the participant dashboard. The VM is then shut down and returned to the participant. The segmentations and results are consequently deleted from the VM’s temporary storage drive. 8. The participants check their results and can manually select those they wish to publish on the public leaderboard. 9. With the continuous evaluation system, a new submission can be uploaded by the participants, but no earlier than a week after their previous submission to avoid reverse– engineering.
5 5.1
Metrics and evaluation tools Introduction
Medical volume segmentation is an important process in medical imaging. Segmentation methods with high precision and high reproducibility are a main goal in surgical planning because they directly impact the results, e.g. the detection and monitoring of tumor progress [64, 65, 28]. Warfield et al. [58] denoted the clinical importance of better characterization of white matter changes the brain tissue and showed that particular change patterns in the white matter are associated with some brain diseases. Accurately recognizing the change patterns is of great value for early diagnosis and efficient monitoring of diseases. Therefore, assessing the accuracy and the quality of segmentation algorithms is of great importance. Medical segmentations are often fuzzy meaning that voxels have a grade of membership in [0, 1]. This is e.g. the case when the underlying segmentation is the result of averaging
Page 14 of 37
D5.5 Tutorial material
different segmentations of the same structure annotated by different annotators. Here, segmentations can be thought of as probabilities of voxels belonging to particular classes. One way of evaluating fuzzy segmentations is to threshold the probabilities at a particular value to get binary representations that can be evaluated as crisp segmentations. However, thresholding is just a workaround that provides a coarse estimation and is not always satisfactory. Furthermore, there is still the challenge of selecting the threshold because the evaluation results depend on the selection. This is the motivation for providing metrics that are capable of comparing fuzzy segmentations without loss of information. Note that there is another common interpretation of fuzzy segmentation as partial volume, where the voxel value represents the voxel fraction that belongs to the class. The fuzzy metric definitions provided in this document can be applied for this interpretation as well. There are different quality aspects in medical volume segmentation according to which types of segmentation errors can be defined. Metrics are expected to indicate some or all of these errors, depending on the data and the segmentation task. Based on four basic types of errors (added regions, added background, inside holes and border holes), Shi et al. [50] described four types of image segmentation errors, namely the quantity (number of segmented objects), the area of the segmented objects, the contour (degree of boundary match), and the content (the segmented region being not distorted by inside holes and boundary holes). Fenster et al. [16] categorized the requirements of medical segmentation evaluation into accuracy (the degree to which the segmentation results agree with the ground truth segmentation), the precision as a measure of repeatability, and the efficiency which is mostly related with time. Under the first category (accuracy), they mentioned two quality aspects, namely the delineation of the boundary (contour) and the size (volume of the segmented object). The alignment, which denotes the general position of the segmented object, is another quality aspect, which could be of more importance than the size and the contour when the segmented objects are very small. There is a need for a standard evaluation tool for medical image segmentation which standardizes not only the metrics to be used, but also the definition of each metric. To illustrate this importance, Section 5.4 shows examples of metrics with more than one definition in the literature leading to different values, but each of them is used under the same name. In the text retrieval domain, the TREC EVAL tool7 provides a standardization of evaluation that avoids such confusion and misinterpretation and provides a standard reference to compare text retrieval algorithms. The medical imaging domain lacks such a widely applied instrument. All the content of Section 5 are adapted parts of an article [54] that was submitted to BMC Medical imaging.
5.2
Evaluation Metrics for volume segmentation
We present a set of metrics for validating medical volume segmentation that were selected based on a literature review of papers in which medical volume segmentations are evaluated. Only metrics with at least two references of use are considered. An overview of these metrics is available in Table 1. Depending on the relations between the metrics, their nature and their definition, we group them into five categories, namely spatial overlap based, pair–counting based, information theoretic based, probabilistic based, and spatial distance based. The aim 7 More
about TREC EVAL under http://trec.nist.gov/trec_eval/
Page 15 of 37
D5.5 Tutorial material
of this grouping is to first ease discussing the metrics in this paper and second to enable a reasonable selection when a subset of metrics is to be used, i.e. selecting metrics from different groups to avoid biased results. Table 1: Overview of the metrics implemented in this tool Metric Dice (=F1-Measure) Jaccard index
Symb. Reference of use in medical images DICE [64] [65] [31] [7] [20] [29] [3] [37] [30] [9] [1] JAC [37] [20] [45] [56] [9] [1] [29] [46] T PR [1] [37] [29] [30] [41] [55]
True positive rate (Sensitivity, Recall) True negative rate (Specificity) T NR False positive rate (=1-Specificity, FPR Fallout) False negative rate (=1-Sensitivity) FNR F-Measure (F1-Measure=Dice) FMS Volumetric Similarity VS Global Consistency Error GCE Rand Index RI Adjusted Rand Index ARI Mutual Information MI Variation of Information VOI Interclass correlation ICC Probabilistic Distance PBD Cohens kappa KAP Area under ROC curve AUC Hausdorff distance HD Average distance Mahalanobis Distance
[1] [37] [29] [55] → Specificity
→ Sensitivity → Dice [20] [45] [56] [3] [9] [19] [46] [45] [56] [60] [61] [46] [45] [56] [60] [61] [59] [38] [65] [48] [31] [45] [60] [56] [61] [18] [15] [18] [20] [37] [64] [65] [41] [38] [37] [18] [39] [20] [3] [30] [9] [42] AV D [37] [30] MHD [40] [9]
cat. Definition 1
(6)
1
(7)
1
(10)
1 1
(11) (12)
1 1 1 1 2 2 3 3 4 4 4 4 5
(13) (15), (16) (17) (18) to (20) (30) (32) (33) to (38) (39), (35) (41) (43) (44) to (46) (47) (48), (49)
5 5
(50), (51) (52) to (54)
The symbols in the second column are used to denote the metrics throughout the paper. The column “reference of use” shows papers where the corresponding metric has been used in the evaluation of medical volume segmentation. The column “category” assigns each metric to one of the following categories: (1) Spatial overlap based, (2) Pair counting based, (3) Information theoretic based, (4) Probabilistic based, and (5) Spatial distance based. The column “definition” shows the equation numbers where the metric is defined
Page 16 of 37
D5.5 Tutorial material
5.3
Metric definitions and Algorithms
We present the definitions of all metrics that have been implemented. Let V be a medical volume represented by the point set X = {x1 , ..., xn } with |X| = w × h × d = n where w, h and d are the width, height and depth of the grid on which the volume is defined. Also let G be a ground truth segmentation represented by the partition Sg = {Sg1 , Sg2 } of X with the assignment function fgi (x) that provides the membership of the object x in the subset Sgi . Furthermore, let T be the segmentation being evaluated represented by the partition St = {St1 , St2 } of X with j j the assignment function ft (x) that provides the membership of x in the class St . Note that in this paper, we only deal with partitions with two classes, namely the class of interest (anatomy or feature) and the background. We always assume that the first class (Sg1 , St1 ) is the class of j interest and the second class (Sg2 , St2 ) is the background. The assignment functions fgi and ft can either be crisp when their range is {0, 1} or fuzzy when their range is [0, 1]. Note that the crisp partition is just a special case of the fuzzy partition. We also assume that the memberships of a given point x always sum to one over all classes. This implies that fg1 (x) + fg2 (x) = 1 and ft1 (x) + ft2 (x) = 1 for all x ∈ X. In the remainder of this section, we define the foundation of methods and algorithms used to compute all the metrics presented in Table 1. We structure the discussion in this section to follow the metric grouping given in the column “category”. This provides a structure that is advantageous for the implementation of the evaluation tool, that is to improve the efficiency by making use of the synergy between the metrics in each group to avoid repeated calculation of the same parameters. 5.3.1
Spatial overlap based metrics
In the following subsections, the spatial overlap based metrics are defined. Because all metrics from this category can be derived from the four basic cardinalities of the so–called confusion matrix, namely the true positives (T P), the false positives (FP), the true negatives (T N), and the false negatives(FN), we define these cardinalities for crisp as well as fuzzy segmentations, then we define the metrics based on them. Basic Cardinalities For two crisp partitions (segmentations) Sg and St , the confusion matrix consists of the four common cardinalities that reflect the spatial overlap between the two partitions, namely T P, FP, FN, and T N. These cardinalities provide for each pair of classes i ∈ Sg and j ∈ St the sum of agreement mi j between them. That is |X|
mi j =
∑
j
fgi (x p ) × ft (x p )
(1)
p=1
Table 2 shows the confusion matrix of the partitions Sg and St . Note that Equation 1 assumes crisp memberships. In the next section the four cardinalities are generalized to fuzzy partitions.
Generalization to Fuzzy segmentations Intuitively, one favorable way to generalize the spatial overlap based metrics presented in Table 1 for fuzzy partitions is to provide methods for calculating the cardinalities of the confusion matrix for fuzzy sets and then use them to define Page 17 of 37
D5.5 Tutorial material
Table 2: Confusion matrix: comparing ground truth segmentation Sg with test segmentation St Subset Sg1 Sg2 (= Sg1 )
St1 St2 (= St1 ) T P(m11 ) FP(m12 ) FN(m21 ) T N(m22 )
all metrics in this category. Here, the main task is to calculate the agreement between two assignments (probabilities). It is common to use a suitable triangular norm (t-norm) to calculate the agreement [32][8]. Given two probabilities p1 and p2 representing the memberships of a particular point to a particular class according to two different observers, we use the min(p1, p2) as a t-norm. That is, we define the agreement function g : [0, 1] × [0, 1] → [0, 1] that models the agreement on a particular voxel being assigned to a particular class as g(p1, p2) = min(p1, p2). This also means that the agreement on the same voxel being assigned to the background is given by g(1 − p1, 1 − p2). Intuitively, the disagreement between the observers is the difference between the probabilities given by |p1 − p2|. However, since the comparison is asymmetrical (i.e. one of the segmentations is the ground truth and the other is the test segmentation), we consider the signed difference rather than the absolute difference as in Equations 3 and 5. The four cardinalities defined in Section 5.3.1 can thereby be generalized to the fuzzy case as follows: n
T P = ∑ min( ft1 (xi ), fg1 (xi ))
(2)
i=1
n
FP = ∑ max( ft1 (xi ) − fg1 (xi ), 0)
(3)
i=1
n
T N = ∑ min( ft2 (xi ), fg2 (xi ))
(4)
i=1
n
FN = ∑ max( ft2 (xi ) − fg2 (xi ), 0)
(5)
i=1
Other norms have been used to measure the agreement between fuzzy memberships like the product t–norm, the L–norms, and the cosine similarity. We justify using the min t–norm by the fact that, in contrast to the other norms, the min t–norm ensures that the four cardinalities, calculated in Equations 2 to 5, sum to the total number of voxels, i.e. T P+FP+T N +FN = |X| which is an important requirement for the definition of metrics. Calculation of overlap based metrics In this section, we define each of the spatial overlap based metrics in Table 1 based on the basic cardinalities in Equation 1 (crisp) or Equations 2 to 5 (fuzzy). The Dice coefficient [14] (DICE), also called the overlap index, is the most used metric in validating medical volume segmentations. In addition to the direct comparison between automatic and ground truth segmentations, it is common to use the DICE to measure reproducibility (repeatability). Zou et al. [64] used DICE as a measure of the reproducibility as a statistical validation of manual annotation where segmenters repeatedly annotated the same MRI image, then Page 18 of 37
D5.5 Tutorial material
the pair–wise spatial overlap of the repeated segmentations is calculated using DICE. DICE is defined by DICE =
2.|Sg1 ∩ St1 | |Sg1 | + |St1 |
=
2.T P 2.T P + FP + FN
(6)
The Jaccard index (JAC) [26] between two sets is defined as the intersection between them divided by their union, that is JAC =
|Sg1 ∩ St1 | |Sg1 ∪ St1 |
=
TP T P + FP + FN
(7)
We note that JAC is always larger than DICE except at the extrema {0, 1} where they are equal. Furthermore the two metrics are related according to 2.|Sg1 ∩ St1 |
|Sg1 ∩ St1 |
= |Sg1 ∪ St1 | 2.(|Sg1 | + |St1 | − |Sg1 ∩ St1 |) DICE = 2 − DICE
JAC =
(8)
Similarly, one can show that DICE =
2.JAC 1 + JAC
(9)
This means that both metrics measure the same aspects and provide the same system ranking. Therefore, it does not provide additional information to select both of them together as validation metrics as done in [9, 1, 13]. True Positive Rate (T PR), also called Sensitivity and Recall, measures the portion of positive voxels in the ground truth that are also identified as positive by the segmentation being evaluated. Analogously, the True Negative Rate (T NR), also called Specificity, measures the portion of negative voxels (background) in the ground truth segmentation that are also identified as negative by the segmentation being evaluated. However these two measures are not common as evaluation measures of medical image segmentation because of their sensibility to segments size, i.e. they penalize errors in small segments more than in large segments [18, 16, 55]. Note that the terms positive and negative are rather for crisp segmentation. However, the generalization in Equations 2 to 5 extends the meaning of the terms to grade agreement. These two measures are defined as follows: Recall = Sensitivity = T PR =
TP T P + FN
(10)
TN (11) T N + FP There are two other measures that are related to these metrics, namely the false positive rate (FPR), also called Fallout, and the false negative rate (FNR). They are defined by Speci f icity = T NR =
Fallout = FPR =
FP = 1 − T NR FP + T N
Page 19 of 37
(12)
D5.5 Tutorial material
FN (13) = 1 − T PR FN + T P The equivalence in Equations 12 and 13 implies that only one of each two equivalent measures should be selected for validation and not both of them together [55], i.e. either FPR or T NR and analogously, either FNR or T PR. Another related measure is the precision, also called the positive predictive value (PPV ) which is not commonly used in validation of medical images, but it is used to calculate the F–Measure. It is defined by FNR =
Precision = PPV =
TP T P + FP
(14)
Fβ -Measure (FMSβ ) was first introduced in [10] as an evaluation measure for information retrieval. However, it is a special case of Rijsbergen’s effectiveness measure8 introduced in [47]. Fβ -Measure is a trade-off between PPV (precision, defined in Equation 14) and T PR (recall, defined in Equation 10). Fβ -Measure is given by FMSβ =
(β2 + 1).PPV.T PR β2 .PPV + T PR
(15)
With β = 1.0 (precision and recall are equally important), we get the special case F1 -Measure (FMS1 ); we call it FMS for simplicity. It is also called the harmonic mean and given by FMS =
2.PPV.T PR PPV + T PR
(16)
Here, we note that the FMS is mathematically equivalent to DICE. This follows from a trivial substitution for T PR and PPV in Equation 16 by their values from Equations 10 and 14. After simplification it results in the definition of DICE (Equation 6) As the name implies, volumetric similarity (V S) is a measure that considers the volumes of the segments to indicate similarity. There is more than one definition for the volumetric distance in the literature. However, we consider the definition in [45, 56, 46, 9], namely the absolute volume difference divided by the sum of the compared volumes. We define the Volumetric Similarity (V S) as 1 −V D where V D is the volumetric distance. That is VS = 1−
||St1 | − |Sg1 || |St1 | + |Sg1 |
= 1−
|FN − FP| 2.T P + FP + FN
(17)
The global consistency error (GCE) [34] is an error measure between two segmentations. Let R(S, x) be defined as the set of all voxels that reside in the same region of segmentation S where the voxel x resides. For the two segmentations S1 and S2, the error at voxel x, E(S1, S2, x) is defined as E(S1, S2, x) = 8 FMS β
can be derived by setting α =
1 β2 +1
|R(S1, x)\R(S2, x)| |R(S1, x)|
in Rijsbergen’s effectiveness measure E = 1 −
Page 20 of 37
(18) 1 1 +(1−α) 1 α PPV T PR
.
D5.5 Tutorial material
Note that E is not symmetric. The global consistency error (GCE) is defined as the error averaged over all voxels and is given by n 1 GCE(S1, S2) = min ∑ E(S1, S2, xi ), n i (19) n ∑ E(S2, S1, xi) i
Equation 19 can be expressed in terms of the four cardinalities defined in Equations 2 to 4 to get the GCE between the (fuzzy) segmentations Sg and St as follows 1 FN(FN + 2.T P) FP(FP + 2.T N) GCE = min + , n T P + FN T N + FP (20) FP(FP + 2.T P) FN(FN + 2.T N) + T P + FP T N + FN Overlap measures for multiple labels All the overlap measures presented previously assume segmentations with only one label. However, it is common to compare segmentations with multiple labels, e.g. two–label tumor segmentation (core and edema). Obviously, one way is to compare each label separately using the overlap measures presented previously, but this would lead to the problem of how to average the individual similarities to get a singly score. For this evaluation tool, we use the overlap measures proposed by Crum et. al [13], namely DICEml and JACml which are generalized to segmentations with multiple labels. For the segmentations A and B ∑ αl ∑ MIN(Ali , Bli ) labels,l
JACml =
voxels,i
∑ αl ∑ MAX(Ali , Bli ) labels,l
(21)
voxels,i
where Ali is the value of voxel i for label l in segmentation A (analogously for Bli ) and αl is a label–specific weighting factor that affects how much each label contributes to the overlap accumulated over all labels. Here, the MIN(.) and MAX(.) are the norms used to represent the intersection and union in the fuzzy case. DICEml can be calculated from JAC according to Equation 9, i.e. DICEml = 2.JACml /(1 + JACml ). Note that the equations above assume the general case of multiple label and fuzzy segmentation. However, in multiple label segmentations, voxel values mostly represent the labels (classes) rather than probabilities which means in most available image format there are either multiple labels or fuzzy segmentations. 5.3.2
Pair–counting based metrics
In this section, pair–counting based metrics, namely the Rand index and its extensions, are defined. At first we define the four basic pair–counting cardinalities, namely a, b, c, and d for crisp and fuzzy segmentations and then we define the metrics based on these cardinalities.
Page 21 of 37
D5.5 Tutorial material
Basic cardinalities Given two partitions of the point set X being compared, let P be the set of n2 tuples that represent all possible object pairs in X × X. These tuples can be grouped into four categories depending on where the objects of each pair are placed according to each of the partitions. That is, each tuple (xi , x j ) ∈ P is assigned to one of four groups of which the cardinalities are a, b, c, and d. • Group I: if xi and x j are placed in the same subset in both partitions Sg and St . We define a as the cardinality of Group I. • Group II: if xi and x j are placed in the same subset in Sg but in different subsets in St . We define b as the cardinality of Group II. • Group III: if xi and x j are placed in the same subset in St but in different subsets in Sg . We define c as the cardinality of Group III. • Group IV: if xi and x j are placed in different subsets in both partitions Sg and St . We define d as the cardinality of Group IV. Note that the count of tuples in Groups I and IV represents the agreement (a + d) whereas the count of tuples in Groups II and III (b + c) represents the disagreement between the two partitions. Obviously, because there are n2 = n(n − 1)/2 tuples, the direct calculation of these parameters needs O(n2 ) runtime. However, Brennan and Light [5] showed that these cardinalities can be calculated using the values of the confusion matrix without trying all pairs and thus avoiding the O(n2 ) complexity, that is a=
1 r s ∑ ∑ mi j (mi j − 1) 2 i=1 j=1
1 b= 2
1 c= 2
s
∑
r
m2. j −
j=1
∑
∑∑
m2i j
(23)
i=1 j=1
r j=1
s
(22)
r
m2i. −
s
∑∑
m2i j
(24)
i=1 j=1
d = n(n − 1)/2 − (a + b + c)
(25)
where r and s are the class counts in the compared partitions, mi j is the confusion matrix (Table 2), mi. denotes the sum over the ith row, and m. j denotes the sum over the jth column. Note that here, in contrast to spatial overlap based metrics, there is no restriction on the number of classes in the compared partitions. However, in the proposed evaluation tool, we are interested in segmentations with only two classes, namely the anatomy and the background; i.e. r = s = 2. We define the four cardinalities for this special case, more specifically for the segmentations Sg and St defined in Section 5.3 based on the four overlap parameters defined in Section 5.3.1 1h a = T P(T P − 1) + FP(FP − 1) 2 (26) i +T N(T N − 1) + FN(FN − 1) Page 22 of 37
D5.5 Tutorial material
1h (T P + FN)2 + (T N + FP)2 2 i 2 2 2 2 −(T P + T N + FP + FN )
(27)
1h c = (T P + FP)2 + (T N + FN)2 2 i −(T P2 + T N 2 + FP2 + FN 2 )
(28)
d = n(n − 1)/2 − (a + b + c)
(29)
b=
Generalization to fuzzy segmentations As mentioned above, since the cardinalities a, b, c, and d are by definition based on grouping all the pairwise tuples defined on Sg and St , this requires processing n(n − 1)/2 tuples which means a direct computation of these cardinalities for fuzzy segmentations takes O(n2 ) runtime. For medical segmentation, this complexity could be a problem since the size of medical volumes could reach 8-digit numbers. Methods (Huellermeier et al. [21], Brouwer [6], Campello [8]) have been proposed that calculate the Rand index and its extension for fuzzy segmentations using different approaches. None of these approaches is efficiently applicable in the 3D medical imaging domain because they all have a run time complexity of O(n2 ). However, Anderson et al. [2] proposed a method that calculates the four cardinalities for fuzzy sets in O(n) runtime. This is achieved by combining two already known strategies: (i) calculating the confusion matrix for fuzzy sets using some agreement function e.g. Equations 2 to 5 and (ii) calculating the four cardinalities by applying Equations 22 to 25 on the values of the fuzzy confusion matrix calculated in (i). This approach is used in this paper which means that Equations 26 to 29 already provide the fuzzy cardinalities according to [2], given the parameters T P, FP, T N and FN are calculated for fuzzy sets. In the next subsection, the Rand index and the adjusted rand index are calculated based on these cardinalities. Calculation of pair-counting based metrics The Rand Index (RI), proposed by W. Rand [44] is a measure of similarity between clusterings. One of its important properties is that it is not based on labels and thus can be used to evaluate clusterings as well as classifications. The RI between two segmentations Sg and St is defined as RI(Sg , St ) =
a+b a+b+c+d
(30)
where a, b, c, d are the cardinalities defined in Equations 26 to 29. The Adjusted Rand Index (ARI), proposed by Hubert and Arabie [23], is a modification of the Rand Index that considers a correction for chance. It is given by m m n m ∑ 2i j − ∑ 2i. ∑ 2. j / 2 ij i j ARI = 1 mi. (31) m. j m. j n mi. + − / ∑ ∑ ∑ ∑ 2 2 2 2 2 2 i
j
i
j
where n is the object count, mi j is the confusion matrix (Table 2), mi. denotes the sum over the ith row, and m. j denotes the sum over the jth column. The ARI can be expressed by the four
Page 23 of 37
D5.5 Tutorial material
cardinalities as ARI = 5.3.3
2(a.d − b.c) c2 + b2 + 2.a.d + (a + d).(c + b)
(32)
Information theoretic metrics
The Mutual Information (MI) between two variables is a measure of the amount of information one variable has about the other. In other words, the reduction in uncertainty of one variable, given that the other is known [12]. It was first used as a measure of similarity between images by Viola and Wells [57]. Later, Russakoff et al. [48] used the MI as a similarity measure between image segmentations; in particular, they calculate the MI based on regions (segments) instead of individual pixels. The MI is related to the marginal entropy H(S) and the joint entropy H(S1 , S2 ) between images defined as H(S) = − ∑ p(Si ) log p(Si )
(33)
i
j
j
H(S1 , S2 ) = − ∑ p(S1i , S2 ) log p(S1i , S2 ) ij
(34)
where p(x, y) is joint probability, Si are the regions (segments) in the image segmentations and p(Si ) are the probabilities of these regions that can be expressed in terms of the four cardinalities T P, FP, T N and FN, which are calculated for the fuzzy segmentations (Sg and St ) in Equations 2 to 5 as follows p(Sg1 ) = (T P + FN)/n p(Sg2 ) = (T N + FN)/n p(St1 ) = (T P + FP)/n
(35)
p(St2 ) = (T N + FP)/n where n = T P + FP + T N + FN is the total number of voxels. Because T P, T N, FP and FN are by definition cardinalities of disjoint sets that partition the volume, the joint probabilities are given by j
j
p(S1i , S2 ) =
|S1i ∩ S2 | n
(36)
which implies TP n FN p(S11 , S22 ) = n FP p(S12 , S21 ) = n T N p(S12 , S22 ) = n p(S11 , S21 ) =
Page 24 of 37
(37)
D5.5 Tutorial material
The MI is then defined as MI(Sg , St ) = H(Sg ) + H(St ) − H(Sg , St )
(38)
The Variation of Information (VOI) measures the amount of information lost (or gained) when changing from one variable to the other. Marin [36] first introduced the VOI measure for comparing clusterings partitions. The VOI is defined using the entropy and mutual information as VOI(Sg , St ) = H(Sg ) + H(St ) − 2 MI(Sg , St ) 5.3.4
(39)
Probabilistic metrics
The Interclass Correlation (ICC) [51] is a measure of correlations between pairs of observations that do not necessarily have an order, or are not obviously labeled. It is common to use the ICC as a measure of conformity among observers; in our case it is used as a measure of consistency between two segmentations. ICC is given by ICC =
σ2S σ2S + σ2ε
(40)
where σS denotes variance caused by differences between the segmentations and σε denotes variance caused by differences between the points in the segmentations [51]. Applied to the segmentations Sg and St , ICC is defined as MSb − MSw with MSb + (k − 1)MSw 2 MSb = (m(x) − µ)2 ∑ n−1 x 1 MSw = ∑( fg (x) − m(x))2 + ( ft (x) − m(x))2 n x ICC =
(41)
where MSb denotes the mean squares between the segmentations (called between group MS), MSw denotes the mean squares within the segmentations (called within group MS), k is the number of observers which is 2 in case of comparing two segmentations, µ is the grand mean, i.e. the mean of the means of the two segmentations and m(x) = ( fg (x) + ft (x))/2 is the mean at voxel x. The Probabilistic Distance (PBD) was developed by Gerig et al. [18] as a measure of distance between fuzzy segmentations. Given two fuzzy segmentations, A and B, then the PBD is defined by R
PBD(A, B) =
|PA − PB | R 2 PAB
(42)
where PA (x) and PB (x) are the probability distributions representing the segmentations and PAB is their pooled joint probability distribution. Applied on Sg and St , defined in Section 5.3, the Page 25 of 37
D5.5 Tutorial material
PBD is defined as ∑ | fg (x) − ft (x)| PBD(Sg , St ) =
x
2 ∑ fg (x) ft (x)
(43)
x
The Cohen Kappa Coefficient (KAP), proposed in [11], is a measure of agreement between two samples. As an advantage over other measures, KAP takes into account the agreement caused by chance, which makes it more robust. KAP is given by KAP =
Pa − Pc 1 − Pc
(44)
where Pa is the agreement between the samples and Pc is the hypothetical probability of chance agreement. The same can be expressed in form of frequencies to facilitate the computation as follows KAP =
fa − fc N − fc
(45)
where N is the total number of observations, in our case the voxels. The terms in Equation 45 can be expressed in terms of the spatial overlap parameters, calculated for fuzzy segmentations (Equations 2 to 5), to get fa = T P + T N (T N + FN)(T N + FP) + (FP + T P)(FN + T P) fc = N
(46)
The ROC curve (Receiver Operating Characteristic) is the plot of the true positive rate (T PR) against the false positive rate (FPR). The area under the ROC curve (AUC) was first presented by Hanley and McNeil [22] as a measure of accuracy in the diagnostic radiology. Later, Bradley [4] investigated its use in validating machine learning algorithms. The ROC curve, as a plot of T PR against FPR, normally assumes more than one measurement. For the case where a test segmentation is compared to a ground truth segmentation (one measurement), we consider a definition of the AUC according to [43], namely the area of the trapezoid defined by the measurement point and the lines T PR = 0 and FPR = 1, which is given by FPR + FNR 2 1 FP FN = 1− + 2 FP + T N FN + T P
AUC = 1 −
5.3.5
(47)
Spatial Distance Based Metrics
Spatial distance based metrics are widely used in the evaluation of image segmentation as dissimilarity measures. They are recommended when the overall segmentation accuracy, e.g the boundary delineation (contour), of the segmentation is of importance [16]. As the only category in this paper, distance–based measures take into consideration the spatial position of voxels. Page 26 of 37
D5.5 Tutorial material
Distance between crisp volumes The Hausdorff Distance (HD) between two finite point sets A and B is defined by HD(A, B) = max(h(A, B), h(B, A))
(48)
where h(A, B) is called the directed Hausdorff distance and given by h(A, B) = max min ||a − b|| a∈A b∈B
(49)
where ||a − b|| is a norm, e.g. Euclidean distance. An algorithm that directly calculates the HD according to Equation 49 takes an execution time of O(|A||B|). There are many algorithms that calculate the HD with lower complexity. In this paper, we use the algorithm proposed in [53] that calculates the HD in a nearly–linear time complexity. The HD is generally sensitive to outliers. Because noise and outliers are common in medical segmentations, it is not recommended to use the HD directly [18, 62]. However, the quantile method proposed by Huttenlocher et al. [24] is one way to handle outliers. According to the Hausdorff quantile method, the HD is defined to be the qth quantile of distances instead of the maximum, so that possible outliers are excluded, where q is selected depending on the application and the nature of the measured point sets. The Average Distance, or the Average Hausdorff Distance (AV D), is the HD averaged over all points. The AV D is known to be stable and less sensitive to outliers than the HD. It is defined by AV D(A, B) = max(d(A, B), d(B, A))
(50)
where d(A, B) is the directed Average Hausdorff distance that is given by d(A, B) =
1 ||a − b|| ∑ min b∈B N a∈A
(51)
To efficiently calculate the AV D and avoid a complexity of O(|A||B|) (scanning all possible point pairs), we use a modified version of the nearest neighbor (NN) algorithm proposed by Zhao et al. [63] in which a 3D cell grid is built on the point cloud and for each query point, a search subspace (a subset of the cell grids that contains the nearest neighbor) is found to limit the search and reduce the number of distance calculations needed. We added three modifications to this algorithm that make use of the nature of segmentations, namely that they are mostly dense point clouds. These modifications enable efficiently finding the exact NN. In the first modification, when calculating the pairwise distances from segment A to B, we remove the intersection A ∩ B from consideration because here all the distances are zero, that is we calculate only A\B to B. For the second modification, instead of considering all points of B, we consider only the points on the surface of segment B. This is justified by the fact that when moving in a line from a point in segment A (but not in the intersection) to the segment B, the first point crossed in B is on the surface and this is the shortest distance, which means all points inside the segments are not relevant. The third modification is to find the radius r that defines a convenient search subspace for a given query point q ∈ A. We find r by moving from q to the mean of B and if a point p ∈ B is crossed, we define r as the distance between q and p, i.e. the search subspace consists of all Page 27 of 37
D5.5 Tutorial material
cell grids contained in or crossed by the sphere centered on q with radius r. If no point p is found (which is unlikely to happen with segmentations), an exhaustive search is performed. The Mahalanobis Distance (MHD) [33] between two points in a point cloud, in contrast to the Euclidean distance, takes into account the correlation of all points in the point cloud containing the two points. The MHD between the points x and y in the point cloud A is given by q (52) MHD(x, y) = (x − y)T S−1 (x − y) where S−1 is the inverse of the covariance matrix S of the point cloud and the superscript T denotes the matrix transpose. Note that x and y are two points in the same point cloud, but in the validation of image segmentation, two point clouds are compared. For this task, we use the variant of MHD according to G. J. McLachlan [35], where the MHD is calculated between the means of the compared point clouds and the common covariance matrix of them is considered as S. Hence the Mahalanobis distance MHD(X,Y ) between the point sets X and Y is q (53) MHD(X,Y ) = (µx − µy )T S−1 (µx − µy ) where µx and µy are the means of the point sets and the common covariance matrix of the two sets is given by S=
n1 S1 + n2 S2 n1 + n2
(54)
where S1 , S2 are the covariance matrices of the voxel sets and n1 , n2 are the numbers of voxels in each set. Extending the distances to fuzzy volumes Different approaches have been proposed to measure the spatial distance between fuzzy images. The approaches described in [52] are based on defuzzification (finding a crisp representation) either by minimizing the feature distance, which leads to the problem of selecting the features, or by finding crisp representations with a higher resolution which leads to multiplication of the grid dimensions and therefore negatively impacts the efficiency of time consuming algorithms, like HD and AV D. For this evaluation tool, we use a discrete form of the approach proposed in [66] i.e. the average of distances at different α-cuttings depending on a given number of cutting levels k. The HD distance between the fuzzy segmentations A and B is thus given by 1 k HDk (A, B) = ∑ HD i (A, B) k k i=1
(55)
HDα (A, B) = HD(Aα , Bα )
(56)
where Aα and Bα are the crisp representations resulting from thresholding the fuzzy volumes A and B at cutting level α, HDα is the HD at cutting level α, and k > 0 is an integer that gives the number of cutting levels considered.
Page 28 of 37
D5.5 Tutorial material
Analogously, the AV D and MHD between the fuzzy volumes A and B are given by AV Dk (A, B) =
1 k ∑ AV D(A ki , B ki ) k i=1
1 k MHDk (A, B) = ∑ MHD(A i , B i ) k k k i=1
(57)
(58)
If the parameters k and α are omitted, i.e. HD, AV D and MHD, we assume distances at the cutting level α = 0.5.
5.4
Multiple definitions of metrics in the literature
We present three examples representing three categories of inconsistency in the literature regarding the definition of the metrics to underline the need of a standardization of evaluation metrics and motivate a standard evaluation tool for medical segmentations. The first category is caused by misinterpretation resulting in misleading definitions, for example the confusion of the pair counting cardinalities (a, b, c and d) with the overlap cardinalities (T P, FP, T N and FN). In some papers [49, 2, 21, 8], the pair–counting cardinalities are used in place of the overlap cardinalities although they are mathematically and semantically different. According to the definition, the pair–counting cardinalities result from grouping n(n − 1)/2 tuples defined on X × X (Section 5.3.2) whereas the overlap–based cardinalities (Section 5.3.1) result from the class overlap i.e. pairwise comparison of n voxel assignments. In the papers mentioned above, several overlap–based metrics including the Jaccard index are defined using the pair–counting cardinalities in place of the overlap cardinalities. To illustrate how strongly the results differ in the two cases, we show examples in Table 3. In each example, the partitions P1 and P2 are compared using the Jaccard index which is calculated in two ways: the first (JAC1 ) using the overlap cardinalities according to [26, 27], the second (JAC2 ) using the pair counting cardinalities according to [49, 2, 21, 8]. The values are different except in the first example. The second category is naming inconsistency, where the Table 3: Pair counting cardinalities versus overlap cardinalities in examples P1 1,0,1,1 1,1,1,1 0,1,0,1 0,0,0,0 1,0,0,1
P2 1,1,0,0 0,0,0,1 1,1,0,0 0,0,0,1 1,1,0,1
TP 1 1 1 0 2
FP 2 3 1 0 0
FN T N 1 0 0 0 1 1 1 3 1 1
JAC1 0.25 0.25 0.33 0.0 0.67
a 1 3 0 3 1
b 2 3 2 0 2
c 1 0 2 3 1
d 2 0 2 0 2
JAC2 0.25 0.5 0.0 0.5 0.25
Five examples show that the pair counting cardinalities (a, b, c, and d) cannot be used in place of the overlap cardinalities (T P, FP, FN, and T N) to calculate the Jaccard index, as it is commonly used in the literature.
same name is used to denote two different metrics. One example is the volumetric similarity (V S). While V S is defined in [45, 56, 46, 9] as the absolute volume difference divided by the sum of the compared volumes (Equation 17), there is another metric definition under the same
Page 29 of 37
D5.5 Tutorial material
name in [25] defined as twice the volume of the intersection divided by the volume sum in percent, i.e. VS = 2
|St ∩ Sg | .100% |St + Sg |
(59)
The last category is the multiple definition that stems from different theoretical approaches for estimating the same value. For example, the Interclass Correlation (ICC) has an early definition proposed by Fisher [17]. Later, several estimators of the ICC have been proposed, one of them is the definition in Equation 40 proposed by Shrout and Fleiss [51]. Note that although these definitions are totally different, in contrast to the second category, they all aim to estimate the same statistic.
5.5
Implementation
This section is organized as follows: In Section 5.5.1, we provide an overview of the the general architecture of the project. Detail about the language, framework, and environment are provided in Section 5.5.2. Some implementation details concerning the optimisations in the tool are presented in Section 5.5.3. Finally, Section 5.5.4 presents some cases of usage. 5.5.1
Architecture
X EvaluateSegmentation is an efficient command line tool that compares two 2D or 3D medical segmentations using the 20 evaluation metrics presented in Table 1. Being a pure command line tool without a GUI makes it suitable to be called using automation scripts when many segmentations are to be evaluated. The tool uses the ITK Library as the input layer, which makes it fully compatible with all image formats supported by ITK. The implementation was generally designed to take advantage of the relations between the 20 implemented metrics represented in their definition in order to make use of the synergy between them to avoid repeating operations to save execution time and memory. By default the evaluation result is displayed in a readable format on the System out, but it can optionally be saved as an XML file in a given path, e.g. to be parsed and processed by other tools. 5.5.2
Programming Environment
EvaluateSegmentation is implemented in C++ using the CMake framework, which makes it operating system and compiler independent. CMake (www.cmake.org) is an open source platform that enables programs implemented in native languages like C++ to be operating system and compiler independent; it was originally created and funded by the National Library of Medicine (NLM) to provide a sufficient way for distributing the ITK application. The source of the project as well as builds for some operating systems are available at9 . To build the EvaluateSegmentation for any operating system, using any compiler, two resource components are required (i) the source code of the project and (ii) the ITK Library available as open source at10 . 9 http://github/codalab/EvaluateSegmentation 10 http://www.itk.org/
Page 30 of 37
D5.5 Tutorial material
5.5.3
Efficiency optimization
Efficiency in speed as well as in memory usage is a critical point in medical volume segmentation. Reasons for this are: (i) Very large volumes, like whole body volumes, are quite common; such volumes can have more than 100 Mio voxels. (ii) Common image formats allow large data types for representing fuzzy voxel values, e.g. double, which makes the handling of such volumes memory critical in combination with large volumes. (iii) Metrics based on pairwise distances between voxels like HD and AV D become computationally inefficient with increasing volume size. (iv) State–of–the–art techniques based on the distance transform are sensitive to increasing volume grid size in terms of speed as well as memory used. EvaluateSegmentation does not use distance transformation techniques for calculations because of their memory sensitivity to grid size. Instead, it uses optimization techniques that make it very efficient in terms speed and memory: To overcome the memory problem of large volumes with large data types, in a first step, EvaluateSegmentation uses a streaming technique, supported by ITK, to load images and save them in another representation that supports values in 255 fuzzy levels using the char data type; thereby overcoming the memory problem with large data types. In a next step, EvaluateSegmentation uses indexing techniques to model the images in a way that (i) makes use of excluding the background voxels, which makes the tool less sensitive to increasing the grid size, and (ii) provides an optimal access base for handling in the next steps. To achieve high efficiency in terms of speed, EvaluateSegmentation uses specific optimisations for each of the metrics that are based on pairwise distances, i.e. HD and AV D; these optimisations include randomization, early breaking, and indexing [53, 63]. 5.5.4
Usage
EvaluateSegmentation is a command line tool. The command line has a mandatory part specifying the two images being compared and an optional path with arguments used to control the metric calculation. The command line has the following syntax: EvaluateSegmentation groundtruthpath segmentationpath [-thd threshold] [-use DICE,JAC,HD,....] [-xml xmlpath] By default, unless other options are given, a fuzzy comparison is performed, otherwise if a threshold, option -thd, is given, binary representations of the images are compared by cutting them at the given threshold. All metrics are considered unless the option -use is given, which specifies the metrics to be calculated. In this case, the symbols of metrics of interest, according to Table 1, should be listed after the option, separated with commas. Some metrics use parameters like the quantile value of the Hausdorff distance; these parameters can be optionally written following the metric symbol after an @, e.g. -use
[email protected] instructs the tool to calculate the Hausdorff distance at 0.9 quantile. More options are described by typing EvaluateSegmentation in the command line.
5.6
Availability and requirements
• Project name: EvaluateSegmentation
Page 31 of 37
D5.5 Tutorial material
• Project home page: http://github/codalab/EvaluateSegmentation • Operating system(s): Platform independent • Programming language: C++ / CMake • Other requirements: ITK Library available under http://www.itk.org • License: Apache License Version 2.0, January 2004 • Any restrictions to use by non-academics: none
5.7
Conclusions on evaluation measures
We propose an efficient evaluation tool for medical volume segmentations using 20 evaluation metrics with analysis and a guideline for metric selection. These metrics are selected based on a comprehensive literature review about validation of medical volume segmentations. The aim of this tool is to provide a standard for evaluating medical volume segmentation by providing a consistent set of metrics. These metrics are categorized according to their nature and the equivalence between some of them to help in finding a reasonable combination when more than one metric is to be considered. The algorithms used to calculate the metrics were selected and optimized to achieve high efficiency in speed and memory required to meet the challenging requirements of evaluating volumes with large grid size. The proposed evaluation tool is implemented in the open source project “EvaluateSegmentation” available for download from http://github/codalab/EvaluateSegmentation. The analysis presented focuses on the metric properties to provide guidelines for selecting the most suitable evaluation metric, given a set of segmentations and an evaluation task.
6
Conclusions
This deliverable regroups the tutorial material of the VISCERAL project. Such material was created for several user groups and includes various aspects. The guide on ethical requirements was done to ease the steps for new projects in the domain, to avoid reinventing things. The quality control framework can also be very useful in many environments and helped us to create annotated content of really good quality. The cloud framework was maybe the most novel part of the project and allows many things that were impossible beforehand. As this required more work than initially planned it is important to document this. The last part of the deliverable describes the metrics tool developed in VISCERAL that has already been included in other projects and will hopefully be very useful for many other researchers working in this field. All in all this deliverable should remain useful for several years to come and allow similar challenges to be created with a lower effort.
Page 32 of 37
D5.5 Tutorial material
7
References
[1] Ali Qusay Al-Faris, Umi Kalthum Ngah, Nor Ashidi Mat Isa, and Ibrahim Lutfi Shuaib. MRI breast skin-line segmentation and removal using integration method of level set active contour and morphological thinning algorithms. Journal of Medical Sciences, May 2013. [2] Derek T. Anderson, James C. Bezdek, Mihail Popescu, and James M. Keller. Comparing fuzzy, probabilistic, and possibilistic partitions. Trans. Fuz Sys., 18(5):906–918, 2010. [3] K. O. Babalola, B. Patenaude, P. Aljabar, J. Schnabel, D. Kennedy, W. Crum, S. Smith, T. F. Cootes, M. Jenkinson, and D. Rueckert. Comparison and evaluation of segmentation techniques for subcortical structures in brain MRI. Medical image computing and computer-assisted intervention, 2008. [4] Andrew P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 1997. [5] Robert L. Brennan and Richard J. Light. Measuring agreement when two observers classify people into categories not defined in advance. British Journal of Mathematical and Statistical Psychology, 27(2), 1974. [6] Roelof K. Brouwer. Extending the Rand, adjusted Rand and Jaccard indices to fuzzy partitions. J. Intell. Inf. Syst., 32(3):213–235, 2009. [7] X. Cai, Y. Hou, C. Li, J. Lee, and W.G. Wee. Evaluation of two segmentation methods on mri brain tissue structures. Conf Proc IEEE Eng Med Biol Soc, 2006. [8] R. J. G. B. Campello. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters, 28(7):833–841, 2007. [9] Ruben Cardenes, Rodrigo de Luis-Garcia, and Meritxell Bach-Cuadra. A multidimensional segmentation evaluation for medical image data. Comput. Methods Prog. Biomed., 96(2):108–124, 2009. [10] Nancy Chinchor. MUC-4 evaluation metrics. In Proceedings of the 4th Conference on Message Understanding, pages 22–29, 1992. [11] Jacob Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46, 1960. [12] Thomas M. Cover and Joy A. Thomas. Interscience, New York, NY, USA, 1991.
Elements of Information Theory.
Wiley-
[13] William R. Crum, Oscar Camara, and Derek L. G. Hill. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Transactions on Medical Imaging, 25(11):1451–1461, 2006.
Page 33 of 37
D5.5 Tutorial material
[14] Lee Raymond Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945. [15] Thomas M. Doring, Tadeu T.A. Kubo, L. Celso H. Cruz, Mario F. Juruena, Jiosef Fainberg, Romeu C. Domingues, and Emerson L. Gasparetto. Evaluation of hippocampal volume based on mr imaging in patients with bipolar affective disorder applying manual and automatic segmentation techniques. Journal of Magnetic Resonance Imaging, 33(3):565–572, 2011. [16] Aaron Fenster and Bernard Chiu. Evaluation of segmentation algorithms for medical imaging. In Conf Proc IEEE Eng Med Biol Soc., volume 7, pages 7186–7189, 2005. [17] Ronald Aylmer Fisher. Statistical methods for research workers. Oliver and Boyd, 1954. [18] Guido Gerig, Matthieu Jomier, and Miranda Chakos. Valmet: A new validation tool for assessing and improving 3D object segmentation. In Proceedings of the 4th International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 516–523, 2001. [19] Bram Van Ginneken, Tobias Heimann, and Martin Styner. 3d segmentation in the clinic: A grand challenge. In MICCAI Workshop on 3D Segmentation in the Clinic, pages 7–15, 2007. [20] Sylvain Gouttard, Martin Styner, Marcel Prastawa, Joseph Piven, and Guido Gerig. Assessment of reliability of multi-site neuroimaging via traveling phantom study. In Proceedings of the 11th International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 263–270, 2008. [21] Eyke Hallermeier, Maria Rifqi, Sascha Henzgen, and Robin Senge. Comparing fuzzy partitions: A generalization of the rand index and related measures. IEEE T Fuzzy Systems, 2012. [22] J. A. Hanley and B. J. Mcneil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29–36, 1982. [23] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2:193–218, 1985. [24] D. P. Huttenlocher, G. A. Klanderman, and W. A. Rucklidge. Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:850–863, 1993. [25] Laura Igual, Joan Carles Soliva, Antonio Hernndez-Vela, Sergio Escalera, Oscar Vilarroya, and Petia Radeva. Supervised brain segmentation and classification in diagnostic of attention-deficit/hyperactivity disorder. In HPCS, pages 182–187, 2012. [26] Paul Jaccard. The distribution of the flora in the alpine zone. New Phytologist, 11(2):37– 50, 1912.
Page 34 of 37
D5.5 Tutorial material
[27] Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc., 1988. [28] David N. Kennedy, Nikos Makris, S. Caviness Verne, and Andrew J. Worth. Neuroanatomical segmentation in MRI: Technological objectives. IJPRAI, 11(8):1161–1187, 1997. [29] Kasiri Keyvan, Dehghani Mohammad Javad, Kazemi Kamran, Helfroush Mohammad Sadegh, and Shaghayegh Kafshgari. Comparison evaluation of three brain mri segmentation methods in software tools. In Biomedical Engineering (ICBME), pages 1–4, 2010. [30] Hassan Khotanlou, Olivier Colliot, Jamal Atif, and Isabelle Bloch. 3D brain tumor segmentation in MRI using fuzzy classification, symmetry analysis and spatially constrained deformable models. Fuzzy Sets Syst., 160(10):1457–1473, may 2009. [31] Stefan Klein, Uulke A. van der Heide, Bas W. Raaymakers, Alexis N. T. J. Kotte, Marius Staring, and Josien P. W. Pluim. Segmentation of the prostate in mr images by atlas matching. In ISBI, pages 1300–1303, 2007. [32] Erich Peter Klement, Endre Pap, and Radko Mesiar. Triangular norms. Kluwer Academic Publ. cop., 2000. [33] P. C. Mahalanobis. On the generalised distance in statistics. In Proceedings National Institute of Science, India, volume 2, pages 49–55, Apr 1936. [34] David R. Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int ’l Conf. Computer Vision, volume 2, pages 416–423, July 2001. [35] G. J. McLachlan. Mahalanobis distance. Resonance, 4:20–26, June 1999. [36] Marina Meila. Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines, pages 173–187. Springer Berlin Heidelberg, 2003. [37] Bjoern Menze, Andras Jakab, Stefan Bauer, Mauricio Reyes, Marcel Prastawa, and Koen Van Leemput, editors. MICCAI 2012 Challenge on Multimodal Brain Tumor Segmentation BRATS2012. MICCAI, Okt 2012. [38] Bart Moberts, Anna Vilanova, and Jarke J. van Wijk. Evaluation of fiber clustering methods for diffusion tensor imaging. In IEEE Visualization, pages 65–72, 2005. [39] Fredric Morain-Nicolier, Stephane Lebonvallet, Etienne Baudrier, and Su Ruan. Hausdorff distance based 3D quantification of brain tumor evolution from MRI images. In 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 5597–600, 2007. [40] W.J. Niessen, K.L. Vincken, and M.A. Viergever. Evaluation of mr segmentation algorithms. In International Society Magnetic Resonance in Medicine, 1999. Page 35 of 37
D5.5 Tutorial material
[41] Yachun Pang, Li Li, Wenyong Hu, Yanxia Peng, Lizhi Liu, and Yuanzhi Shao. Computerized segmentation and characterization of breast lesions in dynamic contrast-enhanced mr images using fuzzy c-means clustering and snake algorithm. Computational and mathematical methods in medicine, 2012. [42] K. SOMASUNDARAM P.NARENDRAN, V.K. NARENDIRA KUMAR. 3d brain tumors and internal brain structures segmentation in mr images. International Journal of Image, Graphics and Signal Processing, 1, 2012. [43] David M. W. Powers. Evaluation: From precision, recall and F-factor to ROC, informedness, markedness correlation. Journal of Machine Learning Technologies, 2:37–63, 2011. [44] William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 1971. [45] A. Ramaswamy Reddy, E. V. Prasad, and L. S. S. Reddy. Abnormality detection of brain mr image segmentation using iterative conditional mode algorithm. International Journal of Applied Information Systems, 5(2):56–66, 2013. [46] A. Ramaswamy Reddy, E. V. Prasad, and L. S. S. Reddy. Comparative analysis of brain tumor detection using different segmentation techniques. International Journal of Computer Applications, 82(14):14–28, 2013. [47] C. J. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979. [48] Daniel B. Russakoff, Carlo Tomasi, Torsten Rohlfing, Calvin R. Maurer, and Jr. Image similarity using mutual information of regions. In 8th European Conference on Computer Vision, ECCV, pages 596–607, 2004. [49] Gilbert Saporta and Genane Youness. Comparing two partitions: Some proposals and experiments. In Proceedings in Computational Statistics, pages 243–248, 2002. [50] Ran Shi, King Ngi Ngan, and Songnan Li. The objective evaluation of image object segmentation quality. In ACIVS, volume 8192, pages 470–479, 2013. [51] P. Shrout and J. Fleiss. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull, 1979. [52] Natasa Sladoje, Joakim Lindblad, and Ingela Nystrom. Defuzzification of spatial fuzzy sets by feature distance minimization. Image and Vision Computing, 29:127–141, 2011. [53] Abdel Aziz Taha and Allan Hanbury. An efficient algorithm for calculating the exact Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, Jan 2015. [54] Abdel Aziz Taha and Allan Hanbury. An efficient tool for calculating medical volume segmentation metrics (submitted). BMC Medical Imaging, 2015.
Page 36 of 37
D5.5 Tutorial material
[55] Jayaram K. Udupa, Vicki R. LeBlanc, Ying Zhuge, Celina Imielinska, Hilary Schmidt, Leanne M. Currie, Bruce Elliot Hirsch, and James Woodburn. A framework for evaluating image segmentation algorithms. Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society, 30, 2006. [56] Nagesh Vadaparthi, Srinivas Yarramalle, Suresh Varma Penumatsa, and P.S.R.Murthy. Segmentation of brain mr images based on finite skew gaussian mixture model with fuzzy c-means clustering and em algorithm. International Journal of Computer Applications, 28(10):18–26, 2011. [57] Paul Viola and William M. Wells, III. Alignment by maximization of mutual information. international journal of computer. Int. J. Comput. Vision, 24(2):137–154, 1997. [58] Simon K. Warfield, Carl-Fredrik Westin, Charles R. G. Guttmann, Marilyn S. Albert, Ferenc A. Jolesz, and Ron Kikinis. Fractional segmentation of white matter. In Proceedings of Second International Conference on Medical Imaging Computing and Computer Assisted Interventions, volume 1679, pages 62–71, 1999. [59] Ron Wehrens, Lutgarde M. C. Buydens, Chris Fraley, and Adrian E. Raftery. Model-based clustering for image segmentation and large datasets via sampling. J. Classification, 21(2), 2004. [60] Suchita Yadav and Sachin Meshram. Brain tumor detection using clustering method. International Journal of Computational Engineering Research(IJCER), pages 11–14, 2013. [61] Suchita Yadav and Sachin Meshram. Performance evaluation of basic segmented algorithms for brain tumor detection. Journal of Electronics and Communication Engineering IOSR, 5:08–13, 2013. [62] D. Zhang and G Lu. Review of shape representation and discription techniques. PR, 2004. [63] Jianhui Zhao, Chengjiang Long, Shuping Xiong, Cheng Liu, and Zhiyong Yua. A new k nearest neighbors search algorithm using cell grids for 3d scattered point cloud. Electronics and Electrical Engineering, 20, 2014. [64] Kelly H. Zou, Simon K. Warfield, Aditya Baharatha, Clare Tempany, Michael R. Kaus, Steven J. Haker, William M. Wells, Ferenc A. Jolesz, and Ron Kikinis. Statistical validation of image segmentation quality based on a spatial overlap index. Academic Radiology, 2004. [65] Kelly H. Zou, William M. Wells, Ron Kikinis, and Simon K. Warfield. Three validation metrics for automated probabilistic image segmentation of brain tumours. Statistics in Medicine, 23, 2004. [66] Rami Zwick, Edward Karlstein, and David V. Budiscu. Measures of similarity among fuzzy concepts: a comparative analysis. Int. J. Approx. reasoning, 1(2):221–242, 1987.
Page 37 of 37
Annex 1: FP7 VISCERAL S EGMENTATION MANUAL V 1.0
FP7 VISCERAL Szegmentációs útmutató Verzió: v.1.0 Jakab A / 2013. 10. 25.
Tartalom Projektinformációk és adatok elérése .......................................................................................................................3 Adatok elérése .....................................................................................................................................................................3 Ticket rendszer felülete ...................................................................................................................................................3 A projekt ütemezése, fázisok .........................................................................................................................................5 Landmarkok kijelölése a felvételeken .......................................................................................................................6 Új VISCERAL landmarkok ............................................................................................................................................ 11 Szervek szegmentációja, általános megfontolások és szegmentációs stratégiák ................................ 17 Szegmentáció GEOS-al.............................................................................................................................................. 17 Szegmentáció Slicer-rel ........................................................................................................................................... 17 Szervek szegmentációja a GEOS-al, leírások ........................................................................................................ 18 Vesék (Kidneys) .......................................................................................................................................................... 18 Lép (Spleen) ................................................................................................................................................................. 18 Máj (Liver)..................................................................................................................................................................... 19 Tüdő (Lungs) ................................................................................................................................................................ 20 Húgyhólyag (urinary bladder) .............................................................................................................................. 21 Egyenes hasizom (rectus abdominis muscle) ................................................................................................ 21 Első ágyéki csigolya (Lumbar vertebra #1) .................................................................................................... 22 Pajzsmirigy (Thyroid gland) .................................................................................................................................. 23 Hasnyálmirigy (pancreas) ...................................................................................................................................... 24 Psoas major izom ....................................................................................................................................................... 24 Epehólyag (Gallbladder) ......................................................................................................................................... 25 Sternum .......................................................................................................................................................................... 25 Aorta ................................................................................................................................................................................ 25 Trachea ........................................................................................................................................................................... 26 Mellékvese (Adrenal glands) ................................................................................................................................. 26 Újabb szervek szegmentációja................................................................................................................................... 27 Cerebral hemispheres: ............................................................................................................................................. 27 Cerebellum .................................................................................................................................................................... 27 Vena cava inferior: ..................................................................................................................................................... 28 Heart ................................................................................................................................................................................ 28 Postvertebral muscles .............................................................................................................................................. 29 Lumbar vertrebra 2-5............................................................................................................................................... 30
2
Projektinformációk és adatok elérése A projekttel kapcsolatos információk: http://www.visceral.eu/ Az első „Benchmark” fázis leírása: http://www.visceral.eu/benchmark-1/ Célunk ehhez annotációk és landmarkok készítése. Definíciók. Az annotáció manuális vagy automata szegmentáció során kijelölt szerv, terület, patológiás elváltozás. Az annotáció digitális formája egy olyan kép, amelynek képpontjai az adott területet diszkrét cimkékkel (label) jelölik, azaz labelmap-ek. A landmark ezzel szemben egy kiterjedés nélküli térbeli pont, amelyet három koordináta és egy név jellemez. A szükséges szegmentációk listája a következő leírásban megtalálható, valamint információ is van arról, hogy a három modalitás közül melyiken érdemes kijelölni azt (T1, T2, CT+kontraszt). A szervek, pl. a vese határaival kapcsolatosan is definiálva van néhány részlet, valamint a landmarkokra (12 db) is példák kapnak a benchmarkra jelentkezők. http://www.visceral.eu/assets/Uploads/Deliverables/VISCERAL-D2.3.1.pdf
Adatok elérése A VISCERAL projekt eddig kiválogatott összes teljes-test felvétele (kb. ~100) a következő FTP szerverről letölthető: (tehát FTP protokoll és nem SSH/SCP!, 21-es port) Host: medgift.hevs.ch (21-es port) user: ** pass: ** A letöltendő fileok neve megegyezik az úgynevezett „ticket”-ben foglalt névvel. Fontos, hogy a megfelelő könyvtárból töltsük le a képet, lehetőleg a már korrigált változatot.
Ticket rendszer felülete Az annotációkat feltölteni egy web-es felületen kell majd, illetve a személyenként leosztott feladatok is itt lesznek elérhetőek. Amikor egy „ticket” érkezik, mindenki kap róla emailes értesítőt, és a webes rendszerben lesz további információ. Jelenleg csak nekem van ilyen hozzáférésem, de a későbbiekben mindenki megkapja a sajátját. A Ticket rendszer weboldala: https://www.cir.meduniwien.ac.at/visceral/tickets/login.php username:
[email protected] password: tickets
Az annotációk visszatöltése során igyekezzünk konzekvensen elnevezni a fileokat, DE ez csak a saját belső munkánk miatt fontos, hiszen a feltöltés után automatikusan átneveződnek annak 3
az alanynak a nevére, akihez feltöltöttük. Mivel az annotációk .nii formátumban készülnek, ezeket a Slicerrel is meg lehet tekinteni utólag, esetleg konzultálni róla. A már elküldött ticket-ek utólag változtathatóak, a submitted ticket fülre kattintva. Fontos mindenkinek az aktuális státusz beállítása. Ez azt jelenti, hogyha valaki „available” (User settings), akkor az automatikus rendszer bizonyos időnként Ticket-eket oszt rá. Az alapbeállítás „unavailable”.
Adatok mentése […] ikon: Save Segment Volume to disk: .nii formátumban lementi a kész szegmentációt.
Annotációs file-ok tömörítése A GEOS alapból .nii (NIFTI) fileokat ment le, tömörítésre nem képes. A fileokat unix GZIP formátumba a 7-zip program segítségével tudjuk átmenteni. www.7-zip.org A képfileok tömörített formában érhetőek el, amely a Linux-ban elérhető GZIP tömörítést alkalmazza. Tehát egy .nii kiterjesztésű file GZIP-be tömörítve .nii.gz kiterjesztésű lesz. A .nii.gz sok programban önállóan is beolvasható képformátum (pl. Slicer), amely veszteségmentesen tömörített információt tartalmaz. Fontos, hogy a GEOS jelenlegi verziója csak .nii kiterjesztésű képeket tud beolvasni, viszont az annotációk visszatöltése a „ticket” rendszerbe praktikusan .gz vagy .zip formátumban célszerű.
Kompatibilitási problémák A NIFTI formátum fejléce (header) komplex, számos metainformációt tartalmaz. Sajnos a programok ezt különféleképpen értelmezik, például a beteg és képek orientációja eltérő lehet a különböző programokban beolvasva. A GEOS nem olvassa jól be a betegpozíció transzformációs mátrixát, a Slicer igen. Ezzel nem szükséges foglalkozni, de a két programmal készült szegmentációk ezért alapból nem fednek át egymással. További probléma lehet, hogy bizonyos képek (pl. teljes test CT-k) beolvasása a Slicer 3.X verziócsaládban problémás. A landmark annotációkhoz ezért előtte próbáljuk Slicer 4-el beolvasni és átmenteni a képet, vagy használjunk Linux operációs rendszert.
4
A projekt ütemezése, fázisok 0. Fázis, Ticket rendszer és képfeldolgozás előkészítése. 1. Fázis, tesztelés. 2013. június 1-12. 3 teljes test (MRI, CT) annotációja és landmarkok kijelölése, a Ticket rendszer tesztelése (felelős: JA), Határidő: 2013. Június 12. 2. Fázis, minőségi tesztelés, első adatok létrehozása. 2013. június 17-július 5. Körülbelül 20-30 teljes test felvétel annotációja és landmarkok létrehozása élesen, a Ticket rendszer használatával, lehetőség szerint mindenki bevonásával (5-6 fő). Összehasonlítás a Heidelbergi csapat annotációival. Az így létrehozott annotációk az EU felé történő beszámolóban (Deliverables) megjelennek, azok elkészítése július közepéig szükséges. 3. Fázis, a Benchmark számára szükséges fő annotációk létrehozása. 2013. július 8. – augusztus 1. A szerződésben foglalt első annotációs fázis, a legtöbb annotáció létrehozása. Körülbelül összesen 100 teljes test CT és MRI annotációja szükséges, a 2-es fázissal együtt. Ezt legalább 5-6 annotátornak kell szétosztani, aki havi 20-30 munkaórát tudna erre rászánni (személyenként kb 20 test annotációja). 4. Fázis, további annotációk, patológiás elváltozások és egyéb szervek annotációja. 2013. október – 2014 vége. A következő fázisban a jelenlegi szervek listáját kiegészítjük. Ezen túl a kóros elváltozások megjelölése is elképzelhető (a beteganyag myeloma multiplexes esetekből származik). Mivel az annotátorokkal körülbelül 10 hónapra szükséges szerződést kötni, az első 2 hónap aktív periódus (2-3. fázis) után a negyedik
5
Landmarkok kijelölése a felvételeken A landmarkok leírása a http://www.visceral.eu/assets/Uploads/Deliverables/VISCERAL-D2.3.1.pdf 8. oldalán található. Teljes test MRI, CT és törzs CT felvételeken 12 pontot kell lerakni, míg a hasi MRI-ken 6-ot, melyek részlegesen átfednek. A landmarkok kijelölése a Slicer-rel a legjobb, aholis a 2D nézeteken a „p” billentyű lenyomásával lehet Fiducial pontot lerakni, a del-el törölni, illetve a felső menüsorban ezek megtalálhatóak egérrel is. A Fiducial listeket tudjuk editálni, illetve a Save funkcióval lementeni. A lementett Fiducial list .fcsv formátumban kerül visszatöltésre. Fontos, hogy a pontok nevei (amit át kell írni L-1, L-2 …-ról a szervek neveire) teljesen identikusak legyenek. A neveket a továbbiakban jelzem. Igyekezzünk körülbelül ugyanúgy lerakni a pontokat, mint az alábbi ábrákon illusztrálva van.
ni/lerakni.
Ezzel a két gombbal lehet Slicerben a Fiducial pontokat módosíta-
Tipp: a Fiducials menüponton belül a landmark nevére kattintva azt mindhárom 2D nézeten előhozza. Fontos!! A landmarkokat a 3-as verziójú Slicerben (3.6.x) rakjuk le, és ezeket egy darab .fcsv kiterjesztésű fileként mentsük le! Tehát mind a 12 pontot egy file tartalmazza. A pontok elnevezése karakterre meg kell hogy egyezzen az itt leírtakkal.
Lateral end of clavicle (left, right) fiducial name: clavicle_left, clavicle_right
6
Crista iliaca, at the top (left, right) fiducial name: crista_iliaca_left, crista_iliaca_right Bár egyértelmű specifikációt erre nem kaptunk, a coronalis képeken a crista iliaca legmagasabb (superior) pontját kell megjelölni.
Symphysis below fiducial name: symphysis Erre sincs egyértelmű specifikáció, de célszerű a symphysis leginkább kiálló részét kijelölni, előre felé.
7
Trochanter major, at the tip (left, right) fiducial name:trochanter_major_left / right A trochanter major coronalis képen látható legmagasabb pontja, csúcsa.
Trochanter minor, most medial part (left, right) fiducial name:trochanter_minor_left / right A trochanter minor legmedialisabb része, ami a coronalis képeken megítélhető.
8
Tip of aortic arch fiducial name:aortic_arch Az aortaív legmagasabb látható pontja. Az MRI felvételek mellkasi szakasza általában mozgási műtermékekkel terhelt, életlen a hilus szövetei kontrasztja.
Tracheal bifurcation fiducial name:trachea_bifurcation
Aortic bifurcation fiducial name:aorta_bifurcation
Az aorta és a VCI oszlása egymáshoz közel esik, ezeket ne tévesszük össze. Az aorta általában kerekebb, kisebb az átmérője, és a véna oszlás előtt található az oszlás, de ez nem mindig van így, lehet atípusos. Érdemes végignézni a képletet, ill. az a. renalisokat felkeresni.
9
10
Új VISCERAL landmarkok
1. Új landmarkok lementése Minden képfile esetén csak egy annotációs file lementése és feltöltése lehetséges (.fcsv), és az elv az, hogy az új landmarkok a régiekkel egy fileban szerepeljenek. Ezért mindenki megkapja azokat a landmark .fcsv fileokat, amiket korábban már elkészített, az eset megnevezésével, és ezt folytatólagosan kell Slicerben az új landmarkokkal kiegészíteni. Így az új landmarkokkal kiegészített .fcsv file újbóli feltöltése szükséges, a Ticket felület „Submitted Tickets” menüpontja alatt. Itt mindenki megtalálja a korábban feltöltött filejait, amiket a Re-upload landmarks gombbal kell újra feltölteni. Amennyiben egy landmark valóban nem látható az adott felvételen, akkor azt ki kell hagyni. A sorrend nem fontos, de a Slicerben történő karakterre pontos elnevetés igen. 2. Landmarkok nevének összefoglalója Landmark neve
Slicerben a fiducial pont neve
C2 nyaki csigolyatest teteje … … … … … Th1 háti csigolyatest teteje … … … … … … … … … … … L1 lumbalis csigolyatest teteje … … … … Processus xyphoideus csúcsa, hegye Aortabillentyű közepe Bal sternoclavicularis ízület Jobb sternoclavicularis ízület Vena cava inferior bifurcatioja (aorta bifurcatio mellett-mögött) Bal humerus, tuberculum majus Jobb humerus, tuberculum majus Bal vesemedence, legalsó pont Jobb vesemedence, legalsó pont Bal tüdő első bronchuselágazása Jobb tüdő első bronchuselágazása Bal szem közepe Jobb szem közepe Agy oldalkamra legelülső pontja, bal Agy oldalkamra legelülső pontja, jobb Tuber ischiadicum, legalsó pont, bal Tuber ischiadicum, legalsó pont, jobb Arteria coronaria első oszlása (baljobbra)
C2 C3 C4 C5 C6 C7 Th1 Th2 Th3 Th4 Th5 Th6 Th7 Th8 Th9 Th10 Th11 Th12 L1 L2 L3 L4 L5 xyphoideus aorticvalve sternoclavicular_left sternoclavicular_right vci_bifurcation tuberculum_left tuberculum_right renalpelvis_left renalpelvis_right bronchus_left bronchus_right eye_left eye_right ventricle_left ventricle_right ischiadicum_left ischiadicum_right coronaria
11
3. Landmarkok részletesen Csigolya landmarkok. C2-C7,Th1-Th12,L1-L5 A csigolyatest zárólemezének teteje, annak közepén. A C1 és sacrum/coccyxon kívül mindegyiknek szükséges a tetejére egy pontot lerakni. Praktikus funkció a Slicer Fiducial menüjében a jump to slices, aholis mindhárom néztet az aktuális pontra viszi. Coronalis és Sagittalis gerinc nézeten középre szükséges illeszteni a pontokat (mivel a gerinc minden esetben kissé elferdült és nem sagittalis síkban látszik a közepe).
Tip of xiphoid process – processus xyphoideus csúcsa. | xyphoideus A processus xyphoideust coronalis metszeten könnyü felkeresni. Sok esetben villaszerűen elágazik, bifurcalt. Ekkor nem a lefelé mutató háromszög csúcsát, hanem a két kis nyúlván vége között rakjunk le pontot.
Center of aortic valve – aortabillentyü közepe | aorticvalve Nehéz megtalálni lowdose CTn, vagy MRI-n, de célszerű a felszálló aorta legproximalisabb részét felkeresni a szív felől (esetleg a kamra felől) és annak közepére rakni a pontot.
12
Bal és jobb sternoclavicularis izület (2) | sternoclavicular_left sternoclavicular_right Célozzuk meg az izület közepét, az izületi vápát.
13
Vena cava inferior oszlása Bifurcation of V. cava inferior (in correlation to the aortic bifurcation) | vci_bifurcation Az aorta oszlási pontja körül kell felkeresni, mögötte.
Tuberculum majus of the humerus (2) | tuberculum_left tuberculum_right A felkarcsont külső oldalán, kissé hát található. Célszerű sagittalis képeken a leglateralisabb pontját felkeresni a csontnak, ami éppen pontszerüen látható.
Vesemedence legmélyebb pontja End of Renal Pelvis in the left and the right kidney (2) renalpelvis_left renalpelvis_right
14
Első bronchus oszlás / főbronchusFirst bifurcation of the primary bronchus in the left lung, and the right lung. (2) |bronchus_left bronchus_right Célszerű a bronchus-oszlás közepére rakni, A-P tengely mentén is.
Szem közepe Center of the Eyes (2) | eye_left eye_right
Agy oldalkamra elülső pontja | Tip of the anterior part of the lateral ventricle (2) ventricle_left ventricle_right. Vagy axialis képen, vagy jobb minőség esetén a coronalis felvételen keressük fel a két oldalkamra legelülső pontját, csúcsát.
Tuber ischiadicum, at the bottom (2) | ischiadicum_left ischiadicum_right Os ischii, „ülőcsont” része. célszerű a leghátsó részét felkeresni coronalis felvételen, a medence síkja mögött.
15
A. coronaria bal főtörzs oszlása, First bifurcation after the main stem of the left coronary artery. Csak ThAb kontrasztos CT-n ismerhető fel. coronaria
16
Szervek szegmentációja, általános megfontolások és szegmentációs stratégiák Szegmentáció GEOS-al A GeOS program .nii kiterjesztésű fileokat olvas be, ami tömörítetlen file. Az egyszerűbb adatmozgatás érdekében .nii.gz kiterjesztében találhatóak a fileok, tehát ezeket használat előtt ki kell tömöríteni. Hasonlóan, amikor elkészültek a szegmentációk, azt újra be kell tömöríteni (ZIP, a windows is tudja egy kattintásra), és így visszatölteni. Például egy CT labelmapje tömörítés nélkül kb 120 Mb, tömörítéssel 10-20 kilobájt. A GEOS-ban lehetséges a kész szegmentáció lementése, valamint a szegmentációs beállítások és a „stroke”-ok lementése is. Ez utóbbi esetén a szegmentációt átalakíthatjuk utólag. Tippek: ügyeljünk arra, hogy a szegmentáció beállításai mindig megfelelőek legyenek – ha új volumeot töltünk be, ezek a korábban lementettek maradnak továbbra is. Optimális beállítások Kontrasztos CT-re (CTce): Iterations: 2 Margin: 40 (nagyobb szervek esetén lehet több) Gamma: 1.5 (max 2-5, mert általában jók a kontrasztok) Post smooth: 1-2, (máj v. tüdő esetén lehet több) Optimális beállítások T1 MRI-re: Iterations: 2 Margin: 40 Gamma: 10-15 Post smooth: 2-3
Szegmentáció Slicer-rel Jelenleg a teljesen manuális (=körberajzolás) szegmentáció csak a 3D Slicer programban lehetséges. Manuális szegmentációt célszerű akkor alkalmazni, ha a célszerv térfogata nem nagy, vagy kevés szeleten látható. Praktikusan ez csak a teljes test T1 és T2 MRI esetén jön szóba, amikor a szeletvastagság 5mm, és ezért néhány, főleg a coronalis síkban elterülő szervek szegmentációja csak néhány szeleten át szükséges. Ilyen esetben gyorsabb a manuális szegmentáció. Továbbá komplex geometria esetén is alkalmazható, ahol a konkáv részeket, vagy szeletek közt kevésbé összefüggő részeket az automatikus szegmentáció nem képes jól megtalálni. MRI képeken a következő szervek szegmentációja javasolt manuális módszerrel:
Vesék Húgyhólyag Rectus abdominis izom Psoas major izom L1 Csigolya Sternum
A Slicer 3.6-ban történő manuális szegmentáció menete. 1. EDITOR / Draw – a szerv szeletről-szeletre történő kijelölése 2. SURFACE MODELS / Label map smoothing – Gauss képsimítás, 0.5 szigma értékkel Az előző lépésben készült volume új, ezt szükséges lementeni. 17
Szervek szegmentációja a GEOS-al, leírások Vesék (Kidneys)
Lép (Spleen) A lép hilus képleteinek eltávolítása.
18
Máj (Liver) A máj costalis felszínén fontos a bordák, bordaközi képletek eltávolítása.
Eltávolítandó az epególyag területe és a májkapu képletei is!
19
Tüdő (Lungs) A szív nagyereinek eltávolítása, tüdőhilus szűrése. Figyelem! Jelenleg a GEOS nem radiológiai nézetet alkalmaz, a bal oldal a képernyő bal oldalán látszik! Bár jeleztük a fejlesztők felé..
VAGY Kb. 20 coronalis szeletenként ezt ismételve:
20
Húgyhólyag (urinary bladder) Ügyeljünk arra, hogy alatta (ffi) a prostata ne kerüljön bele! Egy vízszintes vonallal, coronalis síkú képen le tudjuk választani.
Egyenes hasizom (rectus abdominis muscle) Ez igen vékony is lehet, coronalis metszeten csak egy vízszintes csík. Érdemes a symphysnél elkezdeni követni és 10-15 szeletenként brushokat rajzolni.
21
Első ágyéki csigolya (Lumbar vertebra #1) Az első ágyéki csigolya anatómiája komplex, szükséges a nyúlványok (processus) szegmentálása is, a haránt, tövisnyúlványoké.
MRI felvételen a nagy szeletvastagság miatt ez manuális szegmentációval a legpraktikusabb Slicer-ben, míg CT-n az automata szegmentáció is működik, de ügyeljünk arra, hogy az adott csigolyához tartozó nyúlványokat találjuk meg, ne a felette vagy alatta lévőket. CT-n javasolt technika, kb 10 szeletenként:
22
Pajzsmirigy (Thyroid gland)
MRI-n coronalis képen felismerhető: annotációja Slicerrel ajánlott, mert nem optimális a szöveti kontraszt / kevés coronalis szeleten látható.
Az ábrán THY-vel jelölve (a nyíl nem az!)
23
Hasnyálmirigy (pancreas) Érdemes felkeresni a duodenumot, lépet. A pancreas farka a lép felé mutat, míg a feje a duodenum felé mutat. Axialis képen felismerhető MRI-n is, coronalis képeken lehet, hogy csak O alakú átmetszetek látszanak.:
Psoas major izom Célszerű axialis és coronalis síkban is 1-1 vonással szegmentálni, valamint a hátteret is kijelölni.
24
Epehólyag (Gallbladder)
Sternum Aorta CT felvételen sagittalis síkban célszerű a szegmentáció. Az oszlásig szükséges kijelölni.
25
Trachea A leírás szerint a trachea a larynx-tól a bifurcatio-ig tart. Érdemes felkeresni a pajzsporcot.
Mellékvese (Adrenal glands) CT-n coronalis képen V alakú képlet a vese felett. A vesétől elválasztja a tokon belül és akörül található zsírszövet, ami jelentős lehet, és CT-n hypodenz.
26
Újabb szervek szegmentációja Cerebral hemispheres: RÉSZE: konvexitás, telencephalon, diencephalon NEM RÉSZE: agytörzs, cerebellum, mesencephalon, dura, cisternák, amennyiben a kontraszt engedi, az oldalkamrákat is ki kell venni
Cerebellum
NEM RÉSZE A CEREBELLUMNAK: pons (pl. a következő axialis képen a pons régióját nem kell belevenni, csak a cerebellaris féltekéket!
27
Vena cava inferior: Az első oszlásig kell szegmentálni (v.iliaca), fent pedig a szívig
Heart: amennyire lehetséges:
A nagyereket ne vegyük bele:
28
Postvertebral muscles Erector spinae, a tövisnyúlvány két oldalán fekvő izmok. Alsó határa: a csípőlapát és a sacrum, felső határa: occipita, hátsó nyaki izmokkal együtt. A lumbalis törzsi szakaszon testes. A deltoideus elkülönítése nehéz, esetleg ezt is belevehetjük, ld. képek lejjebb.
29
Lumbar vertrebra 2-5. Fontos, hogy a nyúlványok a megfelelő csigolyához tartozzanak. A
multi-label opciót bekapcsolva ezeket együtt is láthatjuk. Ilnyekor mindig csak azt számolja és annak a brush-ait jeleniti meg, amit éppen kiválasztunk bal oldalt, de a kiszegmentált volume-ok a többiből is látszanak:
30
Annex 2: FP7 VISCERAL L ANDMARK MANUAL V 1.0
FP7 VISCERAL Landmark útmutató Verzió: v.1.0 Jakab A / 2014. 03. 14.
A VISCERAL Silver Corpus (750 eset) részére csak landmark annotáció szükséges, a szerv annotációk automatikusan fognak elkészülni. A landmark annotációt a Slicer 4-es legújabb letölthető verziójával kell végezni. A letöltött CT/MRI képeket nem szabad átmenteni, újramenteni, mert közben azok fejléce megváltozhat, és a landmarkok pozíciója téves lehet. Amennyiben filebeolvasási hibát észlelünk Slicer 4 (4.2/3) alatt, keressük Markust (Markus Holzer) vagy Andrást. Landmark lista Landmark neve
Slicerben a fiducial pont neve
C2 dens elülső része C3 nyaki csigolyatest felső zárólemez középső pontja … … … … Th1 háti csigolyatest felső zárólemez középső pontja … … … … … … … … … … … L1 lumbalis csigolyatest felső zárólemez középső pontja … … … … Bal clavicula lateralis vége Jobb clavicula lateralis vége Processus xyphoideus csúcsa, hegye Bal sternoclavicularis ízület Jobb sternoclavicularis ízület Bal humerus, tuberculum majus lateralis p Jobb humerus, tuberculum majus lat. pont Trochanter major felső pont, bal Trochanter major felső pont, jobb Trochanter minor medialis pont, bal Trochanter minor medialis pont, bal Tuber ischiadicum, legalsó pont, bal Tuber ischiadicum, legalsó pont, jobb symphysis pubis Bal vesemedence, legalsó pontja Jobb vesemedence, legalsó pontja Trachea bifurcatio középvonala Bal tüdő első bronchuselágazása Jobb tüdő első bronchuselágazása Aortaív legfelső pontja Aorta billentyű síkjának középső pontja Coronaria oszlása bal főtörzsre és leszálló Aorta oszlás a kismedencében (nem a truncus!!!!) VCI oszlása a kismedencében Bal szemgolyó térbeli közepe Jobb szemgolyó térbeli közepe Agy oldalkamra legelülső pontja, bal Agy oldalkamra legelülső pontja, jobb
C2 C3 C4 C5 C6 C7 Th1 Th2 Th3 Th4 Th5 Th6 Th7 Th8 Th9 Th10 Th11 Th12 L1 L2 L3 L4 L5 clavicle_left clavicle_right xyphoideus sternoclavicular_left sternoclavicular_right tuberculum_left tuberculum_right trochanter_major_left trochanter_major_right trochanter_minor_left trochanter_minor_right ischiadicum_left ischiadicum_right symphysis renalpelvis_left renalpelvis_right trachea_bifurcation bronchus_left bronchus_right aortic_arch aorticvalve coronaria aorta_bifurcation vci_bifurcation eye_left eye_right ventricle_left ventricle_right
2
trochanter_major_left / _right A trochanter major csúcsának hátsó, felső pontja, sagittalis síkból (CT-n) vagy coronalis síkon (MRI) felkeresve. Nem esik egy síkba a trochanter minor-ral. trochanter_minor_left / right A trochanter minor leg-medialisabb középvonali pontja, sagittalis síkból (CT-n) vagy coronalis síkon (MRI) felkeresve. A major és minor nem esik egy síkba általában (CT).
symphysis A symphysis pubis középvonalba eső pontja, a porcos felszín középső részén.
3
ischiadicum_left ischiadicum_right CT képen pl. axialis síkból felkeresve a ramus os ischii legalsó pontja, ahol a csont éppen látható. MRI képen is a legalsó pont a coronalis projekció mentén. Tehát nem a hátsó pont, hanem legalsó.
4
crista_iliaca_left / right A crista iliaca kerek szélének legfelső érintőleges pontja CT-n felkeresve. Coronalis MRI projekcióban ez nem mindig egy síkban látható.
5
xyphoideus A processus xyphoideus anatómiájától függően: Bifurcalt, villa alakú xyphoideus: a villa két szárának oszlásánál. Csúcsos xyphoideus esetén: a csúcs vége.
6
sternoclavicular_left / right Az ízületi porc axialis síkú középmetszetének felénél,a porckorong síkjának közepén.
clavicle_left / right A clavicula lateralis végének legszélső csúcsi része. 7
tuberculum_left / right A humerus tuberculum részének (lateralis gumó), legszélső része, sagittalis CT-n ahol a csont még éppen látszik. CT-n a felkar felfelé áll, MRI-n a test mellett.
8
Vertebra (L5-C2): A csigolyák annotációja nagyon fontos, hogy pontos legyen. Minden gerinc ferde, több síkban is, ezért nem lehet egy síkból végigjelölgetni őket. Mindig a csigolya felső zárólemez térbeli középpontjára kerüljön a pont. Ezt úgy könnyű kivitelezni, hogy a SHIFT billentyü lenyomásával az egeret húzva a többi nézet is automatikusan szinkronizálódik a kurzor pozíciójához, így az egérgörgővel azoknak a síkját is a csigolya középre tudjuk illeszteni. A C2 esetében nem látni a felső zárólemezt, így a dens előtti részt kell megjelölni. 100-ból 1-2 esetben több csigolya látható, anatómiai variáns, ilyen esetben általában a C vagy Th szakaszokon van több, egyet kihagyhatunk.
9
trachea_bifurcation A trachea oszlása középvonalban, axialis síkon is középen legyen, bal oldali kép. bronchus_left / right Az első bronchus oszlás bal és jobb oldalon sosincs egy síkban. A jobb bronchus először a coronalis síkban oszlik, a bal bronchus inkább axialis sík mentén kereshető fel. Ezek esetén is a bifurcatio térbeli középpontját kell felkeresni, nem pedig annak elején / hátulját.
10
aorta_bifurcation Ez általában minden képen látható, felkereshető, a kismedence, L4-L5 szintjén. Nem a truncus coeliacus vagy renalis bifurcatiorol van szó!!! Az aorta a két iliaca communisra oszlik hegyesszögben. vci_bifurcation Ugyanaz, mint fent. A vena vastagabb, fala általában nem sclerotikus a képeken. A bifurcatiok esetén a Y alakú villa oszlását kell bejelölni, de az axialis-sagittalis síkon annak középső pontját.
11
aorticvalve Az aorta billentyű síkjának középső pontja, közvetlenül a bulbus aortae tövében (ld kép). Még ha a billentyű nem is látszik, az aorta leginkább proximalis részét fel lehet keresni, és ennek középső pontját megtalálni.
12
aortic_arch Az aortaív legfelső pontja, CT-n axialis síkban azon a szeleten, ahol az aorta még éppen látszik.
13
coronaria A bal főtörzs első oszlása, tehát bal CX és LAD-ra. Ez a képek alapján felkereshető, a bulbus aortától baloldalon, hipodenz résszel körülvéve még natív CT-n is látható. A jellegzetes Y alakot axialis síkban fel lehet keresni, a bal kamra közepén futó LAD-tól kezdve.
14
renalpelvis_left / right A vese üregrendszerének legalsó pontja, CT-n axialis síkból felkeresve.
15
eye_left / right: A szemgolyó térbeli középpontja.
16
ventricle_left / right: Az oldalkamra legelülső pontja, CT-n pl. coronalis metszeten, ahol éppen még az agykamra ürege látszik.
17