Remote analysis of sputum smears for mycobacterium tuberculosis quantification using digital crowdsourcing

ABSTRACT

Introduction

Tuberculosis (TB) is a leading cause of morbidity and mortality worldwide. Although the development of Xpert MTB/RIF has recently become a major breakthrough, smear microscopy remains the most widely used method for TB diagnosis, especially in low- and middle-income countries [1]. Given its low sensitivity, the World Health Organization (WHO) recommends that three sputum specimens should be examined for each TB presumptive case. Furthermore, in clinical practice, 100 high-power fields need to be examined in order to classify a smear as negative. Acid fast bacilli (AFB) smear reading requires a skilled microscopist and considering the lab workload associated with smear reading, a microscopist can only examine an average of 20–25 smears/day [2]. In addition, smear reading is subject to human error and prone to considerable interobserver variability [3]. Novel approaches, such as automated image analysis through convolutional neural networks, have recently shown promising results performing microscopy tasks as diagnosis of malaria in thick blood smears, tuberculosis in sputum samples, and intestinal parasite eggs in stool samples [4].

Detecting acid fast bacilli in sputum smear samples is a challenge that has been addressed before. In 2008, M.G. Costa et al. [5] published a method based on global adaptive threshold applied to Red and Green color channels of conventional microscopy images, obtaining a sensitivity of 76.7%. In 2018, Kant et al. [6] developed a system based on convolutional neural networks that achieved a recall of 83.8% and a precision of 67.6%. The same year, R.O. Panicker et al. [7] proposed a method that performs detection of tuberculosis bacilli by image binarization and subsequent classification of detected regions using a convolutional neural network obtaining a precision of 78.4%, a recall of 97.1% and a F1 score of 86.8%. To the best of our knowledge, this is the first crowdsourced approach to detect acid fast bacilli in sputum smears samples.

Crowdsourcing methodologies leveraging the contributions of citizen scientists connected via the Internet have shown utility to solve biomedical challenges involving “big data” analysis that cannot be entirely automated [8]. The “gamification” of crowdsourced tasks untaps a resource for scientific research such as biomedical image analysis [9, 10]. In this context, we aimed to evaluate the feasibility of a crowdsourced approach to sputum smear microscopy analysis for the diagnosis of tuberculosis.

Materials and methods


The gaming platform
TuberSpot (www.tuberspot.org) is an online game for mobile and PC launched on the 24th of March 2015. TuberSpot players score points by identifying correctly M. tuberculosis bacilli in digitized sputum slide fields of view (FOVs) with Ziehl-Neelsen stain (Fig 1). Gamers play with several fields images (FOVs) during each game. A backend server shares out randomly the different FOVs to the players in real time.

Once the game starts, the player sees a FOV on the screen and, within a limited time, has to click in the places where bacilli are believed to be present. Once all bacilli are found, players pass to the next level. We have digitally introduced one synthetic bacillus (fake) in each of the negative FOVs, which cannot be distinguished from a normal one, ensuring that enough time is spent in the FOV even if originally there were no bacillus in it and allowing the introduction of negative FOV in the game. At the beginning of the game, there is a short tutorial showing how a bacillus looks like.

Dataset
The game database consists of 60 digitized FOVs from anonymous samples: 20 images of fields without any bacilli, 20 images with 1–10 bacilli and 20 images with 10–40 bacilli. Digitized smears were provided by the Centro de Investigação em Saúde de Manhiça (Mozambique) and Hospital Clínico San Carlos (Spain). The 60 images come from all types of sputum smear examination reports (negative, scanty, +1, +2, +3). Digitalization of the samples was made with a smartphone (Sony Xperia Z2) attached to the microscope eyepiece by an adapter (Celestron Universal Digiscoping Adapter). A gold standard for each FOV has been determined by three different expert microscopists, reporting the position and number of bacilli.

Crowdsoucing scheme
Collective detection is defined as the number of bacillus found in a single FOV based on the combination of the gameplays from different players over the same FOV. In order to exploit the redundant information produced by multiple independent players over the same FOV, an algorithm was implemented considering that there is a bacillus in a certain area of the FOV if enough individual players in a larger group have clicked (“voted”) in that area of the same FOV [10]. Taking into account that players do not click exactly on the same pixel of the image we applied a clustering strategy. Each point was clustered with the closest neighboring point if the distance between the two points was shorter than the typical size of a bacillus.

To classify a point in the FOV as a bacillus a given number of players must agree: this number is denominated as Quorum (Q). Group sizes (GS) from 1 to 30 gameplays and quorums, from 1 to the maximum number of gameplays, were tested to maximize the performance in the whole test dataset.

The performance of the collective detection algorithm has been evaluated for each quorum (Q) and each size of players groups (GS) with respect to the gold standard measuring: the positive predictive value (precision)(1), the sensitivity (recall or true positive rate (TPR)) (2), the F1 score (3) and the specificity (true negative rate TNR) (4). Collective detections with a given Quorum and Group Size were considered true positives (tp) if the positive cluster distance to a gold standard detection is shorter than the typical size of a bacilli. Accordingly, collective detections with distance greater than the typical bacilli size with respect to a reference bacillus are considered as false positives (fp). All the bacilli that were not collectively detected were considered as false negatives (fn). To calculate the true negatives (tn) we measured the number of equivalent bacilli in the area of the field of view were no bacilli were identified by the experts nor by the collective detections. To this end the area of a bacillus and its immediate surroundings (bacillus area) is used to divide the area of the FOV free of bacilli and collective detections.

Additionally Cohen’s Kappa was computed to assess the agreement between the collective assessment and the reference gold standard.