A multi-parameter diagnostic clinical decision tree for the rapid diagnosis of tuberculosis in HIV-positive patients presenting to an emergency centre

Background: Early diagnosis is essential to reduce the morbidity and mortality of HIV-associated tuberculosis. We developed a multi-parameter clinical decision tree to facilitate rapid diagnosis of tuberculosis using point-of-care diagnostic tests in HIV-positive patients presenting to an emergency centre. Methods: A cross-sectional study was performed in a district hospital emergency centre in a high-HIV-prevalence community in South Africa. Consecutive HIV-positive adults with ≥1 WHO tuberculosis symptoms were enrolled over a 16-month period. Point-of-care ultrasound (PoCUS) and urine lateral flow lipoarabinomannan (LF-LAM) assay were done according to standardized protocols. Participants also received a chest X-ray. Reference standard was the detection of Mycobacterium tuberculosis using Xpert MTB/RIF or culture. Logistic regressions models were used to investigate the independent association between prevalent microbiologically confirmed tuberculosis and clinical and biological variables of interest. A decision tree model to predict tuberculosis was developed using the classification and regression tree algorithm. Results: There were 414 participants enrolled: 171 male, median age 36 years, median CD4 cell count 86 cells/mm 3. Tuberculosis prevalence was 42% (n=172). Significant variables used to build the classification tree included ≥2 WHO symptoms, antiretroviral therapy use, LF-LAM, PoCUS independent features (pericardial effusion, ascites, intra-abdominal lymphadenopathy) and chest X-ray. LF-LAM was positioned after WHO symptoms (75% true positive rate, representing 17% of study population). Chest X-ray should be performed next if LF-LAM is negative. The presence of ≤1 PoCUS independent feature in those with ‘possible or unlikely tuberculosis’ on chest x-ray represented 47% of non-tuberculosis participants (true negative rate 83%). In a prediction tree which only included true point-of-care tests, a negative LF-LAM and the presence of ≤2 independent PoCUS features had a 71% true negative rate (representing 53% of sample). Conclusions: LF-LAM should be performed in all adults with suspected HIV-associated tuberculosis (regardless of CD4 cell count) presenting to the emergency centre.


Introduction
Tuberculosis remains an important cause of morbidity and mortality globally, despite ongoing control efforts 1 . The early diagnosis and successful treatment of people with tuberculosis should reduce the risk of mortality and morbidity, and decrease the transmission of tuberculosis 2 . Factors associated with delays in the diagnosis of tuberculosis include the limitations of tuberculosis diagnostic tests, limited availability of these tests in high burden settings, and the reduced diagnostic performance of tuberculosis tests in people living with HIV (PLWH) [3][4][5] . In PLWH with advanced immunosuppression, the diagnosis of active tuberculosis is challenging due to more atypical clinical presentations; other opportunistic infections with similar presentations; high proportion with inability to produce sputum or negative sputum smears; and high rates of extra-pulmonary and disseminated tuberculosis [6][7][8][9][10][11] . Autopsy studies in HIV-positive adults report a very high proportion with tuberculosis (32% to 47%), almost half (46%) of which was undiagnosed pre-mortem 12 .
The WHO recommends that HIV-positive patients should be systematically screened for active tuberculosis when visiting a healthcare facility 2 . Many patients access the healthcare system through hospital emergency centres. The prevalence of HIV-related admissions to emergency centres varies, with up to 43% documented in Uganda 13 . These patients are often severely ill and would benefit from prompt diagnosis and treatment of tuberculosis to decrease mortality 14 .
The use of point-of-care diagnostic tests would facilitate rapid diagnosis of tuberculosis. Lateral flow lipoarabinomannan (LF-LAM) is currently the only true point-of-care test, with other tests (e.g. smear microscopy, Xpert MTB/RIF, Xpert MTB/RIF Ultra, GeneXpert OMNI, and portable digital chest X-ray) being near point-of-care tests 15 . Point-of-care ultrasound (PoCUS) is also a potentially useful test for extra-pulmonary or disseminated tuberculosis 16 . No evidence-based algorithm incorporating clinical information, individual PoCUS features, and urine LF-LAM for diagnosing tuberculosis in HIV-positive patients currently exists. We performed a cross-sectional diagnostic study and developed a multi-parameter clinical decision tree to facilitate rapid diagnosis of tuberculosis in HIV-positive patients presenting to an emergency centre.

Study setting and participants
Khayelitsha is a township with a mix of formal and informal housing in Cape Town, South Africa. The Khayelitsha Health sub-district has an antenatal HIV prevalence of 34% 17 , and an annual tuberculosis notification rate of 917 per 100,000 persons 18 . The emergency centre of Khayelitsha Hospital (a district-level hospital) manages ± 35,000 patients per annum with an admission rate around 30%. The HIV prevalence of patients managed in the resuscitation unit is 23% 19 .
Inclusion criteria were adults (≥18 years); HIV-positive (HIV-status was determined by laboratory confirmation or from the clinical records), and presence of at least one symptom of the WHO's recommended four-symptom screening rule for tuberculosis in PLWH (cough of any duration, fever, drenching night sweats, or weight loss) 20 . Exclusion criteria were: presenting to the emergency centre more than 24 hours before screening; received anti-tuberculosis treatment within 3 months of screening; pregnant; main clinical presentation of meningitis syndrome or new focal neurology; trauma, gynaecological or psychiatric presentation. Data from this cohort relating to LF-LAM and PoCUS were previously published 21,22 . These manuscripts described the use of LF-LAM in an acute care setting and identified PoCUS features independently associated with HIV-associated tuberculosis 21,22 .
All participants provided written informed consent using a two-phase consent process. Severely ill participants were provided with a short one-page consent form indicating what extra tests would be done and that these would be used to

Amendments from Version 1
We have explained our choice of the LF-LAM test used in the methods section: 'The Alere Determine™ TB LAM Ag test was used since it was the only commercially available test at the time. ' We have clarified the difference between 'Individual PoCUS features' and 'Independent PoCUS features' in the statistical analysis section: 'Individual PoCUS features were determined by univariable analysis using a 10% significance level 22 . In PoCUS features where different thresholds for positivity exists (e.g., size of intra-abdominal lymphnodes), the lowest threshold was included. Individual PoCUS features included any sized pericardial effusion, pleural effusion, ascites, any focal splenic lesion, and any sized intra-abdominal lymphadenopathy 22 . Independent PoCUS features were determined by multivariable logistical regression 22 . The PoCUS features independently associated with tuberculosis were pericardial effusion of any size, ascites, and intra-abdominal lymphadenopathy of any size 22 . ' We have added a footnote to Table 3 linking the 63 patients with a clinical diagnosis of tuberculosis to Table 4 which provide Reasons for the diagnosis of tuberculosis without microbiological confirmation.
We have added the estimated sensitivity and specificity of the FujiLAM test in the discussion: ' Another urine-based LAM assay, Fujifilm SILVAMP TB LAM (FujiLAM; Fujifilm, Tokyo, Japan), has higher sensitivity (70.4% versus 28.1%) but somewhat lower specificity (90.8% versus 95.0%) than the LF-LAM assay we used 33 . ' We included in the discussion that the number need to scan is likely to increase when used in areas with a lower tuberculosis prevalence (and vice versa).
We have added to the limitations that: 'The individual and independent PoCUS features were based on a single study and needs further evaluation. ' We have added to the conclusion that the role of PoCUS as a rule-in test to diagnose HIV-associated tuberculosis in the emergency centre needs to be further investigated.

Procedures and samples
Consecutive patients evaluated at the emergency centre were screened for eligibility from June 2016 through October 2017. A standardized data collection form was used to record demographic and clinical information. Urine, sputum and blood samples were obtained from all patients whenever possible (see Extended data) 23 . Fresh urine samples were tested using the Xpert MTB/RIF assay (GX4) (Cepheid Inc., Sunnyvale, CA, USA) and for the presence of LAM (Alere Determine™ TB LAM Ag test, Alere Inc., Waltham, MA, USA); LF-LAM was performed in the emergency centre 21 . The Alere Determine™ TB LAM Ag test was used since it was the only commercially available test at the time. Sputum specimens were tested using the Xpert MTB/RIF assay (GX4) and cultured in mycobacterial growth indicator tubes (MGIT; Becton Dickson, Sparks, MD, USA). Mycobacterial blood cultures were performed using the BACTEC MYCO/F Lytic blood culture bottle (Becton Dickson, Sparks, MD, USA). The MTBDR plus assay (Hain Lifescience, Nehren, Germany) were used to identify culture isolates as M. tuberculosis complex. Complete blood count and CD4 cell count were done as part of routine clinical care. CD4 cell count results were accepted if performed within 3 months of enrolment. The National Health Laboratory Service performed all the tests.
Ultrasound examination was performed in the emergency centre and the findings documented on a standardized assessment form. A single, emergency physician (with adequate training and credentials as specified by the International Federation of Emergency Medicine's Emergency Ultrasound Special Interest Group 24 ) performed the ultrasound examination using either a Mindray M5™ ultrasound system with a 3C5s (2.5-6.5 MHz) convex probe and a 7L4s (5.0-10 MHz) linear probe (Mindray DS USA, Inc., Mahwah, NJ, USA) or a NanoMaxx™ ultrasound system with a L38n (10-5 MHz) linear array probe and a C60n (5-2 MHz) curved array probe (SonoSite Inc., Bothell, WA, USA). Ultrasound examinations were performed before any specimens were collected. At the time of the ultrasound, the point-of-care sonographer had access to the clinical information but not to results from the reference standard (detection of M. tuberculosis from Xpert MTB/RIF and/or culture on any specimen obtained from any anatomical site). Chest x-rays were reviewed by a single radiologist using a standardized assessment form (see Extended data 23 . Chest x-rays were classified as unlikely tuberculosis, probable tuberculosis, and likely tuberculosis. The radiologist had no access to clinical information or the reference standard.

Statistical analyses
The sample size was determined with the aim of including more than the recommended 10 candidate predictors (including interaction terms) from multivariable logistic regression analyses 25 . The tuberculosis prevalence in HIV-positive patients in the emergency centre is around 25% 19 , and a sample size of 400 HIV-positive participants was deemed adequate to include 100 tuberculosis cases. Data were analysed with the use of SAS/STAT ® software ( Individual PoCUS features were determined by univariable analysis using a 10% significance level 22 . In PoCUS features where different thresholds for positivity exists (e.g., size of intra-abdominal lymphnodes), the lowest threshold was included. Individual PoCUS features included any sized pericardial effusion, pleural effusion, ascites, any focal splenic lesion, and any sized intra-abdominal lymphadenopathy 22 . Independent PoCUS features were determined by multivariable logistical regression 22 . The PoCUS features independently associated with tuberculosis were pericardial effusion of any size, ascites, and intra-abdominal lymphadenopathy of any size 22 .
For correlated variables, when more than one index was significant in a univariate model, the one with more significant effect on the 2 log L -statistic was first entered into the multivariable model. However, in the final model, the effect of substituting variables was also assessed. When more than one correlated variable was significant in multivariable models, the final model selected was the one associated with the smallest Akaike's information criterion (AIC), a statistic derived from the 2 log L -statistic. Multivariable model building was based on the combination of significant variables in univariable models (based on a threshold p<0.10). A model comprising WHO screening symptoms and history of current antiretroviral therapy use was used as starting model 20 . The ability of logistic regression models to discriminate between participants who had and those who did not have microbiologically confirmed tuberculosis was assessed using area under the receiver operating characteristic curves (AUC) and the relative integrated discrimination improvement (RIDI) which measures the percentage increase in discrimination when an extra variable is added to a prediction model 27,28 . AUC comparisons used nonparametric methods 29 . Bootstrap techniques were used to derive the 95% confidence interval (CI) for the RIDI estimates, which were based on 1000 replications.
We developed a decision tree model to predict microbiologically confirmed tuberculosis, including variables from the best performing multivariable logistic regression model, using the classification and regression tree (CART) algorithm and rpart package (version 4.1-11) of the R statistical software. The CART algorithm builds a tree model through recursive partitioning, through which process the data is successfully split into increasingly homogenous subgroups. At each stage (also known as node), the algorithm selects a predictor and a cut-point associated with the best ability of the predictor to discriminate participants with tuberculosis from those without. This was less an issue in the current analyses with no continuous predictor. However, for class variables with more than two levels, the algorithm could collapse levels in order to achieve the best discrimination. The CART starts with one predictor, then adds other predictors (and nodes) until reaching homogenous groups or having subgroups with few participants (<5), or exhaustion of predictors which can contribute further to subgroups refinement. Due to the small size of the achieved tree, no pre-or post-pruning was applied. CART uses a generalization of the binomial variance (Gini index) for its impurity function, and employs a 10-fold cross-validation to estimate error rates. The algorithm code is available as Extended data 30 .
Demographic and clinical characteristics of participants with and without confirmed tuberculosis are presented in Table 2. The median CD4 cell count was 86 cells/mm 3 (25 th -75 th percentile, 30-218). The alternative diagnoses and the reasons for a clinical tuberculosis diagnosis in participants without microbiologically confirmed tuberculosis are presented in Table 3 and Table 4. The all-cause in-hospital mortality was 7.2% (n=30), 15 of whom had confirmed tuberculosis (representing 8.7% in hospital). These individual-level data are available at Zenodo 31 .

Multivariable model
Measures of model performance are summarized in Table 6. The initial model (WHO screening symptoms ≥2, antiretroviral therapy use) had poor discriminatory power in predicting confirmed tuberculosis with an AUC of 0.615. The addition of either PoCUS independent features or PoCUS individual features to the initial model both improved model goodness of fit and its discriminatory power, however the model with PoCUS independent features had a greater AUC and a smaller AIC. The further addition of urinary LF-LAM and chest x-ray improved the model. Adding CD4 cell count did not improve the performance of the model (Table 6).
Based on RIDI% estimates, adding urinary LF-LAM, PoCUS independent features, and chest x-ray to the initial and subsequent models conferred similar levels of improvement for tuberculosis prediction (Table 7). Change in RIDI% was meaningless when CD4 cell count was added to the model comprising WHO symptoms screen, antiretroviral therapy use, PoCUS independent features, urinary LF-LAM and chest X-ray (RIDI% 2.6 (2.4-2.7)).

Prediction tree
Significant variables (Model F in Table 7) were included in the splitting process to build the classification tree for microbiologically confirmed tuberculosis. The CART created for confirmed tuberculosis is shown in Figure 2, and the CART as applied to a theoretical cohort of 1000 patients is presented in Figure 3. The CART analysis suggest that once screened via WHO symptoms as eligible for further diagnostic investigations, the number of WHO symptoms present does not add further to the discrimination of people with tuberculosis from those without. Furthermore, CART positions urinary LF-LAM as the next screening test after WHO symptoms, with 75% of people with positive urinary LF-LAM test (17% of all those with positive WHO symptoms) having a definitive diagnosis of microbiologically confirmed tuberculosis (Figure 2 and Figure 3). For those with negative urinary LF-LAM, CART positions chest x-ray as the next screening test. Chest x-ray appears twice, but with complementary and not overlapping contributions. The first appearance of chest x-ray (after those with negative urinary LF-LAM) serves to separate participants with 'likely tuberculosis' on chest x-ray from those with 'possible or unlikely tuberculosis' on chest x-ray. The presence of one or no PoCUS independent features in those with 'possible or unlikely tuberculosis' on chest x-ray (47% of the starting sample) isolates 83% of this subgroup (representing 39% of the starting sample) where tuberculosis was not microbiologically confirmed ( Figure 2 and Figure 3). The second appearance of chest x-ray occurs in participants with ≥2 PoCUS independent features and serves to separate those with 'possible tuberculosis' on chest x-ray from those with 'unlikely tuberculosis' on chest x-ray. The validation for the decision tree is presented in Figure 4.
We created a second decision tree to make it more clinically applicable by removing the history of antiretroviral therapy (ART) status, because ART interruption is often not disclosed and ART status may be unavailable in confused patients ( Figure 5 and Figure 6). The branch on the original tree relating to antiretroviral therapy no longer expands, narrowing down what to decide for the 24% of the sample with negative urinary LF-LAM and 'likely tuberculosis' on chest x-ray. Just over half (56%) of these participants will have confirmed tuberculosis.
We created a third prediction tree by only excluding chest x-ray, which is not a true point-of-care test (Figure 7 and Figure 8). CART positions PoCUS as the next screening test for those with a negative urinary LF-LAM. The presence of two or less independent PoCUS features (75% of the starting sample) had a true negative rate of 71% (representing 53% of the starting sample) in the subgroup where tuberculosis was not microbiologically confirmed.

Discussion
We developed a prediction tree to diagnose HIV-associated tuberculosis in an emergency centre in a high burden setting. The variables selected on multivariable analysis for inclusion in the final model were the presence of >2 WHO screening symptoms, current antiretroviral therapy use, urinary LF-LAM, independent PoCUS features, and chest x-ray. The CART analysis positioned urinary LF-LAM as the first test to perform in participants with positive WHO screening symptoms, followed by chest x-ray. We also developed a simplified prediction tree by excluding chest x-ray, which is not a true point-of-care test: CART positioned PoCUS as the next screening test for those with a negative urinary LF-LAM.
The use of urinary LF-LAM was the predictor with the best ability of creating pure groups (either with or without tuberculosis); classifying almost 25% of the study sample (75% of which were true positives) regardless of their CD4 cell count. The false positive rate of 25% is less than a recent Cochrane review, in which 33% of participants with tuberculosis symptoms had a false positive urinary LF-LAM result for microbiologically confirmed tuberculosis 32 . However, inappropriate exclusions (e.g. participants unable to produce sputum), different enrolment criteria and different CD4 cell counts could potentially explain the high false negative rate seen in the Cochrane review 32 . Another urine-based LAM assay, Fujifilm SILVAMP TB LAM (FujiLAM; Fujifilm, Tokyo, Japan), has The performance of PoCUS when chest x-ray is available is limited ( Figure 2 and Figure 3). One of every 11 PoCUS examinations will be 'positive' (i.e. two or more PoCUS independent features), but then an evaluation of the chest x-ray would still be needed to refine the classification of patients with and without tuberculosis. A 'negative' PoCUS examination (i.e. the presence of ≤1 PoCUS independent feature) will only rule out 39% of all patients with a clinical suspicion of tuberculosis. This supports other studies and the current WHO guidelines that ultrasound is an additional diagnostic tool and should not replace chest x-ray as the initial imaging step to diagnose tuberculosis in HIV-positive patients 20,37 . However, chest x-ray is not a true point-of-care test, unlike PoCUS. In acute care settings where chest x-ray is not readily available PoCUS has a 100% true positive rate when all 3 of the independent features were detected, indicating its potential value as a rule-in test; however, 39 PoCUS examinations will need to be performed to

Diagnostic test n
Suggestive formal abdominal ultrasound done in radiology department 19 Suggestive chest X-ray 9 Positive urine lateral flow lipoarabinomannan (LF-LAM) 7 Suggestive formal abdominal ultrasound and suggestive chest X-ray 6 Not improving on empiric antibiotics 4 Raised adenosine deaminase (ADA) in effusion fluid (pleural or ascitic) 4 Cerebrospinal fluid suggestive of tuberculous meningitis (TBM) 4 Suggestive chest X-ray and positive urine LF-LAM 3 Suggestive formal abdominal ultrasound and positive urine LF-LAM 2 Psoas abscess on formal ultrasound 2 Caseous necrosis on biopsy (histology) 1 Suggestive computer tomography (CT) scan of abdomen 1 Suggestive chest X-ray and raised ADA in effusion fluid 1 Total 63    confidently diagnose one additional patient in those who had a negative LAM. This number need to scan is likely to increase when used in areas with a lower tuberculosis prevalence (and vice versa). The presence of ≤2 PoCUS independent features will rule out 53% of patients with a clinical suspicion of tuberculosis in situations where chest x-ray is not available; however, the high false negative rate (29%, 218/750) indicates that PoCUS cannot be used as a rule-out test and these patients will need to undergo further testing.
The use of urinary LF-LAM should be prioritised in all HIVpositive patients (regardless of CD4 cell count and clinical condition) who presents to the emergency centre with WHO tuberculosis symptoms. Although a result can be obtained after 25 minutes, a major time increasing factor would be to get a urine sample. The history of current use of ART should be obtained if the patient's condition allows, as it further refines the diagnostic ability of the algorithm by increasing both the true positive and the true negative rate. Chest x-ray should still be performed if available. In these settings, the value of PoCUS becomes doubtful due to the low positive yield (5%) and the further interpretation of a chest x-ray to better classify cases and non-cases. Although 47% of patients will have negative results for urinary LF-LAM, chest x-ray and PoCUS, the true negative rate is only 83%, too low to confidently rule tuberculosis out. In emergency centres without chest x-ray availability (e.g. limited resources, restricted radiology consulting times), physicians can confidently diagnose tuberculosis in patients where all three independent PoCUS features are present (true positive rate 100%). However, only 2% of the PoCUS         examinations are expected to be positive and one can argue whether the time spend to perform the PoCUS is worthwhile. The 71% true negative rate again indicates the need for further diagnostic testing.
Our study has some limitations. Our findings may not be generalizable as the study was conducted in a single emergency centre in a high TB/HIV-prevalence setting; a single, experienced operator performed all the PoCUS examinations; and the chest x-rays were interpreted by a single experienced radiologist. The individual and independent PoCUS features were based on a single study and needs further evaluation.
The main strength of our study is the robust microbiologic reference standard composed of TB culture and Xpert MTB/ RIF performed on multiple samples from different anatomic sites. However, it is still possible that some TB cases were missed by the reference standard. The study was also performed under routine conditions experienced in the emergency centre. Lastly, robust analytic strategies were used to develop and validate the diagnostic decision tree.

Conclusion
We developed a near-patient and point-of-care decision tree for the diagnosis of HIV-associated tuberculosis in acute care settings. Implementing this decision tree following screening via WHO symptoms can allow immediate initiation of TB treatment within the emergency centre in about a quarter of suspected patients among whom 75% would have microbiologically confirmed tuberculosis, or withhold such treatment in nearly half of suspected patients, among whom less than 18% will have microbiologically confirmed tuberculosis. Urinary LF-LAM had a 75% true positive rate, representing 17% of participants with positive WHO screening symptoms regardless of CD4 cell count and its use should be prioritised. The contribution of PoCUS in the context of urinary LF-LAM and chest X-ray availability was limited, due to the low positive yield, the need for further chest x-ray interpretation and the high false negative rate. In acute care settings without chest x-ray availability, PoCUS has a 100% true positive rate, but will only affect 2% of eligible patients. The role of PoCUS as a rule-in test to diagnose HIV-associated tuberculosis in the emergency centre needs to be further investigated.

Peter MacPherson
Liverpool School of Tropical Medicine, Liverpool, UK Thanks for asking me to review this manuscript. Very nice study, and clearly reported. I have only a few comments the authors should address.
Major comments Prevalence of microbiologically-confirmed TB is very high in this population, and health facilities seem to be reasonably well resourced. The authors should add text to the Discussion to discuss generalisbility to settings in Africa outside of the Khayelitsha area. 1.
The CD4 cell count profile is low, and antiretroviral therapy coverage considerably lower than we have seen in other settings. It would be good to add a sentence in the discussion to reflect on how the accuracy, performance, and applicability of the final decision tree may be affected in other settings in the African continent where we see different patterns of ART coverage.

2.
Presumably HIV viral load measurements were not available for inclusion in the model? We have seen in several studies now a moderately high prevalence of detectable viraemia in HIV-positive adults who reported taking ART, and is strongly associated with adverse clinical outcomes. I could imagine that HIV viral load measurement (e.g. using the GeneXpert platform) might be a useful diagnostic predictor of prevalent TB. It would be worth adding a sentence to discuss ART coverage and viral failure in the limitations section of the Discussion. 3.
In Figure 1, not clear what were the reasons for "Research related problems" in the 55 participants who were excluded. The authors should add a sentence to describe these participants in more details in the results, and provide reassurance that these exclusions have not resulted in selection/spectrum bias.

4.
Discussion. Sentence about coverage of urine LAM is a little outdated, and several more countries have rolled-out this test, including Malawi.

5.
POCUS is critically dependent on trained operators and high levels of quality assurance, which are not usually available in most settings in Africa. In most settings, operators trained in POCUS will not be available, whereas digital chest with CAD is often available, meaning that pragmatically, the decision tree placement of POCUS prior to CXR doesn't necessarily make sense. I would perhaps temper the paragraph about the utility of POCUS, especially given the suboptimal performance found here, and in other studies.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: HIV, TB, Public Health, epidemiology, randomised controlled trials, diagnostic accuracy evaluations, global health

Infectious and Tropical Diseases Department, San Bortolo Hospital, Vicenza, Italy
This manuscript is aimed to describe the use of LF-LAM in an acute care setting and identified PoCUS features independently associated with HIV-associated tuberculosis. Then, the study proposes a decisional algorithm for the diagnosis of TB in HIV patients, including point-of-care tests, as a result of a regression tree algorithm (CART).
This algorithm combines the only two tests really available at the patient bedside and having the true characteristics of point of care test: LF-LAM and PoCUS, and the "classic" tests, CXR and GeneXpert PCR plus the culture of the sputum or pus: the two latter tests are now considered as the gold standard.
The scenario is peculiar: the patients accessing this hospital have a high prevalence of HIV and are very deeply immunosuppressed, with very low CD4 count (never treated before for both conditions, or only treated for HIV).
General comments: Portable CXR and GeneXpert, reported in the text as "near point-of caretests" really improved the opportunity of TB diagnosis in the last decades and now are the standard-of-diagnostic tools in many hospitals in many different resource settings. However, the clinical impact of these technologies on the TB epidemic and mortality was not so impressive, particularly if we consider patients living with HIV.
The point-of-care tests like LF-LAM and PoCUS are available also in resource-limited settings (RLS) with affordable costs, can be performed in small and rural hospitals with basic labs without the need for referral centers. This will be a great advantage and an addition in the COVID era.
Limitations of the study (some of the limitations are already discussed in the study).
The authors do not report the TB treatment nor considerations of the patients' clinical outcome, so these tests are evaluated only in their performances in the diagnosis of TB (compared with the gold standard of culture), not considering the efficacy of treatment.

1.
In the file "clinical data of the patients", data on HIV treatment are very limited (see questions).

2.
As at least two different types of LF-LAM tests are commercially available and have different performances, the author should explain the choice (LAM Alere Ag test) in the methods section and discuss in depth the performances of the different tests commercially available.

3.
Intrinsic limitations of the sensitivity of LF-LAM were described in a recent paper of Tlali et al. (2020) 1 -the authors should discuss this paper.

4.
POCUS examinations were performed by a single, experienced operator. A second blinded reviewer of the clips and/or sonographer would have added value to the results, considering that the US is a repeatable test.

5.
The difference between the "individual" and "independent" aspect of the US is based on week evidence and considering that the result of this differentiation is that the focal splenic 6.
lesions and pleuric effusions, I'd delete it. Or, the author should just propose it separately and say that it needs further evaluation. Questions and suggestions: HAART is a relevant issue in these patients. In the database, a considerable part of patients results on HAART therapy, so it is unclear why these patients were so deeply immunosuppressed; failure of the therapy? No compliance? The authors report that the patients are often confused at admission. This is true. However, the failure of the HAART therapy should both have had a role or be a consequence of TB infection: a brief comment of this aspect in the discussion should be appreciated. Also, I've noticed that no data on the VL are available, probably the test was not done. Please add it.
Surprisingly, in the cohort, the different levels of CD4 considered by stratifying the patients in different groups with CD4 > of 200 or less. (<100 cells/mm3, 100-200 cells/mm3, >200 cells/mm3) does not have any impact. Probably, you should have considered another group with a higher level of CD4 and a working active therapy in the stratification (if that kind of patient were included in the cohort). If the CD4 doesn't matter, also the HIV status could have been not so relevant as presumed in the performances of LF-LAM. You should add some considerations on these aspects.
The neurological presentation was considered an exclusion criterion ( Figure 1), but, in Table 1, 13 patients result to have a CRF sample. Please explain why these patients did a Lumbar Puncture, it should be interesting.
In Table 3, I'd link with * the 63 patients with clinical diagnoses of tuberculosis of Table 3 with  Table 4. Surprisingly, no one patient was diagnosed as ARL (AIDS-related lymphoma) that is the principal differential diagnosis (both clinical and the US) in TB-HIV patients: I'll speculate of some bias of selections (is it possible that patients with suspected lymphoma are preselected for referral centers? Sometimes it happens in African settings).
Only 1 case of NTM was diagnosed in this cohort: it is interesting because in the work of Nel et al.
(2017) 2 a considerable rate of false-positive LF-LAM test was found, and the cohort includes a high number of patients with a low CD4 count.
In Table 4, some of the "Reason for diagnosis of tuberculosis without microbiological confirmation", as "Suggestive formal abdominal ultrasound is done in radiology department" are really unclear, data on the true outcome of patients if available should be of particular interest. I partially disagree with the final consideration "The contribution of PoCUS in the context of urinary LF-LAM and chest X-ray availability was limited, due to the low positive yield, the need for further chest x-ray interpretation and the high false-negative rate...." because if you consider that LAM has good sensitivity but lower specificity, PoCUS whose specificity is very high could be the perfect tool. Please add the consideration to the conclusions.
Final comment: it's common thinking that "we have CXR, GeneXpert, and QGT, and we don't need further tests". Theoretically, it seems difficult to support the need for further tests for TB. But, in "real life", there is a considerable "grey area" of challenging cases, particularly in patients with extrapulmonary TB, needing expensive second-level tests, like CT scan, Pet CT, laparoscopy with biopsy, staining and pathologists, all not available in LRS were most patients with TB live. The alternative is to start with an "empirical treatment" without any scientific evidence of TB except the WHO symptoms. But this is a rough approach in the MDR-TB era.
The authors present a scenario, with a sidereal prevalence of TB (47%) and tremendously bad treatment of HIV. This peculiar situation might have influenced the results: other studies should have been carried in different settings, including a limited resource setting with a lower prevalence of HIV (and TB); a database including treatment and outcomes of the patients will add a lot. I suggest adding these considerations to the conclusions. To my knowledge, urinary LAM is increasingly used in LRS hospitals, but the test has some limitations and probably should be improved. However, probably there isn't a perfect test in TB and the combination of different tests is the better choice at the present moment.
I apologize to the editor and authors but my knowledge in statistics is limited, so my review is incomplete and only focused on the clinical aspect of the work. I think a mathematics or a statistics expert should be included as a reviewer of this manuscript Thank you for asking me to review this manuscript. I read it with high interest, and I found it accurate, written with conviction, and powerfully argued.

Reviewer Expertise: I'm a full-time clinician in Infectious Diseases and Tropical Medicine in San
Bortolo Hospital, Vicenza, Italy. I started using ultrasound, POCUS and the interventional US in the setting of infective patients included HIV/TB patients in 1995 and exported my experience in remote settings with very constrained resources.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
in different groups with CD4 > of 200 or less. (<100 cells/mm3, 100-200 cells/mm3, >200 cells/mm3) does not have any impact. Probably, you should have considered another group with a higher level of CD4 and a working active therapy in the stratification (if that kind of patient were included in the cohort). If the CD4 doesn't matter, also the HIV status could have been not so relevant as presumed in the performances of LF-LAM. You should add some considerations on these aspects.

We unfortunately did not include a 4th group representing higher CD4 cell counts. We did comment on the use of urinary LF-LAM in participants with positive WHO screening symptoms regardless of CD4 cell count. This is an area that need further exploration.
The neurological presentation was considered an exclusion criterion (Figure 1), but, in Table  1, 13 patients result to have a CRF sample. Please explain why these patients did a Lumbar Puncture, it should be interesting.

Only patients with a main neurological presentation were excluded. All laboratory tests performed by the hospital clinicians (non-research group) were evaluated and included if the tests related to tuberculosis. There is a very low threshold at the hospital to do a lumbar puncture as part of the evaluation of immunodeficient patients and hence the reason for including 13 CSF samples.
In Table 3, I'd link with * the 63 patients with clinical diagnoses of tuberculosis of Table 3 with Table 4. Table 3 linking the 63 patients to Table 4.

We have added a footnote to
Surprisingly, no one patient was diagnosed as ARL (AIDS-related lymphoma) that is the principal differential diagnosis (both clinical and the US) in TB-HIV patients: I'll speculate of some bias of selections (is it possible that patients with suspected lymphoma are preselected for referral centers? Sometimes it happens in African settings).

We are not aware that patients with suspected lymphoma are per-selected for referral centers at the study hospital. We can only speculate on why AIDS-related lymphoma was not diagnosed.
Only 1 case of NTM was diagnosed in this cohort: it is interesting because in the work of Nel et al. (2017) 2 a considerable rate of false-positive LF-LAM test was found, and the cohort includes a high number of patients with a low CD4 count.

We can only speculate on why non-tuberculous mycobacterial infection was only cultured in one patient.
In Table 4, some of the "Reason for diagnosis of tuberculosis without microbiological confirmation", as "Suggestive formal abdominal ultrasound is done in radiology department" are really unclear, data on the true outcome of patients if available should be of particular interest. certain ultrasound features (e.g. hypo echoic splenic  lesions), where as others want to see more than one ultrasound feature (e.g. pericardial effusion and splenic lesions).

We unfortunately did not follow patients up to see whether their clinical condition actually improves after the empiric initiation of anti-tuberculous treatment. We purposefully used the term 'suggestive' as there is no robust evidence on this (hence a motivation for the current study). The clinical diagnosis of tuberculosis also differ between attending physicians as some will only use
I partially disagree with the final consideration "The contribution of PoCUS in the context of urinary LF-LAM and chest X-ray availability was limited, due to the low positive yield, the need for further chest x-ray interpretation and the high false-negative rate...." because if you consider that LAM has good sensitivity but lower specificity, PoCUS whose specificity is very high could be the perfect tool. Please add the consideration to the conclusions.

We have added that the the role of POCUS as a rule-in test to diagnose HIV-associated tuberculosis needs to be further investigated
Final comment: it's common thinking that "we have CXR, GeneXpert, and QGT, and we don't need further tests". Theoretically, it seems difficult to support the need for further tests for TB. But, in "real life", there is a considerable "grey area" of challenging cases, particularly in patients with extrapulmonary TB, needing expensive second-level tests, like CT scan, Pet CT, laparoscopy with biopsy, staining and pathologists, all not available in LRS were most patients with TB live. The alternative is to start with an "empirical treatment" without any scientific evidence of TB except the WHO symptoms. But this is a rough approach in the MDR-TB era.
The authors present a scenario, with a sidereal prevalence of TB (47%) and tremendously bad treatment of HIV. This peculiar situation might have influenced the results: other studies should have been carried in different settings, including a limited resource setting with a lower prevalence of HIV (and TB); a database including treatment and outcomes of the patients will add a lot. I suggest adding these considerations to the conclusions. To my knowledge, urinary LAM is increasingly used in LRS hospitals, but the test has some limitations and probably should be improved. However, probably there isn't a perfect test in TB and the combination of different tests is the better choice at the present moment.
I apologize to the editor and authors but my knowledge in statistics is limited, so my review is incomplete and only focused on the clinical aspect of the work. I think a mathematics or a statistics expert should be included as a reviewer of this manuscript Thank you for asking me to review this manuscript. I read it with high interest, and I found it accurate, written with conviction, and powerfully argued.  Table 4 details the cases diagnosed with TB without microbiological confirmation -a total of 63 compared with a total of 172 microbiologically proven cases. This group would equate to 25% of patients finally treated for TB. I assume reading the paper these 25% were not included in the final statistics as the abstract clearly states the gold standard that the variables were looked at with respect to microbiologically confirmed TB.
This data raises a few points of interest. I note in Table 4 almost 30% of the group diagnosed without microbiological proof were considered to have TB based on formal US findings. With my PoCUS interest it would be of great interest if the authors were able to review any reasons for differences in the US findings between the expert and PoCUS findings -this may be of interest for future improving of accuracy of any PoCUS algorithm.
These 25% are also of potential interest when considering the results (see 4.2) .

4.1
This covers all the valid points/shortcomings of the study and gives clear evidenced recommendations. Although the authors comment that a weakness of the study was that it was performed under routine conditions in the emergency department, this in my view makes the research more applicable.
4.2 Table 4 shows a number of patients in whom TB was diagnosed in the absence of microbiology which is over 25% of the final cases.
The main scientific thrust of the paper has to be on confirmed cases as a scientific gold standard.
However I do wonder in the final discussion if it is worth subjecting a few of the main decision trees and some of the univariable factors in Table 6 to data analysis which includes these cases to see how they might perform in a "real world" setting. This would obviously not be the main thrust of the scientific process but would be of great interest. Therein specifically, in light of the 30% being US findings I would be especially interested to see how PoCUS performed including that cohort of those eventually treated and whether it affects the number needed to scan?
In regard to the performance of PoCUS, specifically 39 being needed to opt in one case -is it worth highlighting that this is however likely to increase in low prevalence and decrease in higher prevalence areas?

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? 2 Study Design 2.1 The study design is appropriate for investigation of the subject and formulation of a decision tree.
2.2 The study population as indicated is one of a high prevalence and so appropriate to study though, as the authors clearly indicate the applicability to areas of lower prevalence is more difficult to predict.
2.3 Recruitment was good with respect to exclusion criteria some are obvious, others less obvious with no reason given as to exclusion of gynae/obstetric cases and CNS cases .There was a reasonable large amount excluded (132 in total) and some exclusions would prefer male over female inclusion (i.e. gynae/ obstetric though these appear small in number).

All gynae/obstetric cases and CNS cases were not excluded, only patients with a main clinical presentation of meningitis syndrome or new focal neurology; gynaecological or psychiatric presentation.
I note however the male study population was only 40%. With respect to the exclusion criteria a clearer statement of any differences with the study group would be of value.

3.1
The use of phraseology "PoCUS independent and PoCUS individual features" is slightly unclear and could be clarified. Syndr. 2020;83(4):415-423. 31904699 10.1097/QAI.0000000000002279), however we've reworded the paragraph under statistical analysis to better clarify the difference between the two types of POCUS features. Table 4 details the cases diagnosed with TB without microbiological confirmation -a total of 63 compared with a total of 172 microbiologically proven cases. This group would equate to 25% of patients finally treated for TB. I assume reading the paper these 25% were not included in the final statistics as the abstract clearly states the gold standard that the variables were looked at with respect to microbiologically confirmed TB.

This is correct. Only participants with microbiologically confirmation were included in the 172 cases
This data raises a few points of interest. I note in Table 4 almost 30% of the group diagnosed without microbiological proof were considered to have TB based on formal US findings. With my PoCUS interest it would be of great interest if the authors were able to review any reasons for differences in the US findings between the expert and PoCUS findings -this may be of interest for future improving of accuracy of any PoCUS algorithm.

We did not compare PoCUS with radiology-performed ultrasounds. The 30% referred to, relate to participants without microbiological proof and it does not reflect any difference with POCUS findings. We will consider to determine the correlation between radiologyperformed ultrasounds and POCUS findings as it would be of interest.
These 25% are also of potential interest when considering the results (see 4.2) .

4.1
This covers all the valid points/shortcomings of the study and gives clear evidenced recommendations. Although the authors comment that a weakness of the study was that it was performed under routine conditions in the emergency department, this in my view makes the research more applicable.
4.2 Table 4 shows a number of patients in whom TB was diagnosed in the absence of microbiology which is over 25% of the final cases.
The main scientific thrust of the paper has to be on confirmed cases as a scientific gold standard.
However I do wonder in the final discussion if it is worth subjecting a few of the main decision trees and some of the univariable factors in Table 6 to data analysis which includes these cases to see how they might perform in a "real world" setting. This would obviously not be the main thrust of the scientific process but would be of great interest. Therein specifically, in light of the 30% being US findings I would be especially interested to see how PoCUS performed including that cohort of those eventually treated and whether it affects the number needed to scan?
Thanks for the comment. It would be interesting to see how POCUS performed when clinical cases were included. However, we did not follow up patients to determine whether they actually improved on anti-tuberculous treatment and this will severely limit the interpretation of such results.
In regard to the performance of PoCUS, specifically 39 being needed to opt in one case -is it worth highlighting that this is however likely to increase in low prevalence and decrease in higher prevalence areas?
Thanks for the suggestion. We've incorporated it in the discussion section.
Competing Interests: No competing interests were disclosed.