The interobserver variability of thyroid fine needle aspirates using The Bethesda System for Reporting Thyroid Cytopathology

Introduction: The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) is a six category scheme for reporting thyroid Fine Needle Aspirates (FNAs) and represents a major step towards a uniform reporting system for thyroid FNAs, enabling better understanding of thyroid cytopathology report by referring clinicians and providing rational management guidelines. The present study was carried out to analyse cytological features of thyroid FNAs and categorise them by The Bethesda System for Reporting Thyroid Cytopathology; and to assess interobserver variability between two independent reporting pathologists using the Bethesda system. Materials and Methods: The study was conducted over three years from Nov. 2015 to Oct. 2018 in the Department of Pathology, Government Medical College, Jammu. The thyroid FNAs were categorized according to the Bethesda system by two independent pathologists in a double blinded fashion. Interobserver variability was assessed by calculating percentage of agreement, disagreement and statistically using Cohen’s kappa. Results: A total of 610 cases were categorized as Nondiagnostic-51 cases (8.36%), 50 cases (8.20%); Benign-524 cases (85.90%), 510 cases (83.61%); Atypia of Undetermined Significance-2 cases (0.33%), 12 cases(1.97%); Follicular Neoplasm-15 cases (2.46%), 16 cases (2.62%); Suspicious for Malignancy1 case (0.16%), 6 cases (0.98%); Malignant-17 cases (2.79%), 16 cases (2.62%) by two pathologists respectively. The two pathologists agreed in 548 cases (89.84%) and disagreed in 62 cases (10.16%) with Cohen’s kappa=0.628 (62.8%). Agreement was highest in Benign category-95.68%, followed by Malignant-87.5%, Follicular Neoplasm -75%, Nondiagnostic-68%. No agreement was seen in Atypia of Undetermined Significance and Suspicious for Malignancy categories. Conclusion: The present study thus encourages the use of The Bethesda System for Reporting Thyroid Cytopathology as a standardized reporting system with relative ease of reproducibility for effective communication among pathologists and clinicians with regard to thyroid Fine Needle Aspiration Cytology reporting and management. © 2020 Published by Innovative Publication. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by/4.0/)


Introduction
Thyroid diseases are among the commonest endocrine disorders worldwide. About 42 million people in India suffer from thyroid diseases. 1 Fine needle aspiration cytology (FNAC) is the first-line diagnostic procedure for evaluating thyroid lesions. It is a simple, rapid, costeffective test that provides valuable information about the to six or more category schemes with some relying on descriptive phrases instead of categories. 3,4 This lack of uniformity creates confusion amongst referring clinicians in the interpretation of thyroid cytopathology report, thereby affecting definitive clinical management. 2 To address terminology and other issues related to thyroid FNA, the National Cancer Institute (NCI) hosted the "NCI Thyroid Fine Needle Aspiration State of the Science Conference", a multidisciplinary conference that took place in Bethesda, Maryland, United States in October 2007,that led to the formation of The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC), a six-category scheme for reporting thyroid cytopathology that recommends each report should begin with a general diagnostic category. Each of the categories has an implied risk of malignancy that links it to a rational clinical management guideline. 6 Recently published data support the clinical utility and wide acceptance of TBSRTC by both practicing pathologists and clinicians. In Indian perspective too, many studies have been carried out that have concluded TBSRTC to be an effective reporting system for thyroid FNA. It improves perceptions of diagnostic terminology between cytopathologists and clinicians and also provides clear management guidelines to the clinicians. 2,4 In our institution, reporting of thyroid lesions on FNA is variable and no standardized system of reporting is being followed. The present study was thus carried out to analyse the cytological features of thyroid fine needle aspiration smears and categorise them by The Bethesda System for Reporting Thyroid Cytopathology; and to assess interobserver variability between two independent reporting pathologists using The Bethesda System for Reporting Thyroid Cytopathology.

Materials and Methods
The present study was conducted in the Department of Pathology, Government Medical College, Jammu over a period of three years from Nov.2015 to Oct.2018. It included all patients presenting with thyroid swelling referred to this department from various clinical departments of this hospital and from other health care centers. Non-cooperative and morbid patients were excluded from the study.
Smears were prepared from the sample obtained by aspiration or non-aspiration method and were stained with May-Grunwald-Giemsa (MGG) and Papanicolau (PAP) stains.
Stained smears were examined under light microscopy by two independent pathologists in a double blinded fashion. The cytological features were evaluated and reporting was done according to The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC).
The definitions and cytomorphological criteria as described in the, The Bethesda System for Reporting Thyroid Cytopathology atlas were followed by the two reporting pathologists (Pathologist I and Pathologist II). After assessing the adequacy as per TBSRTC adequacy criteria, the thyroid fine needle aspirates were categorized into the six categories of The Bethesda System for Reporting Thyroid Cytopathology. 7

Nondiagnostic/Unsatisfactory (ND/UNS)
Nondiagnostic is used to convey a sample that does not meet the adequacy criteria as given in the TBSRTC monograph. A thyroid FNA sample is considered adequate for evaluation if it contains a minimum of six groups of well-visualised follicular cells, with atleast ten cells per group, preferably on a single slide. Exceptions to this requirement apply to solid nodules with cytologic atypia, solid nodules with inflammation and colloid nodules where minimum number of follicular cells is not required for adequacy.
Nondiagnostic category included aspirates with fewer than six groups of well-preserved, well-stained follicular cell groups with ten cells each (excluding the exceptional circumstances); aspirates with poorly prepared, poorly stained, or significantly obscured follicular cells; cyst fluid, with or without histiocytes, and fewer than six groups of ten benign follicular cells.

Benign
The aspiration smears were categorized as benign if they showed cytomorphological features of a benign follicular nodule, lymphocytic thyroiditis, granulomatous thyroiditis or other entities like acute thyroiditis as per TBSRTC.

Atypia of Undetermined Significance/Follicular Lesion of Undetermined Significance (AUS/FLUS)
This diagnostic category was reserved for aspirates that contained cells with architectural and/or nuclear atypia that was not sufficient to be classified as suspicious for a follicular neoplasm, suspicious for malignancy, or malignant. On the other hand, the atypia was more marked than could be ascribed confidently to benign changes.

Follicular Neoplasm/ Suspicious for Follicular Neoplasm (FN/SFN)
This category included smears that were moderately or markedly cellular with significant alteration in follicular cell architecture, characterized by cell crowding, microfollicles, and dispersed isolated cells. Colloid was scant to absent.
Aspirates in this category were included as Follicular Neoplasm, Hurthle cell (Oncocytic) type if the aspirate consisted exclusively (or almost exclusively) of Hurthle cells having abundant finely granular cytoplasm, enlarged central or eccentrically located round nucleus, prominent nucleolus. Specimens were moderately to markedly cellular with usually little or no colloid and virtually no lymphocytes (excluding blood elements) or plasma cells.

Suspicious for Malignancy (SFM)
Cases showing cytomorphological features that raised a strong suspicion of malignancy (papillary thyroid carcinoma, medullary thyroid carcinoma, lymphoma or metastatic carcinoma) but the findings were not sufficient for a conclusive diagnosis were included in this category.

Malignant
This Bethesda category was applied whenever the cytomorphologic features were conclusive for malignancy. The malignancies included in this category were Papillary thyroid carcinoma, Medullary thyroid carcinoma, Anaplastic carcinoma, Poorly differentiated carcinoma, and lymphoma.
The results of both the pathologists were evaluated for interobserver variability by calculating the percentage of agreement, disagreement; and interobserver variability was statistically assessed using Cohen's kappa as a measure of concordance between the two observers beyond chance.

Results
A total of 610 patients were included in the study. The thyroid fine needle aspirates were categorized into the six categories of The Bethesda System for Reporting Thyroid Cytopathology by two independent pathologists.
Pathologist Benign category was the largest category followed by Nondiagnostic category. Benign follicular nodule was the predominant subcategory followed by chronic lymphocytic thyroiditis as reported by both pathologists. Papillary Thyroid Carcinoma was the most common malignancy reported by both the pathologists in our study ( Table 1).
The interobserver variability was statistically assessed using Cohen's kappa coefficient with observed value of 0.628; 62.8% (95% Confidence Interval 54.0-71.6) i.e. actual agreement between the two pathologists beyond chance was 62.8% (moderate) in our study.

Discussion
TBSRTC has been widely adopted in the United States and in many places worldwide and has been endorsed by the American Thyroid Association. 7,8 The distribution of cases into various TBSRTC categories by the two pathologists in our study were compared with other studies as shown in Table 3.
The frequency of Nondiagnostic interpretations varies notably from laboratory to laboratory (range, 3-34%). 7 The findings of our study (8.36%, 8.20%) are consistent with this range and are comparable with other studies. 4,9 As most thyroid nodules are benign, a benign result is the most common FNA interpretation (approximately 60-70% of all cases). 7 The cases in benign category in our study are higher than this range (85.90%, 83.61%) and also from study by Jo et al. 10 but are comparable with other studies. 2,4,9,11 Being the only tertiary care center of our province, it caters to a large number of patients on both direct and referral basis, so a large population representative of general population is encountered that could be a reason for higher number of cases in benign category.

AUS cases
and Follicular Neoplasm cases reported in our study are comparable with studies in Table 3.
Suspicious for Malignancy (SFM) diagnoses account for approximately 3% (range 1.0-6.3%) of all thyroid FNAs. As with any indeterminate diagnosis, this category should be used judiciously so that patients are managed as appropriately as possible. 7 The SFM cases in our study as reported by Pathologist I are lesser than the lower limit of the given range (0.16%) but are comparable with study by Laishram et al., 11 while are 0.98% i.e.~1% as reported by Pathologist  Mehra et al. 4 Bhat et al. 9 Jo et al. 10 Laishram  Table 3. The two pathologists were in agreement in 548 cases (89.84%) and disagreed in 62 cases (10.16%) which is consistent with other studies. [12][13][14][15] Cohen's kappa score of 0.628 is comparable to kappa score in a study by Awasthi et al. 12 In a study by Salillas et al., 14 the strength of agreement   was very good with a kappa statistic of 0.90. In a study by Pathak et al., 15 unweighted Cohen's kappa score for consultant and SR was 0.7517 (strong), between consultant and JR was 0.5907 (moderate).
Diagnostic agreement was highest in the Benign category (95.68%), followed by Malignant category (87.5%) in our study. In a study by Bhasin et al., 13 maximum degree of agreements were noted in nondiagnostic (100%), malignant (100%) and benign (93.87%) categories. In a study by Awasthi et al., 12 there was absolute agreement on the cases in ND/US and AUS/FLUS categories followed by 94.8% concordance in benign category. Lesser number of cases reported under ND/US category in above two studies could be reason for absolute agreement in ND category in their studies. In a study by Salillas et al., 14 the category under which maximum degree of agreement was noted was malignant (100%, k=0.61) followed by benign (k=0.60).
In our study, no agreement was seen in AUS (0%) category and SFM (0%) category between the two reporting pathologists. In a study by Bhasin et al., 13 maximum disagreement was noted in AUS/FLUS category that is consistent with our study. Of 7 discordant cases in study by Salillas et al., 14 3 were SFM and 2 AUS which is consistent with our study. In a study by Pathak et al., 15 poor interobserver agreement level (K=0.1301) was observed in the AUS/FLUS category which is consistent with our study. In a study by Kocjan et al., 16 there was poor agreement for Thy3a (κ = 0.11) and Thy4 (κ = 0.17) categories which are similar to AUS and SFM categories respectively of Bethesda system and comparable to our study.
Low reproducibility for AUS/FLUS has been reported, variability in criteria used for AUS/FLUS is responsible for significant interobserver and inter-institutional variation in making diagnoses. 15 The reproducibility of AUS/FLUS category is at best only fair. 7 Significant interobserver variability of thyroid FNA though well established has been reported to be smaller (although still significant) for "benign" and "malignant" categories. AUS/FLUS category is one with the most interobserver variability among the cytopathologists. Differing threshold levels in applying the diagnostic criteria as noted in one of the studies could be a reason for this variability in our study too. 17

Conclusion
The findings of our study are consistent with other published studies in literature. The systematic reporting according to TBSRTC has led to clear interpretation of the thyroid FNAC report. The use of this uniform terminology by two reporting pathologists revealed a moderate interobserver agreement that favours its use because of its relative ease of reproducibility. It also provides malignancy risk and management guideline for each category. Thus, the present study encourages the use of TBSRTC as a standardized reporting system in our institution and elsewhere for effective communication among pathologists and clinicians with regard to thyroid FNAC reporting and management.