Echinococcus granulosus (sensu stricto) (G1, G3) and E. ortleppi (G5) in Pakistan: phylogeny, genetic diversity and population structural analysis based on mitochondrial DNA

Background Cystic echinococcosis (CE) is a serious tapeworm infection caused by Echinococcus granulosus (sensu lato) which infects a wide range of animals and humans worldwide. Despite the millions of livestock heads reared in Pakistan, only a few reports on CE prevalence and even fewer on the genetic diversity are available for the country. Meanwhile, the available reports on the genetic diversity are predominantly based on short sequences of the cox1 gene. Methods To close this knowledge gap, this study was designed to investigate the genetic diversity and population structure of Echinococcus spp. in Pakistan using the complete mitochondrial cytochrome c oxidase subunit 1 (cox1) and NADH dehydrogenase subunit 1 (nad1) genes. Results Based on BLAST searches of the generated cox1 and nad1 gene sequences from a total of 60 hydatid cysts collected from cattle (n = 40) and buffalo (n = 20), 52 isolates were identified as E. granulosus (s.s.) (G1, G3) and 8 as E. ortleppi (G5). The detection of the G5 genotype represents the first in Pakistan. The phylogeny inferred by the Bayesian method using nucleotide sequences of cox1-nad1 further confirmed their identity. The diversity indices indicated a high haplotype diversity and a low nucleotide diversity. The negative values of Tajima’s D and Fu’s Fs test demonstrated deviation from neutrality suggesting a recent population expansion. Conclusions To the best of our knowledge, this report described the genetic variation of E. granulosus population for the first time in Pakistan using the complete cox1 and nad1 mitochondrial genes and confirms E. ortleppi as one of the causative agents of CE among livestock in Pakistan. While this report will contribute to baseline information for CE control, more studies considering species diversity and distribution in different hosts across unstudied regions of Pakistan are highly needed.


Background
Cystic echinococcosis (CE) is listed by the World Health Organization (WHO) as a neglected tropical disease having zoonotic connotations [1]. Echinococcus granulosus (sensu lato) is the etiological agent with a cosmopolitan distribution leading to heavy economic losses amounting for up to 3 billion USD per year [2] and human suffering [3]. Based on nucleotide sequence variation of the mitochondrial DNA, eight different genotypes and E. felidis within the E. granulosus complex have been recognized to infect different hosts [4]. Echinococcus granulosus (sensu stricto) (G1/G3), E. ortleppi (G5) and E. canadensis complex have the highest significance concerning Echinococcus infection in livestock population across the globe [1,5]. These species require two mammalian hosts to complete their life-cycle via a definitive canine host and an intermediate host (domestic or wild ungulate), while humans are considered to be accidental or dead-end hosts [6].
CE is endemic throughout the world including countries in Africa, Europe, South America, the Middle East and Central Asia [7,8]. Pakistan is an agriculturedependent country and the livestock sector is an important pillar of the national economy contributing 11.2% to the national GDP [9]. According to the latest livestock census, there are 47.8 million heads of cattle and 40 million heads of buffalo in the country. In the last four decades, there has been no comprehensive data on genetic characterization and molecular diversity of CE in Pakistan. From 1980 to 2015, only three molecular investigations on Echinococcus spp. have been reported in animals and were based on partial cox1 gene sequences in addition to their limited geographical areas [10].
Given the high population of livestock, scarcity of genetic data and endemic status of CE in the adjoining countries, this study was designed to provide a better insight into the circulating genotypes and to describe the genetic population structure and diversity of Echinococcus spp. in animals in Punjab and Khyber Pakhtunkhwa provinces of Pakistan using the full-length nad1 and cox1 genes.

Study areas
Pakistan is located in South Asia bordered by Afghanistan, China, India, and Iran to the west, northeast, east, and southwest, respectively. Hydatid cyst sampling was conducted in Lahore, Faisalabad, and Peshawar (Fig. 1). Lahore is the provincial capital of Punjab and one of the most populous cities in Pakistan. Faisalabad is the third-most-populous city in Pakistan and the secondlargest in the eastern province of Punjab. Historically, it is one of the first planned cities within British India and often regarded as the Manchester of Pakistan because of massive industrialization. Peshawar is the capital and largest city of the Khyber Pakhtunkhwa Province.

Parasite material
Hydatid cysts were collected during September 2019 from abattoirs under the jurisdiction of municipal corporations of Lahore, Faisalabad and Peshawar districts. Before sampling, the stationed veterinary officers were informed about the nature of the study. Inspection of the carcasses was performed, and a total of 60 hydatid cysts were collected. All cysts were of cattle (n = 40) and buffalo (n = 20) origin. Unfortunately, no cysts were found in sheep and goats slaughtered during this period. The cysts were transported to the Department of Clinical Medicine and Surgery, University of Agriculture, Faisalabad, Pakistan, under refrigerated conditions for further processing.

DNA extraction, amplification and sequencing of isolates
Hydatid cysts were thoroughly cleaned with 75% ethanol. Germinal layers were washed with phosphate-buffered saline (PBS). Moreover, protoscoleces were repeatedly washed in PBS as described previously [11] and stored until use. DNA extraction was carried out on cut pieces of germinal layers and protoscoleces using the Qiagen Blood and Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Amplification of the complete mitochondrial complete cox1 gene (1608 bp) using forward primer (5′-ATT ATA GAA AAT TTT CGT TTT ACA CGC-3′) and reverse primer (5′-AAG CAT GAT GCA AAA GGC AAA TAA ACC-3′) and complete nad1 gene (894 bp) using forward primer (5′-ATT ATA GAA AAT TTT CGT TTT ACA CGC-3′) and reverse primer (5′-ATT CAC AAT TTA CTA TAT CAA AGT AAC C-3′) [12] was carried out for all isolates. Positive and negative controls were also used. The PCR conditions were carried out as previously described [13] in a final volume of 25 μl consisting of 5 μl of 5× Taq buffer, 10 pmol of each primer, 0.2 mM dNTPs, 0.5 μl of Ex Taq DNA polymerase (5 U/μl; TaKaRa, Kusatsu, Japan), 2 mM MgCl 2 , 0.5 μl of genomic DNA extract (≥ 20 ng) and RNAse free water to make up the final volume. Amplicons were visualized in a 1.5% (w/v) agarose gel stained with GelRed TM (Biotium, Fremont, USA). Five microliters of the amplicon were used for this purpose while the rest was used for sequencing in an ABI3730Xl DNA Analyser (Beijing Tsingke Biotechnology Co., Ltd., Beijing, China).

Molecular analysis
DNA sequence alignment and manual examination for misread nucleotide bases were conducted in Unipro UGENE v1.32.0 software [14] while the identity of each isolate was confirmed with their nucleotide sequences using the NCBI BLAST program (https ://blast .ncbi.nlm. nih.gov/Blast .cgi).

Phylogenetic analyses
The phylogenetic relationships between haplotypes were inferred by Bayesian method based on the nad1-cox1 dataset using MrBayes v.3.1.1 [19]. Markov Chain Monte Carlo (MCMC) sampling was used to assess the posterior distribution of parameters with a chain length of 5,000,000 states with parameters logged at every 1000 states and 25% discarded as 'burn-in' . TreeView v.1.6.6 (http://taxon omy.zoolo gy.gla.ac.uk/rod/treev iew.html) was used to display the tree.

Results
Amplification of the nad1 and cox1 gene yielded PCR products of approximately 1400 bp and 2000 bp, respectively. Nucleotide sequences of all 60 isolates analyzed in this study were aligned with reference sequences of each genotype within E. granulosus (s.l.) retrieved from Gen-Bank. A total of 3 genotypes of Echinococcus were found: G1 (n = 14); G3 (n = 38); and G5 (n = 8). Comparative data on organ location, cysts fertility and genotypes of E. granulosus and E. ortleppi isolates are shown in Table 1.
In the median-joining network for nad1 gene sequences, haplotype PAK-H4 had a central position and comprised 32.69% (17/52) of the total population with not more than 4 mutational differences from the other haplotypes (Fig. 2). For the cox1 gene, the haplotype PAK-H3 was at the centre of the network, constituting 21.15% (11/52) of the total population with up to 14 mutational differences from the other haplotypes (Fig. 3). Of the 34 haplotypes based on the concatenated sequences of both genes, PAK-H18 was at the centre of the network and constituted 9.61% (5/52) of the total population with a maximum mutational difference of 16 from other haplotypes (Fig. 4). The central haplotype from all three networks comprised isolates from both hosts but did not constitute the majority of the population. The observed nucleotide polymorphism between haplotypes resulted in an amino acid change (see Additional file 1: Tables S1, S2). Concerning E. ortleppi, sequence analysis revealed a total of 17 mutation sites (nad1 = 5 and cox1 = 12) with 11 parsimony informative sites. Out of the 8 samples, 2 and 5 haplotypes were found for nad1 and cox1 genes, respectively, while 6 distinct haplotypes were observed for the nad1-cox1 (2502 bp) concatenated sequences dataset.
Representative nad1 and cox1 haplotype sequences from this study were deposited in the GenBank database under the accession numbers MN886252-MN886293.

Population indices
The nucleotide diversity and neutrality indices for the entire E. granulosus (s.s.) and E. ortleppi populations were calculated based on the sequences of nad1, cox1, and nad1-cox1 genes. For E. granulosus (s.s.), a low nucleotide (π) and high haplotype diversity (Hd) was observed for nad1 and cox1 genes in both cattle and buffalo. Overall, Hd and π were as follows: nad1 (Hd = 0.834, π = 0.00293) and cox1 (Hd = 0.925, π = 0.00223) while nad1-cox1 π and Hd were 0.00248 and 0.972, respectively. Fu's Fs was negative and insignificant (P > 0.05) for the nad1 gene. In contrast, it was significant (P < 0.05) for cox1 and the concatenated nad1-cox1 sequences. Tajima's D were negative and insignificant for the entire population (Table 2).

Phylogenetic analysis
The resulting sequences of the different genotypes/ species with those retrieved from GenBank were used to construct a phylogenic tree. The Bayesian phylogeny based on a dataset of the concatenated nad1-cox1 sequences placed all the Pakistani E. granulosus (s.s.) and E. ortleppi (G5) isolates in the same clusters with the respective reference genotype sequences from Gen-Bank (Fig. 5). The Bayesian phylogenetic inference confirmed their genotype/species status as distant from other Echinococcus species.

Discussion
Mitochondrial DNA has played an extensive role in investigating intraspecific variations and other population genetic studies because of its conserved structure, maternal inheritance, high mutation rate, absence of recombination, and high evolutionary rate [20][21][22]. In this study, we report the genetic diversity of E. granulosus (G1/G3) and E. ortleppi (G5) isolates from buffalo and cattle collected from different slaughterhouses in two provinces of Pakistan based on the complete sequences of two mitochondrial genes, cox1 and nad1. Previously, in Pakistan, the genetic variation and diversity of Echinococcus spp. have mostly been assessed based on partial gene sequences [10,[23][24][25]. Based on the complete gene sequences, we found the G3 genotype as the most prevalent genotype (38/60) infecting the Pakistani bovine population followed by G1 (14/60) and G5 (8/60). The results are concordant with a previous study conducted in a neighboring country (India) where the G3 (71.8%) was also found to be the most common genotype [26]. Conversely, in China [27], Iran [28], Turkey [29] and Brazil [30] the G1 genotype is reportedly the predominant genotype with 95.74%, 87.5%, 66% and 77.4% prevalence, respectively.
As far as we know, available data on the genetic diversity of Echinococcus in Pakistan has been predominantly based on partial sequences of the cox1 gene which limits a thorough comparison of the genotypic diversity scenario in Pakistan with its adjoining countries. The length of the gene under study has been reported to potentially affect the outcomes of genetic diversity investigations [31,32]. For example, using a fragment of the cox1 gene in a study conducted on 223 East European and 89 Italian isolates, 24 and 7 haplotypes, respectively were observed [33], while in another study on 69 Argentine isolates 7  [34]. Similarly, an investigation conducted on 112 isolates from the Sindh Province of Pakistan based on partial cox1 gene sequence resulted in 5 haplotypes [24]. In the present study, we found 34 haplotypes among the investigated E. granulosus (s.s.) population, based on the concatenated genes. The analysis of both mitochondrial genes (nad1 and cox1) revealed a considerable high level of genetic variation. Higher haplotype diversity and nucleotide diversity were recorded for cox1 and nad1 gene, respectively of cattle isolates as compared to buffalo isolates. Overall, the diversity and neutrality indices based on the concatenation of both genes were higher in cattle. The radial population structure from the median-joining network observed in the present study differs from the star-like population structure reported in Argentina [34], China [35,36] and Tunisia [37] where the G1 genotype was found to be preponderant, whereas it was similar to the population structure of E. granulosus (s.s.) recently reported in Pakistan where the G3 genotype was found to be preponderant [38].
The G5 genotype has been reported in humans in Argentina, Brazil, China, France, India, Mexico and the Netherlands [31,[39][40][41] indicating that this is an important genotype in terms of public health significance. In this study, to the best of our knowledge, we confirm for the first time, the presence of the 'cattle' strain (G5) of the E. granulosus (s.l.) complex in central Pakistan. This report is also in line with previous observations regarding the prevalence of E. ortleppi (G5) in a few Asian countries. For example, the G5 genotype has been reported in cattle from India, Iran, Bhutan, and Nepal [26,[42][43][44] and recently in a human from China [41]. Host distribution and prevalence of the G5 genotype are highly variable across the world. Echinococcus ortleppi (G5) infecting cattle has been reported from Brazil [45], Uruguay [46], France [39], Italy [5], Sudan [47], Ethiopia [48] and Kenya [49]. Out of eight isolates of G5 collected in this study, five were of lung origin which is in line with a molecular survey conducted in Brazil, which had a prevalence of 43.4% for the G5 genotype in cattle, with most of the isolates found in the lungs [45]. Overall, the genetic diversity and neutrality indices based on the mitochondrial genes (cox1 and nad1) were relatively similar to the genetic diversity of E. ortleppi in eastern and southern Africa [50].
The neutrality test based on both genes showed negative insignificant values of Tajima's D, indicating an In the present study, high haplotype diversity suggests a considerable amount of genetic differentiation between the haplotypes as seen in the median-joining network. These results, i.e. the combination of the low nucleotide and high haplotype diversity, suggest a rapid population spread out from a small population size, as described in a previous study [40].
Phylogenetic analysis of the concatenated gene sequences also confirmed the identity of the isolates from buffalo and cattle as E. granulosus (G1/G3) and E. ortleppi (G5), respectively as they clustered closely with respective Echinococcus species (G1-G10) sequences retrieved from GenBank. The basal site of the tree was occupied by Neotropical species, E. vogeli and E. oligarthra. The tree also confirmed the sister relationships between E. canadensis and E. ortleppi, and between E. shiquicus and E. multilocularis as previously described [51].

Conclusions
The study demonstrates the preponderance of the G3 genotype among all encountered genotypes/species of E. granulosus (s.s.) (G1, G3) and E. ortleppi (G5) and emphasizes the important role of this genotype in the distribution of CE among livestock and possibly in human populations. The detection of the G5 genotype also raises some public health concerns due to the increasing reports of human infections with this genotype. Furthermore, analysis of the complete nad1 and cox1 gene sequences of E. granulosus (s.s.) population in Pakistan demonstrated a higher genetic variation than what was previously reported based on partial gene sequences.
Additional file 1: Table S1. Echinococcus granulosus (s.s.) mitochondrial cox1 gene nucleotide sequence polymorphism and corresponding amino acid changes among haplotypes from cattle and buffalo. Table S2. Echinococcus granulosus (s.s.) mitochondrial nad1 gene nucleotide sequence polymorphism and corresponding amino acid changes among haplotypes from cattle and buffalo.