Metagenomics of the midgut microbiome of Rhipicephalus microplus from China

Ticks, which are ectoparasites of animals, may carry multiple pathogens. The cattle tick Rhipicephalus microplus is an important bovine parasite in China. However, the midgut microbiome of R. microplus from China has not been characterized via metagenomic methods. Rhipicephalus microplus were collected from cattle in the city of Changsha in Hunan province, China. The DNA of the midgut contents was extracted from fully engorged adult female R. microplus. A DNA library was constructed and sequenced using an Illumina HiSeq sequencing platform. SOAPdenovo software was used to assemble and analyze the clean data. The latent class analysis algorithm applied to system classification by MEGAN software was used to annotate the information on the species’ sequences. DIAMOND software was used to compare unigenes with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, and functional annotation was carried out based on the results of the comparison. The dominant phyla in the five samples were Firmicutes, Proteobacteria, and Actinobacteria. Streptococcus, Mycobacterium, Anaplasma, Enterococcus, Shigella, Lactobacillus, Brachyspira, Pseudomonas, Enterobacter, Bacillus, and Lactococcus were the dominant genera in the five samples. The endosymbiotic bacterium Wolbachia was also detected in all of the samples. Mycobacterium malmesburyense, Streptococcus pneumoniae, Anaplasma phagocytophilum, Enterococcus faecium, Shigella sonnei, Enterococcus faecalis, Lactobacillus casei, Brachyspira hampsonii, Pseudomonas syringae, Enterobacter cloacae, and Lactococcus garvieae were the dominant species in the five samples. In addition to these bacterial species, we also detected some eukaryotes, such as Rhizophagus irregularis, Enterospora canceri, Smittium culicis, Zancudomyces culisetae, Trachipleistophora hominis, and viruses such as orf virus, human endogenous retrovirus type W, enzootic nasal tumor virus of goats, bovine retrovirus CH15, and galidia endogenous retrovirus in all of the samples at the species level. The results of the annotated KEGG pathway predictions for the gene functions of the midgut microflora of R. microplus indicated genes involved in lipid and amino acid metabolism, infectious diseases (e.g., Streptococcuspneumonia infection, human granulocytic anaplasmosis, Shigellasonnei infection, Salmonella enterica infection, and pathogenic Escherichia coli infection), and cancer. Our study revealed that the midgut microbiome of R. microplus is not only composed of a large number of bacteria, but that a portion also comprises eukaryotes and viruses. The data presented here enhance our understanding of this tick’s midgut microbiome and provide fundamental information for the control of ticks and tick-borne diseases.

diseases for which it is a vector, such as anaplasmosis and babesiosis, are often reported in Brazil, India, tropical and subtropical Asia, the Caribbean, Central and South America, China, and Mexico [1][2][3][4][5]. Rhipicephalus microplus is a single-host tick that infects cattle and buffaloes. Although R. microplus only infests cattle once at each stage of development (larva, nymph, and adult), it feeds on blood for several days at a time. This tick harms its hosts by biting and sucking blood, which lead to pruritus, emaciation, loss of fur quality, anemia, reduced milk production, and other clinical characteristics in infested animals [6].
Rhipicephalus microplus is not only a blood-sucking parasite but also the vector for a variety of pathogens [7,8]. It is considered to be the most important vector of bovine tick-borne diseases in global agroecosystems [9,10]. Among the pathogens that R. microplus can carry, we can cite Anaplasma phagocytophilum, Anaplasma marginale, Babesia bigemina, Babesia bovis, Ehrlichia chaffeensis, Ehrlichia canis, Ehrlichia sp. Tibet, Rickettsia spp., Borrelia spp., and severe fever with thrombocytopenia syndrome virus [11][12][13][14]. Furthermore, some researchers have detected a novel human pathogen, Anaplasma capra, in R. microplus [15]. Among the pathogens carried by this tick, B. bigemina, B. bovis, and A. marginale threaten the health of cattle and cause economic losses to the cattle industry [16,17]; A. phagocytophilum can cause human granulocytic anaplasmosis, which can be a serious threat to human health [18][19][20].
Previously, the identification of microorganisms generally depended on culturing. With the rapid development of sequencing technology, bacterial communities associated with entire ticks, the midgut, and the ovary of R. microplus have been studied using non-culture methods [9]. The dominant bacteria encountered in these studies of R. microplus were Wolbachia, Coxiella, and Borrelia burgdorferi [9]. Polymerase chain reaction (PCR) combined with denaturing gradient gel electrophoresis, established by Fisher and Lerman in 1983 [21], was first applied to the analysis of microbial populations in 1993 [22], and has been extensively used for the direct identification of the microflora of ticks [23][24][25]. The bacterial community of the midgut of R. microplus collected from cattle in Jiangxi and Hunan provinces of China was analyzed by PCR combined with denaturing gradient gel electrophoresis, and the dominant bacteria were found to be Rickettsia peacockii and Coxiella [25]. In recent years, new ideas and approaches for the study of the gut microbiome of ticks have been developed due to the rise of metagenomics [26,27].
Metagenomics can be used to identify new and emerging human pathogens circulating in tick vectors [20,28,29]. Adegoke et al. [30] analyzed the microbial composition of two tick species (Hyalomma anatolicum and R. microplus) in Pakistan using metagenomic sequencing. In the present study, metagenomics was used to analyze and determine the species of the midgut microbiome of five fully engorged adult female R. microplus collected from cattle in the city of Changsha in Hunan province, China. Furthermore, the Kyoto Encyclopedia of Genes and Genomes (KEGG) was used to predict the gene functions of the midgut microflora of R. microplus.

Sample collection and DNA extraction
Twenty fully engorged adult female R. microplus were collected from the body surfaces of cattle located in the city of Changsha in Hunan province (28º12′N, 112º59′E), China. All of the R. microplus samples were immediately transferred to Hunan Agricultural University. Five fully engorged adult female R. microplus were utilized for the analysis. Before dissection, the five ticks were surface disinfected with 70% (volume/volume) ethanol for 60 s followed by immersion in three reagents, 100% ethanol, 10% sodium hypochlorite solution, and distilled water, to remove the disinfectant. All of the dissecting apparatuses, plasticware, glassware, buffers (including phosphatebuffered saline), and solutions were sterilized by autoclaving and UV treatment. All of the procedures were conducted in a biosafety cabinet after UV sterilization to protect the samples from environmental contamination.
The five ticks were stabilized with fine-tipped forceps by holding the rear portion. The rear of each tick was cut using sterile ophthalmic scissors, and the midgut contents from each tick (100 μl) were pooled into a single tube. Then 1000 μl of hexadecyltrimethylammonium bromide lysate and 20 μl lysozyme were added to each tube, and the five tubes labeled as follows: R.M.1, R.M.2, R.M.3, R.M.4, and R.M.5. The five tubes were placed in a water bath at 65 °C for 2 h, during which time the tubes were inverted several times to ensure that the samples were fully lysed. After centrifugation (5022 g for 10 min), 950 μl supernatant of each sample was absorbed, and the same volume of phenol (pH 8.0):chloroform:isoamyl alcohol (25:24:1) mixture was added and mixed. After centrifugation (8609 g for 10 min), the supernatants were absorbed, and the same volume of chloroform:isoamyl alcohol (24:1) was added and mixed. After centrifugation (8609 g for 10 min), the supernatant of each sample was transferred to a 1.5-ml centrifuge tube. Then, a 3/4 volume of isopropanol was added to the supernatant of each sample; the tube was shaken and then the sample precipitated at − 20 °C. The sample tubes were centrifuged at 8609 g for 10 min, the liquid removed, and the sample washed twice with 1 ml of 75% (volume/volume) ethanol, after which the ethanol was removed. After that, the precipitate of each sample was allowed to dry naturally at room temperature, and then 50 μl double distilled H 2 O was added to dissolve the DNA. The sample was incubated at 60 °C for 10 min, and 1 μl RNase A was added to digest RNA. The sample was then incubated at 37 °C for 15 min.

Library construction and sequencing
A total of 1 μg DNA per sample was used as input material for the DNA sample preparation. Sequencing libraries were generated using a NEBNext Ultra DNA Library Prep Kit (New England Biolabs, Ipswich, MA) following the manufacturer's protocol. Briefly, the DNA samples were fragmented by sonication to a size of 350 base pairs (bp), after which the DNA fragments were end-polished, A-tailed, and ligated with the full-length adaptor for Illumina sequencing with further PCR amplification. Finally, the PCR products were purified by the AMPure XP system (Beckman Coulter, Brea, CA). The libraries were analyzed for size distribution using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA) and quantified using real-time PCR (Thermo Fisher Scientific, Waltham, MA). The clustering of the index-coded samples was performed on a cBot Cluster Generation System (Illumina, San Diego, CA) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina HiSeq platform (Illumina), and paired-end reads were generated.

Pretreatment of sequencing results
Preprocessing of the raw data obtained from the Illumina HiSeq platform (Illumina) using Readfq (v8; https:// github. com/ cjfie lds/ readfq) was conducted to acquire the clean data for subsequent analysis. The specific processing steps were as follows: removal of the reads that contained low-quality bases (default quality threshold value ≤ 38) above a certain portion (default length of 40 bp); removal of the reads in which the N base had reached a certain percentage (default length of 10 bp); removal of the reads that shared an overlap above a certain portion with the adapter (default length of 15 bp); the reads that were of host origin were also filtered by Bowtie 2.2.4 software (http:// bowti ebio. sourc eforge. net/ bowti e2/ index. shtml).

Metagenome assembly
The clean data were assembled and analyzed using SOAPdenovo software (v2.04; http:// soap. genom ics. org. cn/ soapd enovo. html) [31]. For a single sample, k-mer = 55 was selected for assembly to obtain scaffolds of the sample, with the following parameters: -d 1, -M 3, -R, -u, -F, -K 55 [32][33][34][35]. Then, the assembled scaffolds were broken from the N connections to obtain scaftigs without N [32,36,37]. All the clean data for the samples were compared to the respective scaftigs using Bowtie 2.2.4 software to acquire the paired-end reads not used. The parameters were as follows: -end-to-end, -sensitive, -I 200, -X 400 [32]. All reads not used in the forward step for all the samples were combined. Then, SOAPdenovo software (v2.04; http:// soap. genom ics. org. cn/ soapd enovo. html) was used for mixed assembly with the same parameters as for the single assembly. The mixed assembly scaffolds were broken from the N connection to obtain the scaftigs. Any fragments shorter than 500 bp in the scaftigs were filtered. The filtered scaftigs from the single or mixed assembly were statistically analyzed.

Gene prediction and abundance analysis
MetaGeneMark software (v2.10; http:// topaz. gatech. edu/ GeneM ark/) was used to predict the open reading frame of the scaftigs (≥ 500 bp) from the single and mixed assembly. Sequence information from the predicted results with lengths less than 100 nucleotides [32,[37][38][39][40] was filtered. CD-HIT software (v4.5.8; http:// www. bioin forma tics. org/ cd-hit) [41,42] was used to remove redundancy and obtain the unique initial gene catalogue (the parameter options were -c 0.95, -G 0, -aS 0.9, -g 1, -d 0 [39,43]). The longest sequences were selected as the representative sequences, and those sequences with 95% identity and 90% coverage were clustered. The clean data of each sample were mapped to an initial gene catalogue using Bowtie 2.2.4 (the parameter settings were -endto-end, -sensitive, -I 200, -X 400 [32,40]) to obtain the number of reads to which genes mapped in each sample. The genes for which the number of reads was ≤ 2 [40,44] were filtered in each sample to obtain the gene catalogue (unigenes). The abundance of information for each gene in each sample was calculated based on the number of mapped reads and the length of the gene [38,39,[45][46][47]. The basic informative statistics, core-pan gene analysis, correlation analysis of samples, and Venn diagram analysis of the numbers of genes were carried out based on the abundance of each gene in each sample of unigenes.

Species annotation
DIAMOND software (v0.9.9; https:// github. com/ bbuch fink/ diamo nd/) [48] was used to compare the unigenes with the sequences of bacteria, fungi, archaea, and viruses that were extracted from the nr database (v2018-01-02; https:// www. ncbi. nlm. nih. gov/) of the National Center for Biotechnology Information (NCBI) [the parameter settings were Basic Local Alignment Search Tool Program (BLASTP) E value ≤ 1e−5]. The previous alignment results were employed for the latent class analysis algorithm [49] that was applied to the classification from MEGAN software [50] to verify the species annotation information of the sequences. A table containing the number of genes and the abundance information for each sample in a taxonomic hierarchy (kingdom, phylum, class, order, family, genus, and species) was obtained based on the latent class analysis annotation results and the gene abundance table. The abundance of a species in one sample was equal to the sum of the gene abundances annotated for the species [38,45,51]; the number of genes of a species in a sample was equal to the number of genes with nonzero abundance. Krona analysis, the generation of relative abundance, and the construction of an abundance cluster heat map were carried out based on the abundance table of each taxonomic hierarchy.

Common functional database annotations
DIAMOND software (v0.9.9) was used to compare unigenes with the KEGG [52,53] database (v2018-01-01; http:// www. kegg. jp/ kegg/) with the parameter setting of BLASTP, E value ≤ 1e−5 [33,40]. For each sequence's alignment result, the best BLAST hit (one high-scoring segment pair > 60 bits) was used for subsequent analysis [33,40,54]. The relative abundances of different functional hierarchies were calculated (the relative abundance of each functional level was equal to the sum of the relative abundances of genes annotated at the functional level) [32,39]. The table of the number of genes of each sample in each taxonomic hierarchy was based on the results of the functional annotation and the table of gene abundances. The number of genes of a function in a sample was equal to the number of genes that was annotated for this function given that the abundance was nonzero. Based on the abundance table of each taxonomic hierarchy, the number of annotated genes was determined; the general relative abundance and the abundance cluster heat map were constructed, and the metabolic pathways were analyzed.

Test results of DNA quality
A total of 26,678.61 Mbp of clean data were generated by sequencing with the Illumina HiSeq platform. The   Table 1.

General statistics
Following quality control, 10  identical, and that relatively small differences existed between individuals.

Relative abundance of microorganisms
Sixteen phyla were common to the five samples. The microbial population characteristics of the 10 most abundant phyla in the five samples are shown in Fig. 2. Of these, Firmicutes, Proteobacteria, Actinobacteria, and Spirochaetes were the main phyla in all the samples. Firmicutes was predominant (≥ 25% of genes) in all samples. Ninety-five genera were common to the five samples. The relative abundances of these 95 genera are shown in Additional file 1: Table S1. Of these, Streptococcus, Mycobacterium, Anaplasma, Enterococcus, Shigella, Lactobacillus, Brachyspira, Pseudomonas, Enterobacter, Bacillus, and Lactococcus were the dominant genera in the five samples (Fig. 3). Five genera (Mycobacterium, Brachyspira, Campylobacter, Occidentia, and Neisseria) that were not reported in previous studies of R. microplus were thus, to the best of our knowledge, found for the first time in this species in the present study, despite their abundances being relatively low. Coxiella was not found in the midgut of R. microplus in the present study.

Cluster analysis of the number and relative abundance of annotated genes
The 35 most common genera and their abundance information for each sample were selected from the relative abundance tables at the different taxonomic levels to draw a heat map. The clustering was conducted at the species level to facilitate the result display and information discovery in order to identify the species clustering in the sample. The unigenes of Anaplasma were the most concentrated among all 35 genera in all of the samples, clustering into a single branch (Fig. 4a). The relative abundances of Anaplasma and Enterobacter were lowest in R.M.1, while the relative abundance of Enterobacter was high in R.M.3 (Fig. 4b).

Gene function prediction of microflora
Unigenes were compared with the KEGG database using DIAMOND software. Based on the alignment results, the relative abundance of different functional levels was counted. At level 1, the number of KEGG pathways annotated to metabolism-related functional genes in the microbiome of R. microplus was 72; the number of functional genes associated with human diseases was 145 (Fig. 5). At level 2, genes of the microbiome of R. microplus were associated with 11 metabolic processes, among which the functional genes involved in lipid metabolism were the most abundant, followed by those involved in amino acid metabolism (Fig. 6). Genes associated with 11 human diseases, including cancer and infectious diseases (such as Streptococcus pneumonia infection, human  (Fig. 7).

Discussion
In this study, the microbial population in the midgut of R. microplus was investigated using a metagenomic method. Several of the main phyla, including Firmicutes, Proteobacteria, and Actinobacteria, have been reported in previous studies [55][56][57][58][59]. The results obtained here were somewhat consistent with those of a previous study [27] in which Proteobacteria were present at a high relative abundance. To the best of our knowledge, Mycobacterium, Occidentia, Brachyspira, Campylobacter, and Neisseria are reported here for the first time in R. microplus. Coxiella was detected in the midgut of R. microplus in a previous study [59], but it was not detected in this study. Another relevant finding of our study was the presence of Ehrlichia, which has previously been confirmed to exist in the midgut of R. microplus [27].
In addition, we discovered that the gene functions of the midgut microflora of R. microplus are related to lipid and amino acid metabolism; this may be due to the fact that ticks, which mostly live in agricultural areas and forests, mainly feed on the blood of their hosts [60]. The functions of some of the genes of the midgut microbiota found in the present study are associated with human diseases, and some of the most abundant microbial species are associated with infectious diseases and cancer, Fig. 4 Number of genes and abundance clustering heat map at the genus level. a Heat map of the annotated unigene statistics. Sample names are given on the horizontal axis, and species' information on the vertical axis; different colors represent the number of unigenes. b Relative abundance clustering heat map at the genus level. Sample information is given on the horizontal axis; the vertical axis gives the species' information. The cluster tree on the left of the figure is the species cluster tree; the value corresponding to the middle heat map is the Z-value obtained after the relative abundance of each species row is standardized Zhang et al. Parasites & Vectors (2022) 15:48 which suggests that ticks may be infected with various pathogens. For example, A. phagocytophilum, a tickborne fever pathogen in ruminants [61], was detected in all of the samples. Anaplasma, which belongs to the family Anaplasmataceae and is transmitted by arthropod vectors, can cause severe anemia [62]. Anaplasma phagocytophilum is a zoonotic pathogen found in the granulocytes of animals and humans which can infect human peripheral blood neutrophils and lead to tick-borne diseases and symptoms of human granulocytic anaplasmosis, such as fever accompanied by leukopenia, thrombocytopenia, and functional impairment [63]. Domestic animals are important hosts of A. phagocytophilum [64]. In this study, A. phagocytophilum was detected in the R. microplus samples, which were collected from cattle in Hunan province. In addition, Ehrlichia minasensis, which was previously detected in the hemolymph of R. microplus from Brazil [65], was also detected in this study, at low abundance. Ehrlichia minasensis has also recently been identified from cattle on the French island of Corsica [66], and was also found in the serum of Brazilian dogs [67]. Occidentia, a newly identified genus of the family Rickettsiaceae, is a Gram-negative obligate intracellular bacillus [68] that is intimately associated with its arthropod hosts [69]. Occidentia was isolated from the rodent-associated soft tick Ornithodoros sonrai collected in Senegal [68]. A recent study showed that Occidentia also exists in the hard tick Africaniella transversal [70]. In the present study, Occidentia was detected in the midgut of the five R. microplus examined. These findings show that Occidentia infects ticks of the families Argasidae and Ixodidae. Among the detected species, the  endosymbionts Rickettsia and Wolbachia were also found, although at low abundances. Through their symbiosis with their host, they affect not only their host's ecology and evolution but also its reproductive development [71]. Wolbachia, which is present in 66% of insect species, is probably the most abundant endosymbiont on the planet [72]. The presence of the endosymbionts Rickettsia in host insects, and their extensive horizontal transmission, may have contributed to their widespread occurrence in natural populations of insects [71,73].
Brachyspira cause porcine intestinal spirochetosis, a condition in which diseased animals display chronic diarrhea, rectal bleeding, and lower abdominal cramps [74]. These pathogens do not cause serious disease in swine, but they do have an impact on humans. Some species of Brachyspira, such as Brachyspira pilosicoli and Brachyspira aalborgi, can cause similar symptoms in humans to those seen in swine [74]. Brachyspira have been isolated from the gastrointestinal tracts of mammals and birds, and from habitats contaminated with feces [75]. However, to date, there have been few reports of Brachyspirainfected cattle developing severe disease. In the present study, Brachyspira was detected in R. microplus at a moderately high relative abundance, which is a another reason why preventative measures should be taken to protect cattle from this tick.
The phylum Microsporidia comprises single-celled eukaryotic obligate intracellular parasites that can infect insects (e.g. Nosema apis and Nosema ceranae), fish, mammals, and even humans with immune deficiency diseases [76]. When microsporidia infect humans [77], they can cause diarrhea, myositis, keratitis, bronchitis, and encephalitis [78]. The microsporidian S. culicis is widely distributed and has been reported to infect Culicidae, Chironomidae, and Simuliidae [79]. In the present study, S. culicis was detected in the midgut of R. microplus, suggesting that this tick can carry this microsporidian, although further research is needed to determine whether R. microplus can transmit it.
Orf virus is a highly epitheliotropic parapoxvirus. It may not only cause great production losses in animal husbandry, but also affect human health [80,81]. The clinical symptoms of orf virus infection in animals are erythema, papules, and blisters on the lips and tongue, followed by severe ulceration and, finally, the formation of scabs [82]. The pathological features of orf virus infection in humans and animals are similar and are confined to the epidermis [83]. The main clinical manifestations in humans are skin lesions on the fingers and hands after contact with infected animals [84,85]. In addition to direct transmission, orf virus can also be transmitted by flies and ticks. The present study showed that orf virus exists in the midgut of R. microplus. Therefore, the eradication of R. microplus should be pursued in Hunan province, and orf virus monitored to ensure successful livestock farming and healthy animals.
HERV, which originate from exogenous retrovirus infections in germ cells that occurred millions of years ago, have the potential to cause human disease [86]. There are reports of increased expression of HERV-W in cases of schizophrenia and bipolar disorder, which is associated with distinct clinical or biological characteristics and symptoms [87][88][89]. There have been no previous reports of HERV-W in ticks. In this study, HERV were detected in the midgut of R. microplus, suggesting that R. microplus can carry these viruses.

Conclusions
In this study, we analyzed the midgut microbiome of fully engorged adult female R. microplus from cattle in the city of Changsha in Hunan province, China, using a metagenomic sequencing method. We found that 16 phyla, 95 genera, and 144 species were common to the five samples. The midgut microbiome of this species of tick was not only composed of a large number of bacteria, but also eukaryotes and viruses. These results add to our understanding of the midgut microbiome of R. microplus. The annotated KEGG pathway predictions of the functions of the genes of the midgut microflora of R. microplus indicated that they play a role in lipid and amino acid metabolism, infectious diseases, and cancer. These findings provide fundamental information on the physiology of ticks and their transmission of disease.