Important Dispersal Capacity of One Main Species Vector of Bluetongue and Schmallenberg Viruses Culicoides Obsoletus (Diptera: Ceratopogonidae) Revealed by Landscape Genetic Analyses 


 Background: In the last two decades, recurrent epizootics of bluetongue virus and Schmallenberg virus have been reported in the western Palearctic region. These viruses affect domestic cattle, sheep, goats and wild ruminants and are transmitted by native hematophagous midges of the genus Culicoides (Diptera: Ceratopogonidae). Culicoides dispersal is known to be stratified, i.e. due to a combination of dispersal processes occurring actively at short distances and passively or semi-actively at long distances, allowing individuals to jump hundreds of kilometers. Methods: Here, we aim to identify the environmental factors that promote or limit gene flows of Culicoides obsoletus, an abundant and widespread vector species in Europe using an innovative framework integrating spatial, population genetics and statistical approaches. A total of 348 individuals were sampled in 46 sites in France and have been genotyped using 11 newly-designed microsatellite markers. Results: We found a low genetic differentiation and population structure for C. obsoletus across the country. Using three complementary inter-individual genetic distances, we did not detect any significant isolation-by-distance, but a significant anisotropic isolation-by-distance on a North-South axis. We have employed a multiple regression approach on distance matrices to investigate the correlation between genetic and environmental distances. Among all the environmental factors that have been tested, only cattle density seems to have an impact on C. obsoletus gene flow. Conclusions: Such a high dispersal capacity of C. obsoletus over land territories calls for a re-evaluation of the impact of Culicoides on virus dispersal and highlights the urgent need to better integrate molecular, spatial and statistical information to guide vector-borne diseases control.

and domestic ruminants in western Palearctic region [5,6,7]. Culicoides obsoletus is a very widespread species able to breed in a wide range of habitats [8,9]. BTV spread is facilitated by favorable conditions for midge activity and viral replication and the environmental and climatic drivers of BTV transmission in Europe have been suggested [10,11,12,13,14]. For example, high ambient temperature reduce incubation period of the virus [15] whereas high precipitation and wind speeds can reduced the Culicoides ight activity [16,17,18].
Previous studies of mark-release-recapture on Culicoides species showed that the post-release dispersal distance travelled, during two nights, ranges from 1 to 2.5 km, and is linked to the gradual search for hosts or oviposition sites [19]. The maximum recapture travelled distance recorded was 6 km for Culicoides mohave in a particular desert landscape [20]. Yet, the potentially high dispersal capacity of Culicoides and high mortality of adults when manipulating represents a practical limitation in the context of a mark-release-recapture procedure. Dispersal phenomena driven by wind currents on a much larger scale have also been detected during outbreaks related to Culicoides-borne diseases [3,18,21,22,23,24,25,26]. Introduction of BTV serotypes by wind-borne infected Culicoides has been demonstrated from Northern Africa to Southern Europe [27], from Kenya to South-Western Indian Ocean Islands [28], from Sardinia to the Balearic Islands [25], from France to Corsica [29], in Ireland [30], or in UK [31]. Most of these studies are supported by modelling analyses in which dispersal trajectories have been evaluated by atmospheric dispersion models over water bodies [30,32,33,34,35,36]. Culicoides dispersal over land territories has been less investigated [37] and little attempt has been made to link it to environmental factors. Given this high dispersal capacity and the strati ed dispersal pattern of Culicoides, it is crucial to determine inland connectivity among populations and to identify the potential environmental factors that promote or limit gene ows between them.
Classic approaches to study the impact of landscape on gene ow generally use population-based sampling and are often based on a limited number of sampled populations [28,38]. However, Culicoides such as other dipterans species, are not spatially structured into separate populations and must be considered as a continuum of individuals heterogeneously distributed across landscape. In order to identify the environmental factors having an impact on the dispersal of Culicoides, an individual approach is more relevant as it avoids inter-population genetic distance misinterpretations [39]. Indeed, the individual approach in landscape genetics aims at maximizing the number of sampling sites, and thus brings a much greater statistical power to detect spatial patterns of genetic differentiation and the environmental factors that cause them [40]. In addition to the analysis of inter-individual genetic distances, population genetic structure can also be investigated, e.g. using gold-standard Bayesian clustering methods. However, if isolation-by-distance occurs, these clustering analyses could overestimate the actual number of genetic clusters [41]. It is therefore important to also incorporate a visualization approach into the analytical work ow [42]. MAPI allow to visually compare genetic dissimilarity with some environmental factors but also to develop working hypotheses. On the other side, it is also necessary to use as a complement statistical analyses of genetic and environmental distances.
The present study here aims at determining population inland connectivity at large geographical scales of a main vector species and to identify the environmental factors that promote or limit gene ows between Culicoides populations in France. For this purpose, we characterized 13 microsatellite markers dedicated to C. obsoletus. We are proposing a complementary framework integrating multiple approaches which can be applied more generally to the study of gene ow and links with environmental factors.

Methods
Culicoides obsoletus sampling and morphological identi cation The data set analysed in the present study is composed of 368 individual females biting midges collected in 46 sites in France in April 2011, using the national surveillance network for Culicoides populations or local complementary collections (Table S1). Collections were made overnight using Onderstepoort Veterinary Institute (OVI) light traps set up at farms near cattle or sheep. All insects were stored in 70% ethanol. Morphological identi cation to species level has been carried out in the same way as described in a previous study [43] under a binocular microscope using the available identi cation keys [44].
DNA extraction, ampli cation, sequencing and sequence analyses A total of 368 individuals belonging to the C. obsoletus species complex were DNA-extracted, COI ampli ed and sequenced following the procedure detailed in Mignotte et al. (2020). To ensure that only C. obsoletus individuals were included in the nal data set [45], sequences assignation was run according to Mignotte et al. (2020). The reference COI sequences used to make a speci c assignment of all individuals are available in Table S2. All COI sequences were then aligned using the MUSCLE algorithm [46] implemented in the software package GENEIOUS v.6.0.5 (Biomatters, http://www.geneious.com). In order to estimate unbiased gene ows of C. obsoletus, only individuals identi ed as such were used for further analyses.

Development of microsatellite markers
Microsatellite markers were de ned by next-generation sequencing following a similar protocol to the one  (Table S4) and elongation for 1 min at 72 °C, followed by a nal elongation of 5 min at 72 °C. Fragments were separated on an Applied Biosystems 3500xL Genetic Analyzer. Allelic size allocations for all individuals and microsatellite markers were performed using the program GeneMapper® version 5 (Life Technologies, Applied Biosystem) with double blind reading to limit the potential interpretation bias from the reader.
A rst validation of the polymorphism of these 30 markers was carried out by amplifying and genotyping 48 individuals from various locations within the species geographic range. Then 13 loci showing good genetic pro les and clear allelic size variability were characterized as polymorphic markers and were selected for further analyses (Table S4).
The 368 C. obsoletus previously extracted were then ampli ed and genotyped at these 13 microsatellite loci. Genetic variability parameters, such as the number of alleles per locus, the allele size range, the observed and expected heterozygosity were estimated per locus, and per population over the entire dataset using the R packages ggplot [47], poppr [48] and mmod [49]. Short allele dominance could be a source of heterozygote de ciency in a microsatellites data set [50,51]. In order to avoid such bias, Spearman's correlations rank between the size of the alleles and their frequencies were calculated in R [50]. Null allele frequencies were estimated using the R package PopGenReport [52], and a Bonferroni's correction was applied to all matched tests to take into account potential biases associated with multiple comparisons [53]. Correlation between F IS and null allele was calculated in R, and locus with signi cative correlation were removed. The presence of an imbalance in genetic binding between each pair of loci has been tested by accurate Fisher tests performed with the R package poppr [48]. The same R package was used along with the pegas R package to identify a potential gap in the panmictic matching regime. For this purpose, we performed a chi-square tests to compare the observed heterozygosity de cits (Ho) to the expected heterozygosity (He) under the Hardy-Weinberg (HW) equilibrium with a signi cance threshold of 5%. We also calculated the xation index (F IS ) [54].

Genetic clustering analyses
In order to determine the most likely number of genetic clusters (K) and the probability of assignment of each individual to these clusters, we used the Bayesian clustering methods implemented in the program STRUCTURE (version 2.3.4) [55] and GENELAND (version 4.5.0) [56]. The algorithm implemented in STRUCTURE infers genetic clusters that minimize deviations from Hardy-Weinberg equilibrium. In STRUCTURE, we speci ed the following settings: correlated allele frequency, an admixture model, and a "locprior" model. These options respectively allow to: assume allele frequencies are similar among populations, estimate for each individual the belonging proportion to each K clusters, and use sampling locations as prior information to assist the clustering, which is particularly adapted when there is a weak genetic structuring [57]. We performed 10 independent runs for each value of K varying from 1 to 10. Each run consisted in a burn-in of 100,000 iterations in the Markov Monte Carlo chain (MCMC) and a stationary phase of 1,000,000 iterations in the MCMC. The most probable number of clusters was inferred with the ∆K method (Evanno et al. 2005), via the STRUCTURE Harvester web platform [58]. This ∆K value is a way to determine the in ection in the [Ln P(D)] curve. The clustering resulting from the Bayesian inference was transposed into percentages of assignment of each individual to the K inferred clusters and plotted on a map.
GENELAND also uses a Bayesian algorithm to infer population genetic clusters while taking into account the spatial position of individuals, making it a spatially-explicit clustering method. The most probable number of clusters was also determined by running the algorithm with K ranging from 1 to 10. The analysis was based on 1,000,000 MCMC iterations with a thinning of 1000, maximum rate of the Poisson process xed to 100, maximum number of nuclei in the Poisson-Voronoi tessellation xed to 300 and a burn-in of 100. We used the R package graphics to produce a distribution map of genetic structure resulting of STRUCTURE and GENELAND analyses.

Computation of genetic distances and mapping genetic dissimilarity
We computed three complementary inter-individual genetic distances. The rst inter-individual genetic distance is based on a Factorial Correspondence Analysis (FCA) performed with the program GENETIX v. 4.05.2 [59]. The rst ten FCA axes were used to calculate an Euclidean distance between all individuals using the R package Ecodist [60], hereafter called the "FCA" distance [61]. The second inter-individual genetic distance is the Rousset's distance (a R ), [62], corresponding to the F ST /(1 − F ST ) but for pairs of individuals. We used the SPAGeDi 1.5 program [63] to compute a R genetic distance. The third interindividual genetic metric is the Loiselle's kinship coe cient ("LKC", [64]), which, unlike FCA and a R that are measures of genetic dissimilarity, is an index of similarity [65]. We used the R package elds [66] to investigate isolation-by-distance by performing linear regressions and Mantel tests between interindividual genetic distances and log-transformed geographic distances with R package elds [66]. Mantel tests [67] were performed with the R package vegan [68]. Inter-individual genetic distances were mapped using the program MAPI [42]. The method implemented in MAPI allows to visualize spatial variation in genetic dissimilarity. It can also de ne areas where genetic dissimilarity is signi cantly lower or higher than expected by chance [42]. MAPI analyses were based on 1,000 permutations and an α value of 0.05.

Impact of environmental factors on genetic differentiation
We investigated the potential impact of several environmental factors on the genetic differentiation of C. obsoletus. In practice, we tested the association between matrices of pairwise genetic distances (see above) and matrices of environmental distances. These environmental distances were computed using the program CIRCUITSCAPE 4.0.5 [69,70] and were based on distinct environmental rasters (i.e. geo-referenced grids of environmental values) either treated as potential resistance (R) or conductance (C) factors: i. host densities: European index of distribution of roe deer (C) [71], European index of distribution of red deer (C) [71], global cattle, sheep and goat distributions in 2010 (C) [72]. ii. terrestrial habitat: mean global elevation in 2010 (R) [73] and type of landscape cover (urban area, grassland, forest areas and croplands) (R). The four distinct land cover rasters (resolution: ∼1000 m) were generated from the Corine Land Cover 2012 raster (CLC; www.eea.europa.eu; resolution: ∼100 m) (Fig. S1).
iii. meteorological and climate data: mean surface temperature (R), mean precipitation (R), and mean wind speed and direction (C). We used two alternative rasters for each variable: April 2011 mean and 2000-2010 mean. Surface temperatures were obtained from monthly day and night land surface temperature mean [74]. Precipitation and wind rasters were obtained from monthly means of precipitation, wind speed and the U and V components of wind (components of the horizontal wind towards the east and North) from the European Centre for Medium-range Weather Forecasts (Era5 Interim, http://www.ecmwf.int) over 10 years (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010). Wind U and V components were used to compute prevailing meteorological wind direction (in degrees). Map and arrowheads indicating the mean wind direction over the 10 years at each raster pixel were generated with the R package Rwind (Fig. S2) [75].
A so-called "null raster" was also created and used as a negative control, i.e. a raster on which environmental distances computations correspond to a proxy of geographical distances. The null raster is a uniform raster with all cell values equal to 1. As in Dellicour et al. (2019), three values of k (10, 100 and 1000) were used to modify the potential impact of the environment on the resistance/conductance value. We used multiple regressions on distance matrices (MRDM) coupled with commonality analyses (CA; [76]) to identify unique and common contributions of predictors to the variance in the environmental distance (response variable). All the analyses were carried out with the R packages ecodist [60] and yhat [77]. In addition, we also performed univariate analyses by comparing (i) the determination coe cient R² obtained from the linear regression of genetic distances on distances computed on the environmental raster and (ii) the determination coe cient R² obtained from the linear regression of genetic distances on distances computed on the null raster. Only the environmental factors associated with a R² higher than that the one obtained from the linear regression based on environmental distances computed on the null raster were selected for the multivariate analyses (MRDM-CA) [78]. The same criterions to identify a suppressors were used for the multivariate analysis as in the commonality analyses descriptions [79,80]. These suppressors allow to select environmental variables that are more explanatory than the geographical distance alone. An environmental variable is considered to be a suppressor if regression coe cient and correlation, or when the unique contribution and the common contribution are of opposite sign [79].

Anisotropic isolation-by-distance
In order to identify directional gene ows, we must calculate the projected distance between two sampling sites. The angle between sites was calculated using the R package geosphere [80] and transformed from degrees to radians. The set of angles between 0 and 360 were then tested as angles that maximize isolation-by-distance. The great-circle geographic distance between sampling sites was computed with the R package elds and then log-transformed [66]. The projected distance matrix was calculated for each angle between all sampling sites, using the formula below. In calculation of the projected geographical distance as a function of the angle between populations, d AB is the geographical distance between population A and B and a AB is the angle between populations A and B.
A linear regression of the genetic distances on the projected geographical distances obtained for each angle was then performed. The angle that will maximize the R² of this regression with a positive regression coe cient was considered as the angle maximizing the isolation-by-distance signal [81]. In order to con rm this analysis with a similar but complementary approach, we evaluated the direction that maximizes the correlation between geographic and genetic distance using the PASSAGE program and a "Bearing" analysis. The work ow summarizing the genetic analyses carried-out is described in Figure S3.

Characterization of microsatellite markers and genetic diversity
A total of 368 COI sequences of C. obsoletus were obtained after the DNA barcoding step (Table S3) and 368 C. obsoletus from 46 populations were genotyped with 13 microsatellite loci (Table S4). A lter has also been applied to exclude missing data. Individuals with more than 10% missing data were excluded from the analysis, i.e. 20 individuals. Loci OBSms25 and OBSms26 had the highest proportion of null alleles with 56% and 22% respectively, but also the highest F IS values of 0.691 and 0.357 respectively (Table S5). The expected heterozygosity, which varies from 0.918 (OBSms5) to 0.404 (OBSms29) is always higher than that observed, varying from 0.872 (OBSms13) to 0.197 (OBSms25), re ecting a heterozygous de cit (Table S5). To ensure that all markers provide independent information, linkage disequilibrium between each pair of loci was tested. After Bonferroni correction, no signi cant linkage disequilibrium was observed between all loci. The OBSms25 locus was also the only locus with more than 5% missing data. Thus, only markers with a null percentage inferior to 10% were kept for the nal analysis. In view of the high proportion of null alleles and the strong correlation between F IS and the rate of null alleles (r = 0.947, p-value > 0.001), the OBSms25 and OBSms26 loci were excluded from the analysis. No signi cant correlation between allele size and allele frequency was found for any of the 11 remaining loci (Table S5). The genetic resolution of microsatellite markers depends on the number of markers and their polymorphism. All selected 11 loci were therefore polymorphic (Table S5) and the number of alleles present at a given locus varied from 40 for OBSms5 to 9 for OBSms29. No signi cant isolation-by-distance was detected with the different inter-individual genetic distances, a R (R²<0.001, p = 0.439, Fig. S4), LKC (R²<0.001, p = 0.992, Fig. S4), and FCA (R 2 < 0.001, p = 0.361, Fig. S4).

Clustering analyses and mapping of genetic distances
The Bayesian analysis performed with STRUCTURE revealed a low genetic structuring of C. obsoletus. Meanwhile, the exact number of genetic clusters was not clearly de ned, the optimal value of K was 2 ( Fig. S5); i.e. the co-existence of two genetic clusters was the most likely statistically (Fig. 1). The assignment of individuals to the different clusters revealed a weak genetic structuring of the populations.
Although the assignment of a total of 284 out of the 348 Culicoides individuals to a genetic cluster is well supported (Q > 0. 90), no clear association between spatial clustering and distribution can be observed, i.e. there is no association between Culicoides sampling location and the genetic clusters distributions. However, a West-East transect seems to emerge as samples collected on this transect were more frequently assigned to cluster 1, contrary to the North-South transect more represented by cluster 2. Genetic clusters inferred by GENELAND and STRUCTURE (K = 2) are consistent with these results, both clusters being found at all the sampling sites ( Fig. 1 ab). Results for the second optimal K value (K = 4) of the STRUCTURE program gave a similar view, reinforcing the conclusion that populations are genetically weakly structured (Fig. S6). Additional STRUCTURE analysis has been carried out with only loci OBSms 4, 11 and 28 which are at Hardy Weinberg's equilibrium and no additional structure is detected using only this loci (data not shown). Genetic dissimilarity result from MAPI analyses with the a R genetic distance showed a non-homogeneous pattern of dissimilarity. The maximum inter-individual genetic dissimilarity was of 0.182 and the lowest was of 0.002. The South of France was the most genetically dissimilar region. The center of France was also a zone of strong inter-individual dissimilarity, while eastern France showed low genetic dissimilarity. Another area spanning from the Paris region to Alsace (North east of France) showed low genetic dissimilarity (Fig. 1 cd). MAPI analyses with FCA inter-individual genetic distance, showed a homogeneous genetic dissimilarity. However, almost none of the areas was signi cantly genetically dissimilar in MAPI analysis with the three genetic distances used.  (Table S6). Similarly, a R inter-individual genetic distances were signi cantly associated with distances computed from elevation for k = 100 (Q = 0.00018, p-value = 0.006) and for k = 1000 (Q = 0.00038, p-value = 0.007), with grassland for k = 100 (Q = 0.001, p-value = 0.001) and for k = 1000 (Q = 0.002, p-value = 0.001) and with urban areas for k = 100 (Q = 0.004, p-value = 0.001) and for k = 1000 (Q = 0.007, p-value = 0.001) rasters (Table S6). However, no signi cant association was found between "LKC" genetic distance and the environmental distances (Q values always < 0).
Multivariate regressions revealed a signi cant positive association of FCA inter-individual genetic distance with cattle distributions for k = 1000 (r = 0.1733, p-value < 0.001) and with grassland for k = 1000 (r = 0.1080, p-value < 0.001) (Table S6). Also, multivariate regressions did not reveal any signi cant positive association of a R inter-individual genetic distances with environmental distances. All environmental factors were considered as suppressors when using a R to measure inter-individual genetic distances. Cattle distribution (k = 1000) with FCA inter-individual genetic distance was the only environmental factor that showed a unique contribution to the variance in the dependent variable higher than 1% (U = 0.02) (Table S6).

Anisotropic isolation-by-distance
Globally, correlation between the genetic distance and the distance along the bearing θ, d θ , changed as a function of bearing θ. Anisotropic isolation-by-distance analyses revealed a signi cant positive correlation between a R and FCA inter-individual genetic distances and orientational distances computed along North-South orientation and a signi cant negative correlation between genetic distances and orientational distances computed along a West-East orientation (Fig. 2a,b). For the analysis with LKC, which is a variable of genetic similarity, the North-South was the orientation that maximized the isolation-by-distance (Fig. 2c). Bearing analyses with PASSAGE software revealed a signi cant distance isolation on the North/South axis with the genetic distances a R and FCA (Fig. 3a,b). A negative correlation between LKC genetic similarity and geographical distance was observed on the North/South axis (Fig. 3c).

Discussion
This study presented for the rst time an extensive landscape genetic study carried out on a main vector species of interest for livestock diseases, C. obsoletus. Bayesian clustering analysis revealed a very weak genetic structure of C. obsoletus in France and a low level of inter-individual genetic differentiation was observed. We assessed the impact of a wide range of environmental factors on this pattern. Univariate and multivariate analyses highlighted an absence, or a weak impact of most of the tested environmental factors on inter-individual measures of pairwise genetic differentiation. No isolation-by-distance pattern was detected at the French scale with the set of inter-individual genetic distances used. However, an anisotropic isolation-by-distance analysis revealed a signi cant distance isolation on a North-South axis.

Important gene ows between C. obsoletus populations
The low overall inter-individual genetic dissimilarity highlighted by our study re ected the important level of gene ow for C. obsoletus and comes to reinforce what has been described previously for other Culicoides species. In particular, important level of gene ows have already been observed in France for C. imicola [29] and in Australia for C. brevitarsis [82]. Genetic studies on C. imicola in Europe revealed an important gene ow as re ected by the inference of two large genetic clusters: a "Central Mediterranean cluster" including Algeria, Sardinia, Corsica, Pyrénées-Orientales and Var French departments of France, and a "Western Mediterranean cluster" including Morocco, Spain, Portugal and Majorca [26]. In North America, similar results were also found in C. stellifer. No barriers to gene ow could be identi ed in this species in the southeast United States [83]. In addition, no isolation-by-distance has been observed at the French scale. Identifying a genetic structure pattern is a real challenge when inter-individual genetic dissimilarity is low. Although STRUCTURE performs well at low levels of population differentiation (0.02 < F ST < 0.10) by using prior population and correlated allele frequency models [84], when the differentiation is weaker, such as in the case of highly dispersed organisms, the latter may encounter di culties [85]. This underlines the importance of using complementary tools of visualization approaches like MAPI. In addition, the use of an individual approach avoids to arti cially consider individuals grouped into populations solely due to the effect of the trapping sites [2,86]. Although our method of analysis seems complementary and coherent to detect a genetic structure, C. obsoletus is not genetically or geographically structured at the French scale.
Livestock densities as potential drivers of C. obsoletus gene ow It is undeniable that large gene ows homogenize the genetic diversity of Culicoides. This can be explained by active dispersal and host-seeking movements, as suggested by our results. While the unique contribution of all environmental factors tested was very small, the one of cattle density is the strongest we have detected. Host density as a conductance factor for Culicoides is obviously a factor consistent with the biology of the species. These results are in line with those obtained in landscape genetics studies of the BTV, which had identi ed distributions of cattle and sheep as a key factors in BTV dispersal [13]. In addition, previous studies showed that dairy cattle density is negatively correlated with BTV spread. Although paradoxical with the previous conclusions, this could be explained by the fact that dairy cattle are clustered around the milking parlour and move little, forming a xed feed source, limiting the spread of Culicoides. On the contrary, beef cattle disperse much more in the pastures and by this phenomenon could encourage the diffusion of Culicoides in search of a source of blood [87]. In view of these converging results, of the marked preference of certain species of Culicoides vector for cattle [88], and of BTV emergence or reemergence events in cattle in the Netherlands in 2006 and in France in 2015 [89,90], it seems crucial to closely monitor the surveillance of Culicoides in the vicinity of beef cattle farms [88].
In addition, it has been shown that BTV spread is facilitated at low elevation, up to 300 m [38,87].
Altitude, an environmental factor tested in this study, failed to explain the inter-individual genetic differentiation of C. obsoletus. The altitude therefore does not act as a barrier between sampling sites located at relatively low altitude. It would thus be interesting to integrate in future studies sampling sites with a high altitude above 1,000 m, which is possible for C. obsoletus highlighting a large altitudinal range [91]. However, it is possible that the altitude does not directly impact the dispersal activity of Culicoides, but only the replication or viral infection of BTV due to low temperature. Culicoides obsoletus is extremely generalist and can take a blood meal on a wide range of hosts [92]. The study of phylogenetically close but ecologically very different species of Culicoides, such as C. chiopterus [93], might then allow the identi cation of very different dispersal patterns and an importance of bovine density, the latter being exclusively dependent on it for egg-laying [94].

Long-dispersal of C. obsoletus
The most genetically dissimilar individuals are mainly located in the southernmost populations of the sampling area. Multiple non-exclusive lines of argument might explain the signi cant anisotropic isolation-by-distance observed on the North/South axis in France.
First, the signi cant anisotropic isolation-by-distance could be explained by the wind dispersal.
Dispersion phenomena caused by wind currents are already established for Culicoides, mainly over the seas [3,18,21,22,23,24,25,29]. The re ecting dispersal events can be described as passive and active because recapture beam has been carried out downwind and upwind of the prevailing wind direction [19].
The map of the average wind direction in France over the last 10 years shows potential answers. It can be seen that the southernmost sampling sites (with the most dissimilar individuals) are in an area where the wind direction is different from the rest of France. It should be noted that the diffusion of the BTV, and thus the dispersion of the Culicoides, has already been associated with the wind direction. For example, 2% of BTV infections occurred at distances greater than 31 km [95,96] during the 2006 epizootic. The South of France was initially sampled, however a poor conservation of Culicoides DNA did not allow us to perform barcoding and microsatellite genotyping of these populations. The study of this geographical area could thus complete this study and potentially identify a stronger genetic structure. This will make it possible to decide whether this premise of differentiation observed in this study is due to an older phylogeographic structure, a different wind dispersion in this geographical area, or a random pattern. It is also questionable whether the very low indications of genetic structure could be due to differences in genetic diversity between the south and the north. A lower genetic diversity in the north could then be a sign of an expansion process that would contrast with the interpretation of a large-scale dispersion [97,98]. Indeed the spatial expansion of populations is generally accompanied by a gradients of reduction in the genetic diversity of the population during the expansion process, caused by serial founder events, creating a genetic bottleneck through a founding effect [99,100]. Population sampling at the same scale could allow comparison and estimation of genetic diversity.
Second, anisotropic isolation-by-distance may be due to an artifact due to sampling methods. Indeed, if the extent of the sampling varies depending on the directions, the distances projected from the angles may represent different distance distributions and lead to the over-representation of strong genetic metrics value in the direction of the scatterplot and the hit of positive correlation signals. Moreover, the absence of correlation does not necessarily mean more gene ow but an absence of isolation-by-distance which can also result from a strong drift and thus less gene ow, a drift which depends on both dispersion and population sizes. It is therefore essential to use as a complement, as we have done here and come up with a similar result, an approach that weights the geographical distances between populations according to their orientation with respect to a given angle axis passing through the barycenter, as implemented in PASSAGE.

Consideration for future work
Our results underline the importance of methodological development of wind dispersal models of Culicoides on land and not only over water. However, studies of landscape genetics remain indispensable and complementary in order to improve the accuracy of predictive models for Culicoides dispersal over land through integration of meteorological, landscape and activity-based parameters previously tested and validated in the landscape genetics work ow. In addition, population sampling of the same species at European level, i.e. a very large proportion of the known range of the species, would be necessary to observe a more marked structuring at European level and to estimate more precisely the gene ow. For example, C. imicola in Europe is structured in two large genetic clusters, "Europe central cluster" and "Europe western cluster" [26]. Moreover, the question of the phylogeographic history of C. obsoletus is still very little explored in view of its geographical distribution throughout Europe and North America. On the contrary, the study of the phylogeography of C. imicola, the main Afrotropical vector, has shown its distribution range out from the northern part of sub-Saharan Africa to the Mediterranean basin [38]. This type of study would make it possible to estimate the effective size of the populations, a key factor in the dispersal of C. obsoletus. In future study, the use of High-throughput sequencing approaches using markers such as ddRadseq (Double-digest restriction-site associated DNA sequencing) can provide greater resolution in view of the large number of SNPs revealed (single nucleotide polymorphism) at a local scale and improve our understanding of the active and passive dispersal of Culicoides. It could also be relevant to look at on scale with ner genetic information. This could include more microsatellite markers or SNP to improve genetic resolution and observe the matching and assignment of each individual.

Conclusions
For the rst time, this study provided the rst complete landscape genetic analysis of C. obsoletus, a major vector species of animal viruses in Europe. This study showed that the large scale dispersion is able to homogenize the genetic structure of populations at the scale of a country. Our results demonstrate, for C. obsoletus, a very important inland dispersal and vectorization capacity, which has to be considered for further works on vector competence and epidemiological modeling of disease transmission. Wind direction could be a key factor in the dispersal of many vector insect species. Futures studies should increase their geographical extent to cover the entire species distribution area and better apprehend the limits of Culicoides gene ow. In addition to this biological knowledge, the study highlights several important areas for methodological improvement that may currently limit the inclusion of wind direction in landscape genetic analyses.

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its additional les. The newly generated sequences were submitted in the GenBank database under the accession numbers MT828832-MT828844. All cox1 sequences are available on request.

Competing interests
The authors declare that they have no competing interests.    Bearing analysis: correlation between genetic (aR, LKC, and FCA) and geographical distances as a function of the angle between sampling sites. Circles and crosses indicate signi cant and non-signi cant values, respectively. Abbreviations "E", "W", "S" and "N" refer to "East", "West", "South" and "North", respectively.