Skip to main content

A novel optical sensor system for the automatic classification of mosquitoes by genus and sex with high levels of accuracy

Abstract

Background

Every year, more than 700,000 people die from vector-borne diseases, mainly transmitted by mosquitoes. Vector surveillance plays a major role in the control of these diseases and requires accurate and rapid taxonomical identification. New approaches to mosquito surveillance include the use of acoustic and optical sensors in combination with machine learning techniques to provide an automatic classification of mosquitoes based on their flight characteristics, including wingbeat frequency. The development and application of these methods could enable the remote monitoring of mosquito populations in the field, which could lead to significant improvements in vector surveillance.

Methods

A novel optical sensor prototype coupled to a commercial mosquito trap was tested in laboratory conditions for the automatic classification of mosquitoes by genus and sex. Recordings of > 4300 laboratory-reared mosquitoes of Aedes and Culex genera were made using the sensor. The chosen genera include mosquito species that have a major impact on public health in many parts of the world. Five features were extracted from each recording to form balanced datasets and used for the training and evaluation of five different machine learning algorithms to achieve the best model for mosquito classification.

Results

The best accuracy results achieved using machine learning were: 94.2% for genus classification, 99.4% for sex classification of Aedes, and 100% for sex classification of Culex. The best algorithms and features were deep neural network with spectrogram for genus classification and gradient boosting with Mel Frequency Cepstrum Coefficients among others for sex classification of either genus.

Conclusions

To our knowledge, this is the first time that a sensor coupled to a standard mosquito suction trap has provided automatic classification of mosquito genus and sex with high accuracy using a large number of unique samples with class balance. This system represents an improvement of the state of the art in mosquito surveillance and encourages future use of the sensor for remote, real-time characterization of mosquito populations.

Graphical abstract

Background

Approximately 80% of the world’s human population lives with the risk of one or more vector-borne diseases (VBDs), and every year > 700,000 people die as a result [1]. In an increasingly connected world, travel and trade contribute to the spread of VBDs. Furthermore, a global warming scenario may lead to more favourable conditions for the survival and life cycle completion of the vectors [2] and may affect their abundance and distribution [3]. Mosquitoes (Diptera: Culicidae), particularly those belonging to Aedes, Anopheles and Culex genera, are one of the deadliest vectors worldwide. Mosquito species can transmit diseases such as malaria, dengue, yellow fever, West Nile fever, Zika, chikungunya and others [4]. According to World Health Organization directives [5] and European Centre for Disease Prevention and Control guidelines [6, 7], appropriate surveillance methods and indicators are needed to: determine the composition and monitor changes in mosquito populations, identify the presence of new invasive species, monitor mosquito-borne diseases, quantify the transmission potential of vectors and enable the design of accurate control programmes.

A range of insect trap types and methods are used in regular monitoring and surveillance of immature and/or adult mosquito populations [8]. Although immature stage monitoring can be easier to set up, it is not useful for estimating adult abundance due to the lack of correlation between egg, larval and pupal density indices and adult indices [9]. Studies show that the seasonal variation in mosquito abundance is better represented by adult trap monitoring than by other indices (e.g. House Index) based on immature stages [10]. Therefore, adult mosquito surveillance is generally the most widely applicable and accurate solution, especially for VBD risk assessment [11]. Many adult mosquito monitoring systems rely on traps using light, chemical attractants or CO2 as a bait. Most traps include a suction fan to draw approaching insects into a catch bag within the trap, and such types have been successfully used in many studies [12,13,14]. However, they require the catch bag to be periodically collected in the field, followed by a time-consuming process of identification of the collected specimens by entomologists. The time delay between insect trapping and analysis may limit the correct characterization of the temporal dynamics of mosquito populations. Such delays may also result in degradation of the insects in the catch bag because of desiccation or predation. New approaches to entomological surveillance include novel optical sensors to sense the characteristics of flying mosquitoes and analysis methods including machine learning methods to enable classification of mosquitoes in near real-time [15,16,17,18,19], which is crucial for surveillance programs.

Since the 1940s, microphones have been used to sense the audible flight tones emitted by flying mosquitoes, which may be associated with a particular mosquito genus, species or sex [20]. Acoustic methods are still employed today in applications such as sound traps, which emit species- and sex-specific sound frequencies to attract mosquitoes [21], and in classification systems such as those in which citizen scientists use their mobile phones to record mosquitoes [22, 23]. However, it is hard to obtain acceptable quality audio recordings of free-flying insects in the field because of the presence of background noise [18]. To address this, optical methods have been employed in which a light source is used to illuminate the flying insect and a light sensor is used to detect the light reflected and scattered, or attenuated, by the insect in flight [24,25,26,27,28,29,30]. The use of optical methods in this field began in 1955 when a photoelectric cell was used to detect the light modulation produced by a flying insect crossing its field of view [31]. In recent years, several optoelectronic sensors have been developed and used in conjunction with machine learning techniques to classify flying mosquitoes, with promising levels of accuracy [16, 17, 32,33,34].

Variables known to condition mosquito wingbeat fundamental frequency or its detection include taxonomy, sex, parity status, size, age, environmental temperature or wind speed [35,36,37,38,39]. Historically, wingbeat frequency has been used as the only predictor variable for mosquito classification, but it appears insufficient on its own to differentiate between mosquito species, especially those of the same genus [18]. This could limit field applications, where different mosquito species can coexist, with the possibility of overlap in wingbeat frequency distributions [40]. Efforts have been made in recent years to improve classification methods to distinguish among mosquito species, sex and even parity status [16, 17, 35]. In some cases, more advanced optical approaches have been used, for example to determine insect body and wing depolarization ratio, to improve the accuracy of classification [17].

In addition to the selection of the proper predictor variables and machine learning algorithms, the use of metadata such as the climatic conditions, the spatiotemporal localization and other ecological features accompanying mosquito captures may also be relevant for remote mosquito classification in the field [18, 33], since different mosquito species have different behaviour and ecological needs (e.g. geographical distribution, climatic range, circadian rhythm, and peaks of activity). According to new paradigms of remote mosquito surveillance, wingbeat sensor information and metadata could be sent wirelessly in real time to a server using Internet of Things (IoT) technology [41,42,43] with the potential to improve entomological surveillance.

Currently, there is only one commercial optical sensor product available for the remote monitoring of mosquito populations [41]. It is called the BG-Counter (Biogents, Germany), which according to the company, can distinguish mosquitoes from other insects and count mosquitoes. However, the sensor does not provide information about mosquito genus, species, sex or other attributes.

In this study, we present the results of a prototype optical sensor, which is coupled to the entrance of a commercial mosquito trap. The trap is of a type widely used for mosquito surveillance in the field and contains a suction fan. The fan causes the mosquitoes to pass through the sensor more quickly and with a more perturbed wingbeat compared to free flight conditions as described in another work [39]. For the present work, 4335 flights from mosquitoes of Aedes and Culex genera were recorded using the sensor. The three species for the study, Aedes albopictus, Aedes aegypti and Culex pipiens, were chosen because they are major vectors of arboviruses, have a significant impact on public health and are a focus of vector surveillance and control programs in many parts of the world. A set of features were extracted from each recording and used to train a series of machine learning algorithms to determine which combination of feature and algorithm gave the best performance in classifying mosquitoes by genus and sex. Whilst the scope of this work is limited to the classification of genus (Aedes/Culex) and sex (female/male), the inclusion of the two Aedes species in this study improves the genetic variability and permits future work on species classification using the data set from the present work.

Methods

Mosquito rearing conditions

As stated, three species of mosquitoes, from two genera, were used to generate the dataset:

  1. i.

    Aedes albopictus, population of Sant Cugat del Vallès (2005), Barcelona, Spain (41.4667°, 2.0833°).

  2. ii.

    Aedes aegypti, population of Paea (1994), Tahiti, French Polynesia (− 17.6889°, − 149.5869°).

  3. iii.

    Culex pipiens, population of Gavà (2012), Barcelona, Spain (41.3000°, 2.0167°).

The mosquito populations were all reared under controlled environmental conditions in a climatic chamber at a temperature of 28 °C and a relative humidity of 80%, with a light:dark photoperiod of 12:12 h, except for Cx. pipiens (with a light:dark photoperiod of 11:11 h plus 1 h of dusk and 1 h of dawn). Culex pipiens and Ae. albopictus were reared in a biosafety level 2 (BSL2) laboratory and Ae. aegypti in a biosafety level 3 (BSL3) laboratory at IRTA-CReSA facilities. Larvae were maintained in plastic trays with 750 ml of dechlorinated tap water (renewed three times per week) and were fed with fish pellets (Goldfish Sticks-TETRA, Melle, Germany) ad libitum. Pupae, upon appearance, were immediately placed in insect cages (BugDorm-1 Insect Rearing Cage W30 × D30 × H30 cm, MegaView Science, Talchung, Taiwan). After metamorphosis, adults were fed with sucrose solution (10%) ad libitum. Females were not fed with blood to avoid any body size or flight variation. For Aedes females, the sucrose solution was removed 24 h before the sensor tests. For Cx. pipiens females, this was done 48 h before to improve their affinity for the attractant used in the trap.

Sensor and trap description

The prototype sensor was designed and produced by Irideon SL (Barcelona, Spain) and was coupled to the entrance of a commercial BG-Mosquitaire suction trap from Biogents AG (Regensburg, Germany), as shown in Fig. 1a.

Fig. 1
figure 1

a Prototype sensor (top) fitted to a BG-Mosquitaire trap (bottom). b Side view diagram of sensor and trap to illustrate operation. The exterior of the sensor unit (1) is formed by an inlet tube with a diameter of approximately 100 mm (2), sensor housing (3) and outlet tube (4). The housing contains an optical emitter (5), which projects collimated beams of light through the transparent flight tube (6) and onto an optical receiver (7) to create a sensing zone (8) within the flight tube. The trap (9) contains a suction fan (10), a removable catch bag (11) made of textile mesh and a perforated lid (12). The fan produces a flow of air downward through the inlet tube, flight tube and catch bag and upward through the perforated lid as indicated by the blue arrows. An insect (13) which flies close to the entrance of the inlet tube may then be sucked downwards through the sensing zone where it will be recorded and then trapped in the catch bag. As the mosquito passes through the sensing zone it casts a shadow upon the optical receiver according to the so-called optical extinction mode of operation. As the insect flaps its wings within the sensing zone, the light falling on the optical receiver is modulated, giving rise to changes in the amplitude in the recorded waveform

The trap coupled to the sensor was placed in an insect-rearing cage (BugDorm-4S4590 W47.5 × D47.5 × H93.0 cm, MegaView Science, Talchung, Taiwan). The trap was fitted with a sachet of BG-Sweetscent chemical attractant from Biogents AG. The air flow generated by the fan was approximately 3 m/s in the downward direction. When a mosquito flies close to the entrance funnel of the sensor, it may be sucked in by the fan, detected by the sensor and then trapped in the catch bag inside the body of the trap, as shown in Fig. 1b.

The sensor contains an optical emitter panel and an optical receiver panel, which face each other through a transparent flight tube with a diameter of 105 mm. The optical emitter comprises a two-dimensional (2D) array of 940-nm wavelength infrared light-emitting diodes (LEDs), and the optical receiver comprises a 2D array of 940-nm photodiodes. The optical sensor has an active length of 70 mm in the downward direction. These elements are also shown in Fig. 1b.

The output of the optical receiver is amplified and acquired by an analog to the digital converter (ADC) with a sampling frequency of 9603 samples per second. When a mosquito enters the sensing volume, it automatically triggers a recording of up to 1024 samples, i.e. of up to 107 ms duration. The duration of a typical mosquito flight is around 50 ms. The sensor automatically adds a timestamp to each recording, along with the measured ambient temperature.

Data acquisition process

Mosquitoes from Aedes and Culex genera were anesthetized with carbon dioxide 48 and 72 h respectively before each experiment. They were separated into groups by species (Cx. pipiens, Ae. albopictus and Ae. aegypti) and sex (male, female).

Culex pipiens and Ae. albopictus were introduced into the insect rearing cage in batches of 20 individuals to reduce the chance of multiple mosquitoes passing through the sensor simultaneously. Batches of ten individuals were used for Ae. aegypti because of their greater affinity to the attractant. All mosquitoes were introduced at a distance of 20 to 30 cm from the entrance of the sensor to ensure that they could fly freely until they approached it and were sucked in to approximate field conditions.

Each recording corresponds to a different mosquito, i.e. trapped mosquitoes were not re-used to generate more recordings. Wingbeat files were tagged with species and sex class by the operator. After each experiment, the wingbeat recordings were downloaded from the sensor and processed using a Python script to produce playable and viewable audio files, as depicted in Fig. 2a. Wingbeat recordings were examined manually, and those deemed to be invalid, such as recordings containing more than one mosquito or where a mosquito may have hit the wall of the flight tube, were excluded from the dataset. The excluded recordings represented 2.3% of the data.

Fig. 2
figure 2

a Example of a recorded mosquito flight with ADC sample number (0 to 1023) on the x-axis and amplitude on the y-axis, scaled to a range of [− 1, 1], which equates to the full-scale range of the ADC. A high pass filter in the optical receiver attenuates frequencies < 300 Hz to remove electronic offsets and low-frequency noise, which also attenuates the signal due to the body of the insect. Baseline correction has been applied by subtracting the average value of the recording from each data point in the recording. b Power spectral density (PSD) plot of a typical mosquito flight. The wingbeat fundamental peak is labelled as f1. The fundamental frequency is indicated by the vertical arrow and the fundamental peak power by the horizontal arrow. The various peaks to the right of f1 are harmonics of f1, i.e. at frequencies of 2*f1, 3*f1, etc. The power density has units of (units2/Hz) on a logarithmic (dB) scale. A level of 0 dB/Hz corresponds to a white noise signal time domain signal with a power density of 1.0 unit2/Hz. The fundamental peak power density levels in this study are typically < − 40 dB/Hz, i.e. < 1 × 10–4 units2/Hz. The noise floor of the system, i.e. with sensor active but with no insect in the sensing zone, is < − 85 dB/Hz from 0 to 300 Hz and < − 90 dB/Hz from 300 Hz

The resulting dataset contained 4335 wingbeat recordings, comprising 2472 of Aedes genus (882 Ae. aegypti and 1590 Ae. albopictus) and 1863 of Culex genus (all Cx. pipiens). There were 1211 Aedes females, 1261 Aedes males, 964 Culex females and 899 Culex males. Females were in an age range of 2 to 16 days old and males were in an age range of 2 to 9 days old. These age ranges provide a representative variety in the dataset.

All recordings took place with the sensor and trap located in the laboratory facilities of IRTA-CReSA during daylight hours. The average ambient temperature measured was 25.8 (standard deviation = 1.2 °C).

Feature extraction

The following five features were extracted from each wingbeat recording via the application of digital signal processing methods:

  • The power spectral density (PSD) shows the power of the signal at different frequencies. It is calculated using Welch’s method [44], in which the wingbeat recording is divided into several overlapping segments. A windowing function is applied to each of the segments and a series of periodograms is obtained by calculating the power spectrum of each windowed segment. Finally, the periodograms are averaged to give the PSD [45]. A PSD plot of a typical mosquito recording is shown in Fig. 2b.

  • Wingbeat fundamental frequency in Hertz (Hz) is determined from the PSD as shown in Fig. 2b using a peak search method. The wingbeat fundamental frequency is the frequency at which a mosquito flaps its wings. It is characteristic of mosquito taxonomy and sex and varies depending on intrinsic variables of mosquito biology (size, age, parity status, mating behaviour) [16, 35, 36, 38] and environmental variables such as temperature [37]. The typical range of mosquito wingbeat fundamental frequencies is 300 to 900 Hz [40].

  • The fundamental peak power density (dB/Hz) (hereafter referred to as fundamental peak power) is also determined from the PSD as shown in Fig. 2b and represents the peak power density of the sensor output at the wingbeat fundamental frequency. It is equivalent to the intensity of the sound produced by a flying mosquito, typically ranging from 40 to 80 dB [46, 47].

  • The spectrogram is a series of spectra calculated from multiple overlapping segments of the wingbeat recording. Each spectrum is generated by applying a Fourier transform to the segment to provide information about the amplitude of the various frequency components in the segment. The spectrogram represents the variations of the frequency content of the signal over time rather than an average for the whole signal as given by the PSD [48].

  • Mel Frequency Cepstral Coefficients (MFCCs) are calculated by converting the frequencies of a spectrogram to the Mel scale and applying overlapping triangular filter banks before calculating the cepstrum by transforming the spectra to a logarithmic scale and then applying an inverse Fourier transform [49]. Please refer to Additional file 1: Text S1 and Fig. S1 for further details.

The PSDs have 257 values, generated using a window length of 512 samples. The spectrograms and MFCCs are obtained using nine segments of 512 samples; then, 16 Mel filter banks are applied to each spectrum to give a total of 144 values. All the MFCC coefficients are used.

Each individual feature and one combined feature (fundamental frequency and fundamental peak power) were used for the machine learning models.

A scatter plot of the wingbeat fundamental frequency and peak power features is shown in Fig. 3a for the entire dataset, in Fig. 3b for all Aedes samples and in Fig. 3c for all Culex samples. In Fig. 3a, which is coloured by genus, a high degree of overlap between the genera is observed. In Fig. 3b and c, which are coloured by sex, two clearly separated clusters are observed. The distributions of the two single-value features, fundamental frequency and fundamental peak power, for the three classifications are shown in Additional file 1: Fig. S2.

Fig. 3
figure 3

a Scatterplot of wingbeat fundamental frequency and peak power for the full dataset showing Aedes genus in red and Culex in blue. b Scatter plot of wingbeat fundamental frequency and peak power for Aedes genus showing females in red and males in blue. c Scatter plot of wingbeat fundamental frequency and peak power for Culex genus showing females in red and males in blue

Machine learning

The goal of the machine learning process was to compare the performance of five selected machine learning algorithms using the features described above, in classifying mosquito genus and sex. A labelled dataset consisting of the feature set was used to train, evaluate and compare the classification models. The following five machine learning algorithms were used: logistic regression (LR), gradient boosting (GB), random forests (RF), support vector machines (SVM) and a fully connected deep neural network (DNN). These algorithms were chosen because of their widespread usage and good performance [50]. A brief overview of each algorithm is given in Additional file 1: Text S2. Of these algorithms, the more complex ones, such as DNN or RF, were also used with the single-value features (fundamental frequency and fundamental peak power) because they can model non-linearities, unlike LR.

Three classification tasks were performed: one genus classification (Aedes/Culex) and two sex classifications (male/female), one for each genus (sex of Aedes, sex of Culex). The logic of the classification process is shown in Additional file 1: Fig. S3.

Balanced datasets, i.e. datasets that contained an equal number of samples in each class, were used to make an unbiased assessment. They were obtained by randomly under-sampling the classes which had a higher number of available samples.

Model performance was assessed using the accuracy metric, which is calculated by dividing the number of correct predictions by the total number of predictions. The accuracy metric is a simple evaluation metric, which makes it easy to interpret, and is appropriate when using balanced datasets.

The typical machine learning process consists of training, validation and testing. In the training phase, the model is fitted to the data with different configurations of the algorithm determined by hyperparameters, which can have a significant impact on performance. In the validation phase, the performances of the models trained with the different configurations are compared and the best one is selected. The testing phase assesses how well the model generalizes on previously unused data. A schematic overview of the training, validation and testing approach employed in this work is shown in Additional file 1: Fig. S4.

Seventy-five percent of the recordings in each dataset were chosen randomly to create a training set for use in the training and validation phase. Training and validation were done using fourfold cross-validation, in which the training set is split into four parts of equal size and the model being optimized is trained on three of the four parts and validated on the fourth part. This process is done four times using a different part of the training set for the cross-validation in each iteration. The final cross-validation score was obtained by averaging the four cross-validation results. The model with the best cross-validation score was then selected for testing.

The remaining 25% of each dataset, i.e. that part which was not allocated to training and validation, was used to test the performance of the trained model. Since the data in the test set are completely new to the model, accuracy results for the test set are an indication of how well the model generalizes on new data, and good results cannot be attributed to overfitting of the model.

Error analysis consists of analysing the training and validation accuracies obtained during the training and validation phase. If the training accuracy is considerably higher than the validation accuracy, it indicates overfitting, so more samples could help to improve the model. If, on the other hand, training and validation accuracies have a similar low score, it indicates that the model is too simple and that more training data would probably not help. In this case, the model could possibly be improved by using a different algorithm which is able to learn more complex relationships or to use more features.

Programming was done in Python [51]. For model generation, scikit-learn [52], TensorFlow [53] and XGBoost [54] were used. Regarding execution times, training of the models took days to weeks, but once done, each new sample was classified in under 1 s.

Results

Genus classification

In the genus classification, mosquitoes were classified into Aedes and Culex genus. A total of 2688 samples were used comprising: 1344 Aedes (672 Ae. albopictus and 672 Ae. aegypti) and 1344 Culex (all Cx. pipiens) with an equal number of males and females for each species. The dataset was split 75%/25% into the training data set (2016 samples) and the test set (672 samples). The accuracy results for genus classification on the test set are shown in Table 1 with the best performing algorithm for each feature shown in bold. The best result for genus classification was obtained for the DNN algorithm trained on the spectrogram feature, with an accuracy of 94.2%.

Table 1 Accuracy results for genus classification with best results per feature indicated by a superscript letter

Sex classification of Aedes

In this classification, mosquitoes of the Aedes genus were classified into males and females. A total of 1344 samples were used, comprising 672 females and 672 males, with each sex group comprising 336 Ae. aegypti and 336 Ae. albopictus. The dataset was split 75%/25% into the training data set (1008 samples) and the test set (336 samples). The results for this classification on the test set are shown in Table 2. The best performing algorithms for sex classification of Aedes were logistic regression trained on spectrogram and MFCC, and gradient boosting trained on MFCC, with an accuracy of 99.4% in each case.

Table 2 Accuracy results for sex classification of Aedes with best results per feature indicated by a superscript letter

Sex classification of Culex

In this classification, mosquitoes of the Culex genus (all Cx. pipiens) were separated into males and females. A total of 1560 samples were used comprising 780 females and 780 males. The dataset was split 75%/25% into the training data set (1170 samples) and the test set (390 samples). The results for this classification on the test set are shown in Table 3. For Culex sex classification, an accuracy of 100% was achieved by all five algorithms trained on MFCC; by logistic regression, SVM and DNN trained on spectrogram; and by SVM trained on PSD.

Table 3 Accuracy results for sex classification of Culex with best results per feature indicated by a superscript letter

Summary of the best classification results

A summary of the classification results, which includes the best performing algorithms and features for each classification, is given in Table 4 in which training and validation accuracies are also listed, with an indication of how the results might be improved. The corresponding hyperparameters are listed in Additional file 1: Table S1.

Table 4 Summary of machine learning classification results

The best accuracy results were 94.2% for genus classification, 99.4% for sex classification of Aedes and 100% for sex classification of Culex.

For genus, the training accuracy was 100% and the cross-validation accuracy was significantly lower (95%), which indicates that the model overfits slightly and its performance could possibly be improved with more training samples.

For Aedes sex classification, although the best models gave a near perfect accuracy, the training accuracy and cross-validation accuracy are similar (99.5%), which indicates that the model could possibly be improved with a more complex algorithms and/or features rather than with more training samples. In case of Culex sex classification the accuracy was 100%, so no error analysis was necessary.

Discussion

In the present study, 4335 mosquito flights were recorded using a novel optical sensor. The sensor was attached to the entrance of a commercial mosquito suction trap inside an insect rearing cage, with mosquitoes flying freely within the cage until they were sucked in by the trap, through the sensor and into the catch bag within the trap. Each flight recording made by the sensor corresponded to a different mosquito. Five features were extracted from each recording and used with five different machine learning algorithms for classification of mosquito genus and sex.

One of the features used was wingbeat fundamental frequency, which has been used in many studies for insect characterization and classification [15, 16, 23, 25, 35, 38, 55]. Differences in reported values of wingbeat frequency between studies can be due to intrinsic and/or extrinsic variables such as size, parity status, age and ambient conditions [16, 34, 36, 42, 46]. In this study, the wingbeat fundamental frequency feature gave a high accuracy in sex classification in both Aedes (95.5%) and Culex (98%), but it scored lower (67.3%) in genus classification. These results are consistent with the fundamental frequency histograms in Additional file 1: Fig. S2, which show very little overlap between the distributions of males and females, especially for Culex (Additional file 1: Fig. S2c) and considerable overlap between genera (Additional file 1: Fig. S2a). In the fundamental peak power histograms of Additional file 1: Fig. S2b, c, a higher degree of overlap is observed between the distributions of males and females, especially for Culex, which helps explain why the accuracy for sex using this feature alone (89.5% for Aedes and 83.6% for Culex) was lower than that of fundamental frequency alone.

As other studies have indicated [16, 18, 39, 40], the use of the wingbeat frequency alone as a feature to differentiate between taxonomical classes or other attributes of mosquito biology can be challenging because of overlap in wingbeat frequency distributions. To address this, other authors have used additional features (i.e. depolarization ratio) [16] or metadata (i.e. localization, environmental variables and circadian rhythm) [18] in combination with fundamental frequency to improve their classification methods. In the present work, we have tested several features apart from or in combination with the fundamental frequency to better classify mosquito genus and sex.

The use of both fundamental frequency and fundamental peak power yielded better performance in sex and genus classification than fundamental frequency alone. Although the effect of signal intensity or power has been investigated in mosquito mating and courtship behavioural experiments [46, 47], to the best of our knowledge, fundamental peak power has not been used as a feature in mosquito classification studies. In other sensor systems, the reported signal intensity or power may depend on the position and orientation of the flying mosquito with respect to the sensor [56], whilst our optical setup was designed to measure wingbeat power relatively independently of the position and orientation of the mosquito within the sensing volume.

Despite the better results obtained in this work using the fundamental frequency and power features compared with fundamental frequency alone, the more complex spectrogram and MFCC features provided the best performance for genus and sex classification. MFCCs are normally used in applications such as speech recognition [57] or music information retrieval [58], and although MFCCs are based on human perception of pitch, they have given good results in sound recognition studies with mosquitoes and other insects [34, 49, 59, 60].

In this study, the best performing machine learning algorithm depended on the classification task. For genus classification, DNN showed the best performance, with an accuracy of 94.2%, trained on the spectrogram feature. In another work [33], DNN also gave the best performance for genus classification between Aedes and Culex. For sex classification, the best performing algorithms and features were LR with spectrogram or MFCC and GB with MFCC. Different machine learning algorithms were also compared for mosquito classification in a previous study [17], and it was concluded that the best algorithm for complex classification tasks was SVM. In our study, SVM had an accuracy of 93.4% for genus, although DNN, which was not studied in [17], performed slightly better (94.2%). The classification of mosquito genus achieved a high accuracy of 94.2% while the classification of sex achieved 99.4% and 100% for Aedes and Culex respectively. The training and validation accuracies indicate that genus classification could possibly be improved with more training samples.

Other studies have successfully achieved automatic classification of genus [25, 33] and sex [16, 24] using machine learning with relatively large datasets [34] and placing emphasis on class balance [17]. However, only a small number of sensor studies have been performed using a mosquito suction trap, either without an automatic classification system [39] or with only mosquito and non-mosquito counting and without differentiating mosquito genus and sex [61]. To our knowledge, we present the first sensor system for use with a commercial mosquito suction trap, which provides automatic classification of genus and sex with high performance, based on a large number of training samples, with class balance. Planned further work includes the study of species classification, study of age groups, training of models with more features and feature combinations, and testing of the system in the field.

Conclusions

In this work, we have presented the results of a novel sensor system for genus and sex classification of Aedes and Culex mosquitoes captured by a commercial suction trap in laboratory conditions. The obtained results are encouraging for the use of the sensor with standard suction traps in the field, for the remote surveillance and classification of genus and sex of Aedes and Culex mosquitoes.

Availability of data and materials

The datasets generated during and/or analysed during the current study are not publicly available due to the protection of intellectual property defined under the H2020 agreement no. 853758, but are available from the corresponding author on reasonable request.

Abbreviations

2D:

Two-dimensional

ADC:

Analog to digital

DNN:

Dense neural network

GB:

Gradient boosting

IoT:

Internet of things

LED:

Light-emitting diodes

LR:

Logistic regression

MFCC:

Mel Frequency Cepstral Coefficients

PSD:

Power spectral density

RF:

Random forest

SVM:

Support vector machines

VBD:

Vector-borne disease

References

  1. World Health Organization (WHO). Vector-borne diseases fact sheet. https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases. Accessed 20 May 2021.

  2. Rossati A. Global warming and its health impact. Int J Occup Environ Med. 2017;8:7–20.

    Article  PubMed  Google Scholar 

  3. Khasnis AA, Nettleman MD. Global warming and infectious disease. Arch Med Res. 2005;36:689–96.

    Article  PubMed  Google Scholar 

  4. World Health Organization (WHO). A global brief on vector-borne diseases. https://www.who.int. Accessed 20 May 2021.

  5. Schaffner F, Versteirt V, Medlock J. Guidelines for the surveillance of native mosquitoes in Europe. https://www.ecdc.europa.eu. Accessed 20 May 2021.

  6. Schaffner F, Bellini R, Petrić D, Scholte EJ, Zeller H, Marrama RL. Development of guidelines for the surveillance of invasive mosquitoes in Europe. Parasit Vectors. 2013;18:209.

    Article  Google Scholar 

  7. European Centre for Disease Prevention and Control. Field sampling methods for mosquitoes, sandflies, biting midges and ticks. https://www.ecdc.europa.eu. Accessed 20 May 2021.

  8. Romero-Vivas CME, Falconar AKI. Investigation of relationships between Aedes aegypti egg, larvae, pupae and adult density indices where their main breeding sites were located indoors. J Am Mosq Control Assoc. 2005;21:15–21.

    Article  PubMed  Google Scholar 

  9. Focks DA. A review of entomological sampling methods and indicators for dengue vectors. Special programme for research and training in tropical diseases. Florida: WHO; 2003.

    Google Scholar 

  10. Codeço CT, Lima AWS, Araújo SC, Lima JBP, Maciel-de-Freitas R, Honório NA, et al. Surveillance of Aedes aegypti: comparison of house index with four alternative traps. PLoS Negl Trop Dis. 2015;10:e0003475.

    Article  CAS  Google Scholar 

  11. Krökel U, Rose A, Eiras A, Geier M. New tools for surveillance of adult yellow fever mosquitoes: comparison of trap catches with human landing rates in an urban environment. J Am Mosq Control Assoc. 2006;22:229–38.

    Article  Google Scholar 

  12. Farajollahi A, Kesavaraju B, Price DC, Williams GM, Healy SP, Gaugler R, et al. Field efficacy of BG-Sentinel and industry-standard traps for Aedes albopictus (Diptera: Culicidae) and West Nile virus surveillance. J Med Entomol. 2009;46:919–25.

    Article  PubMed  Google Scholar 

  13. Lühken R, Pfitzner WP, Börstler J, Garms R, Huber K, Schork N, et al. Field evaluation of four widely used mosquito traps in Central Europe. Parasit Vectors. 2014;7:268.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Potamitis I. Classifying insects on the fly. Ecology. 2014;21:40–9.

    Google Scholar 

  15. Santos DAA, Rodrigues JJPC, Furtado V, Saleem K, Korotaev V. Automated electronic approaches for detecting disease vectors mosquitoes through the wing-beat frequency. J Clean Prod. 2019;217:767–75.

    Article  Google Scholar 

  16. Genoud AP, Basistyy R, Williams GM, Thomas BP. Optical remote sensing for monitoring flying mosquitoes, gender identification and discussion on species identification. Appl Phys B. 2018;124:46.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Genoud AP, Gao Y, Williams GM, Thomas BP. A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals. Ecology. 2020;58:e101090.

    Google Scholar 

  18. Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E. Flying insect classification with inexpensive sensors. J Insect Behav. 2014;27:657–77.

    Article  Google Scholar 

  19. Potamitis I, Rigakis I, Fysarakis K. Insect biometrics: optoacoustic signal processing and its applications to remote monitoring of McPhail type traps. PLoS ONE. 2015;10:e0140474.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Kahn MC, Celestin W, Offenhauser WH. The Sounds of Disease-Carrying Mosquitoes. Recording of sounds produced by certain disease-carrying mosquitoes. Science. 1945;101:335–6.

    Article  CAS  PubMed  Google Scholar 

  21. Johnson BJ, Rohde BB, Zeak N, Staunton KM, Prachar T, Ritchie SA. A low-cost, battery-powered acoustic trap for surveilling male Aedes aegypti during rear-and-release operations. PLoS ONE. 2018;13:e0201709.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Li Y, Zilli D, Chan H, Kiskin I, Sinka M, Roberts S, et al. Mosquito detection with low-cost smartphones: data acquisition for malaria research. NIPS Workshop on Machine Learning for the Developing World. 2017. arXiv:1711.06346v3.

  23. Mukundarajan H, Hol FJ, Castillo EA, Newby C, Prakash M. Using mobile phones as acoustic sensors for high-throughput surveillance of mosquito ecology. Elife. 2017;6:e27854.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Ouyang TH, Yang EC, Jiang JA, Lin T-T. Mosquito vector monitoring system based on optical wingbeat classification. Comput Electron Agric. 2015;118:47–55.

    Article  Google Scholar 

  25. Potamitis I, Rigakis I. Measuring the fundamental frequency and the harmonic properties of the wingbeat of a large number of mosquitoes in flight using 2D optoacoustic sensors. Appl Acoust. 2016;109:54–60.

    Article  Google Scholar 

  26. Brydegaard M. Towards quantitative optical cross sections in entomological laser radar—potential of temporal and spherical parameterizations for identifying atmospheric fauna. PLoS ONE. 2015;10:e0135231.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Kirkeby C, Wellenreuther M, Brydegaard M. Observations of movement dynamics of flying insects using high resolution lidar. Sci Rep. 2016;6:29083.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Mullen ER, Rutschman P, Pegram N, Patt JM, Adamczyk JJ, Johanson E. Laser system for identification, tracking, and control of flying insects. Opt Express. 2016;24:11828–38.

    Article  CAS  PubMed  Google Scholar 

  29. Potamitis I, Rigakis I. Large aperture optoelectronic devices to record and time-stamp insects’ wingbeats. IEEE Sens J. 2016;16:6053–61.

    Article  Google Scholar 

  30. Song Z, Zhang B, Feng H, Zhu S, Hu L, Brydegaard M, et al. Application of lidar remote sensing of insects in agricultural entomology on the Chinese scene. J Appl Entomol. 2020;144:161–9.

    Article  Google Scholar 

  31. Richards IR. Photoelectric cell observations of insects in flight. Nature. 1955;175:128–9.

    Article  Google Scholar 

  32. Batista GEAPA, Hao Y, Keogh E, Mafra-Neto A. Towards automatic classification on flying insects using inexpensive sensors. In: 10th International conference on machine learning and applications. Honolulu, HI, USA. 2011 Dec 18–21. IEEE. 2011. https://doi.org/10.1109/ICMLA.2011.145.

  33. Fanioudakis E, Geismar M, Potamitis I. Mosquito wingbeat analysis and classification using deep learning. In: 26th European signal processing conference (EUSIPCO). Rome, Italy. 2018 Sept 3–7. IEEE. 2018. https://doi.org/10.23919/EUSIPCO.2018.8553542.

  34. Silva DF, Vinícius MAS, Ellis DPW, Keogh EJ, Batista GEAPA. Exploring low cost laser sensors to identify flying insect species evaluation of machine learning and signal processing methods. J Intell Robot Syst. 2015;80:313–30.

    Article  Google Scholar 

  35. Genoud AP, Gao Y, Williams GM, Thomas BP. Identification of gravid mosquitoes from changes in spectral and polarimetric backscatter cross sections. J Biophotonics. 2019;12:e201900123.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Gibson G, Warren B, Russell IJ. Humming in tune: sex and species recognition by mosquitoes on the wing. J Assoc Res Otolaryngol. 2010;11:527–40.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Villarreal SM, Winokur O, Harrington L. The impact of temperature and body size on fundamental flight tone variation in the mosquito vector Aedes aegypti (Diptera: Culicidae): implications for acoustic lures. J Med Entomol. 2017;54:1116–21.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Staunton KM, Usher L, Prachar T, Ritchie SA, Snoad N, Johnson BJ. A novel methodology for recording wing beat frequencies of untethered male and female Aedes aegypti. J Am Mosq Control Assoc. 2019;35:169–77.

    Article  PubMed  Google Scholar 

  39. Wang J, Zhu S, Lin Y, Svanberg S, Zhao G. Mosquito counting system based on optical sensing. Appl Phys B. 2020. https://doi.org/10.1007/s00340-019-7361-2.

    Article  Google Scholar 

  40. Kim D, Debriere TJ, Cherukumalli S, White GS, Burkett-Cadena ND. Infrared light sensors permit rapid recording of wingbeat frequency and bioacoustic species identification of mosquitoes. Sci Rep. 2021;11:10042.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Geier M, Weber M, Rose A, Obermayr U, Abadam C. A smart Internet of Things (loT) device for monitoring mosquito trap counts in the field while drinking coffee at your desk. In: AMCA 82nd annual meeting. Savannah, Georgia, USA. 2016 Feb 7–11.

  42. Potamitis I, Eliopoulos P, Rigakis I. Automated remote insect surveillance at a global scale and the internet of things. Robotics. 2017. https://doi.org/10.3390/robotics6030019.

    Article  Google Scholar 

  43. Eliopoulos P, Tatlas NA, Rigakis I, Potamitis I. A “smart” trap device for detection of crawling insects and other arthropods in urban environments. Electronics. 2018. https://doi.org/10.3390/electronics7090161.

    Article  Google Scholar 

  44. Villwock S, Pacas M. Application of the welch-method for the identification of two- and three-mass-systems. IEEE Ind Electron Mag. 2008;55:457–66.

    Article  Google Scholar 

  45. Bisina KV, Azeez MA. Optimized estimation of power spectral density. In: Proceedings of the 2017 international conference on intelligent computing and control systems. Madurai, India. 2017 Jun 15–16. IEEE. 2017. https://doi.org/10.1109/ICCONS.2017.8250588.

  46. Dou Z, Madan A, Carlson JS, Chung J, Spoleti T, Dimopoulos G, et al. Acoustotactic response of mosquitoes in untethered flight to incidental sound. Sci Rep. 2021;11:1884.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Menda G, Nitzany EI, Shamble PS, Wells A, Harrington LC, Miles RN, et al. The long and short of hearing in the mosquito Aedes aegypti. Curr Biol. 2019;29:709–14.

    Article  CAS  PubMed  Google Scholar 

  48. Oppenheim AV. Speech spectrograms using the fast Fourier transform. IEEE Spect. 1970;7:57–62.

    Article  Google Scholar 

  49. Zhu LQ. Insect sound recognition based on MFCC and PNN. In: International conference on multimedia and signal processing. Uttar Pradesh, India. 2011 Dec 17–19. IEEE. 2011. https://doi.org/10.1109/CMSP.2011.100.

  50. Schmidhuber J. Deep Learning in Neural Networks: an overview. Neural Netw. 2015;61:85–117.

    Article  PubMed  Google Scholar 

  51. Phyton. Phyton documentation. https://www.python.org. Accessed 15 Sept 2021.

  52. Scikit Learn. Scikit learn user guide. https://scikit-learn.org. Accessed 15 Sept 2021

  53. Tensor Flow. Tensor flow documentation. https://www.tensorflow.org. Accessed 15 Sept 2021.

  54. XGBoost. XGBoost documentation. https://xgboost.readthedocs.io. Accessed 15 Sept 2021.

  55. Cator LJ, Arthur BJ, Ponlawat A, Harrington LC. Behavioral observations and sound recordings of free-flight mating swarms of Ae. aegypti (Diptera: Culicidae) in Thailand. J Med Entomol. 2011;48:941–6.

    Article  PubMed  Google Scholar 

  56. Arthur BJ, Emr KS, Wyttenbach RA, Hoy RR. Mosquito (Aedes aegypti) flight tones: frequency, harmonicity, spherical spreading, and phase relationships. J Acoust Soc Am. 2014;135:933–41.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of the SPECOM. Patras, Greece. 2005 Oct 17–19. 2005;1:191–194.

  58. Müller M. Information retrieval for music and motion. https://www.mathworks.com. Accessed 20 Sept 2021.

  59. Lukman A, Harjoko A, Yang C-K. Classification MFCC feature from Culex and Aedes aegypti mosquitoes noise using Support Vector Machine. In: International conference on soft computing, intelligent system and information technology (ICSIIT). Denpasar, Indonesia. 2017 Sept 26–29. IEEE. 2017. https://doi.org/10.1109/ICSIIT.2017.28.

  60. Noda JJ, Travieso-González CM, Sánchez-Rodríguez D, Alonso-Hernández JB. Acoustic classification of singing insects based on MFCC/LFCC fusion. Appl Sci. 2019;9:4097.

    Article  Google Scholar 

  61. Day CA, Richards SL, Reiskind MH, Doyle MS, Byrd BD. Context-dependent accuracy of the BG-Counter remote mosquito surveillance device in North Carolina. J Am Mosq Control Assoc. 2020;36:74–80.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors are grateful to BSL2 facilities staff of IRTA-CReSA for their technical contribution.

Funding

This research was supported by the project VECTRACK. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 853758. The contents of this publication are solely the responsibility of the authors, and the funding agency is not responsible for any use that may be made of it.

Author information

Authors and Affiliations

Authors

Contributions

JE, ST and NB conceived and designed the study. JE, MW and PV designed and built the sensor. ST, NB and CA designed the entomological part of the study. JB and MG conducted the flight assays with the sensor. MG, NP and MV conducted mosquito rearing and maintenance and gave support on flight assays. BF conducted the feature extraction of flight assays and implemented the machine learning process to build the classification models. MG and BF analysed the results and drafted the manuscript. NB, MW, CA, ST and JE revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sandra Talavera.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Irideon S.L. is currently applying for a patent relating to the content of the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Text S1.

Mel spectrogram and MFCC generation process. Figure S1. Diagram to illustrate MFCC generation. Figure S2. Histogram plots showing the distributions of fundamental frequency (top) and fundamental peak power (bottom) for a Genus, b Aedes sex and c Culex sex. Text S2. Description of the machine learning algorithms used in this work. Figure S3. Representation of the machine learning classifications (in bold text), with their respective classes immediately below and indicated by the arrow heads. Figure S4. Schematic overview of the training, validation and testing approach. 1 Dataset is randomly separated into training and test sets, accounting for 75% and 25% of the whole dataset respectively. 2 Training set is separated using fourfold cross-validation into four folds with an equal number of samples in each fold. 3 Four iterations of training and validation take place using a different fold for validation in each iteration. 4 Model with best average validation score, obtained by averaging the four cross-validation results, is selected. 5 Model is evaluated using test set (containing data which was previously unused) to obtain test score. Table S1. Hyperparameters of the trained models which achieved the highest accuracies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González-Pérez, M.I., Faulhaber, B., Williams, M. et al. A novel optical sensor system for the automatic classification of mosquitoes by genus and sex with high levels of accuracy. Parasites Vectors 15, 190 (2022). https://doi.org/10.1186/s13071-022-05324-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13071-022-05324-5

Keywords