Identified gaps for Soybean (Glycine max)
17674 registers for soybean landraces were obtained from The Germplasm Resources Information Network (GRIN), The European Genetic Resources Search Catalogue (EURISCO) and The CGIAR System-wide Information Network for Genetic Resources (SINGER). Registers are distributed according to their database origin as follows:
99.3% of all the soybean accessions selected for the analysis, did not have any geographic information. Significant knowledge gaps (i.e. lack of location information in passport data) can be observed in North, Central and South America, China, Southeast Asia, Australia, Japan, South Corea, some countries in Africa, Russia, India and Europe (see Figure 1). All these areas have both collections and cropped lands, but the limited data availability makes it difficult to assess the extent at which the collection is complete.
Figure 1 Total number of accessions (including non-georeferenced accessions) per country and georeferenced accessions (blue dots), using GRIN, EURISCO and SINGER databases.
As shown below, our approach is composed of three main steps:
1. Simple geographic distances and point densities
To visualize areas where deficient sampling has been carried out, and thus focus further collecting missions over those areas, an inspection of both the distribution and the geographical frequency of collections of a cropped species was made. First, we calculated the number of accessions in a 300km radius circular neighborhood within a limited geographic space (see Figure 2). This space is defined by the known distribution of the crop (Leff et al. 2004; Monfreda et al. 2008) (Crop distributions surfaces available at the Land Use and Global Environmental Change website of the Department of Geography at McGill University).
High density areas (in green colour) can be observed in Eastern Europe, Zimbabwe, South Corea and Nepal. The rest of areas in the world appear as poorly sampled, but this is probably a data-access issue (particularly for the United States, France, Italy and China). In the other hand, Nigeria, South Africa, Eastern Europe, India, the Malay archipielago and South America, where despite the presence of the crop only a very limited number of accessions have been collected, are probably significant gaps in collections and need further collecting.
Figure 2 Density of accessions within soybean cultivated lands within a 300km radius neighborhood.
Representativeness of genebank accessions, however, cannot be only assessed geographically, since it is clear that an accession (collected in a certain environment), can be representative of several different environments. Due to that, it is necessary to assess the environmental representativeness of each accession in relation to the entire geographic space where the crop is grown. If the collection is complete, then rare environments should be properly represented. Environmental completeness would be achieved when the set of accessions adequately represent all croplands.
2. Environmental distances
Accessions collection sites were characterized using Worldclim set as environmental layers (Hijmans et al. 2005, available at: http://www.worldclim.org/) to derive 19 bioclimatic indices (Busby 1991) with which a complete characterization of the climate of a place was done (annual trends, seasonality and extremes). Using these environmental data, a function was designed to calculate Mahalanobis distance (Mahalanobis 1936) of the set of points to each of the pixels where the crop is known to be grown (defined by a mask layer).
Due to the considerable collinearities between the variables in the set of Bioclim, which might become an issue when calculating Mahalanobis distance, we discarded P5 (maximum temperature of warmest month).
When analyzing the environmental representativeness of the soybean collection (see Figure 3), it appears that several areas that were tentatively found as ‘gaps’ when analyzing only the geographic coverage of collections are represented by some parts of the collection (even if there are no collections in those areas). Areas in northern United States, Italy, Eastern Europe, China, Australia, southern Zimbabwe, South Africa, some areas in Argentina, Brazil, Bolivia and scattered areas in the Malay archipielago, where very few accessions have been collected and are already environmentally represented in ex-situ holdings.
Figure 3 Environmental representativeness using the Mahalanobis distance
3. Selection of sampling areas and areas with gaps
Tresholds were selected to determine which areas are not represented enough by the set of accessions. Two tresholds were selected based on statistics (one for the sampling density layer, and the other one for Environmental distances) and used to cutoff both previously calculated surfaces.
Potential collecting areas in the case of soybean (yellow to red areas, see Figure 4) are those considered as neither geographically nor environmentally represented in the collection (or in the part of the collection that was assessed).
Significant gaps in the collection can be observed in Nigeria, Iran, India and the Malay archipielago. Brazil, Bolivia, Guatemala appear as a significant gap, and the same can be observed in China, however for this last country, it can be considered as a well sampled country, but with very limited data accessibility.
Further collecting efforts for soybean should be focused in areas where Environmental gaps were identified (since environment is used as a proxy for abiotic traits, such as extreme temperatures, drought, among others). Data availability and quality-improvement issues should be the focus in areas such as North America.
Figure 4 Potential collecting areas for soybean landraces
|Total number of accessions per country and georeferenced accessions map|
|Density of accessions within soybean cultivated lands within a 300km radius neighborhood|
|Environmental representativeness using the Mahalanobis distance|
|Potential collecting areas for soybean landraces|