Identified gaps for Sorghum (Sorghum bicolor var. bicolor)
45724 registers for sorghum landraces were obtained from The International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), The Germplasm Resources Information Network (GRIN) and The European Genetic Resources Search Catalogue (EURISCO). Registers are distributed depending on their database origin as follows:
52.8% of all the sorghum accessions selected for the analysis, did not have any geographic information. Significant knowledge gaps (i.e. lack of location information in passport data) can be observed in China, Russia, United States and some parts of Europe and South America (see Figure 1). All these areas have both collections and cropped lands, but the limited data availability makes it difficult to assess the extent at which the collection is complete. Due to these constraints, we will focus the analyses on areas where at least sparse data with coordinates have been found (i.e. Sub-Saharan Africa, India, and some parts of Europe).
Figure 1 Total number of accessions (including non-georeferenced accessions) per country and georeferenced accessions (blue dots), using GRIN, EURISCO and ICRISAT databases.
As shown below, our approach is composed of three main steps:
1. Simple geographic distances and point densities
To visualize areas where deficient sampling has been carried out, and thus focus further collecting missions over those areas, an inspection of both the distribution and the geographical frequency of collections of a cropped species was made. First, we calculated the number of accessions in a 300km radius circular neighborhood within a limited geographic space (see Figure 2). This space is defined by the known distribution of the crop (Leff et al. 2004; Monfreda et al. 2008) (Crop distributions surfaces available at the Land Use and Global Environmental Change website of the Department of Geography at McGill University).
High density areas (in green colour) can be observed just below the Sahel belt, and in eastern Africa (Uganda, Ethiopia, western Kenya, Burundi, Rwanda, northern Tanzania, Zimbabwe, and central Mozambique). South and central India seem to be very well sampled, whilst the northern areas are particularly far from collection locations, probably meaning that these areas are underrepresented in ex-situ holdings. Areas in China appear as poorly sampled, but this is probably a data-access issue. Similarly, in Europe, the issue is the quality of location data rather than the availability of germplasm. In the other hand, Sub-Saharan Africa areas such as the Democratic Republic of Congo, Madagascar or Chad (in red), where despite the presence of the crop only a very limited number of accessions have been collected, are probably significant gaps in collections and need further collecting.
Figure 2 Density of accessions within sorghum cultivated lands within a 300km radius neighborhood.
Representativeness of genebank accessions, however, cannot be only assessed geographically, since it is clear that an accession (collected in a certain environment), can be representative of several different environments. Due to that, it is necessary to assess the environmental representativeness of each accession in relation to the entire geographic space where the crop is grown. If the collection is complete, then rare environments should be properly represented. Environmental completeness would be achieved when the set of accessions adequately represent all croplands.
2. Environmental distances
Accessions collection sites were characterized using Worldclim set as environmental layers (Hijmans et al. 2005, available at: http://www.worldclim.org/) to derive 19 bioclimatic indices (Busby 1991) with which a complete characterization of the climate of a place was done (annual trends, seasonality and extremes). Using these environmental data, a function was designed to calculate Mahalanobis distance (Mahalanobis 1936) of the set of points to each of the pixels where the crop is known to be grown (defined by a mask layer).
Due to the considerable collinearities between the variables in the set of Bioclim, which might become an issue when calculating Mahalanobis distance, we discarded P5 (maximum temperature of warmest month).
When analyzing the environmental representativeness of the sorghum collection (see Figure 3), it appears that several areas that were tentatively found as ‘gaps’ when analyzing only the geographic coverage of collections are represented by some parts of the collection (even if there are no collections in those areas). Areas in Chad where very few accessions have been collected are represented by the other part of northern Sub-Saharan Africa, and the same happens with southern Madagascar. Democratic Republic of Congo and the coasts of Ghana, Ivory Coast and Sierra Leone, as well as very eastern and northern Madagascar seem to be poorly represented by the current collection, thus indicating the need of further collecting.
Figure 3 Environmental representativeness using the Mahalanobis distance
3. Selection of sampling areas and areas with gaps
Tresholds were selected to determine which areas are not represented enough by the set of accessions. Two tresholds were selected based on statistics (one for the sampling density layer, and the other one for Environmental distances) and used to cutoff both previously calculated surfaces.
Potential collecting areas in the case of Sorghum (yellow to red areas, see Figure 4) are those considered as neither geographically nor environmentally represented in the collection (or in the part of the collection that was assessed).
Significant gaps in the collection can be observed in North Africa (the Mediterranean basin), in coastal West Africa (Ivory Coast, Ghana, Sierra Leone), Democratic Republic of Congo, Madagascar, and some small and isolated areas can be also found in western Africa (eastern Somalia, southern Ethiopia, southwestern Kenya). Thailand appears as a significant gap, and the same can be observed in very northern and southeastern China and Taiwan. China, however, can be considered as a well sampled country, but with very limited data accessibility.
Despite the considerable extension of the crop in both North and Latin America, these areas appear as gaps since they are both geographically and environmentally ‘far’ from the part of the collection under analysis. Further collecting efforts for Sorghum should be focused in Sub-Saharan Africa, particularly in Madagascar, and along the western coast of India, while data availability and quality-improvement issues should be the focus in other areas such as Europe, China and North America. Since we couldn’t determine the extent at which novel sorghum landraces exist in Latin America, this area should be at least roughly explored in search of new and useful germplasm.
Figure 4 Potential collecting areas for Sorghum landraces
|Total number of accessions per country and georeferenced accessions map|
|Density of accessions within sorghum cultivated lands within a 300km radius neighborhood|
|Environmental representativeness using the Mahalanobis distance|
|Potential collecting areas for Sorghum landraces|