Comparison of four DNA barcoding loci to distinguish between some Apiaceae family species

Background The Apiaceae family is among the most significant plant families because it contains both beneficial and poisonous plants. Due to their morphological similarity, these harmless and lethal species are frequently con‑ founded. Cumin, fennel, and anise are the most prevalent members of the family Apiaceae in Egypt. Members of this family are routinely used as medical surrogates, so it is crucial that they are correctly identified and distinguished. DNA barcoding is a molecular technique used for identifying species and reconstructing phylogenetic trees. Results Six plants from this family were chosen for this study due to their medicinal importance, and four DNA bar‑ coding loci (rbcL, matK, trnH‑psaA, and ITS) were used to identify them. The amplicons were sequenced, and the com‑ parative analysis was conducted between the sequences evaluated and the most significant Blast results. The DNA rbcL , trnH‑psaA , and ITS barcodes exhibited similar amplicons among the six species of Apiaceae, while the trnH‑psaA barcode exhibited different amplicons among the Apiaceae . Maximum likelihood approach was used to calculate the genetic distance between the sex species of Apiaceae . The most significant findings were that the one from four DNA barcoding was able to distinguish between distinct species and confirm their evolutionary belonging to this family. Conclusions The current study concludes that trnH‑psbA and ITS DNA identifiers can be used to accurately identify, differentiate, and record Apiaceae species, while the rbcl DNA barcode appears to have fallen short of its intended purpose. So, the data that come from DNA barcodes could be used for the biodiversity assessment and the similarities between hazardous and commercial plants to resolve some of these deficiencies.


Background
To keep the world's healthcare system running, we need medicinal plants.Herbal remedies have been shown to cure a wide range of illnesses and disorders, sometimes with fewer side effects and at a lower cost than pharmaceutical options [26].It is estimated that there are between 3600 and 3751 different species of plants in the Apiaceae family [24].Many important phytochemicals, including phenolic compounds and flavonoids, are found in the Apiaceae family.Flavonoids' antiviral, anticancer, antioxidant, and anti-inflammatory characteristics are only a few of their many positive health effects.In addition, they shield the heart and the brain from damage.Variations in the effects of flavonoids on certain cellular activities have been reported [29] but more research is needed.Essential oils, extracted from various species in this family, have approximately 760 different chemical classes with substantial therapeutic potential.Coriander seed oil has a high concentration of petroselinic acid.The European Commission approved its sale as a novel food additive in 2014 [20] in accordance with Regulation (EC) No 258/97 of the European Parliament and Council.
This family has a lot of plants, for example, Parsley (Petroselinum crispum L.), anise (Pimpinella anisum), coriander (Coriandrum sativum), cumin (Cuminum cyminum L.), dill (Anethum graveolens Mill.), fennel (Foeniculum vulgare Mill.), and caraway (Carum carvi L.) [10].The presence of volatile chemicals is a telltale sign of these plants, which have long been thought to have somewhat negative medicinal effects on the body and mind.However, there are some dangerous members of the Apiaceae family.Hemlock water-dropwort (Oenanthe crocata L.), fool's parsley (Aethusa cynapium L.), poison hemlock (Conium maculatum L.), and water hemlock (Cicuta virosa L.) are some of the most well-known examples of these plants.Toxic species are sometimes mistaken for fragrant food species because of their similar chemical makeup and structure [21].Traditional approaches to biodiversity assessment are time-consuming and rely on taxonomic data, which is becoming scarcer.Recent advances like molecular methods are useful tools for identifying certain clonal variations, and establishing genetic stability [1,2,[11][12][13][14]23].As reported by [6], DNA barcoding may one day offer a faster and more accurate alternative to traditional methods of estimating species diversity that rely on expert field identification personnel.
DNA barcoding has had a significant favorable effect on biodiversity identification and categorization [17].DNA barcodes have two main uses: (1) to determine the species of an unidentified material and (2) to help researchers discover new species by screening thousands of copies of a small number of reference genes.The chloroplast genome, which includes all the DNA sequences in a single plastid, has more information than any singlelocus marker for identifying and classifying plant species.DNA barcodes that make use of chloroplast genomes actively to distinguish between plant species are an important area of research and development [30].The Plant Working Group of the Consortium for the Barcode of Life (CBOL) has proposed using ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) and maturase K (matK), both located in the plastid genome, as a standard barcode for plants, with the option of using additional markers to fill in any gaps.It has been postulated that the trnH and photosynthetic protein II D1 (trnH-psbA) plastid intergenic spacer region is another maker [19].
The Internal Transcribed Spacer (ITS1-4) is a powerful phylogenetic marker with substantial interspecies divergence at the species level.The ITS region has been proposed as a plant barcode because to its superior selective power over plastid areas at low taxonomic levels, especially in parasitic plants for which plastid barcodes give less precision [17].
In this study, we used DNA barcodes to differentiate between six commercially relevant species of the family Apiaceae.Both DNA barcodes (ITS1-4) and three chloroplast DNA barcodes (trnH-psbA, matK, and rbcL) were used to discover the genetic diversity within this family and draw the phylogenetic tree between these species.

Plant material
The experimental seedlings of six Apiaceae species were planted in compost soil-filled pots.Table 1 displays six economically relevant Apiaceae plant species that were studied in this study.

DNA extraction
Using the EasyPure ® Genomic DNA Kit (Beijing Trans Gen Biotechnology Co., Ltd), we isolated DNA from 100 mg of three-week-old leaves from germinated seeds in accordance with the manufacturer's instructions.The purity and concentration of the DNA were evaluated using spectrophotometry and agarose gel electrophoresis (at 0.8% concentration).The DNA was then stored at -20 degrees Celsius until needed.

PCR amplification and purification
MatK, rbcL, and trnH-psbA were amplified from the plastid genome, whereas ITS was amplified from the nuclear genome.The primers, PCR cycle, and amplicon size for each primer are listed in Table 2. Six microliters of double-distilled water were added to the PCR reaction

Data analysis and sequencing
PCR products were purified and sequenced using Sanger Technology (Macrogen, Korea) for bidirectional sequencing of the rbcL, matK, trnH-psaA, and ITS barcode markers as shown in Fig. 1.The obtained sequences were assembled into contigs using BioEdit v.7.2.5 software [27].Pairwise distance, Transition/ Transversion, and Substitution Matrix were estimated with the MEGA11 software employing the Kimura-2-Parameter (K2P) model, and the contigs were aligned with ClustaW of MEGA11 to verify species identity.Relationships analysis among species were built in MEGA11 using the Maximum Likelihood (ML) approach based on the Kimura-2-Parameter (K2P) model [25].All relationships analysis were given 500 replicates of the bootstrap test.

PCR amplification
In this study, high-quality genomic DNA from six Apiaceae species was utilized to amplify four different  barcodes.The PCR product evaluated by agarose gel (1.5%) revealed amplified amplicons ranging in size from 158 bp (exhibited by trnH-psaA) to 780 bp (exhibited by matK).These amplicons when sequenced revealed sequences ranged from 127 to 2333 bp.While rbcL, matK, and ITS exhibited one amplicon, the trnH-psaA exhibited two amplicons shown in Fig. 1 and Table 2.The size of the sequences obtained from the four DNA barcodes is shown in Table 3.

Analysis of data sequence
As presented in Tables 2 and 3 The constructed sequences were compared using BLAST to check for species similarity.MatK, trnH-psbA, and ITS were all shown to be efficient DNA barcoding regions for species identification, despite the Plant Working Group of the Consortium for the Barcode of Life (PWG-CBOL) recommending matK and rbcL as core barcoding regions for plants, Khella Shaytani and Anise's species identities could not be determined using rbcL, whereas BLAST showed Pimpinella saxifrage instead of Pimpinella anisum (Accession No. of all Reference plants as stated in Table 4) and Ammi trifoliatum instead of Ammi majus.The pairwise distance values resulted for all barcodes (rbcL, matK, trnH-psaA, and ITS) being 0.01337, 0.3805, 0.3732, and 0.8018, respectively.

Relationships analysis among species
Each barcoding locus's constructed sequence was aligned using MEGA11.Kimura's 2-Parameter using MEGA11 was used to calculate pairwise distance and Transition/Transversion.
The dendrogram presented the figure of rbcL Fig. 2C that Carum carvi is more closely related to the toxic plant Oenanthe crocata than to any other species.While Fig. 2A shows that Anethum graveolens is genetically closer to the noxious plant Conium maculatum and Carum carvi is more closely related to the toxic plant Oenanthe crocata.In addition, the dendrogram presented the figure of trnH-psbA (shown in Fig. 2B) showed that Petroselinum crispum was more closely related to Ammi majus than Ammi trifoliatum (shown in Fig. 2C).However, the dendrogram presented the figure of ITS (shown in Fig. 2D) demonstrates that Carum carvi is more closely related to Cuminum cyminum than and Ammi majus was more closely Anethum foeniculum.

Discussion
The results of this study suggest that matK, trnH-psbA, and ITS are all effective DNA barcoding regions for species identification in the Apiaceae family [4,28].The existence of single nucleotide repeats, which generate frequent shifts in the reading frame, is responsible for the rapid pace of length divergence.There seems to be a lot of SNP repetitions in the genomes of many angiosperms.This result agrees with the conclusions made by [8], who found that, with the exception of the trnH-psbA region, the sequence DNA coding loci has a reasonable read length in both directions, despite the PWG-CBOL recommending matK and rbcL as core barcoding regions for plants and shown that these markers are highly variable at the interspecific level, but relatively conserved at the intraspecific level.However, rbcL was unable to distinguish between Khella Shaytani and Anise, and ITS misidentified Pimpinella saxifrage as Pimpinella anisum, and Ammi trifoliatum as Ammi majus.because rbcL was not as effective for species identification in Apiaceae.This is likely because 164 rbcL is a relatively slow-evolving gene, and therefore does not accumulate enough variation to 165 distinguish between closely related species.
The high pairwise distances and Transition/Transversion ratios between the six Apiaceae species sequenced in this study suggest that they are all distinct species.This is further supported by the fact that each species had a unique sequence for each of the three barcodes matK, trnH-psbA, and ITS.It is also worth noting that the study found that BLAST analysis of the rbcL sequences for Khella Shaytani and Anise could not determine their species identities.This suggests that there may be some taxonomic confusion surrounding these species, or that the rbcL sequences used in the study were not representative of the species.
Overall, the results of this study suggest that DNA barcoding is a powerful tool for species identification in Apiaceae, and that matK, trnH-psbA, and ITS are all good choices for DNA barcoding in Apiaceae.However, more This finding is in line with recent studies showing that matK and rbcL are not always useful as barcodes for specific plant taxa [9,22].Although matK was effective in this investigation at identifying plant species and producing data, it was unable to reliably differentiate between the species.The trnH-psbA spacer, although relatively short and simple to amplify, is the plastid region with the most variability in angiosperms.Previous research [3,5,7,15,16,18] supports our conclusion that trnH-psbA is the most effective DNA barcode for plant identification.
Furthermore, the ITS has made great strides in the identification of species.Sequences of rbcL, matK, trnH-psbA, and ITS were retrieved from the NCBI Gene repository for the following plant species: Genbank sequences for Anethum graveolens, Apium graveolens, Ammi visnaga, Cicuta maculate, Oenanthe crocata, Aethusa cynapium, Conium maculatum, Heracleum maximum, and Pastinaca sativa were used to construct phylogenetic trees using the phylogenetic maximum likelihood (ML) method.Thus, the best results were using both ITS and trnH-psbA, and the latter achieved the best results at the level of the DNA barcoding markers used.

Conclusion
This research clearly shows that the six Egyptian Apiaceae species can be distinguished from one another using the DNA barcodes rbcL, matK, trnH-psaA, and ITS.However, it was decided that rbcL was not enough for barcoding at the species level.Combining it with another Barcoding Loci may provide a more accurate result for members of this family.In addition, trnH-psbA and ITS did a great job of identifying species.We therefore recommend using a range of biochemical approaches to further distinguish between harmful and beneficial species, as DNA barcoding has shown a close connection between them (as evidenced by phylogenetic trees).We also recommend the combination of trnH-psbA and ITS.Table 5 also displays the nucleotide sequences that we obtained and uploaded to the NCBI database.

Fig. 2
Fig. 2 Phylogenetic relationships among some Apiaceae family constructed using MEGAx software by the maximum likelihood (ML) depending on four DNA barcodes, A = mat K, B = psb, C = rbcL and D = ITS

Table 1
The studied plants, their IDs mixture to adjust the volume of the final product.A thermal cycler (Perkin Elmer GeneAmp PCR System 2400) was used to run all PCRs alongside negative controls.1.5% agarose gel electrophoresis with 3l of 100bp plus DNA ladder (TransGen Biotech Co., Ltd., Cat.No. BM301) was used to examine the PCR results.

Table 2
The primers used for PCR and sequencing *F = Forward; R = reverse , rbcL exhibited amplicon with 533 bp which revealed sequence ranged from 508 to 511 bp among the six Apiaceae species; the 508 bp was observed in Anise while the 511 bp was observed in Parsley.Similarly, ITS exhibited one amplicon with 750 bp which revealed sequences ranging between 699 and 709 bp among the Apiaceae species.While Fennel exhibited the 699 bp, Caraway exhibited the 705 bp sequence.

Table 3
Sequences obtained from the four DNA barcodes using Sanger Technology (Macrogen, Korea)

Table 4
Accession no. of references plant on database for each primer

Table 5
GenBank accession numbers for your nucleotide sequences