A chromosome-level reference genome assembly of the Reeve’s moray eel (Gymnothorax reevesii)

Zhang, Kai; Huang, Yu; Zhang, Yuxuan; Liang, Rishen; Li, Qingqing; Li, Ruihan; Zhao, Xiaomeng; Bian, Chao; Chen, Yongnan; Wu, Jinhui; Shi, Qiong; Lin, Li

doi:10.1038/s41597-023-02394-7

Download PDF

Data Descriptor
Open access
Published: 29 July 2023

A chromosome-level reference genome assembly of the Reeve’s moray eel (Gymnothorax reevesii)

Kai Zhang^1,2^na1,
Yu Huang ORCID: orcid.org/0000-0002-3875-7581^3,4^na1,
Yuxuan Zhang^1,2^na1,
Rishen Liang^1,2^na1,
Qingqing Li^1,2,
Ruihan Li³,
Xiaomeng Zhao³,
Chao Bian ORCID: orcid.org/0000-0001-9904-721X^3,4,
Yongnan Chen^1,2,
Jinhui Wu⁵,
Qiong Shi ORCID: orcid.org/0000-0002-6358-976X^3,4 &
…
Li Lin^1,2

Scientific Data volume 10, Article number: 501 (2023) Cite this article

1376 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Due to potentially hostile behaviors and elusive habitats, moray eels (Muraenidae) as one group of apex predators in coral reefs all across the globe have not been well investigated. Here, we constructed a chromosome-level genome assembly for the representative Reeve’s moray eel (Gymnothorax reevesii). This haplotype genome assembly is 2.17 Gb in length, and 97.87% of the sequences are anchored into 21 chromosomes. It contains 56.34% repetitive sequences and 23,812 protein-coding genes, of which 96.77% are functionally annotated. This sequenced marine species in Anguilliformes makes a good complement to the genetic resource of eel genomes. It not only provides a genetic resource for in-depth studies of the Reeve’s moray eel, but also enables deep-going genomic comparisons among various eels.

Chromosome-level genome assembly and annotation of eel goby (Odontamblyopus rebecca)

Article Open access 02 February 2024

Chromosome-level genome assembly and annotation of the Antarctica whitefin plunderfish Pogonophryne albipinna

Article Open access 12 December 2023

An improved chromosome-level genome assembly and annotation of Echeneis naucrates

Article Open access 04 May 2024

Background & Summary

Fish in family Muraenidae, commonly referred to as moray eels, are one group of the most species-rich of the order Anguilliformes. At present, approximately 210 species are recognized^1,2. They are widely distributed in tropical, subtropical, and temperate waters worldwide, particularly in the region of Indo-Pacific^3,4,5. Muraenidae species are cryptic, primarily found in shallow waters as rock and coral reef inhabitants, which play crucial roles in the maintenance of coral reef biodiversity¹. They exhibit a wide range of color patterns, from uniform distinctive patterns of spots to blotches, bars, and reticulations. In a number of coastal countries, Muraenidae species are of commercial importance to the fishery industry due to their high nutritional value. Besides, their spectacular external coloration makes them well-liked in marine aquariums. To date, moray eels are not well-researched due to their cryptic habitats and irregularly aggressive behaviors⁶.

Anguilliformes species are scale-less with elongated bodies and usually lack pelvic fins^7,8,9. Unlike other families in this order, some Muraenidae species don’t even have pectoral fins, and their opercular bones are degenerated with only small and round branchial pores opening outside². Moray eels also have an unexpectedly wide variety of body sizes as per the whole length and mass. Their adults range in body mass from 4 to above 600 g, and in total length from 10 cm to 400 cm. Remarkably, the average number of vertebral numbers in morays varies by 2.4 folds, which is equal to the variance among extant snakes¹⁰. The maximum length and body shape are two parameters for diversifying among organisms, as each holds an effect on organismal-environment interactions^11,12. Due to their distinctive morphological features, moray eels can be utilized as a good model for thoroughly exploring molecular mechanisms of evolutionary body patterns.

Both freshwater and marine eels differ a lot in phenotypic evolution, adaptation, and speciation and are closely related to chromosome evolution including rearrangements as has been proved by previous studies¹³. The majority of freshwater eels have only 19 pairs of chromosomes¹⁴, but some marine eels such as leopard moray eel (Enchelycore pardalis) and brown moray eel (Gymnothorax unicolor) have 21 pairs¹⁵. An in-depth investigation of this chromosomal difference between these eels may be instructive to reveal phenotypic, adaptive, and speciation differences between the two groups.

Previous studies have primarily concentrated on the morphology, taxonomy, and germplasm resource of moray eels, while limited genetic and genomic resources have largely constrained the conservation as well as utilization of these commercial species. Reeve’s moray eel (Gymnothorax reevesii), as a representative species in the family Muraenidae, has an elongated body and does not have pectoral and pelvic fins (Fig. 1). G. reevesii is a subtropical and tropical species in the Northwest Pacific, and most distribute from southern Japan to the South China Sea¹⁶. Its body color of yellowish brown to brownish, with numerous large obscure dark brownish spots in 3–5 rows along the body side. Among Muraenidae fishes, G. reevesii is a common catch in coastal areas of eastern and southern China with commercial importance to local fisheries¹⁷. In recent years it is also tamed as cultured fish and becomes one of the few Muraenidae species that can be artificially cultured with a vast developmental and economic potential. But until now very few studies have been focused on this species¹⁸, indicating that a huge research gap is still existed, and thereby further information about this species is needed to be explored. In our present study, a chromosome-level genome assembly of G. reevesii was constructed using an integrated strategy of MGI¹⁹ (second generation), PacBio (third generation), and Hi-C sequencing technologies, and we then characterized the high repeat content in the genome, inferred the phylogeny, and performed a chromosomal synteny analysis. This sequenced marine species in Anguilliformes makes a good complement to the genetic resource of eel genomes.

Methods

Sample collection

A female Reeve’s moray eel (Fig. 1a) was collected from Daya Bay Aquatic Testing Center in Huiyan District, Huizhou City, Guangdong Province, China. We pooled the muscle (about 5 g) of this individual for whole genome sequencing (short-read, long-read, and Hi-C sequencing), and the muscle, gonads, brain, and liver (50 mg for each tissue) for transcriptome sequencing. These samples were cut into small pieces and freshly frozen in liquid nitrogen, and then were stored at −80 °C until use. The Animal Ethics Committee of Zhongkai University of Agriculture and Engineering (Guangzhou, China) approved our sampling pipeline.

DNA extraction and genome sequencing

Extraction and purification of genomic DNA (gDNA) from the muscle (2 g) was carried out using a blood & cell culture DNA kit (Qiagen, USA) in accordance with the manufacturer’s protocol.

The extracted gDNA (1.5 ug subsample) was randomly fragmented and used to create a 350-bp insert-size library by using MGIEasy universal DNA library prep set (MGI, China) for subsequent sequencing on a MGISEQ 2000 platform (MGI, China). A total of 65.39 Gb of paired-end raw reads (150 bp in length) were generated and then filtered via the SOAPfilter v2.2²⁰ (default parameters) to remove adaptor sequences and low-quality reads. We finally obtained approximately 60.68 Gb of clean reads for estimation of the genome size and subsequent assembling.

Furthermore, we sampled 2 ug gDNA to construct long-read libraries by using a SMRTbell Express Template Prep Kit 2.0 for HiFi sequencing based on PacBio’s standard protocol (Pacific Biosciences, USA), which were sequenced through a PacBio Sequel II System. CCS software (SMRT Link v9.0)²¹ was then applied to generate the consensus sequences (-min-passes 1 --min-rq 0.99 --min-length 100). About 5.46 million consensus reads (68.88 Gb) with a mean length of 12.62 kb were obtained.

For the Hi-C sequencing, muscle tissue (about 1 g) from the same individual was collected, and DNA libraries were constructed by using GrandOmics Hi-C kit (the applied restriction enzyme is DpnII; GrandOmics, China) according to the manufacturer’s protocol. The Hi-C libraries were then sequenced on an Illumina Novaseq platform (Illumina, USA). In total, 305.18 Gb of Hi-C paired-end raw reads (150 bp in length) were generated. Subsequently, fastp²² was applied to filter the adaptors, and those reads shorter than 30 bp or of low-quality (quality scores < 20). Finally, 99.98% reads (304.92 Gb) were retained for construction of pseudo-chromosomes.

RNA extraction and transcriptome sequencing

RNA samples were extracted from muscle, gonad, brain, and liver tissues (50 mg for each tissue) using a standard Trizol protocol (Invitrogen, USA), and purified using a Qiagen RNeasy mini kit (Qiagen, USA). RNA with equal amounts from each tissue was mixed for creating an Illumina cDNA library followed the manufacture’s guideline, which was then sequenced on a HiSeq X Ten platform (Illumina, USA). Around 6.01-Gb transcriptome data were generated for assistance to genome and gene annotations.

Genome-size estimation

To estimate the genome size of Reeve’s moray eel, a k-mer analysis²³ was performed by using MGI clean reads. Through the k-mer counting (KMC) program²⁴ and genome character estimator (GCE) software²⁵, the 17-mer frequency was calculated. The genome size was then estimated by assessing the 17-mer depth distribution through the equation of G = K_num/K_depth (G is the genome size, K_depth represents the k-mer depth, and K_num stands for the total number of 17-mers). The estimated genome size of Reeve’s moray eel is therefore about 2.05 Gb, and the genomic heterozygosity rate was predicted to be 1.02% (Fig. 2a).

De novo genome assembly

After obtaining subreads, the PacBio long reads were de novo assembled into contigs through hifiasm (v0.16.0)²⁶ with default parameters. These contigs were then polished with Nextpolish (v1.10)²⁷ using the MGI short reads to fix possible base errors. The primary genome assembly was 2.23 Gb in length, consistent with the estimated genome size.

We employed multiple methods to evaluate the quality of this assembly. First, Merqury v1.3²⁸ shows that the estimated completeness, QV, and error rate of the assembly were 86.33%, 41.47 and 7.12e-05, respectively. A k-mer spectra plot is provided (Fig. 3a). Second, Benchmarking Universal Single Copy Orthologs (BUSCO) v5.2.2²⁹ against actinopterygii_odb10 database was employed to assess the completeness of this genome assembly, showing that the assembly contains 94.89% of complete BUSCO genes including 88.3% single-copies and 6.59% duplicates (Table 1), suggesting that this genome assembly is of high quality. Third, Core Eukaryotic Gene Mapping Approach (CEGMA v2.5)³⁰ also shows that 223 (89.92%) out of 248 core genes were successfully assembled (Fig. 3b). Finally, we mapped the long reads back to the assembly via minimap2 v2.15³¹, and calculated the average GC content and sequencing depth over 10-kb windows, which shows that the assembled genome is clean without contamination (Fig. 3c). Gfastats v1.3.1³² was applied to obtain assembly summary statistics (see more details in Table 1).

Table 1 Statistics of the assembled genome for the Reeve’s moray eel.

Full size table

Pseudo-chromosome construction

Based on this high-quality genome assembly, Hi-C technique was subsequently employed to construct pseudo-chromosomes for the Reeve’s moray eel. First, Hi-C clean reads were mapped to the assembled contigs using bowtie2 (v2.3.2)³³ (-end-to-end --very-sensitive -L 30). Subsequently, HiC-Pro (v2.8.1)³⁴ pipeline was applied to detect valid ligation products and only valid contact paired reads were retained for further analysis. Based on these valid reads, the primary assembly was oriented, ordered, and clustered onto chromosomes through LACHESIS³⁵, with optimized parameters (CLUSTER_MIN_RE_SITES = 100, CLUSTER NONINFORMATIVE RATIO = 1.4, CLUSTER_MAX_LINK_DENSITY = 2.5, ORDER MIN N RES IN SHREDS = 60, ORDER MIN N RES IN TRUNK = 60). JuiceBox v1.11.08³⁶ was employed for manual correction of placement and orientation mistakes with glaring distinct chromatin interaction patterns. We hence obtained the final genome assembly with a size of 2.17 Gb, of which 97.87% are anchored into 21 chromosomes (Fig. 2). The scaffold and contig N50 values of the overall chromosome-level genome assembly are 112.89 Mb and 53.38 Mb, respectively, reaching a relatively high level among sequenced fish species.

High repeat content in the Reeve’s moray eel genome

Transposable elements (TE) in the G. reevesii genome were predicted by combination of homology-based and ab initio predictions. In general, default settings from RepeatModeler³⁷ and MITE-Hunter³⁸ were employed to obtain an ab initio repetition library, which was then aligned to Repbase³⁹ with TEclass tool⁴⁰ for classifying the details of each repeat family. By mapping sequences against the Repbase TE library and de novo repeat library, RepeatMasker³⁷ was applied to check known and new TEs so as to further identify these repeats across the assembled genome. Finally, a total of 1.23 Gb of repetitive sequences were annotated in the assembly, including 1.09 Gb of TEs (Table 2).

Table 2 Repetitive sequences in the genome of Reeve’s moray eel.

Full size table

In summary, repetitive sequences account for up to 56.34% of the genome of Reeve’s moray eel, including 50.15% of TEs (Table 2), which content is much higher than those of freshwater eels¹⁴ and most sequenced fishes so far that usually have a TE content less than 45%⁴¹. We then detected the repeat content in each chromosome of the Reeve’s moray eel and the freshwater eels. A plot of the repeat distribution throughout corresponding chromosomes, drawn by the RIdeogram package⁴², indicated that each chromosome of the Reeve’s moray eel also has a higher repeat content when compared to those freshwater counterparts (Fig. 4a).

More specifically, the Reeve’s moray eel genome has 50.15% of TEs, accounting for 89.01% of the total repetitive sequences. The combined length of the sequences representing each type of DNA, LTR, LINE, and SINE are 560.92, 200.04, 165.14, and 20.60 kb, respectively, which shows remarkable expansions of DNA and LTR sequences in the genome of Reeve’s moray eel compared to the freshwater eels (Fig. 4b). Considering that all these eel genomes with Mb-level contig N50 values were assembled from long reads with sufficient coverage, this difference appears be real among genomes and not the result of unassembled repeats in the freshwater eel genomes.

Gene annotation and functional assignment

In the repeat-masked genome, gene prediction was carried out using three methods, including homology, transcriptome-based, and ab initio annotations. For the homology prediction, GeMoMa⁴³ was employed to align homologous proteins from relevant species (including Japanese eel Anguilla japonica, spotted gar Lepisosteus oculatus, zebrafish Danio rerio, three-spined stickleback Gasterosteus aculeatus, and Fugu Takifugu rubripes) to our assembly to predict gene structures. For the transcriptome-based gene prediction, clean transcriptome reads were mapped to the assembled genome by STAR⁴⁴. PASA⁴⁵ was then applied to predict open reading frames (ORFs) and stringtie⁴⁶ was employed to assemble transcripts for gene structure annotation. For the ab initio annotation, transcriptome reads were de novo assembled by Trinity (v2.13.2)⁴⁷, and 300 full-length transcripts were randomly selected to build a library as a training set. Based on this training set, Augustus⁴⁸ with default parameters was then employed to predict genes. Finally, EVidenceModeler (EVM)⁴⁵ was applied to create a combined and non-redundant gene set after removal of those miscoded genes, and the final protein sequences were deduced by standard genetic codes.

Motifs/domains and gene functions of these annotated genes were predicted by blasting (BLASTp) the deduced protein sequences against various public databases, including SwissProt⁴⁹, NCBI NR, KEGG⁵⁰, KOG⁵¹, and Gene Ontology⁵² (GO), with an E-value cutoff of 1e−05.

Finally, in the assembled genome of Reeve’s moray eel, a total of 23,812 protein-coding genes were annotated, with an average gene length of 36,705.61 bp, an average coding sequence (CDS) length of 1,738.42 bp, and an average number of 8.51 exons per gene. Among these genes, 96.77% (23,812 genes) were annotated in the searched protein databases (Table 3). In addition, the CDSs were further translated into protein sequences with standard genetic codes, followed by self-alignment using BLASTp for construction of internal syntenic blocks by MCscan⁵³ with parameters “-a -e 1e-5 -u 1 -s 8”. A Circos⁵⁴ plot (Fig. 2c) was then generated to show (I) lengths of 21 pseudo-chromosomes, (II) gene density (percentage of genes per 100-kb window), (III) repeat density (minimum 0%, maximum 100%), (IV) GC content (minimum 30%, maximum 65%), and internal syntenic blocks (in the center).

Table 3 Gene structures and function annotation.

Full size table

Phylogenetic tree

A species tree was constructed using those single-copy orthologs from the whole genomes of Reeve’s moray eel and other 12 representative ray-finned species, with ropefish (Erpetoichthys calabaricus) as the outgroup. Both the Maximum Likelihood tree (constructed using PhyML⁵⁵) and the Bayesian tree (inferred by MrBayes⁵⁶) present the same topology with 100% for the node bootstrap values or 1 for the node support values. The consensus tree was further time-calibrated using MCMCTREE program in the PAML package⁵⁷. The tree shows that Reeve’s moray eel and two examined freshwater eels make up the Anguilliformes clade, forming a sister group to the Elopiformes composed of two tarpons, and that Reeve’s moray eel diverged from both freshwater eels 101.84 million years ago (Mya; Fig. 5). These data are consistent with previous reports^58,59,60.

Data Records

The genome assembly and raw reads of the genome and transcriptome sequencing for Reeve’s moray eel were deposited at NCBI under the accession number PRJNA934055⁶¹. Raw reads are available in the Sequence Reads Archive (SRA) with the accession number SRP430070. The genome assembly was deposited at GenBank with the accession number GCA_029721435.1⁶². Repeat and gene annotation files are publicly available in FigShare depository with accessions https://doi.org/10.6084/m9.figshare.22357987⁶³ and https://doi.org/10.6084/m9.figshare.22358209⁶⁴, respectively.

Technical Validation

Quality of extracted gDNA was detected by agarose gel electrophoresis (a main band at 20 kb) and Nanodrop spectrophotometer (Thermo Fisher Scientific, USA). The quality of the isolation met the standards that OD260/280 value ranges from 1.8 to 2.0, and OD260/230 is between 2.0–2.2. Quality of extracted RNA was examined by a 2100 Bioanalyzer (RIN > 7.0, 28 S/18 S > 1.0; Agilent Technologies, USA).

Reeve’s moray eel has 21 pairs of chromosomes (Chr) whereas most freshwater eels have only 19 pairs, which is evidenced by both karyotypic observation and genomic sequencing^14,65,66,67. We therefore performed a chromosomal synteny analysis using the JCVI package⁶⁸. Our plot shows that European eel and American eel had perfect one-to-one correspondences among their chromosomes (Fig. 5), and Reeve’s moray eel showed a generally conserved chromosomal synteny although with a few large fragmental exchanges. For example, both Chr2 and Chr3 in American eel and European eel correspond to two chromosomes (10 & 20, and 14 & 18, respectively) of Reeve’s moray eel. In addition, part of the Chr5 fused to the end of the Chr13 (American eel) or Chr12 (European eel) of the freshwater eels, forming Chr2 in the Reeve’s moray eel, and the other half of the Chr5 was preserved as a single chromosome (Chr17) in moray eel, presenting a two-to-two synteny pattern (Fig. 6). The chromosome synteny analysis implies that there is generally conserved chromosomal synteny between moray eel and freshwater eels, but a few fissions and fusions happened after their divergence, leading to 21 and 19 pairs of chromosomes in the two lineages respectively¹⁴.

Code availability

The versions and parameters of bioinformatic tools used in this study have been described in the Method section. If no parameter is provided, the default is used. No custom code was used.

References

Böhlke, E. B. & Smith, D. G. Type catalogue of Indo-Pacific Muraenidae. Proc. Acad. Nat. Sci. Phila. 152, 89–172 (2002).
Article Google Scholar
Smith, D. G., Bogorodsky, S. V., Mal, A. O. & Alpermann, T. J. Review of the moray eels (Anguilliformes: Muraenidae) of the Red Sea, with description of a new species. Zootaxa 4704, 1–87 (2019).
Article Google Scholar
Smith, D. G. A checklist of the moray eels of the world (Teleostei: Anguilliformes: Muraenidae). Zootaxa 3474, 1–64 (2012).
Article Google Scholar
Nelson, J. S., Grande, T. C. & Wilson, M. V. H. Fishes of the World. (John Wiley & Sons, 2016).
Mohapatra, A., Mohanty, S. R., Smith, D. G., Mishra, S. S. & Roy, S. Gymnothorax odishi sp. nov. (Muraenidae: Muraeninae), a short brown unpatterned moray eel from Bay of Bengal, India. Zootaxa 4420, 123–130 (2018).
Article PubMed Google Scholar
Reece, J. S., Bowen, B. W., Smith, D. G. & Larson, A. Molecular phylogenetics of moray eels (Muraenidae) demonstrates multiple origins of a shell-crushing jaw (Gymnomuraena, Echidna) and multiple colonizations of the Atlantic Ocean. Mol. Phylogenet. Evol. 57, 829–835 (2010).
Article CAS PubMed Google Scholar
Ramos-Castro, M., Loh, K. H. & Chen, H. M. A descriptive and comparative neurocranium morphology of Anguilliformes fishes in Taiwan waters. Zootaxa 5023, 509–536 (2021).
Article PubMed Google Scholar
Arai, T. Ecology and evolution of migration in the freshwater eels of the genus Anguilla Schrank, 1798. Heliyon 6, e05176 (2020).
Article PubMed PubMed Central Google Scholar
López, J. A., Westneat, M. W. & Hanel, R. The phylogenetic affinities of the mysterious Anguilliform genera Coloconger and Thalassenchelys as supported by mtDNA sequences. Copeia 2007, 959–966 (2007).
Article Google Scholar
Reece, J. S. & Mehta, R. S. Evolutionary history of elongation and maximum body length in moray eels (Anguilliformes: Muraenidae). Biol. J. Linn. Soc. 109, 861–875 (2013).
Article Google Scholar
Brown, J. H. & Lasiewski, R. C. Metabolism of weasels: the cost of being long and thin. Ecology 53, 939–943 (1972).
Article Google Scholar
Schmidt-Nielsen, K. & Knut, S.N. Scaling: why is animal size so important? (Cambridge university press, 1984).
Damas, J., Corbo, M. & Lewin, H. A. Vertebrate chromosome evolution. Annu. Rev. Anim. Biosci. 9, 1–27 (2021).
Article PubMed Google Scholar
Wang, H. et al. A Chromosome-level assembly of the Japanese eel genome, insights into gene duplication and chromosomal reorganization. Gigascience 11, giac120 (2022).
Article PubMed PubMed Central Google Scholar
Ronchetti, E., Salvadori, S. & Deiana, A. M. Genome size and AT content in Anguilliformes. Eur. J. Histochem. 39, 259–264 (1995).
CAS PubMed Google Scholar
Prokofiev, A. M. Additions to the species composition of morays of the Bay of Nha Trang (South China Sea, central Vietnam)(Anguilliformes: Muraenidae). J. Ichthyol. 50, 38–43 (2010).
Article Google Scholar
Chen, H. M., Shao, K. T. & Chen, C. T. A review of the muraenid eels (Family Muraenidae) from Taiwan with descriptions of twelve new records. Zool. Stud. 33, 44–64 (1994).
Google Scholar
Ramos-Castro, M., Chen, H. M. & Mao, B. S. Descriptions of neurocranium morphology in 34 species of moray eels (Muraenidae) found in Taiwan. Zootaxa 4810, 081–109 (2020).
Article Google Scholar
Blake, J. A., Bult, C. J., Kadin, J. A., Richardson, J. E. & Eppig, J. T. The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 39, D842–D848 (2011).
Article CAS PubMed Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
Article PubMed PubMed Central Google Scholar
Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genom. Proteom. Bioinform. 13, 278–289 (2015).
Article Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Deorowicz, S., Kokot, M., Grabowski, S. & Debudaj-Grabysz, A. KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31, 1569–1576 (2015).
Article CAS PubMed Google Scholar
Sun, H., Ding, J., Piednoël, M. & Schneeberger, K. findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics 34, 550–557 (2018).
Article CAS PubMed Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
Article CAS PubMed Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics 38, 4214–4216 (2022).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 4.10.1–4.10.14 (2004).
Article Google Scholar
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).
Article PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Article PubMed Google Scholar
Shao, F., Han, M. & Peng, Z. Evolution and diversity of transposable elements in fish genomes. Sci. Rep. 9, 15399 (2019).
Article ADS PubMed PubMed Central Google Scholar
Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251 (2020).
Article PubMed PubMed Central Google Scholar
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Article PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Article CAS PubMed Google Scholar
Bairoch, A. et al. The universal protein resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005).
Article CAS PubMed Google Scholar
Ogata, H. et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999).
Article CAS PubMed PubMed Central Google Scholar
Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261–D269 (2015).
Article CAS PubMed Google Scholar
The Gene Ontology Consortium. The Gene Ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330–D338 (2019).
Article Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Article CAS PubMed PubMed Central Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS PubMed Google Scholar
Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
Article PubMed PubMed Central Google Scholar
Yang, Z. H. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Tang, K. L. & Fielitz, C. Phylogeny of moray eels (Anguilliformes: Muraenidae), with a revised classification of true eels (Teleostei: Elopomorpha: Anguilliformes). Mitochondrial DNA 24, 55–66 (2013).
Article CAS PubMed Google Scholar
Barth, J. M. I. et al. Stable species boundaries despite ten million years of hybridization in tropical eels. Nat. Commun. 11, 1433 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Rabosky, D. L. et al. Rates of speciation and morphological evolution are correlated across the largest vertebrate radiation. Nat. Commun. 4, 1958 (2013).
Article ADS PubMed Google Scholar
NCBI Bioproject https://www.ncbi.nlm.nih.gov/bioproject/PRJNA934055 (2023).
NCBI Genbank https://identifiers.org/ncbi/insdc.gca:GCA_029721435.1 (2023).
Huang, L. Repeats annotation. figshare https://doi.org/10.6084/m9.figshare.22357987.v1 (2023).
Huang, L. Gymnothorax reevesii gene annotation (gff, cds, pep). figshare https://doi.org/10.6084/m9.figshare.22358209.v1 (2023).
Park, E. H. & Kang, Y. S. Karyotype conservation and difference in DNA amount in Anguilloid fishes. Science 193, 64–66 (1976).
Article ADS CAS PubMed Google Scholar
Vasconcelos, A. J. M. & Molina, W. F. Cytogenetical studies in five Atlantic Anguilliformes fishes. Genet. Mol. Biol. 32, 83–90 (2009).
Article PubMed PubMed Central Google Scholar
Kai, W. et al. A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication. BMC Genomics 15, 233 (2014).
Article PubMed PubMed Central Google Scholar
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Article ADS CAS PubMed Google Scholar

Download references

Acknowledgements

This project was supported by Guangdong Modern Agricultural Industry Technology Innovation Team Special Project (no. 2019KJ140), National Key Research and Development Program of China (no. 2022YFE0139700), and Science and Technology Plan Project of Qingyuan City (no. 2022KJJH064).

Author information

These authors contributed equally: Kai Zhang, Yu Huang, Yuxuan Zhang, Rishen Liang.

Authors and Affiliations

College of Animal Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
Kai Zhang, Yuxuan Zhang, Rishen Liang, Qingqing Li, Yongnan Chen & Li Lin
Guangdong Provincial Water Environment and Aquatic Products Security Engineering Technology Research Center, Guangzhou, 510225, China
Kai Zhang, Yuxuan Zhang, Rishen Liang, Qingqing Li, Yongnan Chen & Li Lin
Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, Shenzhen, 518081, China
Yu Huang, Ruihan Li, Xiaomeng Zhao, Chao Bian & Qiong Shi
Laboratory of Aquatic Genomics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 518060, China
Yu Huang, Chao Bian & Qiong Shi
Agro-Tech Extension Center of Guangdong Province, Guangzhou, 510225, China
Jinhui Wu

Authors

Kai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rishen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Qingqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruihan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chao Bian
View author publications
You can also search for this author in PubMed Google Scholar
Yongnan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Li Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.L. and Q.S. conceived this project. K.Z., Y.H., Y.X.Z. and R.S.L. participated in data analysis. Y.X.Z., R.S.L., J.H.W., Q.Q.L. and Y.N.C. collected the samples. K.Z., Y.H. and C.B. assembled the genome. R.H.L. and X.M.Z. generated the annotation set. K.Z., Y.H., R.S.L., Y.X.Z. and L.L. wrote the manuscript. Q.S. and L.L. revised the manuscript. All authors have read and approved the final manuscript for publication.

Corresponding authors

Correspondence to Qiong Shi or Li Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, K., Huang, Y., Zhang, Y. et al. A chromosome-level reference genome assembly of the Reeve’s moray eel (Gymnothorax reevesii). Sci Data 10, 501 (2023). https://doi.org/10.1038/s41597-023-02394-7

Download citation

Received: 20 April 2023
Accepted: 17 July 2023
Published: 29 July 2023
DOI: https://doi.org/10.1038/s41597-023-02394-7

This article is cited by

Chromosome-level genome assembly and annotation of eel goby (Odontamblyopus rebecca)
- Zhenming Lü
- Ziwei Yu
- Yongxin Li
Scientific Data (2024)