NCBI's objective is to develop new information technologies to understand the fundamental molecular and genetic processes that control health and diseases. NCBI has been charged with creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics with utilization of databases and software employed by the research and medical community and performing research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules. NCBI also develops and promotes standards for databases, data deposition and exchange, and biological nomenclature.
Eurofins provides the service of submitting all kinds of data in NCBI.
NCBI submission types:
A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium of coordinating organizations. BioProjects aggregate pointers to data to provide users with an entry point into diverse data types. Each genome is part of a BioProject that describes the research effort, and is from a BioSample which presents details of the source of the DNA. Furthermore, each public genome is loaded into the Assembly database, where it is assigned an Assembly accession. When a genome is updated, the Assembly accession is incremented to the next version, but the BioProject and BioSample accessions remain the same.
BioSample databasecontains descriptions of biological source materials used in experimental assays. This database stores descriptive information, metadata, about the biological materials from which data stored in NCBI’s primary data archives are derived. BioSample records are indexed and searchable. The information derived from the database is important for providing context to the derived data so that it may be more fully understood that adds value; promotes re-use; and enables aggregation and integration of disparate data sets, ultimately facilitating novel insights and discoveries across a wide range of biological fields.
The Sequence Read Archive (SRA) stores sequence and quality data in aligned or unaligned formats from NextGen sequencing platforms. SRA accepts reads from high throughput sequencing instruments. Some submissions include sets of SRA reads as part of a comprehensive package.
Whole Genome Shotgun Submissions:
Whole Genome Shotgun (WGS) projects are genome assemblies of draft or incomplete genomes; chromosomes of prokaryotes or eukaryotes that are being sequenced by a whole genome shotgun strategy. The Whole Genome Shotgun (WGS) database fasta sequences. There are two formats for WGS submissions:
- Split format (standard WGS submission format) where the pieces of a WGS project are the contigs (overlapping reads with no gaps) and an optional AGP file is submitted to indicate how the wgs-sequences are assembled together into scaffolds or chromosomes.
- Gapped format is a new format the pieces of a WGS project are the scaffolds that contain runs of Ns that represent gaps. Here an AGP file or sequences that are simply concatenated and joined by Ns are not required.
Transcriptome Shotgun Assembly (TSA) is an archive of computationally assembled sequences from primary data such as ESTs, traces and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. TSA sequence records differ from EST and GenBank records because there are no physical counterparts to the assemblies.
- Submission of forms provided by eurofins genomics, completely filled
- Data: for SRA raw data files (fastq or fasta format), For WGS and TSA assembly files in fasta format.
- Submission of the data on NCBI
- Solving all the queries asked by NCBI