NCBI and its role in solving biological problems:
Started in 1988, in Maryland, Bethesda, USA.
Part of US National Library of Medicine ( NLM ) AND (NIH) National Institute of Health.
Sponsored by the US Government.
Act as a warehouse of data.
Resources Available Under NCBI
Nucleotide
Repository for nucleotide sequences.
GenBank
Comprehensive database of DNA sequences.
Proteins
Repository for protein sequences.
GenPept
Translations of GenBank sequences into proteins.
Gene
Detailed information on gene sequences and functions.
OMIM
Online Mendelian Inheritance in Man.
Database of human genes and genetic disorders.
PubMed
Literature database with ~36 million research papers.
Paid access for full articles.
BLAST
Basic Local Alignment Search Tool.
Finds sequence similarities.
SRA
Sequence Read Archive.
Repository for raw sequence data.
Books
Free access to a variety of books, distinct from research papers.
BioSample
Metadata for biological samples.
Sample protection trademark ensures exclusive research rights.
BioProject
Metadata for research projects.
Links related datasets and publications.
PMC
PubMed Central.
Free access to fewer research papers than PubMed.
PubChem
Compound: Database of chemical structures (SDF files).
Bioassay: Database of chemical bioactivity data.
Substance: Combined chemical information.
SNP
Single Nucleotide Polymorphism database.
Information on genetic variations linked to diseases and conditions.
Genome
Repository of complete genome sequences and annotations.
These resources provide comprehensive data and tools for research in molecular biology and related fields.
Organizational Structure of NCBI:
Computational Biology Branch (CBB)
Conducts research in computational and mathematical problems in molecular biology and genetics.
Focuses on genome analysis, sequence comparison, and predicting functions or structures.
Collaborates with biologists, chemists, mathematicians, and computer scientists from NIH and other organizations.
Information Engineering Branch (IEB)
Develops and manages NCBI’s software and data systems.
Ensures reliable access to NCBI databases and tools.
Works on improving data retrieval and computational resources.
Information Resources Branch (IRB)
Collects, organizes, and distributes biological data.
Maintains NCBI’s databases and information systems.
Supports data sharing and integration across various research projects.
The literature database available under NCBI:
PubMed
A large database of scientific research papers.
Contains around 36 millions articles.
You need to pay to access the full articles.
Hosts both research and review papers.
PubMed Central
A free database with fewer articles than PubMed.
Provides full text access to research papers at no cost.
Focuses on making scientific knowledge freely available.
Bookshelf
A collection of free books and documents on life science and healthcare.
Different from research papers, these are comprehensive books with in-depth information
MESH Database
Medical subject headings (MeSH) is a controlled vocabulary used for indexing articles.
Helps in finding articles on specific topics by using standardized terms.
NCBI Handbook
A guide to NCBI resources and tools.
Provides detailed information on how to use various databases and tools effectively.
Useful for researchers who need to investigate NCBI’s resources.
Sequence Alignment
Sequence alignment is the process of comparing sequences of DNA, RNA, or protein to identify regions of similarity. There are two main types of sequence alignment:
Pairwise Alignment
Description: Aligns a maximum of two sequences.
Method: Two sequences are written across a page in two rows. Identical or similar characters are highlighted in the same color, while non-identical characters are shown as mismatches or gaps.
Purpose: To compare two sequences directly and find the regions of similarity and difference.
Multiple Sequence Alignment (MSA)
Description: Aligns more than two sequences simultaneously.
Types:
Local Alignment:
Focuses on finding the best matching regions between sequences.
Alignment stops at the end of these similar regions.
Example: Comparing two protein sequences to find a common active site region without extending beyond the local similarity.
Global Alignment:
Extends across the entire length of the sequences.
Aims to align as many matching characters as possible from the beginning to the end.
Example: Comparing two full-length gene sequences to identify overall similarity.
Significance of Sequence Alignment
Functional Insights: Sequences that align well often have similar functions. For example, two proteins with high sequence similarity likely perform similar roles in the cell.
Structural Information: In proteins, similar sequences usually imply similar three-dimensional structures.
Evolutionary Relationships: Sequences that are similar may have originated from a common ancestor. This helps in understanding evolutionary connections and defining homologous sequences.
Optimal Alignment: Achieving the best possible alignment is crucial for accurate functional, structural, and evolutionary insights. Optimal alignments reveal the most meaningful biological relationships between sequences.
Sequence alignment is a fundamental tool in bioinformatics, essential for understanding the biological significance of genetic information.