Protein sequence databases are essential resources in bioinformatics and molecular biology research, providing valuable information about the sequences, functions, and structures of proteins.
Here's a brief overview of three major protein sequence databases:
SWISS-PROT:
- SWISS-PROT, now part of the UniProt Knowledgebase (UniProtKB), is a curated protein sequence database maintained by the Swiss Institute of Bioinformatics (SIB).
- It contains manually annotated and reviewed protein sequences, with a focus on providing accurate and comprehensive information about protein function, structure, and interactions.
- SWISS-PROT entries are extensively cross-referenced with other databases, such as PDB (Protein Data Bank) for structural information and PubMed for literature references.
- Each entry in SWISS-PROT includes information about protein names, functions, domains, post-translational modifications, subcellular locations, and biological pathways.
Protein Information Resource (PIR):
- PIR is a comprehensive protein sequence database developed by the National Biomedical Research Foundation (NBRF) and integrated into the UniProt Knowledgebase (UniProtKB).
- It contains a large collection of protein sequences derived from various sources, including translations of nucleotide sequences from GenBank and other databases.
- PIR provides annotations for protein sequences, including information about protein names, functions, domains, and evolutionary relationships.
- The database is manually curated to ensure the accuracy and quality of its annotations, similar to SWISS-PROT.
GenPept:
- GenPept is a component of the NCBI (National Center for Biotechnology Information) protein database, which is part of the larger Entrez system.
- It contains automatically generated protein sequences derived from the translation of nucleotide sequences submitted to GenBank, the primary nucleotide sequence database.
- GenPept entries are not manually curated like SWISS-PROT and PIR, but they provide a vast collection of protein sequences from a wide range of organisms.
- The database includes basic information about protein sequences, such as accession numbers, sequence lengths, and organism sources, but may lack detailed annotations found in curated databases.
In summary, SWISS-PROT, PIR, and GenPept are important protein sequence databases that serve different purposes, ranging from manually curated annotations to large-scale collections of automatically generated sequences. Researchers use these databases to access protein sequence information for various applications, including functional annotation, evolutionary analysis, and structure prediction