Databases and Tools

Bridging computational techniques with biological insights to advance scientific discovery.

SASTRA's In-House Databases & Tools

Name of the Database Description

The SWI/SNF Infobase offers comprehensive resources including Browse, Taxonomy browser, Search, Blast, and Download functionalities to explore BAF and PBAF sub-complexes across various organisms. Users can access detailed subunit information, search by multiple IDs or subunit names, and navigate gene or transcript data through the Taxonomy Browser. The BLAST feature allows protein sequence searches against the database and PDB, identifying related homologues. Additionally, gene and transcript lists for specific organisms are available for download, enhancing research and study of SWI/SNF complexes.
Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by comparing input motif with full length sequence. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface.
The WBSDB database is a comprehensive resource dedicated to understanding Wuchereria bancrofti, the primary cause of lymphatic filariasis (LF), which leads to severe health conditions. It focuses on the disease's impact, life cycle, transmission vectors, and the global prevalence of W. bancrofti. The database aims to support research and awareness on LF by offering detailed insights into the nematode's structure, infection mechanisms, and the socio-economic implications of the disease, emphasizing the crucial need for effective control and prevention strategies.
The APMP database catalogs medicinal plants along with their structural and annotated data concerning active constituents and their pharmacological, toxicological, and biochemical activities. It features a specialized search engine and relational database, with each plant entry detailed with information from online databases, including an APMP identifier, plant name, habitat, drug action, constituents, and dosage levels. Active principles are annotated with data like Pubchem identifiers, compound names, molecular weight, and molecular descriptors, aiming to bridge traditional knowledge with modern pharmacological research.
PMDB serves as an extensive repository of plant metabolites, featuring over 1000 metabolites with structural and functional annotations. Entries offer details on metabolite names, common names, descriptions, SMILES notations, molecular formulas, file formats (SDF, MOL, PDB), and 2D structures. It includes tools for metabolite sketching and manipulation, linking to external databases (KEGG, PUBCHEM, CHEBI) for expanded information. PMDB aims to centralize data on small molecules in plants, enhancing accessibility for researchers with tools for visualizing and analyzing metabolite structures.

Databases and Access Links

Name of the Database Description
NCBI is a source for public biomedical databases and has software tools for analyzing molecular and genomic data.
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.
The DNA Data Bank of Japan (DDBJ) provides freely available nucleotide sequence data.
ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
TreeFam (Tree families database) is a database of phylogenetic trees of animal genes. It provides information about ortholog and paralog assignments and evolutionary history of gene families.
The Protein Data Bank (PDB) is a crystallographic database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy.The PDB is a key resource in areas of structural biology, such as structural genomics.
The CATH database provides hierarchical classification of protein domains based on their folding patterns. Domains are obtained from protein structures deposited in the Protein Data Bank and both domain identification and subsequent classification use manual as well as automated procedures.
The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins.The shapes of domains are called "folds" in SCOP. Domains belonging to the same fold have the same major secondary structures in the same arrangement with the same topological connections
Expasy is an integrated database containing information about genomes, proteomes, evolution and structural biology.
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances.
PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them.
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them.
Ensembl is a bioinformatics project to organize biological information around the sequences of large genomes.
InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterize them.
STRING is a database of known and predicted protein-protein interactions.
TrEMBL is a computer-annotated protein sequence database supplementing the SWISS-PROT Protein Sequence Data Bank.

Tools and Access Links

Name of the Tool Description
BLAST finds regions of similarity between biological sequences.
Clustal is a series of widely used computer programs used in Bioinformatics for multiple sequence alignment.In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins.
SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of protein 3D structures.
SAVES tools consist of many protein varification tools. The input structure should be in pdb format.
EMBOSS is the European Molecular Biology Open Software Suite. EMBOSS programs contains collection of tools for sequence analysis.
MUSCLE stands for MUltiple Sequence Comparison by Log Expectation. It is used for multiple sequence alignment of protein and nucleotide sequences.
JPred is a Protein Secondary Structure Prediction server and has been in operation since approximately 1998.
The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery.
PyMOL is a molecular visualization system used to view and ivestigate atoms and molecules.
Avagadro is a molecular editor sued in computational chemistry, molecular modelling to inverstigate and alter molecules and atoms.
GALAXY is a next generation Scientific workflow, data integration and data analysis publishing platform which aims to make computational biology accessible to research Scientists
Gromacs is a molecular dynamic package designed for simulations of protein, lipids and nucleic acid.
Crystallographic Object-Orinted Toolkit(COOT) used to display and manipulate atomic models of macromolecules like proteins and nucleic acids using 3D computer graphics.