MalaCards is an integrated searchable database of human maladies and their annotations, modeled on the architecture and richness of the popular GeneCards database of human genes.
What's in a MalaCard?
This page provides information about the various MalaCards sections and tables.
MalaCards Disease List
An offline process is responsible for generating the comprehensive integrated list of diseases by mining heterogeneous, partially overlapping sources (see below for list of sources), unifying names and acronyms, and organizing characterizations.
Disease name unification is effected by transforming each name to a canonical form. The canonical form is constructed by lowercasing, lexical sorting of words, removing special characters and common words like disease, syndrome, as well as merging equivalent words (like juvenile and childhood). This canonical form is then hashed and used for comparison against transformed new names.
For each malady a unique symbol is generated, composed of the first letter of its name, followed by the next two consonants, followed by a serial number. For example, the symbol generated for rett syndrome is RTT001.
This section provides the malady name, symbol, MIFTS score and acronym (where available).
A stats bar provides some statistics related to the information shown in the card.
Between the disease name and the stats bar the MalaCards category association of the disease is displayed, with a link to the Aliases & Classifications section where all disease categories are displayed, .
MalaCards employs four different annotation schemes, as follows:
- Source mining- Mining data sources for disease-specific information is used to populate relevant sections of a MalaCard. To this end we define two types of sources. Primary sources are those which are used to derive both disease names and annotations. Secondary sources are those from which only annotations are derived. These sources generally contain non-disease terms intermixed with disease information. Direct source mining provides information for the Aliases & descriptions, summaries, clinical features, drugs and therapeutics, genetic tests and anatomical context sections. When appropriate, in-house analysis is performed, in order to link annotations to diseases, or to integrate and display disease specific data. For example, we have developed a process that utilizes UMLS concepts to map diseases to drugs used for its treatment (see 'MalaCards sections').
- GeneCards search- One central annotation source for MalaCards is an automated use of the GeneCards search engine, including section-specific advanced searches. For example, the affiliated gene set with a disease is obtained by using the disease name as a search string, which allows the generation of the related genes section in MalaCards. Importantly, gene association does not imply causality between the gene and the disease. Associations sometimes include annotation like 'unaffected', and this can be verified using the 'GeneCards section context' link. Similarly, the publications associated with a disease are obtained via a search for its name in all of the publication titles within GeneCards.
- GeneAnalytics set analysis- Malacards implements a strategy in which gene-disease relationships within GeneCards are used to create disease-specific content. For this, we leverage GeneCards' GeneAnalytics tool. The disease-associated gene set (generated as described above) is forwarded to GeneAnalytics, which distills statistically significant descriptors enriched in this set. For example, in the 'Atherosclerosis' MalaCard, 'cardiovascular system' is thus entered into the phenotypes section, while 'apoptosis' into the pathways section. This process also assigns a relevance score for every hit, and is employed to populate the related diseases, phenotypes, pathways, compounds and GO terms sections. In these sections, the relevant tables display the top affiliating genes, linked to their respective contexts within GeneCards.
- MalaCards search- We use MalaCards searches to populate additional sections, including elucidating new relations amongst diseases in the related diseases section and associating tissues in the anatomical context section.
MalaCards InFormaTion Score (MIFTS). Assigned to each disease by summing the base 10 logarithms of the counts of its populated annotations. MIFTS defines the richness of information in each card. This score currently ranges from 1 to 101, with the MalaCard scored 101 being the most annotated card.
MalaCards search relevance score (MSRS). Obtained by the Solr based MalaCards search engine, as described in MalaCards search guide.
- MalaCards composite relevance score (MCRS). Assigned to descriptors provided by the GeneAnalytics set analyses mechanism
The score is defined as:
where:SGD is the rank of the GeneAnalytics score, which orders descriptors first by their GeneAnalytics p-value, and then by the size of the group of genes associated with the descriptor. SLR(i) is the Solr search engine score's rank of a gene shared between the descriptor and the disease. Ns is the number of data sources supporting the descriptor. Thus, the score takes into account the hit importance in GeneCards, as well as the significance of the specific attribute according to GeneAnalytics, as well as the number of supporting sources.
MalaCards composite related genes score (MCRGS). This score originates from Solr based GeneCards search engine score, obtained by querying the disease in GeneCards. This part of the score takes into account the number of hits, and the importance of the fields in which they were found. Next, a composite score is generated by taking into account genes that are associated with a genetic test for the disease, genes that are associated with a known causative variation with respect to the disease, as well as genes that were manually mapped to the disease by other sources.
MalaCards composite related diseases score (MCRDS). Assigned to entries in the related diseases section. It is computed as the sum of 1) MalaCards composite relevance score and 2) MalaCards search relevance score. Prior to this, each of these two score values is normalized by equating the means as well as the standard deviations for the two distributions across all of MalaCards. A bonus amounting to the average of the two scores is added to diseases coming from both GeneAnalytics set analysis and MalaCards search.
In order to start adding structure and hierarchy into MalaCards, disease types were clustered into families. ~3000 diseases are currently distributed amongst ~700 families. This grouping was done on a lexical basis, looking for names having the same base but distinguished by their type. For example, a search for 'Alzheimer' retrieves over 1500 MalaCards ranked by a score. By perusing the list, we observe that the top 19 hits actually describe 2 types of Alzheimer disease, the general type and the familial type. The disease in each 'family' having the highest MIFTS score is designated to be the 'parent'. In the case of a tie, priority is given to the disease associated with the highest number of sources, and then to the one with the shortest name. The parent/child attribute is denoted by 'P' or 'c' respectively and appears in search results, as well as at the top of the related diseases section of relevant cards.
A user can download the card data to a parsable excel sheet using the 'Download this MalaCard' button on the stats bar within the header. This download requires registration to GeneCards Plus. Data for scientific collaborations can also be requested by filling out the Feedback Form .
"Fully expand this MalaCard" button is found on the right hand side of the stats bar. This buttons fully expands all lists in all sections to allow convinient searches within the card.
This section displays descriptions of a disease, as extracted from a subset of the sources listed below, as well as a MalaCards unique summary describing the card content. Other summaries typically include a short definition of the disease, organs involved, etiology and main symptoms.
Aliases & Classifications
This section includes the following subsections:
- Aliases & Descriptions - displays synonyms and aliases for the relevant MalaCards malady, as extracted from a subset of the sources listed below. Strongly similar aliases, even if trivially different, are included, to match common expectations and to facilitate searches. The disease name appears first, with its own associated source-indicating superscripts. The alias list is sorted first by the count of contributing sources, sub-sorted by descending length. The main malady name is shown in bold.
- Characteristics - taken from HPO and Orphanet, including for each mapped corresponding disease the mortality, age of onset, age of death and mode of inheritance, where available.
- Classifications - displays a unified classifications tree for the disease, united from different mapping sources. MalaCards classifications are generated by mapping to accepted classification sources (e.g. DO, Orphanet, ICD10 etc) as well as by mining specialized keywords in disease name and descriptions. Every disease in the disease family gets all the categories of all other members in its family (i.e. from the categories of the parent).
- External Ids - If available, displayed also are external ids, which are cross references to IDs of external databases/ontologies. The external IDs are searchable.
The top of the section displays the disease family classification, if available.
Related diseases are obtained in two ways: First, by GeneAnalytics set analysis, whereby other diseases computed to have significant shared descriptors for the target disease's affiliated genes are collected. Second, as matched by MalaCards searches. All obtained related diseases are sorted by the MalaCards composite relevance score.
Network images are generated using the gephi toolkit. Images are generated for the top 20 scored related diseases. Each related disease is a node, while edges represent a connection to the target disease, as well as the top 20 interconnections between the related diseases themselves, where available. Images are not generated for diseases having fewer than 5 connections. Edges thickness is proportional to the connection strength. The links from the current MalaCard's disease are shown with red filled circle, and edges colored in red.
Another form of disease-disease connection shown in the Related Diseases section is the comorbidity. A set of disease comorbidity relationships (with P <0.01) was obtained from the Phenotypic Disease Network (PDN). These diseases are identified with ICD9 codes in the PDN. The ICD9 codes were used to identify the corresponding UMLS concepts that were checked against the list of UMLS concepts mapped to maladies. A total of 4989 relationships modeling comorbidity of unique 741 MalaCards diseases were mapped.
Provides information and links about symptoms and other clinical attributes of the disease, extracted from a subset of the sources listed below. Symptoms typically represent changes from normal function, sensation, or appearance, but may also be other MalaCards maladies with their own cards.
The Human Phenotype Ontology (HPO) table shows phenotype description, frequency with percent of patients in brackets and HPO source accession.
An icon of "PS" circled next to the symptom marks a "Pathognomonic sign", used to diagnose the disease.
Drugs & Therapeutics
This section provides information regarding:
- FDA approved drugs - manually curated data from various sources including FDA and NCI. For each drug multiple data elements are displayed where available. In the main table the data includes the drug name, active Ingredient(s), pharmaceutical company and approval date. Hitting the "+" sign opens a pop-in which includes more data on the drug, including the FDA label, maladies treated, indication and usage, drug target(s) from DrugBank and mechanism of action.
- Drugs for the disease (from DrugBank, HMDB, Dgidb, PharmGKB, IUPHAR, NovoSeek, BitterDB) - diseases and drugs are mapped to interventional clinical trials in ClinicalTrials.gov, and drugs sharing a clinical trial with the disease are shown. Hitting the "+" sign opens a pop-in which includes the drug synonyms. Drug meta data is drawn from the specified sources using GeneCards integrated drugs database. The table includes the drug name, the drug approval status, the drug phases from the different trials it is involved in, a link to a search in ClinicalTrials.gov for both the disease and the drug, Cas Number and PubChem Id.
- Interventional clinical trials - the subsection displays all clinical trial records from ClinicalTrials.gov mapped to the specific disease. The table includes the title of the clinical trial, the status of the clinical trial, the ID of the trial, linked to ClinicalTrials.gov and its phase.
- Inferred drug relations via UMLS/NDF-RT- Combined information from the Unified Medical Language System (UMLS) and the National Drug File-Reference Terminology (NDF-RT). Initially, a MalaCards name is mapped to a UMLS concept representing a disease by utilizing the MetaMap system. Subsequently, the NDF-RT terminology within UMLS is used to provide a link of such disease concepts to drug(s) via the 'may be treated by' relationship. This work was done in collaboration with C. Paul Morrey.
- Cell-based therapeutics approaches from Lifemap Discovery:
- Stem-cell-based therapeutic approaches
- Embryonic/adult cultured cells which are candidate therapeutic approaches
- Cochrane evidence based reviews: A link to Mesh lookup in Cochrane library of evidence based medicine to display hits in a collection of six databases that contain different types of high-quality, independent evidence to inform healthcare decision-making, and a seventh database that provides information about Cochrane groups.
This section provides descriptions of genetic testing, specialized cytogenetic testing, and biochemical testing for inherited disorders. Genetic tests are extracted from a subset of the sources listed below.
This section provides descriptions on cells, compartments, and organs relevant to the disease. Anatomical context data is extracted from a subset of the sources listed below. The MalaCards organs/tissues related to the disease are obtained by using the malacards search mechanism on a set of predefined tissues. Foundational Model of Anatomy (FMA) ontology data interconnections are extracted via the Disease Ontology.
This section provides mouse orthologs phenotypes which are obtained by being contextually related to the key disease using the GeneAnalytics mechanism described above, applied to the set of affiliated genes. Phenotypes are scored according to their relevance (see above ).
This section provides publications associated with the disease, currently obtained by searching all of the publications in the GeneCards database. For each publication, the title and link to the PubMed article is supplied.
The articles are ranked, first according to the number of sources that associate the article with the disease-related genes, then by date of publication, and then according to the individual source scores for article/gene relationships.
This section provides the list of the affiliated genes found to be associated with the key disease. Initially, at least 10 affiliated genes are shown, (all of the elite genes are always shown), with an option to see the complete list. The genes list is composed by taking into account: 1. The GeneCards search mechanism. 2. Genetic testing resources supplying specific genetic tests for the disease 3. Genetic variations resources supplying specific causative variations in genes for the disease. 4. Resources that manually curate the association of the disease with genes. A prioritizing algorithm is applied to generate the genes list. The table shows gene symbols, descriptions, relevance scores, and the context according to which the gene is related to the disease. GeneCards section context is in fact the section context of the search hit. The relevance score is computed by factoring the importance of the different resources associating the gene with the disease (see above ).
MalaCards "elite" genes (marked with *) are those likely to be associated with causing the disease, since their gene-disease associations are supported by manually curated and trustworthy sources.
The cancer Gene Census list from COSMIC is an ongoing effort to catalogue those genes for which mutations have been causally implicated in cancer. Genes listed in the cancer census gene list are marked with a CC icon.
This section displays known causative variaitons for the disease extracted from a subset of the sources listed below.
If available, the table of UniProtKB/Swiss-Prot genetic disease variations shows the gene symbol, AA change, link to the variation details and source, and SNP Id where available.
If available, the table of Clinvar genetic disease variations shows the gene symbol, variation name linked to Clinvar, variation type, significance, SNP Id where available, genome assembly and location.
Catalogue of somatic mutations in cancer (COSMIC)
The COSMIC disease classification defined by the terms of Primary site, Site subtype, Primary histology and Histology subtype.
The matching of COSMIC classification terms to the MC diseases was performed by searching the COSMIC classification terms in MC sections: Genes, Aliases and Classifications and Summaries and excluding non-specific terms like "mixed", "NS" etc.
The matching confidence was set from 4 to 1, and was calculated in the following manner:
- 4 - all terms of a specific COSMIC classification present in the MC disease title
- 3 - all terms of a specific COSMIC classification present in the MC disease card (but not title)
- 2 - one term out of 4 or 3 specific COSMIC classification terms was omitted and all the remaining terms present in the MC disease title
- 1 - one term out of 4 or 3 specific COSMIC classification terms was omitted and all the remaining terms present in the MC disease card (but not title)
The variation score shown is a summation of the number of hits of each of the tags.
If available, the table of CNVD Copy Number Variation in Disease lists the variations related to this diseases. The table shows the CNVD ID, Chromosom, Start, End, Type, Gene Symbol and mapped CNVD Disease.
LifeMap gene expression information is currently available for over 100 human diseases. The data includes the most differentially expressed genes (p-value threshold of 0.05, corrected for multiple testing) in the diseased tissue vs. its matched normal tissue and is derived from one of the following sources:
- Experiments published in the gene expression omnibus (GEO) and/or scientific literature.
- Large scale datasets: gene expression information from online public large scale datasets, such as MGI or Barcode
The experiments are described in details in LifeMap Discovery
The table shows the gene symbol (linked to GeneCards), gene description, tested tissue (linked to LifeMap Discovery), up or down regulation indicator ( + meaning up, - meaning down), relative fold change (log2) and the respective P value. Expression data is extracted from scientific literature, high throughput experiments and public large scale datasets.
Below the expression table, a link for search in the Gene Expression Omnibus (GEO) for disease gene expression data relevant to the disease is displayed.
This section provides pathways related to the disease, obtained by being contextually related to the key disease using the GeneAnalytics mechanism described above. applied to the set of affiliated genes. The pathways are extracted from a subset of the sources listed below. Entries are scored according to their relevance (see above ). The table displays the super pathways, with their member pathways indented.
This section provides relationships between MalaCards diseases and chemical compounds, obtained by being contextually related to the key disease using the GeneAnalytics mechanism described above, applied to the set of affiliated genes. Drugs and Compounds are extracted from a subset of the sources listed below. Entries are scored according to their relevance (see above ).
This section provides cellular component ontologies, biological process ontologies and Molecular function ontologies enriched in the set of genes affiliated with the disease. The table displays the name of the relevant ontology, the GO ID, which is the identifier used by GO and linked to the GO entry, and the genes related to the disease as well as to the specific ontology using the GeneAnalytics mechanism described above. The entries are scored according to their relevance (see above ).
This section provides links to all of the following MalaCards sources, including those obtained via GeneCards:
- BitterDB - BitterDB currently holds over 570 bitter compounds obtained from the literature and from Merck index and their associated 25 human bitter taste receptors (hT2Rs).
- CDC - The Centers for Disease Control and Prevention contains information about various diseases and health conditions.
- Cell Signaling Technology - CST (Cell Signaling Technology) provides discovery tools for cell signaling research, including information about pathways and phosphorylation sites.
- ClinicalTrials - ClinicalTrials.gov is a registry and results database of federally and privately supported clinical trials conducted in the United States and around the world.
- Clinvar - ClinVar archives and aggregates information about relationships among variation and human health
- CNVD - CNVD (Copy Number Variation in Disease) is a systematic and comprehensive database for copy number variations and related diseases, in which all the records were manually extracted from experimental data published in CNV-related articles. Hence, CNVD database is a reliable and comprehensive resource for studying diseases associated copy number variations.
- Cochrane Library - The Cochrane Library (ISSN 1465-1858) is a collection of six databases that contain different types of high-quality, independent evidence to inform healthcare decision-making, and a seventh database that provides information about Cochrane groups. Cochrane's mission is to promote evidence-informed health decision-making by producing high-quality, relevant, accessible systematic reviews and other synthesized research evidence.
- Cosmic - Catalogue of somatic mutations in cancer
- dbSNP - The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI).
- DGIdb - The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development.
- Disease Ontology - Disease Ontology provides a hierarchical open source ontology for the integration of biomedical data that is associated with human disease.
- diseasecard - Diseasecard is an information retrieval tool for accessing and integrating genetic and medical information for health applications.
- DISEASES - The University of Copenhagen DISEASES database provides disease-gene associations mined from literature.
- DrugBank - The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information.
- ExPASy - ExPASy is the SIB Bioinformatics Resource Portal which provides access to scientific databases and software tools (i.e., resources) in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc.
- FDA - The Food and Drug Administration (FDA or USFDA) is a federal agency of the United States Department of Health and Human Services. The FDA is responsible for protecting and promoting public health through the regulation and supervision of food safety, tobacco products, dietary supplements, prescription and over-the-counter pharmaceutical drugs (medications), vaccines, biopharmaceuticals, blood transfusions, medical devices, electromagnetic radiation emitting devices (ERED), cosmetics, animal foods & feed and veterinary products.
- FMA - The Foundational Model of Anatomy Ontology (FMA) is an evolving computer-based knowledge source for biomedical informatics; it is concerned with the representation of classes or types and relationships necessary for the symbolic representation of the phenotypic structure of the human body.
- Gene Expression Omnibus DataSets - Gene Expression Omnibus (GEO) is a database repository of high throughput gene expression data and hybridization arrays, chips, microarrays.
- Gene Ontology - The Gene Ontology, is a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.
- GeneAnalytics - GeneAnalytics enables researchers to identify tissues and cell types related to their gene sets, to characterize tissue samples and cultured cells and assess their purity and explore their selective markers.
- GeneCards - GeneCards provides gene-centric information, automatically mined and integrated from a myriad of data sources, resulting in a web-based card for each of the tens of thousands of human gene entries.
- GeneReviews - GeneReviews are expert-authored, peer-reviewed, current disease descriptions that apply genetic testing to the diagnosis, management, and genetic counseling of patients and families with specific inherited conditions.
- GeneTests - GeneTests is a clinical information resource relating genetic testing to the diagnosis, management, and genetic counseling of individuals and families with specific inherited disorders.
- Genetics Home Reference - Genetics Home Reference provides consumer-friendly information about the effects of genetic variations on human health.
- GTR - The Genetic Testing Registry (GTR) provides a central location for voluntary submission of genetic test information by providers. The scope includes the test's purpose, methodology, validity, evidence of the test's usefulness, and laboratory contacts and credentials.
- HGMD - Database of human gene mutation data. Features publications, newly added genes, and locus specific databases.
- HMDB - HMDB is an electronic database containing detailed information about small molecule metabolites found in the human body.
- ICD10 - A system of categories to which morbid entities are assigned according to established criteria. The purpose of the ICD is to permit the systematic recording analysis, interpretation and comparison of mortality and morbidity data collected in different countries or areas and at different times.
- ICD10 via Orphanet - mapping between diseases and ICD10, as provided by Orphanet.
- ICD9CM - The International Classification of Diseases, 9th Revision, Clinical Modification. ICD-9-CM is the official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States.
- IUPHAR - The international Union of Basic and Clinical Pharmacology is a detailed, peer-reviewed pharmacological, functional and pathophysiological information on human, mouse and rat G Protein-Coupled Receptors, Voltage- and Ligand-Gated Ion Channels, Nuclear Hormone Receptors and selected Enzymes, including all Protein Kinases.
- KEGG - Kyoto Encyclopedia of Genes and Genomes (KEGG) provides pathway information.
- LifeMap Discovery® - LifeMap Discovery® is a state-of-the-art platform for embryonic development and stem cell biology research.
- MalaCards - MalaCards is an integrated database of human maladies and their annotations, modeled on the architecture and richness of the popular GeneCards database of human genes. MalaCards is searched for annotations to find for example related diseases (that mention each other in their cards), and related organs.
- MedGen - Organizes information related to human medical genetics, such as attributes of conditions with a genetic contribution.
- MedlinePlus - Medlineplus contains health information from the National Library of Medicine.
- MeSH - MeSH is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity.
- MESH via Orphanet - mapping between diseases and MESH, as provided by Orphanet.
- MGI - MGI (Mouse Genome Informatics, formerly MGD) provides a comprehensive source of information on the experimental genetics of the laboratory mouse; it includes information on mouse markers, mammalian homologies, probes and clones. GeneCards presents links to mammalian homology pages, the name of the mouse gene, its location (in centiMorgan), phenotypic alleles, and links to the entries for the mouse gene.
- NCBI BioSystems Database - The NCBI BioSystems Database provides integrated access to biological systems and their component genes, proteins, and small molecules, as well as literature describing those biosystems and other related data throughout Entrez.
- NCBI Bookshelf - NCBI Bookshelf contains a collection of biomedical textbooks.
- NCI - The National Cancer Institute (NCI) is the federal government's principal agency for cancer research and training.
- NCIt - NCI Thesaurus (NCIt) provides reference terminology for many NCI and other systems. It covers vocabulary for clinical care, translational and basic research, and public information and administrative activities.
- NDF-RT - NDF-RT organizes the drug list into a formal representation. NDF-RT is used for modeling drug characteristics including ingredients, chemical structure, dose form, physiologic effect, mechanism of action, pharmacokinetics, and related diseases.
- NIH Clinical Center - The center for clinical research of the NIH includes general and patient information and organization resources.
- NIH Rare Diseases - The Office of Rare Diseases Research (ORDR) at the National Institutes of Health (NIH) coordinates research and information on rare diseases.
- NINDS - The National Institute of Neurological Disorders and Stroke (NINDS) conducts and supports research on brain and nervous system disorders.
- Novoseek - Novoseek extracted knowledge from biological databases and text repositories, enabling users to uncover the knowledge hidden within these data sources. The relevance scores of elements related to genes (chemical substances and diseases) are based on the analysis of co-occurrences of two elements in Medline documents. The observed number of documents where both elements appear together and the number of documents where both appear independently are compared to an expected value based on a hypergeometric distribution. The Novoseek project is no long accessible on the web, and is available upon request. MalaCards Novoseek data is based on GeneCards data from Novoseek from 2011.
- Novus Biologicals - The mission of Novus Biologicals, LLC is to accelerate scientific discovery by developing and marketing unique products for the lifesciences.
- OMIM - OMIM (Online Mendelian Inheritance in Man) is a catalog of human genes and genetic disorders with a lot of information about many different aspects (medical and genetic). GeneCards presents a list of diseases listed as allelic variants in the respective entry for the gene, synonyms for the gene, and a link to the OMIM database entry.
- OMIM via Orphanet - mapping between diseases and OMIM, as provided by Orphanet.
- Orphanet - Orphanet is the reference portal for information on rare diseases and orphan drugs, for all audiences. Orphanet’s aim is to help improve the diagnosis, care and treatment of patients with rare diseases.
- PathCards - PathCards is an integrated database of human pathways and their annotations.
- PharmGKB - PharmGKB is an integrated resource about how variation in human genes leads to variation in our response to drugs.
- PubMed - PubMed comprises more than 22 million citations for biomedical literature from MEDLINE, life science journals, and online books.
- QIAGEN - the leading provider of sample and assay technologies.
- R&D Systems - A company providing antibodies, assays, kits and additional products and services.
- Reactome - Reactome provides curated knowledgebase of biological pathways in humans.
- SinoBiological - Sino Biological Inc is quickly becoming the leading global biological solution specialist, offering a comprehensive set of value-added and cost-effective premium quality solutions (services and reagents) to accelerate life science research and biological product development worldwide.
- SNOMED-CT - SNOMED Clinical Terms (SNOMED CT) is the most comprehensive, multilingual clinical healthcare terminology in the world.
- SNOMED-CT via Orphanet - mapping between diseases and SNOMED-CT, as provided by Orphanet.
- The Human Phenotype Ontology - The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease.
- Thomson Reuters - GeneGo is a data mining & analysis solutions in systems biology
- Tocris Bioscience - Tocris Bioscience is a leading supplier of high performance life science reagents, peptides and antibodies, with customers in virtually all of the world's major pharmaceutical companies, universities and research institutes.
- Tumor Gene Family of Databases - contains information about genes that are targets for cancer-causing mutations.
- UMLS - The UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including electronic health records.
- UMLS via Orphanet - mapping between diseases and UMLS, as provided by Orphanet.
- UniProtKB/Swiss-Prot - UniProtKB/Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB). It is a high quality annotated and non-redundant protein sequence database, which brings together experimental results, computed features and scientific conclusions.
- Wikipedia - Wikipedia is a free encyclopedia built collaboratively.
- Navarro G (2001). "A guided tour to approximate string matching". ACM Computing Surveys 33 (1): 31–88. DOI:10.1145/375360.375365