Transcript Document

Cheminformatics &
Pharmainformatics
In this presentation……
Part 1 – Molecular Conventions
Part 2 – Resources
Part 3 – Drug Design
Part 4 – Drug Development
Part
1
Molecular Conventions
Cheminformatics
• It is a combination of chemistry and
information technology, is required for the
processing and analysis of chemical data
• Cheminformatics is relevant to biologists
because chemistry data are important in
many areas of molecular biology, e.g, in the
study of protein interactions and metabolism
Molecular formulae
• Molecules can be represented by simple
formulae, which give the number and type of
atoms
• However, this does not show how they are
connected
• Structural formulae provide some information
about the arrangement of atoms in a molecule
and thus allow isomers to be distinguished
Structural representation of
ethane that show tetrahedral
distribution of coordinated
groups about saturated carbon
atoms. Panels (a) and (b) show
two extreme conformations. The
energetically favourable
conformation (a), which
predominates in nature, has H
atoms on opposite sides of C-C
bond as far as possible from
each other (in the staggered
configuration). The less
favourable conformation (b) has
atoms in eclipsed configuration.
Panels (c) and (d) show
conformations viewed from the
end of molecule
H
H
1
2
H
H
H H
H
H
H
H
H
H
(b)
(a)
H
H
1
H
2
H
H
H
H
(c)
H
H
(d)
Structural formulae and full and simplified structural
diagrams for some common organic compounds
Name
Formula
Full structure
Methane
CH4
H
|
H–C–H
|
H
Ethane
C 2 H6
H
|
H–C–
|
H
Ethene (ethylene)
C2 H4
H
H
|
|
H–C = C–H
H
|
C–H
|
H
Simplified structure
Structural formulae and full and simplified structural
diagrams for some common organic compounds
Name
Formula
Cyclohexane
C6H12
Ethanol
C2H5OH
Ethenal
(acetaldehyde)
CH3CHO
Full structure
H
|
H–C–
|
H
H
|
C–O
|
H
H H
|
|
H–C–C=O
|
H
Simplified
structure
OH
O
Structural diagrams
• Molecules can be represented using simple graphs,
which show atoms as nodes and bonds as links
• For organic molecules, further simplification is
achieved by assuming that carbon atoms make up
the molecular backbone and that the valency of
four is satisfied by hydrogen atoms unless
otherwise shown
• Such diagrams present all molecules as planar
shapes an do not indicate the spatial distribution of
atoms in 3D
Chirality
• If four different groups are coordinated around
a central carbon atom, the molecule is
described as chiral
• Chiral molecules exist in two conformations,
enantiomers, which are mirror-images of each
other
• Although enanciomers have the same chemical
properties, many enzymes and other proteins
show chiral sensitivity, which is important in
drug development and related fields
Multi-chiral configuration
• Molecules may contain any number of
chiral centers and a series of forms, called
distereoisomers, may exist
• These may have different chemical
properties because of the way different
groups interact within the molecule
DL and RS conventions
• The absolute configuration of groups around a chiral
carbon atom can be described using a number of
conventions
• In the DL system, molecules are named D or L
according to whether the coordinated groups are
arranged in a similar fashion to those in Dglyceraldehyde or L-alanine
• In the RS system, molecules are named R (rectus)
and S (sinister) according to the size of chemical
groups surrounding the carbon atom
Representation of a tetrahedrally coordinated saturated carbon
atom in an organic molecule
(a) the carbon atom is at the centre of a tetrahedron with four
coordinated groups
(b) simplified representation with the central carbon removed
(c) Representation of the tetrahedron as a flat image
1
1
1
3
C
4
2
4
2
2
3
3
4
(a)
(b)
(c)
Chirality representation
(a) The structural formula of glyceraldehyde gives no
CH2OHCHOHCHO
indication of its chirality
CHO
H
D
CHO
CHO
OH
CH2OH
OH
L
H
CH2OH
OH
CHO
OH
CH2OH
CH2OH
D
L
(b) if the molecule is represented as a
(c) these can be shown as
tetrahedron, the D and L enantiomers can
be distinguished
2D graphs using the Fischer
convention
Part
2
Resources
SMILES
• SMILES is a system for representing
chemical formulae as strings, based on a
valence model in which all valencies are
considered to be satisfied by hydrogen atoms
unless otherwise shown
• The system has conventions for representing
different bond types, cyclic molecules,
branches, cis/trans isomers and chirality
RasMol and Chime
• There are several specialized data formats
for chemical structures based on the
principle of a molecular formula and
associated table of connections
• Viewing utilities such as RasMol and
Chime can interpret these file formats and
display interactive molecular structures in a
variety of user-defined schemes and colors
Chemical structure and databases
• Structural information about different molecules can
be obtained from a number of comprehensive WWW
resources, including Chemical Abstracts On-Line,
Chemfinder and MedChem
• Each of these resources provides a chemical database
that can be searched using a variety of query formats,
e.g., systematic name, non-systematic name, formula,
molecular weight or CAS registry number
• Search results provide physical, chemical and
biomedical information with links to other databases
and resources
• MedChem also provides the SMILES string
QSAR
• A QSAR is a statistical method used to determine how
the structural features of a molecule are related to
biological activity
• The QSAR approach is particularly useful for
categorizing the activities of related molecules with
multiple functional groups
• Each molecule is broken down into a series of
descriptors (molecular properties) and the QSAR
determines which descriptors are most likely to
promote biological activity
• This gives rise to a set of rules that can be used to
evaluate the potential activity of new molecules
Part
3
Drug Design
Pharmainformatics
• Pharmainformatics is the combination of
biology, chemistry, mathematics and
information technology that is essential for
efficient data management, processing and
analysis in the pharmaceutical industry
Drugs
• Drugs interact with targets, usually proteins,
in the body and through interactions cause
physiological responses
• The pharmaceutical industry aims to
discover drugs with specific beneficial
effects to treat human diseases
Gene – drug – life
• To know a gene’s chemical structure and
composition is one thing, but understanding its
actual function is another thing
• Though the sequencing and analysis would help in
answering questions on aging, diseases, disorders,
and many more, a new discipline of designer drugs
is around the corner waiting for someone to tap
• Even a single nucleotide polymorphism (SNP,
pronounced “snips”), a T, for instance, in one of the
gene sequence, where the neighbour has a C, can
spell trouble
Gene – drug – life
• Many drugs work only on 30 percent of human
population
• In extreme cases, a drug that saves one person may
poison another. For instance, a type II drug
Rezulin, which has been linked to more than 60
deaths from liver toxicity worldwide
• This is where in silico drug design would help not
only in reducing the designing, modeling and
testing time but also reducing the expenditure in
manpower, resources and on various phases of drug
design and development
Areas of drug design
• For drug design, the process must be viewed from
three different dimensions viz., drug design for
– Diseases such as HIV, cancer, etc. that have been beating
the people
– Life style drugs
– Drugs for repairing genetic disorders
• There is an immanent need for evolving drugs for
diseases such as hepatitis C, leprosy and malaria
since these diseases are wide spread and trouble the
people at large
• Other infectious diseases such as tuberculosis, HIV,
etc. are also highly troublesome
In silico drug design
• Earlier, the drug design process used to take many
decades and was carried out haphazardly without
any direction whereas presently there is a systems
approach. Added to this are tremendous reduction
in research and production costs
• Already the surge in bioinformatics solutions has
redefined the way drug trials are done making a
shift from in vitro to in silico
• In silico drug design could be used to shorten the
time of drug design and this issue shall remain the
biggest challenge for years to come
Drugs are insoluble in water…
• A large portion of proteins constitute water (2/3rd
of human body consists of water) and hence do
not behave like rigid bodies due to the presence of
water in the cells and consequently, the
behavioural pattern differs from protein to protein
• Drugs normally do not dissolve in water.
Designing of drugs in silico (on chips, without
water) should consider this point
Important areas for drug design
• The four most important areas of
consideration for successful drug design are
the
–
–
–
–
binding sites
molecular shape
molecular size
inhibitory properties of the proteins
Important areas for drug design
• The study related to crystallization of membrane
protein structure also plays a vital role in drug
design. This area of research would be highly
challenging and would prove to be an excellent
foundation for further research
• Since the sequence size of dengue virus is just
about 11 KB, it would be highly useful for
carrying out lot of work quickly and conveniently
Medical applications
• Bioinformatics and drug design can be highly
useful for diagnosis and treatment of various
neurological disorders. It has been found that
many neurological disorders are due to unusual
gene structures like the triple ‘A’ formation
“AAA” (the A of “ATGC” nucleotides) in the
genes. The problem becomes more complex with
multiple repeats or occurrences of triple ‘A’.
More than eight such repeats are known and in
such cases children are permanently bed ridden or
has to use wheel chairs
Part
4
Drug Development
Bioinformatics in drug development
• Genomics, proteomics, combinatorial
chemistry and high-throughput screening
(HTS) have all contributed to a massive
increase in the amount of data generated by the
pharmaceutical industry
• The role of bioinformatics is to store, track and
provide tools for the analysis of these data –
some thing like an automated environment
Bioinformatics in drug development
• Specific applications include the modeling
of protein interactions with small molecules
allowing rational drug design, the
association of genotype and drug response
patterns (pharmacogenomics), the design
and assessment of chemical diversity in
combinatorial libraries, and the processing
and storage of data from high-throughput
screens of lead compounds
Areas of biology
Application
Role of bioinformatics
Genomics/proteomics (human genome project)
•Characterization of human genes and
proteins
•Target identification/ validation in
the human genome
•Cataloging SNPs and association
with drug response patterns
(pharmacogenomics)
Genomics/proteomics (human pathogen genome project)
•Characterization of genes and proteins of •Target identification/ validation in
organisms that are pathogenic to humans pathogens
Functional genomics (protein structures)
•Analysis of protein structures (humans
and their pathogens)
•Prediction of drug/target
interactions
•Rational drug design
Areas of biology
Application
Role of bioinformatics
Functional genomics (expression profiling)
•Determining gene expression patterns in
disease and health
•Gene classification based on drug
responses
•Pathway reconstruction
Functional genomics (genome-wide mutagenesis)
•Determining the mutant phenotypes for
all genes in the genome
•Databases of animal models
•Target identification/ validation
Functional genomics (protein interactions)
•Determining interactions among all
proteins
•Characterization of protein
interactions
•Reconstruction of pathways
•Prediction of binding sites
Areas of chemistry
Application
Role of bioinformatics
HTS
•Highly parallel assay formats for lead
identification
•Storing, tracking and
analyzing data
Combinatorial chemistry
•Synthesis of large number of chemical
compounds
•Cataloging chemical libraries
•Assessing library quality/
diversity
•Predicting drug/target
interactions
Principles of drug development
• Drug development begins with the
identification of a suitable target, which must
contribute significantly to a human disease
• Ideally, altering the activity of this target
should have a beneficial effect thus showing
its potential for therapeutic intervention
• The next stage of the process is lead
discovery, where compounds showing some
of the desired activity of an ideal drug are
sought
Principles of drug development
• Optimization of lead compounds results in
drug candidates that may be registered and
submitted for clinical trials, which establish
their safety and metabolic behaviour in
human subjects
Genetic link to drugs
• An early example of the utility of bioinformatics in drug
design is cathepsin K, an enzyme that might turn out to be an
important target for treating osteoporosis, a crippling disease
caused by the breakdown of bone
• While analyzing the osteoclasts (cells that break down bone in
the normal course of bone replenishment) taken from people
with bone tumors, it was found that osteoclasts cells were
over expressed and could be over active in individuals with
osteoporosis
• They matched with a previously identified class of molecules
called cathepsins. Efforts are on to find a potential drug to
block the cathepsin K target
Genetic link to drugs
• Scientists believe that 99.9 percent of your genes
perfectly match those of the person sitting beside
you. But the remaining 0.1 percent of the genes
vary and it is these variations in which the drug
companies are interested in
• Several years after the debut of tests for BRCA1 and
BRCA2, scientists are still trying to determine
exactly to what degree those genes contribute to a
woman’s cancer risk
Chemical diversity
• Diverse chemical libraries are required for efficient
lead discovery if little is known about the binding
properties of the drug target
• Conversely, focused libraries are required if the
structure of the target is known, since this defines a
particular set of ligands
• Chemical diversity can be defined by comparing
molecules on the basis of descriptors (functional
groups) and how these fill chemical space
• A number of software tools are available for the
design and assessment of diverse or focused chemical
libraries, virtual screening against drug targets
Computational screening
• Software applications like DOCK and Autodock
match potential ligands to binding sites by
calculating steric constraints and bond energies
• These can be used to search chemical databases and
find potential drug leads
• Some applications consider the ligand and binding
site as inflexible structures, rather like pieces of a
jigsaw, while others can incorporate flexibility into
the molecules by calculating allowable and
compatible bond torsions
Functional genomics
• The large-scale functional annotation of
genes is known as functional genomics and
incorporates areas such as homology
searching, structural analysis, expression
analysis, large scale mutagenesis and the
analysis of protein interactions
• All of these areas are important in drug
development
Genome-scale mutagenesis
• Genome-scale mutagenesis is a rich source
of animal disease models for target
identification and validation, and large
mutant collections in simple organisms can
be used for the rapid high-throughput
screening of potential lead compounds
Approaches in functional genomics
Approach
Functional annotation method
Homology searching
Comparison to related sequences with known function
Protein structure
determination (structural
genomics)
Comparison to molecules with related structure and known
function
Comparative genomics
Functional annotation by domain conservation, conserved
phylogeny or conserved genomic organization
Expression analysis
Similar expression profiles indicate conserved function
Mutagenesis
Function based on mutant phenotype, e.g. knockout mice
Protein interaction
screening
Function based on presence in multi-subunit complex or on
interaction with proteins of known function
Small molecule informatics Interaction with small molecules
Pharmacogenomics
• It is a study of how variation in the human
population correlates with drug response
patterns
• The analysis of genomic data and its
comparison with drug response data allows
patients to be clustered into drug response
groups, so that appropriate drugs and dose
regimens can be administered
• Variation is catalogued by analyzing data on
mutation (particularly SNPs) and gene
expression profiles
In lab vs. out of lab effort
• The companies and individuals plug into the effort of
drug design at various points: collecting and storing
data, searching databases, and interpreting the data
• The race and competition is all about who can mine
the massive information best
• Just modeling or computing of the drug design or
protein structure would not be sufficient, but lot of
information on test results and clinical trials from
outside are also very important
• Most of the time should be spent on this aspect for
ensuring success in drug design and development
Issues of drug design
• Eventhough the human genome has been
sequenced, there a number of problems awaiting
for solutions…… technical, legal, and social
• It is absolutely not clear as to how much must one
know about a gene in order to patent it
• There is also a necessity of reviewing all failed
drugs, i.e., drugs failed during clinical trails since
their molecular composition and experimentation
process could give lot of valuable information
• Various aspects connected to successful drug
design include supercomputing, modeling of
proteins through software, biotechnology,
computational methods and analysis, biochemistry,
in silico drug design, etc.
• It is notable that a drug that works for protein ‘A’
does not work for protein ‘B’ or behaves
differently due to various factors. That is why,
many drugs could fail, and hence an integrated
(team work) effort is required with tremendous
amount of information and interactions
• At the moment, many patent applications rely on
computerized prediction techniques that are often
referred to as “in silico” biology
• With full or partial gene sequence, scientists enter the
data into a computer program that predicts the amino
acid sequence of the resulting protein
• By comparing this hypothetical protein with known
proteins, the researchers take a guess at what the
underlying gene sequence does and how it might be
useful in developing a drug, say, or a diagnostic test
• Searches for compounds that bind to and have the
desired effect on drug targets still take place mainly in
a biochemist’s traditional “wet” lab, where evaluations
for activity, toxicity and absorption can take years
• But now with the bioinformatics initiatives, tools and
growing databases of protein structures and
biomolecular pathways, this aspect of drug
development is shifting to computers
• As the saying goes “genomics without bioinformatics
will not have much of a payoff”
Ayurveda and tribal medicine
• Till date, not much has been considered about
the biodiversity, especially research and
knowledge base on alternate medicine,
Ayurveda, herbs/shrubs applications from
remote villages, etc.
• This area of medicine and study of their affect
on genes and proteins could be another
challenging and interesting area
Future of pharmainformatics
• Drug companies collect the genetic know-how to make
medicines tailored to specific genes – an effort called
pharmacogenomics
• In the years to come, pharmacists may hand over one version of
blood pressure drug based on your unique genetic profile, while
the person behind in line would get a different version of the
same medicine!!
• There is going to be a day when somebody comes in with
cancer, and diagnosis can be done not on the basis of
morphology of the cancer but by looking at the detailed patterns
of gene expression and protein-binding activities in that cell
Target for the industry
• It is expected that in this decade, the
pharmaceutical industry will be faced with
evaluating up to 10,000 human proteins
against which new therapeutics might be
directed
• That is 25 times the number of drug targets
that have been evaluated by all the companies
since the dawn of the industry
Resources
• For a primer on genetic testing and a directory of
genetic tests, visit GeneTests at www.genetests.org
• For more on the ethical, legal and social implications
of human genome research, visit the National Human
Genome Research Institute’s web site at
www.nhgri.nih.gov/ELSI