Computational chemistry has come a long way in the last two decades. Novel
quantum chemical methods, such as Density Functional Theory (DFT), have given
access to efficient and quite accurate computer
models that can help understanding and predicting complex chemical reactions. The
advent of high performance computer hardware has allowed the size of molecules
that can be treated routinely to increase up to 200
atoms. In the next few years, this number will likely double and continue to
increase. While DFT is still far away from being perfect, the combination of
acceptable accuracy and consistency made it the method of choice for large scale
simulations in chemistry.
The improvement in the quality of the simulation comes at a high price, however.
The results of small and simple calculations in the past could be analyzed in
great detail giving an in-depth understanding of the electronic structure at
least in a qualitative sense and rational predictions could be made. For example, simple ligand field theory
combined with basic group theory can explain the molecular structure of entire classes of organometallic
compounds in addition to providing a rationale for the basic pattern of their
spectroscopic properties. Whereas modern electronic structure calculations are
more sophisticated and often give quantitatively reliable
predictions for the specific model, the underlying electronic structure
becomes so complicated that general trends can rarely be recognized. Even with an
accurate model calculation at hand rational predictions of what effect simple alterations of the ligand might have are
practically impossible.
Unlike in the experimental world, the realm of quantum chemistry is infinitely
precise (not to be mistaken with infinitely accurate!). If the computer model
predicts a certain reaction energy profile with a set of structures and
energies, we can at least in principle analyze the data to a point where every
bit of the energy and structure is accounted for. Such an analysis
becomes quickly very hard to do with increasing size of the computer model and
the latest trend is that computational chemistry often used as a 'black-box'
tool that gives optimized structures and energies. We believe
computational chemistry can do much more and the key to success lies
in changing the way quantum chemical data is analyzed today. We would like to take full advantage of state of the art data mining and
data management technologies to increase the amount of chemical information we
can extract from quantum calculations. Furthermore, we would like to design an
artificial expert system that can use the information to make rational decisions and
suggest new strategies for the chemistry problem at hand.
The artificial expert has three components:
A. Database as chemical memory: Rather than
conducting each of our studies as independent and separate projects, we treat each
computer simulation as an information and experience source that is processed
through a common interface, which also serves as a information indexing and data
warehousing facility.
Instead of being used only for one specific purpose of the particular study,
the computed wave functions are stored for additional analysis in the future
in a different framework. Thus, we build an expert databank with an
increasing number of chemical systems that have been studied from numerous
different aspects, ensuring transparency of the data to other
researchers of the team. Taking advantage of data sharing, each calculation increases the level of 'experience' of
the expert system extending the empirical knowledge base upon which new
hypothesis and chemical concepts will be derived. To allow real-time
access to each of the calculations, we make use of currently available
database engines, such as SQL-Server, rather than designing a proprietary
solution. The challenge lies in transforming quantum chemical data into a
data structure that enables making use of these technologies.
B. Data Mining: A crucial component for increasing the yield of chemical
knowledge is extracting chemically meaningful data out of the large-scale
quantum chemical simulations without or with minimal human effort. As quantum
mechanical model systems become realistic in size millions of coefficients are
necessary to describe the final result in its raw form, i.e. the computed wave
function. The answer to almost all questions one might ask about the simulated system is hidden in these numbers. The challenge lies in distinguishing
data that are irrelevant for the specific question under investigation
from those that are important.
To carry out this task efficiently, we develop automatic data mining tools
that can process the molecular orbitals and trace changes and similarities
between chemically related molecules. We make heavy use of a range of
visualization techniques to enlarge the scope of our analysis while
maintaining a conceptual simplicity.
C. Towards Artificial Chemical Intelligence: The final and most demanding
part of our ultimate goal is designing an 'inference engine', which is a module
that can automatically formulate a hypothesis based on the data provided
by the two modules outlined above and determine which exploratory
calculations should be carried out to fill possible experience holes that
might exist in the databank. For example, if the task at hand calls for
increasing the stability of a given molecule towards oxidation, the
databank will first be consulted to determine which combination of
functional group and structural motif gave the desired trend in a
similar study previously. This process requires the ability of the system to both
recognize structural similarities and similarities of the problem, which
is a challenging question on its own right. Then the expert system will
determine a number of promising targets, build their prototype structures
and launch appropriate calculations with minimal human interaction.
Finally, the expert system will collect the results, analyze them using
the the analysis tools described above, evaluate the success and if
necessary readjust the working strategy in an iterative manner until the
best answer is found.
An important component of our research philosophy is to produce working
solutions to real chemical problems immediately. Thus, our development
project is highly modular and the components functions
independently from each other allowing immediate application to ongoing
chemical research in our laboratory. Current interest aims at identifying
rational strategies for devising better anticancer drugs, robust
industrial catalysts or new materials. We are also interested in
large-scale simulations of metalloenzymes to understand how they work and
identify potential ways of replicating their reactivity in industrial
settings with biomimetic complexes. For more details of the chemical systems we
are interested in, see
http://www.chem.indiana.edu/personnel/faculty/Baik/baik.htm.