Mu-Hyun Baik
Assistant Professor of Informatics and Assistant Professor of Chemistry,
Chemical Informatics / Computational, Inorganic, Bioinorganic and Physical Chemistry

B.S. (Vordiplom) Heinrich-Heine Universität Düsseldorf (Germany), 1994
Ph.D. University of North Carolina Chapel Hill, 2000
Postdoctoral Associate, Columbia University, 2003

Phone: (212) 854-8471
Email: mbaik@indiana.edu

Development of an Artificial Chemical Expert
based on Quantum Mechanical Electronic Structure Theory

Computational chemistry has come a long way in the last two decades. Novel quantum chemical methods, such as Density Functional Theory (DFT), have given access to efficient and quite accurate computer models that can help understanding and predicting complex chemical reactions. The advent of high performance computer hardware has allowed the size of molecules that can be treated routinely to increase up to 200 atoms. In the next few years, this number will likely double and continue to increase. While DFT is still far away from being perfect, the combination of acceptable accuracy and consistency made it the method of choice for large scale simulations in chemistry.

The improvement in the quality of the simulation comes at a high price, however. The results of small and simple calculations in the past could be analyzed in great detail giving an in-depth understanding of the electronic structure at least in a qualitative sense and rational predictions could be made. For example, simple ligand field theory combined with basic group theory can explain the molecular structure of entire classes of organometallic compounds in addition to providing a rationale for the basic pattern of their spectroscopic properties. Whereas modern electronic structure calculations are more sophisticated and often give quantitatively reliable predictions for the specific model, the underlying electronic structure becomes so complicated that general trends can rarely be recognized. Even with an accurate model calculation at hand rational predictions of what effect simple alterations of the ligand might have are practically impossible.

Unlike in the experimental world, the realm of quantum chemistry is infinitely precise (not to be mistaken with infinitely accurate!). If the computer model predicts a certain reaction energy profile with a set of structures and energies, we can at least in principle analyze the data to a point where every bit of the energy and structure is accounted for. Such an analysis becomes quickly very hard to do with increasing size of the computer model and the latest trend is that computational chemistry often used as a 'black-box' tool that gives optimized structures and energies. We believe computational chemistry can do much more and the key to success lies in changing the way quantum chemical data is analyzed today. We would like to take full advantage of state of the art data mining and data management technologies to increase the amount of chemical information we can extract from quantum calculations. Furthermore, we would like to design an artificial expert system that can use the information to make rational decisions and suggest new strategies for the chemistry problem at hand.

The artificial expert has three components:

A. Database as chemical memory: Rather than conducting each of our studies as independent and separate projects, we treat each computer simulation as an information and experience source that is processed through a common interface, which also serves as a information indexing and data warehousing facility. Instead of being used only for one specific purpose of the particular study, the computed wave functions are stored for additional analysis in the future in a different framework. Thus, we build an expert databank with an increasing number of chemical systems that have been studied from numerous different aspects, ensuring transparency of the data to other researchers of the team. Taking advantage of data sharing, each calculation increases the level of 'experience' of the expert system extending the empirical knowledge base upon which new hypothesis and chemical concepts will be derived. To allow real-time access to each of the calculations, we make use of currently available database engines, such as SQL-Server, rather than designing a proprietary solution. The challenge lies in transforming quantum chemical data into a data structure that enables making use of these technologies.   

B. Data Mining: A crucial component for increasing the yield of chemical knowledge is extracting chemically meaningful data out of the large-scale quantum chemical simulations without or with minimal human effort. As quantum mechanical model systems become realistic in size millions of coefficients are necessary to describe the final result in its raw form, i.e. the computed wave function. The answer to almost all questions one might ask about the simulated system is hidden in these numbers. The challenge lies in distinguishing data that are irrelevant for the specific question under investigation from those that are important. To carry out this task efficiently, we develop automatic data mining tools that can process the molecular orbitals and trace changes and similarities between chemically related molecules. We make heavy use of a range of visualization techniques to enlarge the scope of our analysis while maintaining a conceptual simplicity. 

C. Towards Artificial Chemical Intelligence: The final and most demanding part of our ultimate goal is designing an 'inference engine', which is a module that can automatically formulate a hypothesis based on the data provided by the two modules outlined above and determine which exploratory calculations should be carried out to fill possible experience holes that might exist in the databank. For example, if the task at hand calls for increasing the stability of a given molecule towards oxidation, the databank will first be consulted to determine which combination of functional group and structural motif gave the desired trend in a similar study previously. This process requires the ability of the system to both recognize structural similarities and similarities of the problem, which is a challenging question on its own right. Then the expert system will determine a number of promising targets, build their prototype structures and launch appropriate calculations with minimal human interaction. Finally, the expert system will collect the results, analyze them using the the analysis tools described above, evaluate the success and if necessary readjust the working strategy in an iterative manner until the best answer is found. 

An important component of our research philosophy is to produce working solutions to real chemical problems immediately. Thus, our development project is highly modular and the components functions independently from each other allowing immediate application to ongoing chemical research in our laboratory. Current interest aims at identifying rational strategies for devising better anticancer drugs, robust industrial catalysts or new materials. We are also interested in large-scale simulations of metalloenzymes to understand how they work and identify potential ways of replicating their reactivity in industrial settings with biomimetic complexes. For more details of the chemical systems we are interested in, see http://www.chem.indiana.edu/personnel/faculty/Baik/baik.htm.