rule-based generation of thermochemical routes to biomass conversion

12

Click here to load reader

Upload: prodromos

Post on 27-Jan-2017

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

Rule-Based Generation of Thermochemical Routes to Biomass Conversion

Srinivas Rangarajan, Aditya Bhan,* and Prodromos Daoutidis*

Department of Chemical Engineering and Materials Science, UniVersity of Minnesota, Minneapolis, 421Washington AVenue SE, Minneapolis, Minnesota 55455

Biomass conversion to fuels and chemicals involves a multitude of oxygen-containing compounds andthermochemical reaction routes. A detailed elucidation of the process chemistry is, thus, a key step inunderstanding the reaction mechanisms and designing chemical processes in a biorefinery. In this paper, acomputational tool, called Rule Input Network Generator (RING), is presented as a platform for modelingdiverse homogeneous and heterogeneous chemistries in biomass conversion and automatically generating theunderlying complex reaction networks. RING accepts a set of reaction rules and initial reactants as inputsand exhaustively generates the reactions of the system. The reaction center of an elementary step is representedby a SMARTS-like string and identified as a submolecular pattern in a reactant molecular graph using apattern-matching algorithm. The reaction events are subsequently modeled as a graph transformation system.The generality of this framework was substantiated by the successful application of RING in reproducing thereaction mechanisms of different biomass conversion systems, such as acid-catalyzed dehydration of fructose,base-catalyzed esterification of triglycerides, and gas phase pyrolysis of fatty esters.

1. Introduction

Biomass, an abundant and renewable resource of organiccarbon, will play a key role in supplying the world with “green”transportation fuels and other useful organic chemicals.1-4 Abiorefinery, envisaged to function akin to a petroleum refinery,will draw a feedstock from an abundant biomass source andconvert it into a host of smaller and more valuable products,such as ethanol, biodiesel, and levulinic acid, in a sequence ofunit operations and processes.2,3,5,6 A mainstay in the designand development of petroleum refineries are the strategies,software tools, and semiempirical physicochemical propertycorrelations that have been developed for process modeling,design, and optimization.7-10 These are extensively used in thebasic design of petroleum refineries and in subsequent processimprovements. Some tools, such as plant-wide simulators andoptimizers, are in principle applicable in designing biorefineries,as well. Kinetic and mechanistic models of reaction systems inpetroleum refining, on the other hand, are not directly transfer-able to biorefineries because the process chemistry and reactionmechanisms are significantly different. Such models, therefore,will have to be developed anew for modeling the chemistryrelevant in biomass conversion.

Several challenges need to be addressed prior to performingkinetic or mechanistic analysis of biomass conversion systems.First, biomass has a C/O ratio of 1:1, whereas fossil fuels consistpredominantly of carbon and hydrogen, implying that thechemical transformations pertaining to oxygenates, which arenot well explored yet, become important. Second, biomassconversion to fuels and chemicals can involve diverse thermo-chemical routes,11-16 such as gas phase pyrolysis, liquid phasesolution chemistry (e.g., dehydration and hydrolysis), andheterogeneous catalytic chemistry. Furthermore, the compositionof biomass varies, depending upon the source (e.g., cellulose,hemicellulose, and lignin) and, hence, leads to a variable productdistribution. Thus, to understand the reaction systems of abiorefinery, it is essential to have a tool for (a) representinggeneric organic compounds in a compact format, (b) describing

and generating diverse chemistries in biomass conversion, and(c) identifying potential reaction routes that exist betweenreactants and products. Such a tool can then be used as aprecursor for kinetic modeling and for pathway and mechanisticanalysis because these require a reaction network as the firststep.

In this paper, we present Rule Input Network Generator(RING), a computational tool that provides the user with theability to describe a variety of heterogeneous and homogeneouschemistries in biomass conversion and to generate complexreaction networks. The tool takes in as input reactants andreaction rules of a reaction system and generates a list of allpossible products and reactions on the basis of the reaction rules.RING builds on the strategies developed for modeling pyroly-sis,17 combustion,18 hydrocarbon processing,19,20 and syntheticorganic chemistry;21 it further implements algorithms from thedomain of Cheminformatics22 to generalize the description ofa reaction using a graph-transformation system, thereby stream-lining the method of reaction generation. Four examples ofreaction network generation, consisting of different types ofchemistries relevant in biomass conversion, are presented tohighlight the versatility of RING.

2. Background

Computational tools with a systematic procedure for repre-senting chemical transformations were first developed in thefield of chemistry for computer-aided organic synthesis.21

LHASA23 uses heuristics-based strategies for retrosynthesis,24

and IGOR25-27 implements the Dugundji-Ugi algebraic modelof BE and R matrices25 to represent molecules and reactionsand combinatorially generates all possible synthetic routes fororganic chemical systems. These tools were developed for thespecific purpose of planning the synthesis of organic compoundsto aid chemists in designing experiments and, therefore, are notdirectly applicable in reaction engineering, in which theemphasis is on building physicochemical models of reactionsystems.

Most reactors in a petroleum refinery are complex: the numberof compounds and intermediates involved is large enough topreclude complete identification using analytical techniques, and

* To whom correspondence should be addressed. E-mails: (A.B.)[email protected]; (P.D.) [email protected].

Ind. Eng. Chem. Res. 2010, 49, 10459–10470 10459

10.1021/ie100546t 2010 American Chemical SocietyPublished on Web 06/03/2010

Page 2: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

the reaction network is so highly connected that the mathemati-cal analysis of these systems becomes nontrivial. One of thefirst generic strategies to build practical models of such complexreaction systems was Structure Oriented Lumping (SOL),proposed initially by Mobil8,28 and expanded later by Exxon-Mobil.29 SOL represents a molecule as a vector of the frequency(the number of occurrences) of structural increments in thatcompound. These increments were identified on the basis ofthe analysis of petroleum feedstocks and products. A reactionmodifies the reactant molecule vector and results in the product.SOL was designed to provide a pathway-level approach, andthe molecular representation scheme prevents the depiction ofintermediates and, consequently, elementary steps. To providefurther elucidation of reaction mechanisms, reaction generationtools have been developed that utilize a set of elementary stepsas formal reaction rules to construct complex networks; hence,the term “rule-based” to characterize such tools.

NETGEN17 is a rule-based tool that adopts the method ofBE and R matrices for modeling pyrolysis, polymerization,biochemical reactions, and silicon nanoparticle growth.30-32 Italso includes an ordinary differential equations solver for kineticmodeling. Reaction Description Language (RDL)19,33 andRDL++20 are rule-based tools with an English-like languageas the user-interface. In RDL, reaction rules of chemicalreactions are described using the syntax of the language.RDL++ expands RDL with functionality and syntax to describemore complex rules, such as determining aromatic and allylicatoms, recognizing penta-coordinated carbonium ion intermedi-ates, and representing multiple catalysts and catalytic centers.In many cases, elementary reaction steps have some restrictionson the nature of the reactant species. For example, carboniumions are formed on solid acid catalysts only upon adsorption ofparaffins because olefins, despite having sp3 carbons, rapidlyequilibrate with their carbenium ion surface intermediates. InRDL and RDL++, therefore, constraints on the size andstructure of the reactants and the products can be imposed in areaction rule whenever required, on the basis of experimentaldata or expert knowledge.

The number of reactions generated in rule-based modelsincreases almost exponentially with the addition of everyreactant and reaction rule and can, therefore, increase theexecution time significantly. For example, Hsu et al.20 showedthat, given a set of reaction rules for propane aromatization,increasing the allowed size of the products from C5 to C9resulted in increasing the execution time from as little as 15 sto almost 48 h. This combinatorial explosion in the number ofspecies and reactions can be mitigated in RDL++ by preventingthe generation of reactions that are unlikely on the basis ofstructure and reactivity arguments. The tool NETGEN, on theother hand, prunes the network on the basis of the number ofreaction steps between a given product and the initial reactants(rank-based technique34) and the relative rate of the given stepwith respect to a reference reaction rate (rate-based technique35).

Automated reaction generation has also been used substan-tially in the field of combustion of hydrocarbons because detailedkinetic modeling aids the design of engines and other combus-tion systems. EXGAS,36 an automated tool for generation ofkinetic models, was developed to model gas phase oxidationand combustion of alkanes. It adopts the method of Chinnicket al.37 to represent a molecule as a tree initiating from a rootatom. The external representation of molecules in EXGAS isin the form of a 1-D string.

The tools described above were designed and applied forspecific chemistries: RDL and RDL++ for catalytic systems,

COMGEN18 and EXGAS for combustion, and NETGENpredominantly for free radical chemistry. These tools, as a result,are generally not applicable across different chemistries. Inbiomass conversion, however, the potential of different ther-mochemical routes is still being explored, and many types ofchemistries seem promising, as discussed earlier. A singleplatform for modeling these diverse chemistries will provide acommon medium to analyze the reaction pathways of differentchemistries and to compare and contrast one type of processchemistry with another. In this context, a computational toolthat allows for the representation of the different atomicconfigurations and bonding types, nonbonding interactions,different reagents (e.g., electrophiles and nucleophiles), ligands,catalysts, and multiple catalytic sites—chemical concepts per-taining to organic chemistry—will be highly valuable. Such atool, in conjunction with information of thermodynamic quanti-ties and kinetic parameters, can be used to construct and analyzepossible mechanisms38and to develop microkinetic models.39

RING includes a generic three-step framework for describingreaction rules that allows the user to model all types ofchemistries applicable in biomass conversion. Specifically, thetool uses graph theory and graph-transformation systems toprovide abstractions for representing molecules and reactions,respectively. Section 3 describes the basic working principlesof RING through examples, and section 4 demonstrates thegenerative ability of the tool by considering four differentchemical systems from the domain of biomass conversion. Adetailed description of the underlying theory and algorithms willbe the subject of a subsequent paper.

3. RING: a Description

RING, as discussed in sections 1 and 2, is a rule-basedautomated reaction network generator. The set of initial reactantsand a set of elementary steps as reaction rules are the essentialinputs to RING, which then generates a list of possible reactions.Any rule-based tool requires a scheme for representing mol-ecules and reaction rules and a procedure for generating thereactions. RING, for this reason, is composed of three compo-nents (Figure 1): a molecular representation system that is usedto depict all molecules (including reactants, products, andintermediates), a scheme for describing reaction rules unam-biguously, and a network generator that manages the processof construction of reaction networks based on the reaction rules.

Both the network generator and the depiction scheme forreaction rules are aided by other subcomponents; the reactionrule description contains a pattern representation scheme fordescribing specific fragments in a molecule; and the networkgenerator contains an internal molecular representation scheme

Figure 1. Overall structure of the automated reaction network generator.

10460 Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010

Page 3: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

using graph theory, a graph-transformation system for represent-ing the elementary reaction steps, and a control loop for thegeneration of all possible reactions. Each of these threecomponents is described in detail below.

3.1. Compact Molecular Representation. A molecularrepresentation scheme is essential for input, storage, retrieval, andoutput of molecules. The number of molecules (reactants, products,and intermediates) in a complex reaction system is large, typicallythousands in number. A representation scheme, consequently,should consume as little memory as possible and yet preserve theatom connectivity, electronic, and atomic information of themolecule. A character-string-based representation scheme, basedon SMILES,40 has been used previously.18,20,33 SMILES is astring representation scheme for molecules with a prescribedset of rules for depicting molecules. RING employs anadapted form of standard SMILES. Adaptations were madeto represent additional electronic configurations (atomtypes),such as free radicals and carbonium ions that were notrepresented in the original version. Figure 2 shows SMILES-like string equivalents for simple compounds. A givencompound can have multiple string representations. Forexample, ethanol could be represented either as “CCO” oras “OCC”; a unique string representation for every moleculetherefore, requires a “canonical” string which is generatedin RING using the CANGEN algorithm.41 This algorithmranks the atoms in a molecule on the basis of their invariantatomic properties (such as atomic weight, atomic number,charge, valency, unpaired electrons, and lone pairs) and thoseof their neighboring atoms. The ranks are then used toconstruct the canonical SMILES-like string.

3.2. Unambiguous Description of Reaction Rules. Thereaction rules of a rule-based generation tool convey theelementary steps of the system being modeled. The frameworkfor description of reaction rules should allow an unambiguousspecification of the atoms and bonds participating in a reaction.In addition, the framework should also have a provision forpreventing a combinatorial explosion. With these goals in mind,a framework with a three-step procedure for reaction descriptionwas adopted. Consider the example of adsorption of a ketoneon an acid catalyst, as shown in Figure 3.

For simplicity, the catalytic center is represented in the figureas H+ (a proton). First, the reactant pattern, consisting of theset of atoms and bonds participating in a reaction, is determined.In this case, the keto functional group, CdO, and the acid site,H+, constitute the reactant pattern.33,42 The reaction stepproceeds with bond formation between the proton and theoxygen. The charge on the proton is transferred to the carbon,and the double bond of the keto group weakens to form a singlebonded, C+O. This description of the transformation operationsconstitutes the second step. To prevent the generation ofreactions that are infeasible, additional constraints, such as thoserequiring the reactant to be a neutral species, need to bespecified. In addition, constraints on size and structure canbe imposed by either the chemical process or the nature of thecatalyst; consequently, there may be an upper bound on the sizeof the molecule or a restriction that the molecule should not behighly branched. The description of such constraints constitutesthe third and final step.

Ratkiewicz et al., in their tool COMGEN,18 adopt a stringrepresentation for reactant patterns based on SMARTS,43 whichcontains well-defined rules and symbols to represent patternsin a molecule. RING adopts a SMARTS-like representation thatis more comprehensive than that used in COMGEN becauseadditional symbols are employed to represent atom environmentsand classes of atoms and bonds. Table 1 gives examples ofdifferent pattern strings and their interpretations. Reaction rulescan be either unimolecular or bimolecular; the latter case consistsof two reactant patterns, one for each reactant.

Modifications in structure and electronic configuration com-prise the description of transformation operations. Structuralchanges are transformations that increase or decrease the bondorder and change connectivity (e.g., the formation and cleavageof a bond), whereas changes in electronic configuration incor-porate changes in charge or electron density of the atomsparticipating in the reaction (such as the neutral carbon acquiringa positive charge and becoming a carbocation). Labels, such as1 and 2 of the pattern C1dO2 in Table 1, are given to the atomsof the reactant pattern for their identification when describingthe transformation operations. For example, in the case ofadsorption of a ketone, the pattern C1dO2 describes thecarbonyl group as the reactant pattern (with carbon labeled 1and oxygen labeled 2), whereas the transformation operationsinclude modifying the atomtype of 1 to C+ and decreasing thebond order of the bond between atoms 1 and 2.

The constraints of a rule can be at the molecular level or atthe level of the atom. Molecular constraints can be further

Figure 2. Sample external string representation adapted from SMILES40

used in RING. ‘1’ is a ring identifier. For benzene, this implies that thefirst and the last carbon atoms given in the string are connected by a bondthus forming a ring. Similarly, the oxygen and the first carbon in Furanhave a bond, thus forming a ring. Note also that all aromatic atoms arerepresented in small letters, and others are represented in capital letters, toprovide this additional information of aromaticity of the molecule and atoms.Atoms with charges and unpaired electrons are enclosed within squarebrackets.

Figure 3. Adsorption of a ketone on an acid site to form a carbocation.

Table 1. Sample SMARTS-Like Patterns and Their Interpretation

pattern interpretation

CC two carbon atoms connected by a single bondCO carbon and oxygen atoms connected by a single bondC∼C two carbon atoms connected by any type of bondC1dO2 a carbon atom doubly bonded to an oxygen;

1 and 2 are labels given to the atomsC1[!dO]C2 a carbon atom labeled 1 singly bonded to

another carbon atom labeled 2; 1 is not connectedto oxygen by a double bond.

Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010 10461

Page 4: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

classified as constraints on charge, size, or structure; forexample, constraints such as a molecule being neutral or acyclic.These constraints are specified as strings with a defined set ofsymbols and are interpreted by the network generator. RINGallows multiple molecular constraints to be combined intocomplex boolean strings, as shown in Table 2, thereby providingadditional leverage in describing reaction rules. The constraintson the atom pertain to conditions on the local environmentaround a specific atom and are specified in the pattern strings.For example, the pattern C1[!dO]C2, which indicates a patternof two carbons singly bonded to each other, includes an atom-level constraint that forbids atom 1 to be a carbonyl carbon.The framework thus offers flexibility in describing reaction rulesto varying levels of detail, from providing just the minimalinformation comprising only the reactant pattern and thetransformations to a highly detailed description having complexconstraints governing the rule.

Table 3 summarizes the reaction rule for the case shown inFigure 3. Two patterns are created: one for the keto group andthe second for the proton. Four transformation operationsdescribe the changes in bonding and electronic configuration,whereas the constraint on the first pattern requires the reactantto be of size less than 6, acyclic and neutral. The atom-levelconstraints on 1, [!H], and [!O] prohibit the atom from beingsingly bonded to hydrogen or connected to oxygen atoms andconsequently prevents aldehyde, acid, and ester groups fromparticipating in a such reaction. Table 3 is representative of thereaction rule and is not the actual form of the input to RING.These rules are currently written as C++ commands directlyinto the code of RING; a user-interface is currently beingdeveloped and is discussed briefly later.

3.3. Network Generator. The user inputs the initial reactants(as SMILES-like strings), defines the reaction rules (also calledreactiontypes), and in addition, can also provide a list of globalconstraints that are to be satisfied by all molecules. The overallprocess of generation of reactions, based on these rules, ismanaged by the network generator which performs threefunctions. First, it maintains lists of molecules (reactants,intermediates, and products), reaction rules (also referred to asreactiontypes), and reactions. Second, it creates internal repre-sentations of molecules and patterns and finds matches of thepatterns in the molecule. Third, it finds all possible reactionsthat each molecule can undergo. The second and third functionsare discussed in detail below.

The internal representation of molecules and patterns arebased on chemical graph theory.44 In this representation scheme,atoms are nodes and bonds are the edges of a graph. The nodesand the edges of this graph have attributes specifically associatedwith them: the nodes (atoms) contain atomic information andthe electronic configuration, and the edges (bonds) contain bondorder information of the bonds connecting the atoms. The nodalattributes allow different electronic configurations (or atomtypes)of a given type of element to be assigned for each atom. Forexample, carbon can exist as neutral carbon C, carbenium ionC+, radical C•, carbene C:, etc. The atomtype of an atom can,hence, be modified in a reaction to take up any other allowedelectronic configuration.

When a molecule object is created, its SMILES-like stringis interpreted and a molecular graph is created. Graph-basedalgorithms are used to determine further characteristics of themolecule. For example, additional functions for generating theunique SMILES, finding all rings45 in the molecule, anddetecting aromaticity46 use graph algorithms such as depth-firstand breadth-first traversal. When a pattern object is created, itsSMARTS-like string is parsed to generate the pattern graph.The instances of a specific pattern in a given molecule are foundusing the Ullmann algorithm for subgraph isomorphism.47 Thisalgorithm finds all parts of the molecular graph (also calledsubgraphs) identical to the pattern graph. Figure 4 shows someexamples of matches in a molecule wherein atoms are labeledfor the sake of easy reference. For example, a pattern CdOcreates a pattern graph corresponding to the keto group; theUllmann algorithm subsequently finds identical subgraphs ofthis pattern in the molecule and returns the subgraph C2dO3 asa match.

This internal representation, along with the framework fordescribing reaction rules discussed in section 3.2, inherentlyprovides a method for reaction generation based on graphtransformation.48 The instances of a reactant pattern, also knownas reaction centers, are first identified in the molecule. If amolecule has the appropriate reaction center, then all theconstraints of the reaction rule are evaluated on the molecule.For bimolecular reaction rules, a second molecular graphcorresponding to the coreactant is taken, and instances of thesecond reactant pattern are identified. If the reactants satisfy allthe constraints, the transformation operations of the reactionrule are applied on the molecular graphs. The attributes of theatoms and the bonds of the reaction center in the molecule aremodified according to the reaction rule. The connected com-ponents of the transformed graphs are the products of thereaction. Each instance of the reactant pattern leads to a differentreaction. For example, Figure 5 extends the case shown in Figure3 and shows two instances of the reactant pattern leading totwo possible reactions. The number of reactions for a bimo-

Table 2. Examples of Boolean Constraint Expressions

constraint expression interpretation

{s < 6}&!{r} size less than 6 heavy atoms andnot a ring (linear)

{q0}&{C ) C} neutral molecule and olefinic(!{r}&{s < 9}) | ({r}&{s < 7}) linear with size less than 9 OR

cyclic with size less than 7

Table 3. Reaction Rule for Adsorption of Ketone on an Acid Site

reaction rule

Reactant Pattern1. C1[!H][!O])O22. H+3

Transformation Operations1. modify atomtype of 3 to H2. modify atomtype of 1 to C+3. decrease bond order of bond (1, 2)4. connect atoms 2 and 3

Constraintspattern 1 - {s < 6}&!{r}&{q0}

Figure 4. Sample patterns and their instances in 1-hydroxy propan-2-one.

10462 Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010

Page 5: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

lecular reactiontype is the product of the number of possiblereaction centers of the two reactants.

The network generator also generates all possible reactionsallowed by the reactiontypes, which is essential for mechanisticmodeling of any reaction system. Figure 6 shows the flowchartfor the overall process of reaction generation. The initialreactants are all stored in a list called the unprocessed moleculelist. The process of generation of reactions begins with “pop-ping” a molecule from this list. The generator contains a list ofconstraints, called the global constraints, that are to be satisfiedby all molecules. For example, a standard global constraint usedin all the examples shown in section 4 is to forbid consecutivedouble bonds such as CdCdC in all molecules. If the moleculesatisfies all the constraints, it is placed in a list called theprocessed molecule list. For each defined reactiontype, thismolecule is tested for possible reactions. This is done using thegraph-rewriting technique described above. The new reactionsare stored, and all product molecules not present in either ofthe molecule lists are added into the unprocessed molecule list.For bimolecular reactiontypes, the second molecule is taken fromthe processed molecule list. Once all reactiontypes are consid-ered, the next molecule in the unprocessed molecule is popped,and the entire sequence of steps described above is repeated.The overall process of generation ends when the unprocessedlist is empty. The final output is a list of all reactions that aregenerated.

4. Results and Discussion

A description of RING was provided in section 3. Fourexamples of reaction generation from the domain of biomassconversion are presented here to highlight the ability of thisnetwork generator to model different kinds of chemistries. Theseare (a) acid-catalyzed dehydration of fructose to form 5-hy-droxymethyl furfural (HMF), (b) base-catalyzed transesterifi-cation of triglycerides, (c) gas phase pyrolysis of fatty esters,and (d) acid-catalyzed hydrolysis of HMF to produce levulinicacid. These four examples, which are topics of current research,portray the diverse chemistries characteristic of biomass conver-sion systems and thus form a good basis for testing the scopeof RING. In all these cases, RING reproduced the mechanismsreported in the literature and, in addition, generated otherpossible reactions. The examples were generated on a DellPrecision T3400 workstation with a Q6600 Intel Core 2 Quad2.6 GHz processor.

4.1. Acid-Catalyzed Dehydration of Fructose. 5-Hydroxym-ethyl furfural, or HMF, has been identified as a potential greenfuel,11 and acid-catalyzed dehydration of fructose is one of theproposed routes for its production.49 The elementary steps50

involved in the conversion of fructose to HMF are described inFigure 7. RING then generated all possible reactions that canoccur in the system within the given set of reaction rules and

Figure 5. Adsorption of dicarbonyl on an acid site. There are two instancesof the keto group pattern CdO, which result in two possible adsorptionsteps.

Figure 6. Flowchart for the generation of reactions.

Figure 7. Elementary steps for dehydration of fructose to produce HMF.

Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010 10463

Page 6: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

listed over 1500 unique reactions in addition to reproducingthe mechanism reported by Antal et al.50 (Figure 8).

4.2. Base-Catalyzed Transesterification of Triglycerides.Transesterification of triglycerides by methanol, under basicconditions, is the preferred way of producing biodiesel.11 Inthis process, the triglycerides are broken down to diglycer-ides;monoglycerides;andfinally, toglycerol.Themechanism11,51

involves four steps, as shown in Figure 9. These elementarysteps along with the triglyceride shown in Figure 10 werethe inputs. RING generated all possible reactions with thesefour steps, including the mechanism reported in the litera-ture.51

4.3. Pyrolysis of Fatty Esters. Pyrolysis is the thermaldecomposition of large molecules into smaller products via gasphase free radical chemistry. Pyrolysis of fatty acid esters leadsto the formation of linear alkanes and aromatics.11 The mech-anism provided by Schwab et al.52 consists of 10 elementarysteps, as shown in Figure 11. The first step involving theformation of a diene is not an elementary step and proceedsthrough an allylic intermediate. For the sake of simplicity, theadditional steps were ignored. Similarly, the dehydrogenationof cyclohexene shown in step 6 is not an elementary step andhas been considered as a single step for simplicity. These twosimplifications indicate that reaction rules need not always beelementary, and nonelementary rules can also be described ifthey are unimolecular or bimolecular. Figure 12 shows the

reactions that comprise the pyrolysis mechanism52 for ethyloleate as the initial reactant. RING again reproduced themechanism reported by Schwab et al.52

4.4. Levulinic Acid from HMF. Levulinic acid is theprecursor for potential oxygenated fuel additives11 and can besynthesized from HMF in an acid-catalyzed reaction,53 produc-ing formic acid as a byproduct. The elementary steps involvedin the generation of levulinic acid from HMF are given in Figure13. RING generated more than 12 000 reactions; those thatconstitute the mechanism given in the literature53 are shown inFigure 14.

4.5. Discussion. In all these four examples, global constraintsprohibited consecutive double bonds and cations. The maximumallowable size was set to 25 heavy atoms in all cases, exceptfor the case of HMF to levulinic acid, for which a tighterrestriction of 11 was set on the basis of the mechanism reportedin the literature. The number of reactions and molecules(reactants, products, and intermediates) generated in each caseis given in Table 4. The data in Table 4 show that RING canbe used to construct reaction systems of diverse chemistry andvaried size. It can, therefore, be used to construct a variety ofsystems pertaining to biomass conversion. The execution timefor these systems was only a few minutes in all the cases andas little as 2 s for pyrolysis, indicating that large systems canbe constructed in a reasonable time.

The examples of catalysis considered in this paper, thoughhomogeneous, can be used to model heterogeneous catalyticsystems as well. For example, the acid site of a solid acidcatalyst can be represented as H+ for convenience. Thisrepresentation can also be extended to all surface speciesbecause they can be represented by the corresponding ions.For example, surface alkoxides can be represented ascarbenium ions.20,33 In RING, the carbonium ion is repre-

Figure 8. The reaction pathway for dehydration of fructose to form HMF. The SMILES-like representation of the reactants and products is also given.

Figure 9. Elementary steps of base-catalyzed transesterification of triglycerides.

Table 4. Execution Details for the Examples

systemexecutiontime (s)

no. ofmolecules

no. ofreactions

fructose to HMF 26 559 1528transesterification 4 89 140pyrolysis 2 54 67HMF to levulinic acid 366 4456 12253

10464 Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010

Page 7: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

sented as C* to distinguish it from the positively chargedcarbenium ion. This enables RING to describe chemistriessuch as alkane activation on zeolites.

The implementation of SMARTS-like strings, constraintexpressions, and graph transformation system makes the frame-work for describing reactions in RING generic. The internal

Figure 10. Mechanism of transesterification of triglycerides to form diglycerides adapted from the literature.11,51 The SMILES-like string of the reactantsand the products of each reaction, as generated by RING, is also shown.

Figure 11. Elementary steps occurring in pyrolysis of fatty acid ester. Note that steps 1 and 6 are not elementary and can be further resolved into elementarysteps, if necessary.

Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010 10465

Page 8: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

graphical representation of molecules provides an abstractionthat represents the two constituents of a molecule that participatein a reaction: the atoms and the bonds. The ability of RING tohandle diverse forms of chemistries is a result of these twofeatures, which allow for

(1) representation of every observed atomic configuration ofan atom through its attributes;

(2) representing modifications of electronic configuration of an atomas changes in attributes during the graph transformation step;

(3) reflecting changes in the bonding of a molecule bymodifying the edge attributes in a graph transformation step;and

(4) describing constraints, either using constraint expressionsto describe molecular constraints or incorporating atomicconstraints within SMARTS-like strings.

Table 4 indicates that a large number of reactions may begenerated for certain systems. It is likely that not all of themwill be significant under a given reaction condition. It is,therefore, important to be able to prune a reaction network to amanageable size. In RING, expert knowledge or additionalinformation can be used to either remove insignificant reactionrules or include limiting constraints in reaction rules that preventthe generation of insignificant reactions. The additional informa-tion could include theoretical calculations or experimentalobservations.

An important aspect of any automated reaction networkgenerator is the comprehensiveness of the reactions that aregenerated. In other words, the tool should not overlook a validreaction or generate an incorrect reaction. Hsu et al.20 indicatethat proving comprehensiveness for a large reaction system isan open problem. Instead, the comprehensiveness of thereactions generated using RING can be inferred on the basis ofthree analyses. First, it is clear that RING does not overlookobvious reactions because it generates all the reactions reportedin the literature in each of the four examples discussed above.Second, RING was tested using a small reaction system, forwhich it is easier to manually enumerate the number of possiblereactions of certain reaction steps. The output of RING can thenbe analyzed to check if the requisite number of reactions ofthese reaction steps has been generated. Third, for a givenreaction step, the number of reactions generated by differentreactants of the same homologous series can be easily calculatedand used to verify the output of RING. The second and thirdanalyses have been carried out separately and are documentedin the Supporting Information.

Different reaction sequences leading from a reactant to a finalproduct can be determined from the list of all reactions usingRING. Since all the reaction pathways of a given molecule canbe quickly determined if the elementary steps are known, RING

Figure 12. Pyrolysis mechanism of ethyl oleate adapted from the mechanism of Schwab et al.52 The SMILES-like string of the reactants and the productsof each reaction, as generated RING, is also shown.

10466 Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010

Page 9: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

can be used to identify novel thermochemical pathways thatcan serve as potential synthesis routes for biofuels and chemi-cals. The list of generated reactions and species can be used tobuild mechanisms and develop microkinetic models to simulatethe different conversion processes in a biorefinery. In additionto its applicability in biomass conversion, this tool can also beused to construct reaction networks of other complex systems,such as hydrocracking or combustion, because the underlyingtheory governing the tool is generic.

RING still has a few limitations in terms of representingmolecules and intermediates. For example, there exists nocompact representation for representing polymers such aslignin. Furthermore, unlike acid catalysts that can berepresented as H+ for convenience, metal catalysts cannotbe represented at all. Therefore, other types of catalysts andintermediates involving multiple catalytic sites cannot berepresented. In addition, hydrogen bonding of oxygen-containing intermediates cannot yet be represented. Theselimitations currently restrict the ability of the tool to modelsolid catalysis. RING is currently implemented as a C++library of classes and functions, and the reaction rules andthe initial reactants of a system are written into a C++ codethat can access this library. Furthermore, the extensive use

of strings containing alphanumeric or special characters forrepresenting molecules, patterns, and constraints makes itunintuitive for a user to define reaction rules in RING. Auser-interface in the form of a domain specific language thatmimics graph-rewriting is planned. Research is in progressto address these limitations and to thereby expand the scopeof application as well as software usability of RING.Nevertheless, this tool currently offers a framework toconstruct reaction networks of different chemical systems inan exhaustive manner.

5. Conclusions

A rule-based tool reaction generation tool RING waspresented as a single platform for constructing complexreaction networks of different chemical systems in biomassconversion. RING adopts established methods and algorithmsfrom Cheminformatics to extend the techniques used by otherreaction generation tools designed for specific types ofchemistry. Reaction rules in RING are described using aframework with a three-step modular procedure. Moleculeshave an abstract representation in the form of a moleculargraph, and a graph-transformation system that applies chemi-

Figure 13. Elementary steps used for levulinic acid synthesis from HMF.

Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010 10467

Page 10: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

cal transformations on the molecular graph is the abstractionof a reaction. A generic scheme for describing constraints,to variable degrees of complexity, is implemented in RINGto describe restrictions that can be imposed in a reactionsystem based on physical and chemical arguments. Fourrelevant chemical systems (namely, dehydration of fructoseto produce HMF, base-catalyzed transesterification of trig-

lycerides, gas phase pyrolysis of fatty esters, and acid-catalyzed hydrolysis of HMF to form levulinic acid) that arerepresentative of the diverse types of chemistry potentiallyvaluable in biomass conversion were considered for exhaus-tive generation of reactions. RING reproduced the mecha-nisms reported in the literature for the all these systems.RING can thus model heterogeneous and homogeneous

Figure 14. Mechanism for conversion of HMF to levulinic acid based on Horvat et al.53 The SMILES-like string of the reactants and products of thesereactions, as generated by the RING, is also shown.

10468 Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010

Page 11: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

reactions in the thermochemical routes to biomass conversion.Some limitations, such as representing hydrogen bonding,however, still exist in the tool and form the subjects of currentresearch undertaken to make RING more versatile.

Acknowledgment

The authors thank Dr. Shuo-Huon Hsu in OSIsoft, LLC, SanLeandro, CA for technical discussions on automated reactiongeneration. The authors also acknowledge financial support fromthe Institute on the Environment (Discovery Grant: DG-0009-08) and Initiative for Renewable Energy (Large Grant: RL-0004-09) at the University of Minnesota and from the NationalScience Foundation Emerging Frontiers in Research and In-novation program, Grant no. 0937706.

Supporting Information Available: Discussion of twoanalyses to prove comprehensiveness of RING and a list ofreactions generated for the two analyses. This material isavailable free of charge via the Internet at http://pubs.acs.org.

Literature Cited

(1) Schubert, C. Can biofuels finally take center stage. Nat. Biotechnol.2006, 24 (7), 777.

(2) Ragauskas, A. J.; Williams, C. K.; Davison, B. H.; Britovsek, G.;Cairney, J.; Eckert, C. A.; Frederick, W. J.; Hallett, J. P.; Leak, D. J.; Liotta,C. L.; Mielenz, J. R.; Murphy, R.; Templer, R.; Tschaplinski, T. The pathforward for biofuels and biomaterials. Science 2006, 311 (5760), 484.

(3) Regalbuto, J. R. Cellulosic Biofuels-Got Gasoline. Science 2009,325 (5942), 822.

(4) U.S. DOE; Biomass as feedstock for a bioenergy and bioproductsindustry: The technical feasibility of a billion-ton annual supply, http://www1.eere.energy.gov/biomass/pdfs/final_billionton_vision_report2.pdf, April2005 (accessed January, 2010).

(5) Clark, J. H.; Budarin, V.; Deswarte, F. E. I.; Hardy, J. J. E.; Kerton,F. M.; Hunt, A. J.; Luque, R.; Macquarrie, D. J.; Milkowski, K.; Rodriguez,A.; Samuel, O.; Tavener, S. J.; White, R. J.; Wilson, A. J. Green Chemistryand the biorefinery: a partnership for a sustainable future. Green Chem.2006, 8, 853.

(6) Petrus, L.; Noordermeer, M. A. Biomass to biofuels, a chemicalperspective. Green Chem. 2006, 8 (10), 861.

(7) Ho, T. C. Kinetic Modeling of Large-Scale Reaction Systems. Catal.ReV. 2008, 50 (3), 287–378.

(8) Quann, R. J.; Jaffe, S. B. Building useful models of complex reactionsystems in petroleum refining. Chem. Eng. Sci. 1996, 51 (10), 1615.

(9) Ghosh, P.; Hickey, K. J.; Jaffe, S. B. Development of a detailedgasoline composition-based octane model. Ind. Eng. Chem. Res. 2006, 45(1), 337.

(10) Moro, L. F. L. Process technology in the petroleum refiningindustryscurrent situation and future trends. Comput. Chem. Eng. 2003,27, 1303.

(11) Huber, G. W.; Iborra, S.; Corma, A. Synthesis of TransportationFuels from Biomass: Chemistry, Catalysts, and Engineering. Chem. ReV.2006, 106, 4044.

(12) Corma, A.; Iborra, S.; Velty, A. Chemical Routes for the Trans-formation of Biomass into Chemicals. Chem. ReV. 2007, 107, 2411.

(13) Chheda, J. N.; Huber, G. W.; Dumesic, J. A. Liquid-Phase CatalyticProcessing of Biomass-Derived Oxygenated Hydrocarbons to Fuels andChemicals. Angew. Chem. Int. Ed. 2007, 46, 7164–7183.

(14) Carlson, T. R.; Vispute, T. P.; Huber, G. W. Green Gasoline byCatalytic Fast Pyrolysis of Solid Biomass Derived Compounds. ChemSus-Chem 2008, 1, 397.

(15) Dauenhauer, P. J.; Dreyer, B. J.; Degenstein, N. J.; Schmidt, L. D.Millisecond Reforming of Solid Biomass for Sustainable Fuels. Angew.Chem., Int. Ed. 2007, 46, 5864.

(16) Lin, Y.-C.; Huber, G. W. The critical role of heterogeneous catalysisin lignocellulosic biomass conversion. Energy EnViron. Sci. 2009, 2, 68.

(17) Broadbelt, L. J.; Stark, S. M.; Klein, M. T. Computer-generatedpyrolysis modeling: on the fly generation of species, reactions and rates.Ind. Eng. Chem. Res. 1994, 33 (4), 790.

(18) Ratkiewicz, A.; Truong, T. N. Application of chemical graph theoryfor automated mechanism generation. J. Chem. Inf. Model. 2003, 43, 36.

(19) Prickett, S. E.; Mavrovouniotis, M. L. Construction of complexreaction systems 0.2. Molecule manipulation and reaction applicationalgorithms. Comput. Chem. Eng. 1997, 21 (11), 1237.

(20) Hsu, S. H.; Krishnamurthy, B.; Rao, P.; Zhao, C. H.; Jagannathan,S.; Venkatasubramanian, V. A domain-specific compiler theory basedframework for automated reaction network generation. Comput. Chem. Eng.2008, 32 (10), 2455.

(21) Todd, M. H. Computer-aided organic synthesis. Chem. Soc. ReV.2005, 34 (3), 247.

(22) Engel, T. Basic Overview of Cheminformatics. J. Chem. Inf. Model.2006, 46 (6), 2267.

(23) Corey, E. J.; Long, A. K.; Rubenstein, S. D. Computer-AssistedAnalysis in Organic Synthesis. Science 1985, 228 (4698), 408.

(24) Jones, M., Jr. Organic Chemistry; W. W. Norton & Company: NewYork, 1997.

(25) Dugundji, J.; Ugi, I. An algebraic model of constitutional chemistryas a basis for chemical computer programs. Top. Curr. Chem. 1973, 39,19.

(26) Ugi, I.; Bauer, J.; Bley, K.; Alf, D.; Dietz, A.; Fortain, E. Computerassisted solution of chemical problemssThe historical development andpresent state of the art of a new discipline of chemistry. Angew. Chem.,Int. Ed. Engl. 1993, 32, 201.

(27) Ugi, I.; Bauer, J.; Blomvberger, C.; Brandt, J.; Dietz, A.; Fontain,E.; et al. Models, concepts, theories, and formal languages in chemistryand their use as a basis for computer assistance in chemistry. J. Chem. Inf.Comput. Sci. 1994, 34, 3.

(28) Quann, R. J.; Jaffe, S. B. Structure oriented lumpingsDescribingthe chemistry of complex hydrocarbon mixtures. Ind. Eng. Chem. Res. 1992,31 (11), 2483.

(29) Jaffe, S. B.; Freund, H.; Olmstead, W. N. Extension of Structure-Oriented Lumping to Vacuum Residua. Ind. Eng. Chem. Res. 2005, 44 (26),9840.

(30) Kruse, T. M.; Wong, H.-W.; Broadbelt, L. J. Mechanistic Modelingof Polymer Pyrolysis: Polypropylene. Macromolecules 2003, 36 (25), 9594.

(31) Li, C.; Henry, C. S.; Jankowski, M. D.; Ionita, J. A.; Hatzimanikatis,V.; Broadbelt, L. J. Computational discovery of biochemical routes tospecialty chemicals. Chem. Eng. Sci. 2004, 59, 5051.

(32) Wong, H.-W.; Li, X.; Swihart, M. T.; Broadbelt, L. J. DetailedKinetic Modeling of Silicon Nanoparticle Formation Chemistry viaAutomated Mechanism Generation. J. Phys. Chem. A 2004, 108 (46), 10122.

(33) Prickett, S. E.; Mavrovouniotis, M. L. Construction of complexreaction systems 0.1. Reaction description language. Comput. Chem. Eng.1997, 21 (11), 1219.

(34) Broadbelt, L. J.; Stark, S. M.; Klein, M. T. Termination ofComputer-Generated Reaction Mechanisms: Species Rank-Based Conver-gence Criterion. Ind. Eng. Chem. Res. 1995, 34 (8), 2566.

(35) Susnow, R. G.; Dean, A. M.; Green, W. H.; Peczak, P.; Broadbelt,L. J. Rate-Based Construction of Kinetic Models for Complex Systems. J.Phys. Chem. A 1997, 101 (20), 3731.

(36) Warth, V.; Battin-Leclerc, F.; Fournet, R.; Glaude, P. A.; Come,G. M.; Scacchi, G. Computer based generation of reaction mechanisms forgas-phase oxidation. Comput. Chem. 2000, 25, 541.

(37) Chinnick, S. J.; Baulch, D. L.; Ayscough, P. B. An Expert Systemfor Hydrocarbon Pyrolysis Reactions. Chemom. Intell. Lab. Syst. 1988, 5,39.

(38) Dumesic, J. A. Analyses of Reaction Schemes Using De DonderRelations. J. Catal. 1999, 185, 496.

(39) Bhan, A.; Hsu, S.-H.; Blau, G.; Caruthers, J. M.; Venkatasubra-manian, V.; Delgass, W. N. Microkinetic modeling of propane aromatizationover HZSM-5. J. Catal. 2005, 235, 35.

(40) Weininger, D. SMILES, A chemical language and informationsystems. 1. Introduction to methodology and encoding rules. J. Chem. Inf.Comput. Sci. 1988, 28 (1), 31.

(41) Weininger, D.; Weininger, A.; Weininger, J. L. SMILES. 2.Algorithm for generation of unique SMILES notation. J. Chem. Inf. Comput.Sci. 1989, 29 (2), 97.

(42) Blurock, E. S. Reaction: System for Modeling Chemical Reactions.J. Chem. Inf. Comput. Sci. 1994, 35, 607.

(43) Daylight Chemical Information Systems, Inc.; Daylight TheoryManual; 2008; http://www.daylight.com/dayhtml/doc/theory/index.html (ac-cessed Jan 2010).

(44) Trinajstic, N. Chemical Graph Theory, 2nd ed.; CRC Press: BocaRaton, FL, 1992.

(45) Hanser, T.; Jauffret, P.; Kaufmann, G. A New Algorithm forExhaustive Ring Perception in a Molecular Graph. J. Chem. Inf. Model.1996, 36, 1146.

Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010 10469

Page 12: Rule-Based Generation of Thermochemical Routes to Biomass Conversion

(46) Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E.;Willighagen, E. The Chemistry Development Kit (CDK): An open-sourceJava library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 2003,43 (2), 493.

(47) Ullmann, J. R. An Algorithm for Subgraph Isomorphism. J. ACM1976, 23 (1), 31.

(48) Benko, G.; Flamm, C.; Stadler, P. F. A Graph-Based Toy Modelof Chemistry. J. Chem. Inf. Comput. Sci. 2003, 43 (4), 1085.

(49) Torres, A. I.; Tsapatsis, M.; Daoutidis, P. Continuous productionof 5-Hydroxymethylfurfural from fructose: a design case study. Energy andEnVironmental Science, submitted.

(50) Antal, M. J.; Mok, W. S. L.; Richards, G. N. Mechanism offormation of 5(hydroxymethyl)-2-furaldehyde from D-fructose and sucrose.Carbohydr. Res. 1990, 199, 91.

(51) Schuchardt, U.; Sercheli, R.; Vargas, R. M. Transesterification ofvegetable oils. J. Braz. Chem. Soc. 1998, 9, 199.

(52) Schwab, A. W.; Dystra, G. J.; Selke, E.; Sorenson, S. C.; Pryde,E. H. Diesel fuel from thermal decomposition of soybean oil. J. Am. OilChem. Soc. 1988, 65, 1781.

(53) Horvat, J.; Klaic, B.; Metelko, B.; Sunjie, V. Mechanism of levulinicacid formation. Tetrahedron Lett. 1985, 26, 2111.

ReceiVed for reView March 8, 2010ReVised manuscript receiVed May 19, 2010

Accepted May 20, 2010

IE100546T

10470 Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010