accelerated discovery of high-refractive-index polyimides

13
doi.org/10.26434/chemrxiv.7670903.v1 Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining Mohammad Atif Faiz Afzal, Mojtaba Haghighatlari, Sai Prasad Ganesh, Chong Cheng, Johannes Hachmann Submitted date: 05/02/2019 Posted date: 06/02/2019 Licence: CC BY-NC-ND 4.0 Citation information: Afzal, Mohammad Atif Faiz; Haghighatlari, Mojtaba; Ganesh, Sai Prasad; Cheng, Chong; Hachmann, Johannes (2019): Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining. ChemRxiv. Preprint. We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches. File list (2) download file view on ChemRxiv Accelerated Discovery of High-Refractive-Index Polyimide... (1.02 MiB) download file view on ChemRxiv R1_R2_PI_screening_results.xlsx (10.96 MiB)

Upload: others

Post on 28-Dec-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerated Discovery of High-Refractive-Index Polyimides

doi.org/10.26434/chemrxiv.7670903.v1

Accelerated Discovery of High-Refractive-Index Polyimides viaFirst-Principles Molecular Modeling, Virtual High-Throughput Screening,and Data MiningMohammad Atif Faiz Afzal, Mojtaba Haghighatlari, Sai Prasad Ganesh, Chong Cheng, Johannes Hachmann

Submitted date: 05/02/2019 • Posted date: 06/02/2019Licence: CC BY-NC-ND 4.0Citation information: Afzal, Mohammad Atif Faiz; Haghighatlari, Mojtaba; Ganesh, Sai Prasad; Cheng, Chong;Hachmann, Johannes (2019): Accelerated Discovery of High-Refractive-Index Polyimides via First-PrinciplesMolecular Modeling, Virtual High-Throughput Screening, and Data Mining. ChemRxiv. Preprint.

We present a high-throughput computational study to identify novel polyimides (PIs) with exceptionalrefractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI predictionprotocol based on a combination of first-principles and data modeling developed in previous work, which weemploy on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtualscreening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order todetermine the performance potential of each candidate. This rapid and efficient approach yields a number ofhighly promising leads compounds. Using the data mining and machine learning program package ChemML,we analyze the top candidates with respect to prevalent structural features and feature combinations thatdistinguish them from less promising ones. In particular, we explore the utility of various strategies thatintroduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights providea foundation for rational and targeted design that goes beyond traditional trial-and-error searches.

File list (2)

download fileview on ChemRxivAccelerated Discovery of High-Refractive-Index Polyimide... (1.02 MiB)

download fileview on ChemRxivR1_R2_PI_screening_results.xlsx (10.96 MiB)

Page 2: Accelerated Discovery of High-Refractive-Index Polyimides

Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles

Molecular Modeling, Virtual High-Throughput Screening, and Data Mining

Mohammad Atif Faiz Afzal,1, ∗ Mojtaba Haghighatlari,1 Sai

Prasad Ganesh,1 Chong Cheng,1 and Johannes Hachmann1, 2, 3, †

1Department of Chemical and Biological Engineering, University at Buffalo,

The State University of New York, Buffalo, NY 14260, United States2Computational and Data-Enabled Science and Engineering Graduate Program,

University at Buffalo, The State University of New York, Buffalo, NY 14260, United States3New York State Center of Excellence in Materials Informatics, Buffalo, NY 14203, United States

ABSTRACT

We present a high-throughput computational study to identify novel polyimides (PIs) with ex-ceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizesan RI prediction protocol based on a combination of first-principles and data modeling developed inprevious work, which we employ on a large-scale PI candidate library generated with the ChemLG

code. We deploy the virtual screening software ChemHTPS to automate the assessment of thisextensive pool of PI structures in order to determine the performance potential of each candidate.This rapid and efficient approach yields a number of highly promising leads compounds. Usingthe data mining and machine learning program package ChemML, we analyze the top candidateswith respect to prevalent structural features and feature combinations that distinguish them fromless promising ones. In particular, we explore the utility of various strategies that introduce highlypolarizable moieties into the PI backbone to increase its RI yield. The derived insights provide afoundation for rational and targeted design that goes beyond traditional trial-and-error searches.

I. INTRODUCTION

Polyimides (PIs), shown in Fig. 1, are synthesized bypolycondensation of R1-containing dianhydride with R2-based diamine or diisocyanate [1]. They are an appealingclass of organic materials due to their exceptional ther-mal stability and easy processability [2–4]. These proper-ties are complemented by mechanical stability, flexibility,light weight, low cost, as well as flame and radiation re-sistance, and they thus hold much promise for a range ofapplications [5, 6].

FIG. 1. Polyimide (PI) core structure with residues R1 andR2.

However, their generally low index of refraction (RI)undermines their utility for use in many optic and opto-electronic devices [7–11], such as (image) sensors [12, 13],displays [14], and light sources (including organic light-emitting diodes) [15], in which organic materials can bedeployed in situ as microlenses [16], waveguides [17],microresonators [18], interferometers [19], anti-reflective

[email protected][email protected]

coatings [20], optical adhesives [21], and substrates [22].Most of these applications demand large RI values, oftenupwards of 1.7 or 1.8. Typical carbon-based polymers –PIs included – only exhibit values in the range of 1.3–1.5 [11]. This situation provides a strong incentive topursue novel high-RI PIs that are suitable for the afore-mentioned applications.

As the properties of organic materials can be tailoredand tuned by controlling their molecular structure, theyare a prime target for combinatorial search or rationaldesign attempts [23–25]. The addition of highly polariz-able moieties that do not have extensive π-electron con-jugation can increase the RI of polymers without neg-atively affecting other optical properties. (Significantπ-conjugation can lead to large optical dispersion andbirefringence [26], as well as poor transparency and col-oration.) For example, the incorporation of small aro-matic rings, halogens, metals, and in particular sulfurhave shown promise for this purpose [3, 27–31]. In 2007,Ueda et al. developed PI films with relatively high RIvalues of up to 1.75, but they, unfortunately, exhibitedlarge birefringence [32]. In recent years, these findingshave been improved upon by increasing the sulfur contentof the PIs, yielding RI values of up to 1.76 and smallerbirefringence [33].

In this paper, we present a computational and data-driven approach to study the RI values of PIs and rapidlyidentify promising lead compounds. We investigate dif-ferent ways that introduce highly polarizable moietiesinto the PI framework in order to overcome the techni-cal limits of existing compounds. Given the encouragingresults from the earlier work discussed above, we put a fo-cus on incorporating sulfur into the PIs, and in doing so,

Page 3: Accelerated Discovery of High-Refractive-Index Polyimides

2

create a new class of high-RI polymers. In Sec. II, we de-tail the methods employed in this work, i.e., we providebackground on our data-driven in silico approach (Sec.II A), introduce our RI modeling protocol (Sec. II B), dis-cuss the molecular design space we consider (Sec. II C),and outline our data mining and analysis techniques (Sec.II D). Sec. III presents and discusses the results of ourstudy, and our findings are summarized in Sec. IV.

II. BACKGROUND, METHODS, AND

COMPUTATIONAL DETAILS

A. Data-Driven In Silico Approach

The development of new materials such as PIs has tra-ditionally been an experimentally-driven, trial-and-errorprocess, guided by experience, intuition, and conceptualinsights. This approach is, however, often costly, slow,biased towards certain domains of chemical space, andlimited to relatively small-scale studies, which may eas-ily miss promising leads (both on individual compoundsas well as compound classes).

The study at hand instead embraces a data-driven insilico research paradigm that has gained considerable in-terest in the past few years [34, 35] for its promise toaddress the inherent complexities of structure-propertyrelationships and the vastness of chemical space moreefficiently (see, e.g., Refs. [36–43]). Our work bringstogether molecular modeling, high-throughput computa-tional screening, and machine learning as well as a corre-sponding software ecosystem to support data-driven dis-covery and rational design [44, 45]. It has its greatestutility as part of integrated research pipelines with ex-perimentalist partners, where it provides guidance forexperimental efforts and mitigates some of their before-mentioned shortcomings.

Our research approach and its rationale can be sum-marized as following: We first establish a computationalmodeling protocol that can make sufficiently accurateand fast predictions for the target property in the com-pound class(es) of interest – in the study at hand for theRI values of PIs (cf. Sec. II B). We then create a large-scale virtual library of candidates within that compoundclass (cf. Sec. II C), on which we apply this modelingprotocol to evaluate the performance potential of eachcandidate compound. In addition to obtaining informa-tion about each individual compound, we can mine thescreening results in their entirety to reveal underlyingstructure property-relationships (cf. Sec. II D). Our ap-proach can thus identify both specific lead compoundsas well as high-value molecular patterns and features forthe de novo design of tailored PIs or the creation of ad-ditional screening libraries.

B. Refractive Index Modeling Protocol

According to the Lorentz-Lorenz equation, the RI (nr)is a function of the polarizability (α) and the numberdensity (N), i.e.,

nr =

1 + 2αN/3ǫ01− αN/3ǫ0

.

In previous work [46, 47], we developed a modeling pro-tocol that allows us to accurately and efficiently pre-dict the RI values of polymers within the Lorentz-Lorenzequation. This protocol is based on a combination offirst-principles quantum chemistry calculations and datamodeling. We compute (static) polarizability values αusing the coupled-perturbed self-consistent field equa-tions within the Kohn–Sham density functional theory(DFT) framework with the PBE0 hybrid functional [48]and the double-ζ-quality def2-SVP basis set by the Karl-sruhe group [49]. We employ an all-electron, closed-shellapproach and include Grimme’s D3 correction [50] to ac-count for dispersion interactions. The polarizability cal-culations are performed on geometries optimized usingthe universal force field (UFF) [51] following a 3D con-formational screening as implemented in the OpenBabelsoftware [52]. All DFT calculations are carried out usingthe ORCA 3.0.2 quantum chemistry package [53]. Wecompute the number density N via the van der Waalsvolume VvdW and packing factor Kp of the amorphousbulk polymer as

N =Kp

VvdW

.

We obtain VvdW using Slonimskii’s method detailed inRef. [54] and Kp using the support vector regression(SVR) machine learning model introduced in Ref. [46](except during the prescreening of individual residues R1

and R2 discussed in the following section, for which wesimply employ a constant Kp of 0.75 that is typicallyfound in PIs [55]). We obtain the asymptotic limit for thepolymers via a linear extrapolation scheme as outlinedin Ref. [46]. (In expoloratory calculations, we alreadyobserve near-perfect extensivity based on the monomerunits, which we explain with the large size of the PImonomers (∼ 150 atoms) and the finite correlation lengthexhibited in them. There is thus no need to perform ex-pensive oligomer-sequence calculations.) We previouslydemonstrated on a set of 112 non-conjugated polymerswith experimentally known RI values that this protocolcan make rapid and accurate predictions [46, 47], andthus enable the high-throughput screening study at hand.We execute the latter using our automated virtual screen-ing code ChemHTPS [56], which creates inputs, executesand monitors the calculations, parses and assesses the re-sults, and which extracts and post-processes the informa-tion of interest. Where applicable, we report the meanabsolute error (MAE) and the root mean squared error(RMSE) as well as their percentage errors (MAPE and

Page 4: Accelerated Discovery of High-Refractive-Index Polyimides

3

RMSPE, respectively) compared to available experimen-tal data.

C. Molecular Design Space

In Fig. 2, we show the RI heat map in the α/N param-eter space and mark the positions of the 112 polymerswe previously studied [46, 47]. The figure emphasizesthe relative importance of the counteracting parametersand their feasibility in real-world compounds. In orderto design high-RI systems, we in principle have to pur-sue compounds that simultaneously feature both largepolarizability and number density values. The position-ing of the known compounds with high-RI leans moretowards high polarizability and relatively small numberdensity rather than vice versa. One path to narrowingdown compound space is to maximize the polarizabilityfor systems from the same compound family, for whichthe number density is similar. Given that this is knownto be the case for PIs, we pursue this path and focus oursearch on highly polarizable PI candidates [55].

FIG. 2. Refractive Index (RI) heat map showing the depen-dence on polarizability and number density expressed in theLorentz-Lorenz equation. The dots mark 112 experimentallyknown polymers that were used to benchmark the employedRI model.

For this, we create a PI library based on 29 buildingblocks and bonding rules shown in Fig. 3, which we se-lect following empirical expertise with regard to promiseand synthetic feasibility. The building blocks can be di-vided into linkers (B1-B6) and (hetero-)aromatic moi-eties (B7-B29). Using our ChemLG molecular librarygenerator code [57] with a systematic combinatorial link-ing scheme, we initially generate 38,619 R1 structuresand 171,172 for R2. Combining all R1 and R2 to formPIs would lead to a total of 6.6 billion compounds. Torestrict the search space to a more manageable number ofcandidates, we select only the most promising R1 and R2

groups (which ultimately distinguish the PI candidates)

based on the computed RI values of the individual R1

and R2. After prescreening the residues, we develop PIcandidates by introducing the top 100 R1 and 100 R2

structures into the PI core motif as shown in Fig. 1. Wesubsequently screen the 10,000 resulting PI candidates.ChemLG keeps a record of the list of building blocks andconnections that are used to create a particular com-pound, and we use this information for the analysis ofstructure-property relationships.

FIG. 3. Molecular building blocks used to create the residuesR1 and R2 of the PI screening library studied in this work.The “R” on the building blocks denote allowed sites for link-ing. The building blocks marked in green (B1-B6) are linkersand the blue ones (B7-B29) are (hetero-)aromatic moieties.

D. Data Mining and Pattern Analysis

In addition to identifying the best candidates from ourhigh-throughput screening, i.e., those with the highestpredicted RI values, we further analyze the collected datato develop a better understanding of the structural pat-terns that lead to high-RI PIs. For instance, we conducta hypergeometric distribution analysis, in which we com-pute the Z-scores (Zi) of each building block i used inthe creation of the screening library as

Zi =ki −mKi

M

σi

,

with

σi =

[

mKi

(

M −Ki

M

)

×

(

M −m

M − 1

)]1

2

,

where M is the total number of compounds in the en-tire library, m is the subset of compounds that is con-sidered (e.g., the top compounds), Ki is the number of

Page 5: Accelerated Discovery of High-Refractive-Index Polyimides

4

TABLE I. Comparison of RI results from the employed prediction model with the experimental values of ten known high-RIPIs. We report errors and percentage errors in each case. For the entire set, we obtain a mean absolute error (MAE) and rootmean squared error (RMSE) of 0.021 (1.2%) and 0.025 (1.4%), respectively. (The values in brackets are the correspondingpercentage errors, i.e., MAPE and RMSPE.)

Ara−j Structure Experimentvalue [11]

Calculatedvalue

Error

a 1.746 1.738 -0.008 (-0.5%)

b 1.753 1.739 -0.013 (-0.7%)

c 1.749 1.707 -0.042 (-2.4%)

d 1.748 1.751 0.003 (0.2%)

e 1.733 1.760 0.027 (1.6%)

f 1.758 1.779 0.021 (1.2%)

g 1.760 1.735 -0.025 (-1.4%)

h 1.726 1.741 0.015 (0.9%)

i 1.737 1.743 0.005 (0.3%)

j 1.769 1.724 -0.045 (-2.5%)

occurrences of building block i in M molecules and kiits occurrences in the subset of m molecules. A largeZ-score thus indicates a statistical overexpression of theassociated building block in the high-RI candidates rela-tive to the overall screening library. By applying the Z-score analysis, we can thus identify the most importantbuilding blocks and the degree to which they correlatewith large RI values. We perform a similar analysis ofbuilding block combinations to reveal synergistic effectssimilar to those known from push-pull or donor-acceptorcopolymers. In addition, we present an analysis of Z-score trends for each building block in ranked candidatesubsets as well as of the average RI values of the candi-dates derived from each building block. All data miningwork was conducted using our ChemML code [58].

III. RESULTS AND DISCUSSION

We previously tested the RI modeling protocol de-scribed in Sec. II B for the before-mentioned set of 112

non-conjugated polymers and established its predictiveperformance. However, this benchmark set did not con-tain any PIs. To prove our protocol’s validity for thisparticular compound class, we perform additional calcu-lations on ten PIs with experimentally known RI values.We compare the experimental and computational resultsin Tab. I. The MAE (MAPE) and RMSE (RMSPE)values for this comparison are 0.021 (1.2%) and 0.025(1.4%), respectively, which suggests that our protocol canaccurately predict the RI values of PIs as well.

The results of the R1 and R2 residue prescreening aresummarized in Fig. 4 (a) and (b), respectively, and theresults of the subsequent PI screening are given in Fig.4 (c). The plots show histograms of the computed RIresults. Most of the candidates for R1 and R2 have RIvalues between 1.5 and 1.7 with an average of (a) 1.600,and (b) 1.627, respectively. However, the PI candidatesderived from the top R1 and R2 residues have signifi-cantly larger RI values with an average of 1.843. We notethat either of these averages is well above the range oftypical organic polymers (i.e., 1.3–1.5), which indicates

Page 6: Accelerated Discovery of High-Refractive-Index Polyimides

5

a good choice of building blocks. As per the objectivesof this work, we are able to identify a sizable numberof compounds with RI values greater than 1.7, i.e., (a)2851 (7.4%), (b) 22388 (13.1%), and (c) 9985 (99.8%).At or above the critical threshold of 1.8, we still find (a)131 (0.3%), (b) 1252 (0.7%), and (c) 6961 (69.6%) com-pounds. The shift to considerably larger RI values alsodemonstrates the success of our strategy to prescreen theR1 and R2 individually and build PI candidates based onthe top residues. (Details of the prescreening and screen-ing results are provided in the Supplementary Material.)

Fig. 5 shows the RI distribution of (a) R1, (b) R2,and (c) PI candidates containing each constituent build-ing block (cf. the PI candidate library construction illus-trated in Fig. 3). We find that R1 and R2 candidates con-taining building blocks B28 (anthracene), B25 (diben-zothiophene), and B24 (thianthrene) have the highestRI values, while those containing building blocks B17

(cyclopentadiene) and B16 (1,3,5-triazinane) show thelowest RI. The ranking of the building blocks is very sim-ilar for both R1 and R2 structures, which is unsurprisingconsidering their general similarity. The PI candidatesdo not contain building blocks B10 (1,4-dithiane) andB14 (toluene with linking in 2,4 position), as these weremissing in the top 100 structures of R1 and R2 that wereused in their construction. The average RI values for allthe other building blocks – with the exception of B26

(tetraphenylmethane) – are greater than 1.8. However,it should be noted that the number of PI candidates con-taining a specific building block is very variable. The redplot in Fig. 5 (c) shows the count of each building blockin the 10,000 PI structures. The most common buildingblocks in the PI library are B1 (CH2-linker), B25, B2

(S-linker), B28, and B3 (O-linker), with B28 and B2

occurring in almost all PI candidates. Building blocksB1, B2, and B3 do not exhibit particularly large aver-age RI values, however, given the construction templatefor the residues introduced in Fig. 3, they are statisticallymore likely to occur. The average RI value alone is thusnot a sufficient metric to gauge the potential impact ofeach building block on the performance of PI candidates.

In the following, we analyze the contribution of thebuilding blocks in the high-RI candidates of R1 and R2,using the hypergeometric distribution analysis detailedin Sec. II D. We focus on the top 10% of the R1 andR2 candidates (with RI values greater than 1.687 and1.711, respectively). We do not include PI candidatesin this analysis as the selective construction scheme withits biased building block counts makes it less meaningful.Fig. 6 shows the resulting Z-scores, which identify theover- or underrepresentation of each building block inthe high-RI R1 and R2 candidates compared to a randomsample. The results point to B28 and B25 as the mostprevalent building blocks in the top residues, and thus themost promising ones to consider for the design of high-RI polymers. This finding is in good agreement withthe prior analysis of the RI averages and distribution.The ranking of the building blocks in both R1 and R2

are again largely the same, suggesting that the effect ofthe building blocks is similar for the somewhat differentsequence in R1 and R2.The above Z-score analysis only yields insights into

the pervasiveness of building blocks in the top 10% can-didates. In order to gain a more comprehensive picture,we now evaluate the Z-score of each building block ineach 10% segment of the R1 and R2 libraries. For this,we sort the R1 and R2 libraries by increasing RI value,divide them into 10 subsets, and perform a hypergeomet-ric distribution analysis for each building block in eachsubset. The results are shown in Fig. 7. We observe cer-tain general trends for the building blocks making up R1

and/or R2:

1. The Z-score of some building blocks increases withincreasing RI values, indicating a direct correlation(e.g., for B28, B25, B24, and B23).

2. The Z-score of some building blocks decreases withincreasing RI values, indicating a negative correla-tion (e.g., for B17, B22, B10, and B16).

3. The Z-score of some building blocks goes through amaximum for intermediate RI values, indicating acorresponding correlation with average candidates(e.g., for B11, B26, B19, B18, B20, and B7).

4. The Z-score of some building blocks does not showa clear trend, indicating a lesser impact on and cor-relation with the RI values (e.g., for B1 and B6.)

In the bottom right corner of Fig. 7, we plot the av-erage RI values of the R1 and R2 candidates in each ofthe 10 subsets. We observe that the R2 structures havesomewhat higher RI values in comparison to R1 struc-tures. The principal difference in R1 and R2 candidatesis the number of aromatic building blocks, i.e., two vsthree, and the higher content of these moieties correlateswith higher RI values.In addition to analysing the influence of individual

building blocks on the RI values, we also study the poten-tial impact of building block pairs. For this we calculatethe joint Z-scores of all possible building block combi-nations in the top 10% candidates. The results for R1

and R2 are shown in Fig. 8. This analysis reveals somedependence on particular building block combinations.For instance, B23, B24, and B25 perform significantlyworse when paired with B4 and B5, but exhibit large Z-scores in combination with B2 and B3. However, overallwe find the impact of individual building blocks to be thedominant factor. For instance, B28 has the largest pos-itive Z-score, regardless of its counterpart.Overall, we find anthracene (B28) to be the most

promising moiety, a surprising finding and somewhat con-tradictory to our initial hypothesis considering that itdoes not contain sulfur. However, it is in line with (andindependently confirms) other efforts in the communitythat aim to integrate anthracene into high-RI opticalpolymers (see e.g., Ref. [59]). The other outstanding

Page 7: Accelerated Discovery of High-Refractive-Index Polyimides

6

FIG. 4. RI distribution histograms of (a) the 38,619 individual R1 residues; (b) the 171,172 individual R2 residues; and (c) the10,000 PI structures resulting from the top R1 and R2 residues. We observe a distinct shift towards higher RI values in (c).

FIG. 5. RI value distribution around the respective average RI values (blue points) of the (a) R1 residues; (b) R2 residues; and(c) PI candidates containing each building block. The blue bands refer to one standard deviation. The red points in (c) showthe building block counts.

FIG. 6. Z-score of each building block in the top 10% R1 andR2 candidates compared to the entire R1 and R2 candidatepool, respectively. Green color represents a positive Z-score,and negative ones are shown in red.

moieties are the S-heterocycles dibenzothiophene (B25)and thianthrene (B24). Naphthalene (B23) shows somepromise as well. The O- and S-linkers (B2 and B3, re-spectively) and to lesser degree SO2 (B6) outperform thecarbon-based linkers.

IV. CONCLUSIONS

In this study, we demonstrated that the data-drivenin silico approach that is being advanced by us andothers can rapidly and efficiently assess the propertiesand performance potential of high-RI candidate com-pounds, identify numerous leads for next-generation PIs,and elucidate structure-property relationships that formthe foundation for rational design rules. By combiningour RI prediction model with virtual high-throughputscreening techniques, we characterized candidates on alarge scale at a fraction of the time and cost of tradi-tional studies. We identified high-value building blocks(e.g., anthracene, dibenzothiophene, thianthrene) andstructural patterns (e.g., S- and O-linkers) that corre-late with large RI values. Correspondingly, we identifiedregions in chemical space in which we can hope to max-imize the RI values of PIs. These guidelines allow usto target specific molecular motifs and create polymerswith exceptional optical properties. In future experimen-

Page 8: Accelerated Discovery of High-Refractive-Index Polyimides

7

FIG. 7. Z-score of each building block in all segments of the residual libraries. The top plot in each cell corresponds to R1, thebottom plot to R2. Green color indicates positive Z-scores and red negative values. The last cell shows the average RI valuesin each segment.

tal work, we will utilize these guidelines and pursue thepromising candidates that have emerged.

SUPPLEMENTARY MATERIAL

Electronic supplementary material accompanies thispaper and is available through the journal website. Itprovides details of the computational data displayed inthe figures throughout this paper or that were used inthe statistical analysis. We also give detailed definitionsof all statistical metrics used in this work.

COMPETING FINANCIAL INTERESTS

The authors declare to have no competing financialinterests.

ACKNOWLEDGMENTS

This work was supported by start-up funds providedthrough the University at Buffalo (UB), the National Sci-ence Foundation (NSF) CAREER program (grant No.OAC-1751161), and the New York State Center of Excel-lence in Materials Informatics (grant No. CMI-1148092-8-75163). Computing time on the high-performance com-puting clusters ’Rush’, ’Alpha’, ’Beta’, and ’Gamma’was provided by the UB Center for Computational Re-search (CCR). The work presented in this paper is partof MAFA’s PhD thesis [60]. MH gratefully acknowledgessupport by Phase-I and Phase-II Software Fellowships(grant No. ACI-1547580-479590) of the NSF MolecularSciences Software Institute (grant No. ACI-1547580) atVirginia Tech [61, 62].

Page 9: Accelerated Discovery of High-Refractive-Index Polyimides

8

FIG. 8. Z-scores of building block combinations in the top 10% candidates of R1 and R2. The size of the circles correspond tothe magnitude of the Z-score value. Green circles indicate positive Z-scores and red circles negative values.

[1] George Odian, Principles of Polymerization, 4th ed. (Wi-ley, Hoboken, NJ, 2004).

[2] David Dusselberg, Dominique Verreault, PatrickKoelsch, and Claudia Staudt, “Synthesis and character-ization of novel, soluble sulfur-containing copolyimideswith high refractive indices,” Journal of MaterialsScience 46, 4872–4879 (2011).

[3] Jin-gang Liu, Yasuhiro Nakamura, Tomohito Ogura, YujiShibasaki, Shinji Ando, and Mitsuru Ueda, “Opticallytransparent sulfur-containing polyimide-TiO2 nanocom-posite films with high refractive index and negative pat-tern formation from poly(amic acid)-TiO2 nanocompos-ite film,” Chemistry of Materials 20, 273–281 (2007).

[4] Tomoya Higashihara and Mitsuru Ueda, “Recentprogress in high refractive index polymers,” Macro-molecules 48, 1915–1929 (2015).

[5] Anu Stella Mathews, Il Kim, and Chang-Sik Ha, “Syn-thesis, characterization, and properties of fully aliphaticpolyimides and their derivatives for microelectronics andoptoelectronics applications,” Macromolecular Research15, 114–128 (2007).

[6] Chih-Ming Chang, Cheng-Liang Chang, and Chao-Ching Chang, “Synthesis and optical properties of solu-ble polyimide/titania hybrid thin films,” MacromolecularMaterials and Engineering 291, 1521–1528 (2006).

[7] Ting Lei, Jie-Yu Wang, and Jian Pei, “Roles of flexiblechains in organic semiconducting materials,” Chemistryof Materials 26, 594–603 (2014).

[8] Kashmiri Lal Mittal, Polyimides: synthesis, characteri-

zation, and applications, Vol. 1 (Springer Science & Busi-ness Media, 2013).

[9] Mehdi Barikani and Shahram Mehdipour-Ataei, “Syn-thesis, characterization, and thermal properties of novelarylene sulfone ether polyimides and polyamides,” Jour-nal of Polymer Science Part A: Polymer Chemistry 38,1487–1492 (2000).

[10] Irina Butnaru, Maria Bruma, Thomas Kopnick, andJoachim Stumpe, “Influence of chemical structure on therefractive index of imide-type polymers,” Macromolecu-lar Chemistry and Physics 214, 2454–2464 (2013).

[11] Jin-gang Liu and Mitsuru Ueda, “High refractive indexpolymers: fundamental research and practical applica-tions,” Journal of Materials Chemistry 19, 8907–8919(2009).

[12] Maria D. Angione, Rosa Pilolli, Serafina Cotrone, MariaMagliulo, Antonia Mallardi, Gerardo Palazzo, LuigiaSabbatini, Daniel Fine, Ananth Dodabalapur, NicolaCioffi, and Luisa Torsi, “Carbon based materials for elec-tronic bio-sensing,” Materials Today 14, 424–433 (2011).

[13] Anja Voigt, Ute Ostrzinski, Karl Pfeiffer, Joo Yeon Kim,Vahid Fakhfouri, J. Brugger, and Gabi Gruetzner, “Newinks for the direct drop-on-demand fabrication of poly-mer lenses,” Microelectronic Engineering 88, 2174–2179(2011).

[14] S. Ummartyotin, J. Juntaro, M. Sain, and H. Manus-piya, “Development of transparent bacterial cellulosenanocomposite film as substrate for flexible organic lightemitting diode (OLED) display,” Industrial Crops andProducts 35, 92–97 (2012).

[15] Chaoyu Xiang and Ruiqing Ma, “Devices to increaseOLED output coupling efficiency with a high refractiveindex substrate,” (2017), US Patent 9,640,781.

Page 10: Accelerated Discovery of High-Refractive-Index Polyimides

9

[16] Hiroaki Nishiyama, Junji Nishii, Mizue Mizoshiri, andYoshinori Hirata, “Microlens arrays of high-refractive-index glass fabricated by femtosecond laser lithography,”Applied Surface Science 255, 9750–9753 (2009).

[17] Yasuo Kokubun, Norihide Funato, and MasanoriTakizawa, “Athermal waveguides for temperature-independent lightwave devices,” IEEE Photonics Tech-nology Letters 5, 1297–1300 (1993).

[18] Heming Wei and Sridhar Krishnaswamy, “Direct laserwriting polymer micro-resonators for refractive indexsensors,” IEEE Photonics Technology Letters 28, 2819–2822 (2016).

[19] A. Rodrıguez, G. Vitrant, P.A. Chollet, and F. Kajzar,“Optical control of an integrated interferometer usinga photochromic polymer,” Applied Physics Letters 79,461–463 (2001).

[20] S. Singaravalu, D.C. Mayo, H.K. Park, K.E.Schriver, and R.F. Haglund, “Anti-reflective polymer-nanocomposite coatings fabricated by RIR-MAPLE,” inSPIE LASE, Vol. 8607 (International Society for Opticsand Photonics, 2013) p. 860718.

[21] Jung-Bum Kim, Jeong-Hwan Lee, Chang-Ki Moon, Sei-Yong Kim, and Jang-Joo Kim, “Highly enhanced lightextraction from surface plasmonic loss minimized organiclight-emitting diodes,” Advanced Materials 25, 3571–3577 (2013).

[22] Eunhye Kim, Hyunsu Cho, Kyoohyun Kim, Tae-WookKoh, Jin Chung, Jonghee Lee, YongKeun Park, and Se-unghyup Yoo, “A facile route to efficient, low-cost flexibleorganic light-emitting diodes: Utilizing the high refrac-tive index and built-in scattering properties of industrial-grade PEN substrates,” Advanced Materials 27, 1624–1631 (2015).

[23] Claudio A Terraza, Jin-Gang Liu, Yasuhiro Nakamura,Yuji Shibasaki, Shinji Ando, and Mitsuru Ueda, “Syn-thesis and properties of highly refractive polyimidesderived from fluorene-bridged sulfur-containing dianhy-drides and diamines,” Journal of Polymer Science PartA: Polymer Chemistry 46, 1510–1520 (2008).

[24] Kenneth R. Carter, Richard A. DiPietro, Martha I.Sanchez, and Sally A. Swanson, “Nanoporous polyimidesderived from highly fluorinated polyimide/poly (propy-lene oxide) copolymers,” Chemistry of materials 13, 213–221 (2001).

[25] Dong Yu, Ali Gharavi, and Luping Yu, “Novel aromaticpolyimides for nonlinear optics,” Journal of the AmericanChemical Society 117, 11680–11686 (1995).

[26] Namiko Fukuzaki, Tomoya Higashihara, Shinji Ando,and Mitsuru Ueda, “Synthesis and characterization ofhighly refractive polyimides derived from thiophene-containing aromatic diamines and aromatic dianhy-drides,” Macromolecules 43, 1836–1843 (2010).

[27] Chia-Liang Tsai, Hung-Ju Yen, and Guey-Sheng Liou,“Highly transparent polyimide hybrids for optoelectronicapplications,” Reactive and Functional Polymers 108, 2–30 (2016).

[28] Junya Kobayashi, Tohru Matsuura, Yasuhiro Hida,Shigekuni Sasaki, and Tohru Maruno, “Fluorinatedpolyimide waveguides with low polarization-dependentloss and their applications to thermooptic switches,”Journal of lightwave technology 16, 1024 (1998).

[29] Takashi Sawada and Shinji Ando, “Synthesis, character-ization, and optical properties of metal-containing fluori-nated polyimide films,” Chemistry of materials 10, 3368–

3378 (1998).[30] Stefanie A Sydlik, Zhihua Chen, and Timothy M Swager,

“Triptycene polyimides: soluble polymers with high ther-mal stability and low refractive indices,” Macromolecules44, 976–980 (2011).

[31] Nam-Ho You, Yasuo Suzuki, Daisuke Yorifuji, ShinjiAndo, and Mitsuru Ueda, “Synthesis of high re-fractive index polyimides derived from 1, 6-bis (p-aminophenylsulfanyl)-3, 4, 8, 9-tetrahydro-2, 5, 7, 10-tetrathiaanthracene and aromatic dianhydrides,” Macro-molecules 41, 6361–6366 (2008).

[32] Jin-gang Liu, Yasuhiro Nakamura, Yuji Shibasaki, ShinjiAndo, and Mitsuru Ueda, “Synthesis and characteri-zation of high refractive index polyimides derived from4, 4 [variant prime]-(p-phenylenedisulfanyl) dianiline andvarious aromatic tetracarboxylic dianhydrides,” Polymerjournal 39, 543 (2007).

[33] Hyeonuk Yeo, Jiye Lee, Munju Goh, Bon-CheolKu, Honglae Sohn, Mitsuru Ueda, and Nam-HoYou, “Synthesis and characterization of high refrac-tive index polyimides derived from 2, 5-Bis (4-Aminophenylenesulfanyl)-3, 4-Ethylenedithiothiopheneand aromatic dianhydrides,” Journal of Polymer SciencePart A: Polymer Chemistry 53, 944–950 (2015).

[34] National Science and Technology Council, Materials

Genome Initiative for Global Competitiveness, Tech. Rep.(Washington, DC: National Science and TechnologyCouncil, 2011).

[35] Johannes Hachmann, Theresa L. Windus, John A.McLean, Vanessa Allwardt, Alexandra C. Schrimpe-Rutledge, Mohammad Atif Faiz Afzal, and MojtabaHaghighatlari, Framing the role of big data and modern

data science in chemistry, Tech. Rep. (2018) NSF CHEWorkshop Report.

[36] Roel S. Sanchez-Carrera, Sule Atahan, Joshua Schrier,and Alan Aspuru-Guzik, “Theoretical characterizationof the air-stable, high-mobility dinaphtho[2,3-b:2’3’-f]thieno[3,2-b]-thiophene organic semiconductor,” TheJournal of Physical Chemistry C 114, 2334–2340 (2010).

[37] Anatoliy N. Sokolov, Sule Atahan-Evrenk, Rajib Mon-dal, Hylke B. Akkerman, Roel S. Sanchez-Carrera, Ser-gio Granados-Focil, Joshua Schrier, Stefan C. B. Manns-feld, Arjan P. Zoombelt, Zhenan Bao, and Alan Aspuru-Guzik, “From computational discovery to experimentalcharacterization of a high hole mobility organic crystal,”Nature Communications 2, 437–438 (2011).

[38] Johannes Hachmann, Roberto Olivares-Amaya, SuleAtahan-Evrenk, Carlos Amador-Bedolla, Roel S.Sanchez-Carrera, Aryeh Gold-Parker, Leslie Vogt,Anna M. Brockway, and Alan Aspuru-Guzik, “TheHarvard Clean Energy Project: Large-scale computa-tional screening and design of organic photovoltaics onthe world community grid,” The Journal of PhysicalChemistry Letters 2, 2241–2251 (2011).

[39] Roberto Olivares-Amaya, Carlos Amador-Bedolla, Jo-hannes Hachmann, Sule Atahan-Evrenk, Roel S.Sanchez-Carrera, Leslie Vogt, and Alan Aspuru-Guzik, “Accelerated computational discovery of high-performance materials for organic photovoltaics bymeans of cheminformatics,” Energy & EnvironmentalScience 4, 4849–4861 (2011).

[40] Carlos Amador-Bedolla, Roberto Olivares-Amaya, Jo-hannes Hachmann, and Alan Aspuru-Guzik, “OrganicPhotovoltaics,” in Informatics for Materials Science and

Page 11: Accelerated Discovery of High-Refractive-Index Polyimides

10

Engineering: Data-driven Discovery for Accelerated Ex-

perimentation and Application, edited by Krishna Rajan(Amsterdam: Butterworth-Heinemann, 2013) Chap. 17,pp. 423–442.

[41] Johannes Hachmann, Roberto Olivares-Amaya, AdrianJinich, Anthony L. Appleton, Martin A. Blood-Forsythe,Laszlo R. Seress, Carolina Roman-Salgado, Kai Trepte,Sule Atahan-Evrenk, Suleyman Er, Supriya Shrestha,Rajib Mondal, Anatoliy Sokolov, Zhenan Bao, and AlanAspuru-Guzik, “Lead candidates for high-performanceorganic photovoltaics from high-throughput quantumchemistry - the Harvard Clean Energy Project,” Energy& Environmental Science 7, 698–704 (2014).

[42] Edward O. Pyzer-Knapp, Changwon Suh, Rafael Gomez-Bombarelli, Jorge Aguilera-Iparraguirre, and AlanAspuru-Guzik, “What is high-throughput virtual screen-ing? A perspective from organic materials discov-ery,” Annual Reviews of Materials Research 45, 195–216(2015).

[43] Steven A. Lopez, Edward O. Pyzer-Knapp, Gregor N.Simm, Trevor Lutzow, Kewei Li, Laszlo R Seress, Jo-hannes Hachmann, and Alan Aspuru-Guzik, “The Har-vard organic photovoltaic dataset,” Scientific Data 3,160086 (2016).

[44] Johannes Hachmann, Mohammad Atif Faiz Afzal, Mo-jtaba Haghighatlari, and Yudhajit Pal, “Building anddeploying a cyberinfrastructure for the data-driven de-sign of chemical systems and the exploration of chemicalspace,” Molecular Simulation 44, 921–929 (2018).

[45] Mojtaba Haghighatlari and Johannes Hachmann, “Ad-vances of machine learning in molecular modeling andsimulation,” arXiv [physics.data-an] , 1902.00140 (2019).

[46] Mohammad Atif Faiz Afzal, C. Cheng, and JohannesHachmann, “Combining first-principles and data mod-eling for the accurate prediction of the refractive indexof organic polymers,” The Journal of Chemical Physics148, 241712 (2018).

[47] Mohammad Atif Faiz Afzal and Johannes Hachmann,“Benchmarking DFT approaches for the calculation ofpolarizability inputs for refractive index predictions inorganic polymers,” Physical Chemistry Chemical Physics-, – (2019).

[48] Carlo Adamo and Vincenzo Barone, “Toward reliabledensity functional methods without adjustable param-eters: The PBE0 model,” The Journal of ChemicalPhysics 110, 6158–6170 (1999).

[49] Florian Weigend, Reinhart Ahlrichs, K. A. Peterson,T. H. Dunning, R. M. Pitzer, and A. Bergner, “Bal-anced basis sets of split valence, triple zeta valence andquadruple zeta valence quality for H to Rn: Design andassessment of accuracy,” Physical Chemistry ChemicalPhysics 7, 3297 (2005).

[50] Stefan Grimme, Jens Antony, Stephan Ehrlich, andHelge Krieg, “A consistent and accurate ab initioparametrization of density functional dispersion correc-tion (DFT-D) for the 94 elements H-Pu,” The Journal ofChemical Physics 132, 154104 (2010).

[51] A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. God-dard, and W. M. Skiff, “UFF, a full periodic table forcefield for molecular mechanics and molecular dynamicssimulations,” Journal of the American Chemical Society114, 10024–10035 (1992).

[52] Noel M. O’Boyle, Michael Banck, Craig A. James, ChrisMorley, Tim Vandermeersch, and Geoffrey R. Hutchi-son, “Open babel: An open chemical toolbox,” Journalof Cheminformatics 3, 33–33 (2011).

[53] Frank Neese, “The orca program system,” Wiley Inter-disciplinary Reviews: Computational Molecular Science2, 73–78 (2012).

[54] GL Slonimskii, AA Askadskii, and AI Kitaigorodskii,“The packing of polymer molecules,” Polymer ScienceUSSR 12, 556–577 (1970).

[55] Valery P. Privalko and Alexey V. Pedosenko, “Molecularpacking density in the crystalline state of semi-rigid chainpolymers. i: Polyimides,” Polymer Engineering & Science37, 978–982 (1997).

[56] Johannes Hachmann, William S. Evange-lista, and Mohammad Atif Faiz Afzal,“ChemHTPS 0.7 – An Automated Virtual High-Throughput Screening Program Suite for Chem-ical and Materials Data Generation,” (2017),https://bitbucket.org/hachmanngroup/chemhtps.

[57] Johannes Hachmann and Mohammad Atif Faiz Afzal,“ChemLG 0.5 – A Library Generator Code for the Enu-meration of Chemical and Materials Space,” (2017),https://hachmannlab.github.io/chemlg.

[58] Johannes Hachmann and Mojtaba Haghighatlari,“ChemML 0.10 – A Machine Learning and InformaticsProgram Suite for Chemical and Materials Data Mining,”(2017), https://hachmannlab.github.io/chemml.

[59] Hiroto Kudo, Masaru Yamamoto, Tadatomi Nishikubo,and Osamu Moriya, “Novel materials for large changein refractive index: synthesis and photochemical reactionof the ladderlike poly(silsesquioxane) containing norbor-nadiene, azobenzene, and anthracene groups in the sidechains,” Macromolecules 39, 1759–1765 (2006).

[60] Mohammad Atif Faiz Afzal, From Virtual High-

Throughput Screening and Machine Learning to the Dis-

covery and Rational Design of Polymers for Optical Ap-

plications, Ph.D. thesis, University at Buffalo (2018).[61] Anna Krylov, Theresa L. Windus, Taylor Barnes, Eliseo

Marin-Rimoldi, Jessica A. Nash, Benjamin Pritchard,Daniel G.A. Smith, Doaa Altarawy, Paul Saxe, Ce-cilia Clementi, T. Daniel Crawford, Robert J. Harrison,Shantenu Jha, Vijay S. Pande, and Teresa Head-Gordon,“Perspective: Computational chemistry software and itsadvancement as illustrated through three grand challengecases for molecular science,” J. Chem. Phys. 149, 180901(2018).

[62] Nancy Wilkins-Diehr and T. Daniel Crawford, “NSF’sinaugural software institutes: The science gateways com-munity institute and the molecular sciences software in-stitute,” Comput. Sci. Eng. 20, 26–38 (2018).