Data-driven material discovery for photocatalysis: a short review

    Corresponding author: Qimin Yan,
  • Department of Physics, Temple University, Philadelphia, PA, 19122, USA

Abstract: In this short review, we introduce recent progress in the research field of data-driven material discovery and design for solar fuel generation. Construction of material databases under the materials genome initiative provides a great platform for material discovery and design by creating computational screening pipelines based on the materials’ descriptors. In the field of solar water splitting, data-driven computational discovery approach has been effective in making material predictions. When combined with synergistic and complimentary experimental efforts, high-throughput computations based on density functional theory showed great predictive power for accelerated discovery of inorganic compounds as functional materials for solar fuel generation. As an example, we introduce the theory–experiment joint discovery of a large set of metal oxide photoanode materials that have been theoretically predicted to be efficient candidates and soon verified by synergistic experimental fabrication and characterization processes. In the field of two-dimensional materials, the application of data-driven approach has realized the prediction of many promising candidates with suitable direct band gaps and optimal band edges for the generation of chemical fuels from sunlight, greatly expanding the number of theoretically predicted 2D photoelectrocatalysts that are awaiting experimental verification. We discuss the challenges for the continued discovery and design of novel bulk and 2D compounds for photocatalysis via a data-driven approach. At the end of this review, we provide a brief outlook for future material discoveries in the field of solar fuel generation.


1.   Introduction
  • The use of predictive simulation in combination with experiments for the accelerated discovery and rational design of functional materials is a grand challenge of modern materials science[1]. Under the Materials Genome Initiative[1] proposed by United States government in 2011 and similar projects in Europe[2], several materials databases[35] based on high-throughput computations utilizing density functional theory (DFT) have been created and recently enabled rapid screening of inorganic compounds with multiple desirable properties and functionalities[611]. As mentioned in a recent review article, since the advent of computational material databases and related data-driven discovery approaches five years ago, these material databases and related analysis tools have already been used to identify more than 20 new functional materials for a number of applications that were later confirmed by experiments[10].

    In the field of photocatalysis for the generation of chemical fuels from sunlight, the limited number of known photoelectrocatalytic materials poses a significant challenge. Photocathode or photoanode materials, which are key components in tandem-structured solar fuel devices that drive the oxygen evolution reaction (OER) and hydrogen evolution reaction (HER) respectively, are required to satisfy three criteria: efficiency, reactivity, and stability. Compounds with small electronic band gaps that match the solar spectrum are desirable for efficient optical absorption of sunlight. These compounds are also required to offer optimal band edge energies that align in a favorable manner with the thermodynamic reaction potentials as well as small-enough overpotentials to efficiently drive the OER/HER reactions. Stability against electrochemical and photoelectrochemical corrosion is another crucial material issue that needs to be addressed. So far, stable and efficient photoanode materials as photoelectrocatalysts for OER remain critically missing[12].

    In this short review article, we will introduce recent research progress on the search for novel inorganic compounds for photocatalysis. The article covers the discovery of two classes of inorganic materials: metal oxides and two-dimensional layered materials. The review will focus on how the development of photocatalytic materials and devices has benefited from the fast development of data-driven material discovery and design approach, especially the combination of high-throughput computational predictions and combinatorial experimental synthesis and measurement efforts. Challenges for the future development of data-driven approach for photocatalysis and their possible solutions will be discussed in the review.

2.   Data-driven discovery of inorganic crystalline photocatalysts

    2.1.   Discoveries before the advent of data-driven approach

  • Before the advent of data-driven material discovery and design, experimental efforts in the field of photocatalysis were largely based on trial and error. Since the discovery of TiO2 as a stable but large-band-gap photocatalyst in 1972 by Fujishima and Honda[13], a significant amount of research effort has been carried out with the focus to identify novel candidates in the field. By 2007, more than 130 nonmetallic compounds had been synthesized and tested for solar water splitting[14]. This number has been increased by at least 30 in the last decade[15, 16]. In general, those discovered inorganic compounds can be classified into two categories: traditional non-oxide semiconductors and metal oxides. Although a large group of traditional semiconductors host desirable electronic band gaps and band edge energies that are optimal for OER and/or HER, unfortunately they are not stable against photocorrosion especially in the extremely oxidizing environment for OER[17, 18]. From this point of view, metal oxide is the only class of inorganic compounds that can offer electrochemical and photoelectrochemical stability in the harsh solution environment for photocatalysis.

    In the past two decades, a large number of d0 and d10 transition metal oxides (TMOs) have been experimentally identified as photoactive[19, 20]. Owing to the valence band maximum (VBM) constructed mainly by the deep O 2p states, the band gaps of these d0 and d10 TMOs are usually larger than 3 eV, which prohibits the efficient absorption of sunlight especially in the visible range. Since the discovery of TiO2 photocatalyst[13] and before the advent of data-driven material discovery approach, forty years of trial-and-error based experimental research has yielded only 16 metal-oxide photoanode compounds with band gaps in the desirable 1.2–2.8 eV range[15]. For instance, monoclinic BiVO4[21] has received substantial attention as a photoanode material due to its excellent OER photoactivity. It has a desirable 2.4 eV band gap derived from a conduction band minimum (CBM) consisting of V 3d states and a VBM of mixed O 2p and Bi 6s chara-cter[22]. However, the performance of these experimentally discovered metal oxides is still far from satisfactory. The vast majority of all reported low-band-gap photoanodes including Fe2O3, WO3, and BiVO4, exhibited known intrinsic limitations related to rapid recombination rates and/or poor photoelectrochemical stability. A breakthrough in this field relies on the discovery of stable, photoactive, and low-band-gap oxide compounds that strongly overlaps with the solar spectrum.

  • 2.2.   Computational material prediction using a data-driven approach

  • Since five years ago, pioneering high-throughput computational screening studies in this field have predicted a large amount of materials for photocatalysis in two specific chemical subspaces: metal oxide perovskites[23, 24] and oxynitrides[8]. For instance, the work by Castelli et al. demonstrated an efficient screening of oxide materials with perovskite structures based on electronic structure calculations using the so-called GLLB-SC functional[9, 25]. This work successfully reduced a vast space of 5400 different materials to only 15 promising candidates for solar water splitting[25]. As key quantities for electronic structure screening, the CBM and VBM positions were determined by empirically estimating the middle of the gap using electronegativity of the atoms as proposed by Butler and Ginley[26]. The screening “re-identified” already known materials in this chemical space (such as AgNbO3, BaSnO3, BaTaO2N, SrTaO2N, CaTaO2N and LaTiO2N) and predicted several novel compounds, including 9 oxides and oxynitrides, which warrant further experimental investigation.

    Wu et al. proposed another screening approach by computing the CBM and VBM directly from first principles in an aqueous environment with a focus on oxynitrides with diverse crystal structures[8]. Band gap calculations were based on a so-called Δ-sol method that was previously shown to perform well for a set of traditional semiconductors[27]. 2948 compounds were screened and most of the known photocatalytic materials were reproduced by this work. Sixteen new materials were suggested by the screening approach as promising photocatalysts, including two ternary and eleven quaternary oxynitrides. These computational and data-driven efforts have greatly increased the number of potential photocatalytic compounds for experimental investigation. Unfortunately, partly due to the lack of synergy between theory and experiment, to the best of our knowledge these theoretical predictions have not achieved any experimental discovery of novel photocatalysts. It soon became clear to the community that the complementarity between theory and experiment is of great importance for accelerated discovery of novel compounds for solar fuel generation.

  • 2.3.   Theory–experiment joint materials discovery and design

  • Computational and experimental approaches differ in the material properties they can most efficiently and effectively characterize in the field of solar fuel generation. For instance, it is straightforward to down-select a large set of compounds by data-mining and obtain band gaps and edge energies from first-principles computations, while this is an extremely difficult task for experimentalists. On the other hand, it is relatively easy to experimentally construct photoelectrochemical cells and measure photocurrents and stabilities against (photo)electrochemical corrosion, both involving thermodynamic and kinetic processes which are rather challenging for high-throughput computational simulations. Since the last several years, a series of collaborative work began to address this theory–experiment synergy issue by designing a high-throughput screening pipeline involving integrated computational and experimental materials screening workflows. The central idea of theory–experiment joint discovery and design approach can be described as follows: a tiered screening pipeline should be constructed to: (i) selectively mine a materials database with a design principle or hypothesis to obtain a subset of promising materials; (ii) screen this materials’ subset for selected properties utilizing high-throughput computation at appropriate levels of theory; and (iii) employ combinatorial experiments on the same subset to both validate the theory and characterize material performance under device-relevant conditions.

    Recently, utilizing a computational high-throughput screening workflow, a ternary vanadate β-Mn2V2O7 was identified exhibiting a 1.8 eV band gap and unprecedented valence band alignment to the OER potential as a result of hybridization of Mn 3d with O 2p states[28], while not photoactive for the OER. The combined properties of valence band alignment for OER, sub-2 eV band gap and stability under illumination in pH 13 make this light absorber truly unique. More interestingly, β-Mn2V2O7 shares a common VO4 structural motif with BiVO4 both compounds possess 3d0 V cations tetrahedrally-coordinated by oxygen. Orbital hybridization resulting from this VO4 motif engenders significant baseline O 2p and V 3d character at the VBM and CBM, creating an electronic structure “scaffold” that enables the formation of a desirable Eg and EVBM upon introduction of an additional metal cation[29]. Considering the twelve previously known OER photoanodes with band gap energies between 1.2 and 2.8 eV, a significant fraction of them (α-Ag3VO4[30], FeVO4[31, 32], and β-Cu3V2O8[33] are also ternary vanadates with a VO4 structural motif in the 3d0 electronic configuration. Through a combination of combinatorial materials synthesis, high-throughput photoelectrochemistry, high-throughput optical spectroscopy, and detailed electronic structure calculations, 4 photoelectrocatalyst phases including the VO4 motif were recently identified–α-Cu2V2O7, β-Cu2V2O7, γ-Cu3V2O8 and Cu11V6O26– with band gap energy at or lower than 2 eV[34]. For these reasons, VO4-based ternary compounds are fertile ground for both discovering metal oxide photoanodes and seeding the photoanode materials genome.

    Based on the hypothesis that this VO4-scaffold phenomenon applies broadly to ternary vanadates, Yan et al. designed a multiple-tier screening pipeline searching for photoanodes starting with a query of the Materials Project (MP) database[3] to identify the 174 known VO4-based ternary vanadates[15]. The tier-2 screen included the DFT formation energy above the convex hull in the composition phase diagram and a coarse estimate of band gaps in an effort to avoid non-synthesizable materials[8] and known wide-gap insulators, respectively. These quantities are stored in the MP database and computed using DFT with the generalized gradient approximation of Perdew, Burke, and Ernzerhof (PBE) and Hubbard U[35] corrections for metal cation d states with the VASP code[36]. As a compromise that balances efficiency with accuracy, the band gaps and band edge energies are evaluated in the tier-3 screen using generalized Kohn-Sham states obtained from the hybrid functional (HSE)[37] with a modified mixing parameter and surface slab PBE+U calculations[38, 39] to determine if the VBM meets the OER thermodynamic requirement. To validate these computational screening criteria and their propensity for identifying photoanode materials, combinatorial sputtering and annealing methods were carried out in which thin film synthesis was attempted for each target phase. For those samples that exhibited 80% purities for target phases, high throughput optical spectroscopy (UV–vis) and photo-electrochemistry were performed to characterize the band gaps (direct and indirect transitions) and the photocurrent densities at the Nernstian potential (JO2/H2O), respectively. As shown in Fig. 1, 12 novel metal vanadates were discovered by this theory–experiment joint discovery effort, which established ternary metal vanadates as a prolific class of photoanode materials[15]. Detailed analysis of these vanadate compounds reveals the key role of VO4 structural motifs and electronic band-edge character in efficient photoanodes, initiating a genome for such materials.

    Semiconductor materials used as photocatalysts are often not thermodynamically stable under operating conditions[40]. For solar fuels photoanodes that drive the OER, recent efforts have focused on the development of protective layers to mitigate electrolyte contact to the functional semiconductor[41]. It has been proved as an effective approach for self-passivating materials but impractical otherwise since a single point defect in the protective coating inevitably results in complete corrosion of the semiconductor, motivating the search for new stable photoanode materials[42, 43]. Before the advent of data-driven discovery approach, Fe2O3[44], ZnFe2O4[45, 46], and Bi2MoO6[47] have been the only visible-band-gap metal oxide photoanodes that exhibit stability under the highly oxidizing OER environment. Utilizing data-driven computationally-guided experiment design, the stability issue was recently addressed by a robust mechanism to identify photoanode materials that mitigate the thermodynamic instability under operating conditions. The integration of theory and experiment was enabled by an extension of the Materials Project Pourbaix Diagram analysis tool[48] by identifying theoretically a materials-specific electrolyte pH range in which the material is most thermodynamically stable under operating conditions. Pourbaix diagram can be viewed as a generalized phase diagram where the thermodynamic stability of solid compounds is determined by comparing with that of ionic species[17, 49]. Through this theory-experiment joint work, five Mn-based oxides were identified as electrochemically stable and photoactive[16].

    To date, the integration of first principles computations with high throughput experiments has yielded the most prolific material discovery effort in the field of solar fuel generation, as demonstrated by the identification of 17 water oxidation photoelectrocatalysts in the target band gap range, including recently reported 4 copper vanadates[34], 8 additional metal vanadates[15], and 5 Mn-based oxides[16]. (See Table 1 that lists all the recently discovered low-band-gap oxide photoanodes to the best of our knowledge.) Considerably expanding the number of known photoelectrocatalysts for generation of chemical fuels from sunlight, this set of data-driven discovery work has demonstrated the high-throughput material discovery as a prolific approach for solar fuel generation, paving the way for a broadly applicable materials-by-design feedback loop.

  • 2.4.   Challenges for continued discovery of inorganic compounds for solar fuel generation

  • In spite of the success of hybrid functional HSE and quasiparticle many-body GW approach[50], the continued development of accurate and efficient band structure method for a diverse class of compounds, especially for transition metal oxides, is still of urgent need to increase the reliability of high-throughput computations as one of the top screen tiers in a data-driven discovery and design process. So far, the determination of the CBM/VBM relative to OER/HER redox potentials in high-throughput computations ignored the explicit dipole interaction at the water–solid interfaces. It is known that surface dipoles that arise at the interface between metal oxides and water are expected to raise the VBM energy[39]. First-principles framework has been developed in recent years to address this issue based on quasiparticle correction and explicit modeling of interface heterostructures, including a series of work by Hybertson[50] group and Galli group[51]. However, the analysis has been based on detailed solid–water interfaces requiring a dramatic amount of computational resources for a single compound and a general guidance for high-throughput screening and discovery is still not available.

    Stability has been addressed by constructing Pourbaix diagram from the standpoint of thermodynamic energetics. In reality, the photocorrosion process is largely affected by kinetic process that cannot be studied theoretically in a high-throughput manner. Although this issue can in principle be addressed by a theory–experiment combined discovery approach, the lack of useful theoretical tools or guidance greatly limits the power of data-driven discovery approach to discovery photoelectrochemically stable compounds for solar fuel generation. New and highly efficient computational framework for kinetic processes needs to be developed to address this crucial issue from a theoretical point of view.

    Experimentally, the photocurrent density observed for those newly discovered compounds in their photoelectrochemical cells is still much lower than or of the order of several mA/cm2. Material optimization, especially defect control to reduce carrier recombination centers for those newly discovered compounds, is urgently needed to enlarge the impact of material discoveries in this field.

3.   Discovery of two-dimensional materials for photocatalysis

    3.1.   Computational discovery of 2D photocatalysts

  • Since the advent of two dimensional (2D) materials, their applications for the renewable synthesis of solar fuels have attracted tremendous research efforts[5255], owing to the sought-after advantages of these single-layer compounds including efficient charge transfer, abundant reaction sites, and highly-adjustable electronic structures. Compared to conventional three-dimensional (3D) bulk photocatalysts, 2D photocatalysts possess a series of unique properties which may potentially enhance photoelectrochemical device efficiency for solar fuel generation[56]. These features include extremely high specific surface area for water redox reaction, ultra-small thickness facilitating the diffusion of electron and hole to the solid/water interface, and multiple choices of host-guest combinations based on van de Waals heterojunctions.

    Prior experimental research has yielded more than 20 2D compounds for solar fuel generation with band-gap energies in the desirable range that strongly overlaps with the solar spectrum. Table 2 demonstrates the photocatalytic performance and material parameters of recently discovered 2D layered materials including metal oxides, metal chalcogenides, and metal-free nanosheets. These 2D photo catalysts showed distinct performance as compared to their 3D counterparts. For instance, Xie et al. found that SnS2 single-layers yield 70 times higher of photocurrent density than that of bulk SnS2 possibly due to multiple reasons including improved carrier density, a fully depleted pace charge layer, and fast interfacial charge transfer[57]. Monolayer 1T-MoS2 exhibited a 26 000 μmol/(h·g) of H2 yield under the irradiation of 100 W halogen light, while bulk MoS2 is almost inert in catalyzing water due to the lack of active sites[58]. Owing to the promise to address the long-lived solar energy conversion problem, the continued discovery of novel 2D photocatalysts is of great interest.

    In the field of 2D photocatalysis, computational simulation has been a powerful tool to predict promising candidates and eliminate unlikely materials. For instance, Hennig group proposed a strategy to screening 2D materials for photo water-splitting and carried out a series of work in the field. Screening criteria applied include, but are not limited to, suitable band gap, band edge, low formation energy, and stability in water[59]. As shown in Fig. 2, a couple of 2D materials were identified and predicted theoretically to be suitable for photo water-splitting, including a family of group IV monochalcogenides, MX (M = Ge, Sn, Pb; X = O, S, Se, Te), MoS2, WS2, PtS2, and PtSe2[59]. Liu et al. predicted that single layer metal-phosphorus-trichalcogenides, MPX3 (A = MII, MI0.5MIII0.5; X = S, Se; MI, MII, and MIII represent Group-I, Group-II, and Group-III metals, respectively) exhibited low formation energy, suitable band gap, band edges, and outstanding photo absorption efficiency for photo water splitting[60]. In recent years, several other 2D materials have been predicted for solar fuel generation[6167].

    Prior computational screening in the field of 2D photocatalysis[68] have been limited to a relatively small compound space. A thorough search of promising candidates and, more importantly, a deeper understanding of why these 2D compounds host optimal material properties for photocatalysis, is critically missing. In recent years, extensive efforts have initiated the construction of several 2D material databases mostly based on the generation of single-layer structures through data-mining layered compounds in existing inorganic compound databases including the MP database and the Inorganic Crystal Structure Database (ICSD)[6971]. Although only a limited number of materials’ properties are computed and included in these databases for now, these 2D compound repositories have become a fertile ground for research efforts in the field aiming to theoretically predict novel 2D compounds for photocatalysis.

    In a recent work by our group (unpublished), 62 promising 2D compounds have been predicted or “re-identified” as photoanode or photocathode materials for solar fuel generation. The discovery process included a data-mining procedure for layered structure identification, a new electronic structure framework for 2D compounds including automatic first Brillouin zone identification and high-symmetry k-points definition, combined with a multiple-tier discovery pipeline incorporating multiple material screening criteria including type and size of band gaps, band edge energies, and exciton binding energies. The study also establishes the tunability of band edges of binary 2D compounds and the interplay between electronic structure, anion/cation electronegativity, and orbital hybridization. This work demonstrated the power of data-driven approach for accelerated discovery of 2D functional materials. These findings, together with other prior data-driven discovery work in the field, have provided a large set of potential candidates for future experimental investigation to make a real breakthrough.

  • 3.2.   Challenges for material discovery of 2D photocatalysts

  • Though great progress has been made, there are still many challenges for the wide application of 2D photocatalysts, including material degradation, large exciton binding, as well as slow kinetics of carrier transfer and charge traps. In addition, the impacts of substrates on the electronic structure and catalytic performance of these 2D compounds are not generally known. Several convenient approaches are available to further improve the performance of existing 2D photocatalysts: (i) doping or edge modification effectively modified the light absorption properties of 2D materials by turning band gaps[7274]; (ii) cocatalysts have been designed and significantly improved the catalytic activity of many 2D compounds[75]; (iii) heterojunction structure design (including 2D–2D and 2D–substrate) can extend the light-response energy range and promote the charge separation of photo generated electron–hole pairs[76]. In addition to assisting the discovery of novel 2D compounds, data-driven material discovery and design approach should find its role in these optimization processes.

    Due to the decrease of electronic screening in 2D materials, especially along the out-of-plane direction, exciton binding energies are much larger than most of the bulk semiconductor compounds[7779], which may limit the separation and migration of electron and hole carriers. An efficient way to quickly evaluate the exciton binding energies of a large amount of 2D compounds for material screening is still not available yet and needs continued theoretical development.

4.   Challenges and outlook: material discovery and design for solar fuel generation
  • Although a breakthrough has been made in the prediction and verification of novel inorganic photocathode and photoanode materials for solar fuel generation, several challenges still need to be addressed before these newly discovered compounds can be incorporated in real photoelectrochemical devices and tested in harsh environments for commercial applications.

    First of all, several materials’ properties should be addressed computationally to develop a more comprehensive data-driven discovery workflow. These properties include material stability against kinetic corrosion process and water–solid interaction. Also, there is a great need for the evaluation of surface reactivity in the high-throughput screening pipeline. These properties and phenomena are known to be challenging for first-principles simulations. Considering its fast development in the field of material property prediction in recent years, machine learning is expected to become a surprisingly effective and efficient tool to predict these properties in the near future. For photoanode material development, better and more efficient electronic structure methods to address strong correlation effects and self-interaction error corrections in complex oxides should be developed. The success of data-driven computational discovery of photocatalysts will rely on the development of these novel tools and theoretical approaches as well as their effective incorporation into existing discovery and design infrastructure.

    As mentioned in the previous section, a hypothesis based on a specific structure motif has been utilized to achieve the successful discovery of a large set of metal vanadate photoanodes. In the future, a critical question is: can we generate structure-property correlation-based hypothesis or design principle using a data-driven approach and initiate the search from there? Again, machine learning tools adapted to material science and a massive amount of data involving structures and properties of inorganic semiconductor compounds will be the key to the solution.

    On the other hand, once a material candidate list is passed to experimentalists, it is critical to continue the development of combinatorial growth techniques to control defect density and improve material quality in the synthesis process to maximize the number of novel compounds with acceptable photoelectrochemical performance. In the field of 2D catalysis, a synergy between computation and experiment is still missing and urgently needed. New directions for 2D photocatalyst design include the high-throughput simulations of defects in 2D materials, automatic construction of 2D layered heterojunctions, and the development of simulation infrastructure for 2D surface catalysis including both HER and OER.

  • J. P. and Q. Y. are supported by the Center for the Computational Design of Functional Layered Materials, an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences under Award No. DE-SC0012575. Part of the computational work used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Figure (2)  Table (2) Reference (102) Relative (20)

Journal of Semiconductors © 2017 All Rights Reserved