ARTICLES

Framework for TCAD augmented machine learning on multi- IV characteristics using convolutional neural network and multiprocessing

Thomas Hirtz1, Steyn Huurman2, He Tian1, , Yi Yang1 and Tian-Ling Ren1

+ Author Affiliations

 Corresponding author: He Tian, tianhe88@tsinghua.edu.cn

PDF

Turn off MathJax

Abstract: In a world where data is increasingly important for making breakthroughs, microelectronics is a field where data is sparse and hard to acquire. Only a few entities have the infrastructure that is required to automate the fabrication and testing of semiconductor devices. This infrastructure is crucial for generating sufficient data for the use of new information technologies. This situation generates a cleavage between most of the researchers and the industry. To address this issue, this paper will introduce a widely applicable approach for creating custom datasets using simulation tools and parallel computing. The multi-IV curves that we obtained were processed simultaneously using convolutional neural networks, which gave us the ability to predict a full set of device characteristics with a single inference. We prove the potential of this approach through two concrete examples of useful deep learning models that were trained using the generated data. We believe that this work can act as a bridge between the state-of-the-art of data-driven methods and more classical semiconductor research, such as device engineering, yield engineering or process monitoring. Moreover, this research gives the opportunity to anybody to start experimenting with deep neural networks and machine learning in the field of microelectronics, without the need for expensive experimentation infrastructure.

Key words: machine learningneural networkssemiconductor devicessimulation



[1]
Bankapalli Y S, Wong H Y. TCAD augmented machine learning for semiconductor device failure troubleshooting and reverse engineering. 2019 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), 2019, 1
[2]
Carrillo-Nuñez H, Dimitrova N, Asenov A, et al. Machine learning approach for predicting the effect of statistical variability in Si junctionless nanowire transistors. IEEE Electron Device Lett, 2019, 40, 1366 doi: 10.1109/LED.2019.2931839
[3]
Teo C W, Low K L, Narang V, et al. TCAD-enabled machine learning defect prediction to accelerate advanced semiconductor device failure analysis. 2019 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), 2019, 1
[4]
Wong H Y, Xiao M, Wang B Y, et al. TCAD-machine learning framework for device variation and operating temperature analysis with experimental demonstration. IEEE J Electron Devices Soc, 2020, 8, 992 doi: 10.1109/JEDS.2020.3024669
[5]
Lei Y, Huo X, Yan B P. Deep neural network for device modeling. 2018 IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM), 2018, 154
[6]
Hammouda H B, Mhiri M, Gafsi Z, et al. Neural-based models of semiconductor devices for SPICE simulator. Am J Appl Sci, 2008, 5, 385 doi: 10.3844/ajassp.2008.385.391
[7]
Wu Y C, Jhan Y R. 3D TCAD simulation for CMOS nanoeletronic devices. Singapore: Springer Singapore, 2018
[8]
Aurenhammer F. Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv, 1991, 23, 345 doi: 10.1145/116873.116880
[9]
Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. Spring Joint Computer Conference on - AFIPS '67, 1967, 483
[10]
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86, 2278 doi: 10.1109/5.726791
[11]
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv: 1511.07122, 2015
[12]
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res, 2014, 15, 1929
[13]
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv: 1502.03167, 2015
[14]
Ng A Y. Feature selection, L1 vs. L2 regularization, and rotational invariance. Twenty-First International Conference on Machine Learning - ICML '04, 2004, 78
[15]
Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation. Cambridge, MA, USA: MIT Press, 1986, 318
[16]
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313, 504 doi: 10.1126/science.1127647
[17]
Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning - ICML '08, 2008, 1096
[18]
Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. Proceedings of the 35th International Conference on Machine Learning, 2018, 1587
[19]
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, 2016, 1928
[20]
Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications. arXiv preprint arXiv: 1812.05905, 2018
[21]
Schulman J, Levine S, Moritz P, et al. Trust region policy optimization. arXiv preprint arxiv: 1502.05477, 2015
[22]
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017
[23]
Kingma D P, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv: 1312.6114, 2014
[24]
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Commun ACM, 2020, 63, 139 doi: 10.1145/3422622
Fig. 1.  (Color online) (a) Diagram representing the workflow of generating the training samples. The simulations are distributed among workers using multiprocessing. Those workers are assigned to the different cores of the CPU and executed concurrently. (b) Structure of a FinFET used for the research. The tunable device parameters, along with their values, are: channel doping concentration (1017 cm–3), gate oxidation thickness (1 nm), and SD doping concentration (8 × 1019 cm–3). (c) Structure of the default NMOS used for the research. The process parameters that can be tuned as well as their default values are: N-well concentration (1017 cm–2), gate oxidation time (10 min), LDD dose (1014 cm–2) and LDD energy (30 keV).

Fig. 2.  (Color online) Samples of a training dataset using planar NMOS. Each line represents one curve of a training sample. Five distinct NMOS characteristics are simulated and used: (a) $I_{\rm ds}$$V_{\rm gs}$ with $V_{\rm ds}$ fixed at 0.1 and 1 V, (b) $ I_{\rm ds}$$V_{\rm ds}$ curves with $V_{\rm gs}$ fixed at 1 and 2 V, and (c) the off-state breakdown $I_{\rm ds}$$V_{\rm ds}$. The voltage of the $I_{\rm ds}$$V_{\rm gs}$ and $I_{\rm ds}$$V_{\rm gs}$ curves does not change from simulation to simulation, they are therefore omitted from the neural network's input. In total, 500 training samples are displayed on the plots.

Fig. 3.  (Color online) (a) Neural network architecture used for mapping the characteristics of a device to the process parameters. The 13 input channels are composed of the five current characteristics, the voltage of the off-state breakdown curve (when simulating the breakdown curve, the current is set and the voltage is therefore variable, in contrast to the other voltage characteristics), their logarithmic counterpart as well as the index values. (b) Scatter plots representing the values predicted by the network (y-axis) versus the actual values (x-axis). The network can accurately predict the FinFET's device parameters as long as the parameter in question has a strong enough correlation with the simulated curve. A stronger correlation means higher accuracy. In total, 1000 samples are displayed on each plot. The samples were not previously seen by the network. (c) Training curves for the predictions of the parameters for different numbers of training samples. The darker curves represent the exponential moving averages.

Fig. 4.  (Color online) (a) Neural network architecture used for mapping the process or device parameters to its electrical characteristics. (b) Training curves for the predictions of characteristics using different numbers of training samples.

Fig. 5.  (Color online) (a) Plots representing NMOS characteristics predicted by the network (solid line) versus the actual values (dotted line) of three samples from the validation dataset. The samples were not previously seen by the network. (b) Prediction of characteristics with three parameters fixed and the N-well concentration spread evenly over its range of value. The gate oxidation time, LDD Dose and LDD Energy are set at 12.5 min, $1.5\times10^{14}$ cm–2 and 30 keV, respectively.

Fig. 6.  Structure of a classical autoencoder. The input ($X$) is feed into an encoder network to get the code ($Y$). The input can be then reconstructed ($\hat{X}$) using a decoder network. The goal is to train the encoder and the decoder to have the minimum distance possible between the input and the output while encoding the data.

Fig. 7.  (Color online) (a) Scatter plots representing the values predicted (y-axis) versus the actual values (x-axis). The black dots are the values predicted by the network using the true characteristics as input. The red dots are the values obtained by first predicting the characteristics from the parameters, and then predicting the parameters from the characteristics. The grey lines represent the ground truth. Coefficients of determination correspond to the scatter plot of their color. A stronger correlation means higher accuracy. In total, 1000 samples are displayed on each plot. The samples were not previously seen by networks. (b) Plots representing NMOS characteristics predicted by the network (solid line) versus the actual values (dotted line) of three samples from the validation dataset. The characteristics were predicted by first predicting parameters using the characteristics, then using those parameters to predict the characteristics.

Fig. 8.  (Color online) (a) Study of the neural network loss when predicting characteristics. Several numbers of samples and ranges have been tested. The curves are averaged over seven sets of training for 2000 epochs. (b) The statistical parameters used for the study. The process parameter ranges are uniform distributions bounded by: Mean × (1 ± Sigma).

Fig. 9.  (Color online) Neural network loss for predicting characteristics versus the numbers of samples used for training. The different curves represent the number of parameters that are randomized when generating the training set (e.g., for the “1 Parameter” curve, all the parameters except the N-well concentration are fixed). The parameters were added in the following order: N-well concentration, gate oxidation time, LDD dose, and LDD energy. The curves are averaged over seven sets of training for 2000 epochs.

[1]
Bankapalli Y S, Wong H Y. TCAD augmented machine learning for semiconductor device failure troubleshooting and reverse engineering. 2019 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), 2019, 1
[2]
Carrillo-Nuñez H, Dimitrova N, Asenov A, et al. Machine learning approach for predicting the effect of statistical variability in Si junctionless nanowire transistors. IEEE Electron Device Lett, 2019, 40, 1366 doi: 10.1109/LED.2019.2931839
[3]
Teo C W, Low K L, Narang V, et al. TCAD-enabled machine learning defect prediction to accelerate advanced semiconductor device failure analysis. 2019 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), 2019, 1
[4]
Wong H Y, Xiao M, Wang B Y, et al. TCAD-machine learning framework for device variation and operating temperature analysis with experimental demonstration. IEEE J Electron Devices Soc, 2020, 8, 992 doi: 10.1109/JEDS.2020.3024669
[5]
Lei Y, Huo X, Yan B P. Deep neural network for device modeling. 2018 IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM), 2018, 154
[6]
Hammouda H B, Mhiri M, Gafsi Z, et al. Neural-based models of semiconductor devices for SPICE simulator. Am J Appl Sci, 2008, 5, 385 doi: 10.3844/ajassp.2008.385.391
[7]
Wu Y C, Jhan Y R. 3D TCAD simulation for CMOS nanoeletronic devices. Singapore: Springer Singapore, 2018
[8]
Aurenhammer F. Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv, 1991, 23, 345 doi: 10.1145/116873.116880
[9]
Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. Spring Joint Computer Conference on - AFIPS '67, 1967, 483
[10]
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86, 2278 doi: 10.1109/5.726791
[11]
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv: 1511.07122, 2015
[12]
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res, 2014, 15, 1929
[13]
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv: 1502.03167, 2015
[14]
Ng A Y. Feature selection, L1 vs. L2 regularization, and rotational invariance. Twenty-First International Conference on Machine Learning - ICML '04, 2004, 78
[15]
Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation. Cambridge, MA, USA: MIT Press, 1986, 318
[16]
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313, 504 doi: 10.1126/science.1127647
[17]
Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning - ICML '08, 2008, 1096
[18]
Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. Proceedings of the 35th International Conference on Machine Learning, 2018, 1587
[19]
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, 2016, 1928
[20]
Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications. arXiv preprint arXiv: 1812.05905, 2018
[21]
Schulman J, Levine S, Moritz P, et al. Trust region policy optimization. arXiv preprint arxiv: 1502.05477, 2015
[22]
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017
[23]
Kingma D P, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv: 1312.6114, 2014
[24]
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Commun ACM, 2020, 63, 139 doi: 10.1145/3422622
  • Search

    Advanced Search >>

    GET CITATION

    shu

    Export: BibTex EndNote

    Article Metrics

    Article views: 2396 Times PDF downloads: 175 Times Cited by: 0 Times

    History

    Received: 06 April 2021 Revised: 22 June 2021 Online: Accepted Manuscript: 04 August 2021Uncorrected proof: 10 August 2021Published: 03 December 2021

    Catalog

      Email This Article

      User name:
      Email:*请输入正确邮箱
      Code:*验证码错误
      Thomas Hirtz, Steyn Huurman, He Tian, Yi Yang, Tian-Ling Ren. Framework for TCAD augmented machine learning on multi- I–V characteristics using convolutional neural network and multiprocessing[J]. Journal of Semiconductors, 2021, 42(12): 124101. doi: 10.1088/1674-4926/42/12/124101 T Hirtz, S Huurman, H Tian, Y Yang, T L Ren, Framework for TCAD augmented machine learning on multi- I–V characteristics using convolutional neural network and multiprocessing[J]. J. Semicond., 2021, 42(12): 124101. doi: 10.1088/1674-4926/42/12/124101.Export: BibTex EndNote
      Citation:
      Thomas Hirtz, Steyn Huurman, He Tian, Yi Yang, Tian-Ling Ren. Framework for TCAD augmented machine learning on multi- IV characteristics using convolutional neural network and multiprocessing[J]. Journal of Semiconductors, 2021, 42(12): 124101. doi: 10.1088/1674-4926/42/12/124101

      T Hirtz, S Huurman, H Tian, Y Yang, T L Ren, Framework for TCAD augmented machine learning on multi- I–V characteristics using convolutional neural network and multiprocessing[J]. J. Semicond., 2021, 42(12): 124101. doi: 10.1088/1674-4926/42/12/124101.
      Export: BibTex EndNote

      Framework for TCAD augmented machine learning on multi- IV characteristics using convolutional neural network and multiprocessing

      doi: 10.1088/1674-4926/42/12/124101
      More Information
      • Author Bio:

        Thomas Hirtz received his M.S. degree from the National Institute of Applied Science of Rennes in 2017. He is currently working towards a Ph.D. in Electronic Science and Technology at the Institute of Microelectronics, Tsinghua University. His research interests include reinforcement learning and applications of machine learning techniques in the domain of physics and electronics

        He Tian received the Ph.D. degree from the Institute of Microelectronics, Tsinghua University, in 2015. He is currently an associate professor in Tsinghua University. He was a recipient of the National Science Foundation for outstanding young scholars. He has co-authored over 100 papers and has over 6000 citations. He has been researching on various 2D material-based novel nanodevices

      • Corresponding author: tianhe88@tsinghua.edu.cn
      • Received Date: 2021-04-06
      • Revised Date: 2021-06-22
      • Published Date: 2021-12-10

      Catalog

        /

        DownLoad:  Full-Size Img  PowerPoint
        Return
        Return