Statistically modeling I-V characteristics of CNT-FET with LASSO

    Corresponding author: Yan Wang, wangy46@tsinghua.edu.cn
  • Institute of Microelectronics, Tsinghua University, Beijing 100084, China

Key words: statistical learningcompact modelCNT-FETI-V characteristicsLASSOmachine learning

Abstract: With the advent of internet of things (IOT), the need for studying new material and devices for various applications is increasing. Traditionally we build compact models for transistors on the basis of physics. But physical models are expensive and need a very long time to adjust for non-ideal effects. As the vision for the application of many novel devices is not certain or the manufacture process is not mature, deriving generalized accurate physical models for such devices is very strenuous, whereas statistical modeling is becoming a potential method because of its data oriented property and fast implementation. In this paper, one classical statistical regression method, LASSO, is used to model the I-V characteristics of CNT-FET and a pseudo-PMOS inverter simulation based on the trained model is implemented in Cadence. The normalized relative mean square prediction error of the trained model versus experiment sample data and the simulation results show that the model is acceptable for digital circuit static simulation. And such modeling methodology can extend to general devices.

    HTML

1.   Introduction
  • Accurate semiconductor device modeling is essential for integrated circuit (IC) design and optimization. Traditionally, we build compact models for transistors on the basis of physics. The typical and most successful one is the Berkeley short-channel IGFET model (BSIM)[1]. Because of its accuracy, it has been widely used in academic research and the IC industry for many years. However, developing one generation of such a physical model is expensive and needs very long time to adjust for non-ideal effects. As the feature size scales down to its physical limit, Moore's law has nearly come to an end and the need for studying new material and devices is increasing[2-5]. But the vision for the application of many novel devices is not certain or the manufacture process is not mature, which means that modeling such devices based on physics is very strenuous.

    As an example, the carbon nanotubes field effect transistor (CNT-FET) is a promising candidate building block for the next generation of high energy-efficient digital switches which run as fast as silicon based technologies, but that would generate much less heat[6-10]. In 2013, a research group in Stanford demonstrates the first computer built entirely using CNT-based transistors, which runs an operating system that is capable of multitasking[11]. This is an amazing work that shows enormous potential for CNT technology. But until now, CNTs made by different processes still show significant differences. Even for those CNTs made by the same manufacturing process, their properties are not consistent. Under such circumstances, it is very hard to build an accurate physical model for general cases[12].

    Statistical learning is a kind of method to map input to output according to the sample data directly. In statistic theory, it is a principle that the prediction error of the trained model will be small enough as long as the sample space is big enough. As a result, if we can find an appropriate model prototype and collect plenty of sample data, it is possible to train a statistical model for a specific semiconductor device with very high accuracy[13]. Above all, it only takes very short time to construct such a model extremely cheaply.

    Least absolute shrinkage and selection operator (LASSO) is a widely used shrinkage linear regression and feature selection method in statistical and machine learning. It is first introduced by Tibshirani[14]. These two books[15, 16] discuss various shrinkage regression methods at great length. In this paper, we apply LASSO to model the $I$-$V$ characteristics of CNT-FET and use this model to implement a pseudo PMOS inverter simulation in Cadence. The trained model's prediction error and the simulation result show that such a model is acceptable for digital circuit static direct-current (DC) simulation, and this data driven modeling methodology can extend to general devices.

2.   Background
  • Generally there is a drawback of standard linear regression; the trade-off between model complexity and over-fitting. Standard linear regression treats all features equally and cannot determine which ones are more closely related to the output. By including fewer features in the model the regression may encounter large bias. On the contrary, it may stick in over-fitting which means a large variance when there are too many useless features. LASSO solves this problem in an elegant way.

    The model of standard linear regression is

    where ${f({x};{w})}$ denotes the model output, ${x}$ is the feature vector composed of feature variables, ${x}=[1, x_1, x_2, x_3 ...]^T$, and ${w}$ is the coefficient vector which is also referred to as model parameters. Constant "$1$" is included in vector ${x}$ so as to include the intercept $w_0$ in ${w}$ in which way the linear model can be written in a simplified form. The difference between feature variables and input variables should be emphasized. The input variable represent the original input of the modeling system, while the feature variables are generated from input variables. For example, ${(V_{\rm gs}, V_{\rm ds})}$ of a NMOS-FET are the input variables, while variables like $(V_{\rm gs}, V_{\rm gs}^2, V_{\rm gs}^3, V_{\rm gs}V_{\rm ds}, \exp V_{\rm ds}...)$ are the feature variables.

    Over all sample points, $( {x}^{(i)T}, y^{(i)}), i=1, 2, 3, ..., m$, the goal of standard linear regression is to find the optimal $\widehat{{w}}$ that minimize the loss function

    With the help of vector norm, it can be written as

    where $y=[y^{(1)}, y^{(2)}, ..., y^{(m)}]^T, X=[{x}^{(1)}, {x}^{(2)}, ..., {x}^{(m)}]$.

    The brightest idea of LASSO is to introduce a penalty for parameters in the loss function

    This penalty takes the form of 1-norm tends to make $\widehat{{w}}$ sparse which means that some components of $\widehat{{w}}$ will be zero. In other words, some of the unrelated features will be eliminated automatically when $\widehat{{w}}$ is calculated.

    What needs attention is that the column vector of $X$ in Eq.(4) is ${x}^{(i)}=[x_1^{(i)}, x_2^{(i)}, x_3^{(i)}...]^T$, which removes the constant item "$1$''. In this way, the model's intercept $w_0$ will be separated from the penalized parameter vector ${w}$. This is because that penalization of the intercept would make the trained model depend on the origin chosen for ${y}$[15]. The intercept $w_0$ can usually be estimated by

    The $\lambda$ in Eq. (4) is the shrinkage parameter: the greater the value of $\lambda$, the more features will be eliminated. It can be determined by cross validation.

3.   Modeling CNT-FET
  • This section shows the result of modeling CNT-FET DC characteristic with LASSO. The process of choosing optimal shrinkage parameter $\lambda$ and pseudo inverter simulation result are also given.

  • 3.1.   CNT I-V data

  • We choose a p-type CNT-FET with width $=$ 3 $\mu$m, length = 15 $\mu$m. Its structure is similar to traditional MOS-FETs except that stochastic distributed semiconducting carbon nanotubes are employed as the channel material between source and drain. Fig. 1 shows a sketch of a CNT-FET. The drain, source and gate material is Pd and the gate dielectric is made of HfO$_2$.

    The data set used for the training model of $I_{\rm ds}$ with respect to $V_{\rm gs}$ and $V_{\rm ds}$ is measured from experiment. The terminal source connects to ground. $V_{\rm gs}$ is swept from $-2$ to $0$ V with step size of about $0.15 $ V. Meanwhile, for every $V_{\rm gs}$, sweeping $V_{\rm ds}$ from $-2$ to $0$ V with step size $0.02 $ V. All of the $(V_{\rm ds}, V_{\rm gs})$ and their corresponding $ I_{\rm ds}$ compose the sample space whose size is 1414. We use 1260 points for the training model and choosing the optimal $\lambda$, and reserve 154 points as a independent test set to visualize the prediction ability of the trained model with optimal $\lambda$.

    In order to generate the feature space ${x}$, we adopt a polynomial basis of $V_{\rm gs}$ and $V_{\rm ds}$ with order 10. Under this circumstance, ${x}$ is extended to a vector of dimension 65. In other words, we get 65 feature variables.

  • 3.2.   Cross validation

  • $k$-fold cross-validation is a widely used validation method that can make the best use of the data. Instead of splitting into two parts, $k$-fold cross-validation separates the data into k subsets. For every subset $S_i ( i=1, 2, ..., k), $ training the model on the remaining $k-1$ subsets data and then calculating the test error on $S_i$, which usually takes the form of mean squared error

    The mean of $k$ MSE$_{i}$

    will be regarded as the prediction error of the trained model. Empirically, $k$ takes $5$ or $10$.

    Over 1260 samples, we use the LASSO package in MATLAB to train the model and adopt 10-fold cross-validation to decide the optimal $\lambda$ of LASSO. By comparison, we choose $\lambda = 1.12 \times {10^{ - 9}}$ in which case there are 49 features left in the trained model.

  • 3.3.   Testing result

  • To evaluate the fitting result, we apply the trained model with $\lambda = 1.12 \times {10^{ - 9}}$ to the independent testing set. As the magnitude of the modeling target, $I_{\rm ds}$, is very small, we use the normalized root mean square (RMS) as the prediction error

    where max$(y)$ denotes the maximum $y$ among all $y^{(i)}$'s. For $\lambda={1.12e-9}$, the normalized RMS is $1.7\%$. Fig. 2 visualizes the test result for this trained model from the perspective of output characteristics and transfer characteristics.

  • 3.4.   Simulation result

  • We transform the previous trained CNT-FET model from mathematical expression into verilog-A and use it to run the DC spice simulation in Cadence for a pseudo-PMOS inverter. Fig. 3 shows a schematic of the inverter. The size of input transistor is $w=3$ $\mu$m, $l=15$ $\mu$m and the resistor of the load is 100 k$\Omega $. The supply voltage is $2$ V.

    Fig. 4 VTC shows the voltage transfer characteristic (VTC). It can be seen that this inverter based on our trained model can realize the function of inversion.

4.   Conclusion
  • In this paper, we adopt the idea of using statistical learning to model semiconductor devices. Compared with modeling based on the physics, statistical learning focuses on the experimental data directly which is highly efficient. As an example, we use LASSO to model one p-type CNT-FET. The normalized RMS of the fitting result and the circuit simulation result show that the model is acceptable.

Figure (4)  Reference (16) Relative (20)

Journal of Semiconductors © 2017 All Rights Reserved