ARTICLES

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

Yang Feng1, Zhaohui Sun1, Yueran Qi1, Xuepeng Zhan1, Junyu Zhang2, Jing Liu3, Masaharu Kobayashi4, Jixuan Wu1, and Jiezhi Chen1,

+ Author Affiliations

 Corresponding author: Jixuan Wu, jixuanwu@sdu.edu.cn; Jiezhi Chen, chen.jiezhi@sdu.edu.cn

PDF

Turn off MathJax

Abstract: With the rapid development of machine learning, the demand for high-efficient computing becomes more and more urgent. To break the bottleneck of the traditional Von Neumann architecture, computing-in-memory (CIM) has attracted increasing attention in recent years. In this work, to provide a feasible CIM solution for the large-scale neural networks (NN) requiring continuous weight updating in online training, a flash-based computing-in-memory with high endurance (109 cycles) and ultra-fast programming speed is investigated. On the one hand, the proposed programming scheme of channel hot electron injection (CHEI) and hot hole injection (HHI) demonstrate high linearity, symmetric potentiation, and a depression process, which help to improve the training speed and accuracy. On the other hand, the low-damage programming scheme and memory window (MW) optimizations can suppress cell degradation effectively with improved computing accuracy. Even after 109 cycles, the leakage current (Ioff) of cells remains sub-10pA, ensuring the large-scale computing ability of memory. Further characterizations are done on read disturb to demonstrate its robust reliabilities. By processing CIFAR-10 tasks, it is evident that ~90% accuracy can be achieved after 109 cycles in both ResNet50 and VGG16 NN. Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training, which pave the way for further development of artificial intelligence (AI) accelerators.

Key words: NOR flash memorycomputing-in-memoryenduranceneural networkonline training



[1]
Yao P, Wu H Q, Gao B, et al. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577, 641 doi: 10.1038/s41586-020-1942-4
[2]
Khwa W S, Akarvardar K, Chen Y S, et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. Proc IEEE Symp VLSI Technol, 2020, 1
[3]
Zhang W Y, Wang S C, Li Y, et al. Few-shot graph learning with robust and energy-efficient memory-augmented graph neural network (MAGNN) based on homogeneous computing-in-memory. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 224
[4]
Kumar S, Wang X X, Strachan J P, et al. Dynamical memristors for higher-complexity neuromorphic computing. Nat Rev Mater, 2022, 7, 575 doi: 10.1038/s41578-022-00434-z
[5]
Lu Y M, Li X, Yan B N, et al. In-memory realization of eligibility traces based on conductance drift of phase change memory for energy-efficient reinforcement learning. Adv Mater, 2022, 34, 2107811 doi: 10.1002/adma.202107811
[6]
Huang P, Zhou Z, Zhang Y, et al. Dual-configuration in-memory computing bitcells using SiOx RRAM for binary neural networks. APL Mater, 2019, 7, 081105 doi: 10.1063/1.5116863
[7]
Chang C C, Chen P C, Chou T, et al. Mitigating asymmetric nonlinear weight update effects in hardware neural network based on analog resistive synapse. IEEE J Emerg Sel Top Circuits Syst, 2018, 8, 116 doi: 10.1109/JETCAS.2017.2771529
[8]
Ravsher T, Garbin D, Fantini A, et al. Enhanced performance and low-power capability of SiGeAsSe-GeSbTe 1S1R phase-change memory operated in bipolar mode. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 312
[9]
Ielmini D, Ghetti A, Spinelli A S, et al. A study of hot-hole injection during programming drain disturb in flash memories. IEEE Trans Electron Devices, 2006, 53, 668 doi: 10.1109/TED.2006.870280
[10]
Wu W, Wu H Q, Gao B, et al. A methodology to improve linearity of analog RRAM for neuromorphic computing. 2018 IEEE Symposium on VLSI Technology, 2018, 103 doi: 10.1109/VLSIT.2018.8510690
[11]
Wang Q W, Park Y, Lu W D. Device variation effects on neural network inference accuracy in analog In-memory computing systems. Adv Intell Syst, 2022, 4, 2100199 doi: 10.1002/aisy.202100199
[12]
Ogawa S, Shiono N. Interface-trap generation induced by hot-hole injection at the Si-SiO2 interface. Appl Phys Lett, 1992, 61, 807 doi: 10.1063/1.107751
[13]
Choi W, Kwak M, Heo S, et al. Hardware neural network using hybrid synapses via transfer learning: WO x nano-resistors and TiO x RRAM synapse for energy-efficient edge-AI sensor. 2021 IEEE International Electron Devices Meeting (IEDM), 2021, 23.1. 1
[14]
Ali T, Seidel K, Kühnel K, et al. A novel dual ferroelectric layer based MFMFIS FeFET with optimal stack tuning toward low power and high-speed NVM for neuromorphic applications. 2020 IEEE Symposium on VLSI Technology, 2020, 1 doi: 10.1109/VLSITechnology18217.2020.9265111
[15]
Lue H T, Hsu P K, Wei M L, et al. Optimal design methods to transform 3D NAND flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN). 2019 IEEE International Electron Devices Meeting (IEDM), 2020, 38.1.1 doi: 10.1109/IEDM19573.2019.8993652
[16]
Malavena G, Spinelli A S, Compagnoni C M. Implementing spike-timing-dependent plasticity and unsupervised learning in a mainstream NOR flash memory array. 2018 IEEE International Electron Devices Meeting (IEDM), 2019, 2.3.1 doi: 10.1109/IEDM.2018.8614561
Fig. 1.  (Color online) Schematics of flash-based CIM architecture. The pulse time of Vg and the threshold voltage is individually mapped as vector and matrix, then the amount of charge can represent the result of MVM.

Fig. 2.  (Color online) (a) Schematic of adopted CHEI and HHI programming scheme. (b) The energy band diagram of CHEI and HHI programming scheme.

Fig. 3.  (Color online) The architecture of (a) ResNet 50 and (b) VGG 16 convolutional neural network.

Fig. 4.  (Color online) (a) The proposed scheme to improve both endurance and speed by optimizing the operation scheme for NN online training. (b) The comparison of the Vth tuning speed of FN tunneling and the HHI. (c) The high linearity and symmetric potentiation and depression process using the CHEI and the HHI combined methods.

Fig. 5.  (Color online) (a) The IV curves of the programmed/erased state before and after 109 cycles. (b) Enhancements of endurance at lower MW show the trade-off between MW and endurance. (c) SS value and (d) Ioff of different MW and cycles compared with the traditionalprogramming scheme, wherein each box contains 15 different memory cells.

Fig. 6.  (Color online) (a) Comparisons between the proposed scheme and the traditional scheme. (b) Read disturbance of different states after 109 cycles. (c) Applications in CIFAR-10 using ResNet50 and Vgg16. Even after 109 cycles, ~90% accuracy can be achieved for the CIFAR-10 task.

Table 1.   The benchmark of this work and various non-volatile CIM devices.

Ref Cell type On/off ratio Pgm.speed Endurance DR (s)
[8] PCM 104 108 106
[13] RRAM 103 1 μs 104
[14] FeFET 105 300 ns 105 104
[15] 3D flash 105 105 105
[16] Flash 102 10 μs 105
This work (optimized operation) Flash 106 (before cycles) 10 ns 109 105
DownLoad: CSV
[1]
Yao P, Wu H Q, Gao B, et al. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577, 641 doi: 10.1038/s41586-020-1942-4
[2]
Khwa W S, Akarvardar K, Chen Y S, et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. Proc IEEE Symp VLSI Technol, 2020, 1
[3]
Zhang W Y, Wang S C, Li Y, et al. Few-shot graph learning with robust and energy-efficient memory-augmented graph neural network (MAGNN) based on homogeneous computing-in-memory. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 224
[4]
Kumar S, Wang X X, Strachan J P, et al. Dynamical memristors for higher-complexity neuromorphic computing. Nat Rev Mater, 2022, 7, 575 doi: 10.1038/s41578-022-00434-z
[5]
Lu Y M, Li X, Yan B N, et al. In-memory realization of eligibility traces based on conductance drift of phase change memory for energy-efficient reinforcement learning. Adv Mater, 2022, 34, 2107811 doi: 10.1002/adma.202107811
[6]
Huang P, Zhou Z, Zhang Y, et al. Dual-configuration in-memory computing bitcells using SiOx RRAM for binary neural networks. APL Mater, 2019, 7, 081105 doi: 10.1063/1.5116863
[7]
Chang C C, Chen P C, Chou T, et al. Mitigating asymmetric nonlinear weight update effects in hardware neural network based on analog resistive synapse. IEEE J Emerg Sel Top Circuits Syst, 2018, 8, 116 doi: 10.1109/JETCAS.2017.2771529
[8]
Ravsher T, Garbin D, Fantini A, et al. Enhanced performance and low-power capability of SiGeAsSe-GeSbTe 1S1R phase-change memory operated in bipolar mode. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 312
[9]
Ielmini D, Ghetti A, Spinelli A S, et al. A study of hot-hole injection during programming drain disturb in flash memories. IEEE Trans Electron Devices, 2006, 53, 668 doi: 10.1109/TED.2006.870280
[10]
Wu W, Wu H Q, Gao B, et al. A methodology to improve linearity of analog RRAM for neuromorphic computing. 2018 IEEE Symposium on VLSI Technology, 2018, 103 doi: 10.1109/VLSIT.2018.8510690
[11]
Wang Q W, Park Y, Lu W D. Device variation effects on neural network inference accuracy in analog In-memory computing systems. Adv Intell Syst, 2022, 4, 2100199 doi: 10.1002/aisy.202100199
[12]
Ogawa S, Shiono N. Interface-trap generation induced by hot-hole injection at the Si-SiO2 interface. Appl Phys Lett, 1992, 61, 807 doi: 10.1063/1.107751
[13]
Choi W, Kwak M, Heo S, et al. Hardware neural network using hybrid synapses via transfer learning: WO x nano-resistors and TiO x RRAM synapse for energy-efficient edge-AI sensor. 2021 IEEE International Electron Devices Meeting (IEDM), 2021, 23.1. 1
[14]
Ali T, Seidel K, Kühnel K, et al. A novel dual ferroelectric layer based MFMFIS FeFET with optimal stack tuning toward low power and high-speed NVM for neuromorphic applications. 2020 IEEE Symposium on VLSI Technology, 2020, 1 doi: 10.1109/VLSITechnology18217.2020.9265111
[15]
Lue H T, Hsu P K, Wei M L, et al. Optimal design methods to transform 3D NAND flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN). 2019 IEEE International Electron Devices Meeting (IEDM), 2020, 38.1.1 doi: 10.1109/IEDM19573.2019.8993652
[16]
Malavena G, Spinelli A S, Compagnoni C M. Implementing spike-timing-dependent plasticity and unsupervised learning in a mainstream NOR flash memory array. 2018 IEEE International Electron Devices Meeting (IEDM), 2019, 2.3.1 doi: 10.1109/IEDM.2018.8614561
  • Search

    Advanced Search >>

    GET CITATION

    shu

    Export: BibTex EndNote

    Article Metrics

    Article views: 561 Times PDF downloads: 70 Times Cited by: 0 Times

    History

    Received: 14 July 2023 Revised: 09 September 2023 Online: Accepted Manuscript: 10 November 2023Uncorrected proof: 11 December 2023Published: 10 January 2024

    Catalog

      Email This Article

      User name:
      Email:*请输入正确邮箱
      Code:*验证码错误
      Yang Feng, Zhaohui Sun, Yueran Qi, Xuepeng Zhan, Junyu Zhang, Jing Liu, Masaharu Kobayashi, Jixuan Wu, Jiezhi Chen. Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance[J]. Journal of Semiconductors, 2024, 45(1): 012301. doi: 10.1088/1674-4926/45/1/012301 Y Feng, Z H Sun, Y R Qi, X P Zhan, J Y Zhang, J Liu, M Kobayashi, J X Wu, J Z Chen. Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance[J]. J. Semicond, 2024, 45(1): 012301. doi: 10.1088/1674-4926/45/1/012301Export: BibTex EndNote
      Citation:
      Yang Feng, Zhaohui Sun, Yueran Qi, Xuepeng Zhan, Junyu Zhang, Jing Liu, Masaharu Kobayashi, Jixuan Wu, Jiezhi Chen. Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance[J]. Journal of Semiconductors, 2024, 45(1): 012301. doi: 10.1088/1674-4926/45/1/012301

      Y Feng, Z H Sun, Y R Qi, X P Zhan, J Y Zhang, J Liu, M Kobayashi, J X Wu, J Z Chen. Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance[J]. J. Semicond, 2024, 45(1): 012301. doi: 10.1088/1674-4926/45/1/012301
      Export: BibTex EndNote

      Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

      doi: 10.1088/1674-4926/45/1/012301
      More Information
      • Author Bio:

        Yang Feng Yang Feng received a BEng degree from the School of Information Science and Engineering (ISE), Shandong University, in 2021, where she is currently pursuing a PhD degree with the School of Information Science and Engineering (ISE), Shandong University. Her focuses are on the design and simulation of computing-in-memory circuits and systems

        Jiezhi Chen Jiezhi Chen received a PhD degree from the Department of Informatics and Electronics, The University of Tokyo, in 2009. In 2010, he joined the Research and Development Center, Toshiba Corporation. He is currently a professor with the School of Information Science and Engineering, Shandong University, China. His research interests include the characterization and process engineering of nano-scale transistors and non-volatile memories, with a main focus on reliability physics

      • Corresponding author: jixuanwu@sdu.edu.cnchen.jiezhi@sdu.edu.cn
      • Received Date: 2023-07-14
      • Revised Date: 2023-09-09
      • Available Online: 2023-11-10

      Catalog

        /

        DownLoad:  Full-Size Img  PowerPoint
        Return
        Return