Citation: |
Xiping Jiang, Fujun Bai, Song Wang, Yixin Guo, Fengguo Zuo, Wenwu Xiao, Yubing Wang, Jianguo Yang, Ming Liu. SSA-over-array (SSoA): A stacked DRAM architecture for near-memory computing[J]. Journal of Semiconductors, 2024, 45(10): 102201. doi: 10.1088/1674-4926/24050004
****
X P Jiang, F J Bai, S Wang, Y X Guo, F G Zuo, W W Xiao, Y B Wang, J G Yang, and M Liu, SSA-over-array (SSoA): A stacked DRAM architecture for near-memory computing[J]. J. Semicond., 2024, 45(10), 102201 doi: 10.1088/1674-4926/24050004
|
SSA-over-array (SSoA): A stacked DRAM architecture for near-memory computing
DOI: 10.1088/1674-4926/24050004
CSTR: 32376.14.1674-4926.24050004
More Information-
Abstract
Aiming to enhance the bandwidth in near-memory computing, this paper proposes a SSA-over-array (SSoA) architecture. By relocating the secondary sense amplifier (SSA) from dynamic random access memory (DRAM) to the logic die and repositioning the DRAM-to-logic stacking interface closer to the DRAM core, the SSoA overcomes the layout and area limitations of SSA and master DQ (MDQ), leading to improvements in DRAM data-width density and frequency, significantly enhancing bandwidth density. The quantitative evaluation results show a 70.18 times improvement in bandwidth per unit area over the baseline, with a maximum bandwidth of 168.296 Tbps/Gb. We believe the SSoA is poised to redefine near-memory computing development strategies.-
Keywords:
- near-memory,
- vertical stacking,
- SSA,
- bandwidth density
-
References
[1] Sebastian A, Le Gallo M, Khaddam-Aljameh R, et al. Memory devices and applications for in-memory computing. Nat Nanotechnol, 2020, 15, 529 doi: 10.1038/s41565-020-0655-z[2] Ielmini D, Wong H S P. In-memory computing with resistive switching devices. Nat Electron, 2018, 1, 333 doi: 10.1038/s41928-018-0092-2[3] Khan K, Pasricha S, Kim R G. A survey of resource management for processing-In-memory and near-memory processing architectures. J Low Power Electron Appl, 2020, 10, 30 doi: 10.3390/jlpea10040030[4] Santoro G, Turvani G, Graziano M. New logic-in-memory paradigms: An architectural and technological perspective. Micromachines, 2019, 10, 368 doi: 10.3390/mi10060368[5] Horowitz M. Computing’s energy problem (and what we can do about it). 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, 10 doi: 10.1109/ISSCC.2014.6757323[6] Ke L, Zhang X, So J, et al. Near-memory processing in action: Accelerating personalized recommendation with AxDIMM. IEEE Micro, 2022, 42, 116 doi: 10.1109/MM.2021.3097700[7] Mutlu O, Ghose S, Gómez-Luna J, et al. Processing data where it makes sense: Enabling in-memory computation. Microprocess Microsyst, 2019, 67, 28 doi: 10.1016/j.micpro.2019.01.009[8] Li Y J, Tang J S, Gao B, et al. Monolithic three-dimensional integration of RRAM-based hybrid memory architecture for one-shot learning. Nat Commun, 2023, 14, 7140 doi: 10.1038/s41467-023-42981-1[9] Sun Z, Kvatinsky S, Si X, et al. A full spectrum of computing-in-memory technologies. Nat Electron, 2023, 6, 823 doi: 10.1038/s41928-023-01053-4[10] Tian W C, Li B, Li Z, et al. Using chiplet encapsulation technology to achieve processing-in-memory functions. Micromachines, 2022, 13, 1790 doi: 10.3390/mi13101790[11] Lee S, Cho H, Son Y H, et al. Leveraging power-performance relationship of energy-efficient modern DRAM devices. IEEE Access, 2018, 6, 31387 doi: 10.1109/ACCESS.2018.2845861[12] Spessot A, Oh H. 1T-1C dynamic random access memory status, challenges, and prospects. IEEE Trans Electron Devices, 2020, 67, 1382 doi: 10.1109/TED.2020.2963911[13] Singh G, Chelini L, Corda S, et al. A review of near-memory computing architectures: Opportunities and challenges. 2018 21st Euromicro Conference on Digital System Design (DSD), 2018, 608 doi: 10.1109/DSD.2018.00106[14] Ravichandiran P P, Franzon P D. A review of 3D-dynamic random-access memory based near-memory computation. 2021 IEEE International 3D Systems Integration Conference (3DIC), 2021, 1 doi: 10.1109/3DIC52383.2021.9687615[15] Yu Y, Jha N K. Energy-efficient monolithic three-dimensional on-chip memory architectures. IEEE Trans Nanotechnol, 2018, 17, 620 doi: 10.1109/TNANO.2017.2731871[16] Asad A, Kaur R, Mohammadi F. A survey on memory subsystems for deep neural network accelerators. Future Internet, 2022, 14, 146 doi: 10.3390/fi14050146[17] Shiba K, Okada M, Kosuge A, et al. A 7-nm FinFET 1.2-TB/s/mm2 3D-stacked SRAM module with 0.7-pJ/b inductive coupling interface using over-SRAM coil and manchester-encoded synchronous transceiver. IEEE J Solid-State Circuits, 2023, 58, 2075 doi: 10.1109/JSSC.2022.3224421[18] Sohn K, Yun W J, Oh R, et al. A 1.2 V20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE J Solid-State Circuits, 2017, 52, 250 doi: 10.1109/JSSC.2016.2602221[19] OH C S, Chun K C, Byun Y Y, et al. A 1.1V 16GB 640GB/s HBM2E DRAM with a data-bus window-extension technique and a synergetic on-die ECC scheme. 2020 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2020, 1, 1 doi: 10.1109/ISSCC19947.2020.9063110[20] Park M J, Lee J, Cho K, et al. A 192-gb 12-high 896-GB/s HBM3 DRAM with a TSV auto-calibration scheme and machine-learning-based layout optimization. IEEE J Solid-State Circuits, 2023, 58, 256 doi: 10.1109/JSSC.2022.3193354[21] Farmahini-Farahani A, Ahn J H, Morrow K, et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 2015, 283 doi: 10.1109/HPCA.2015.7056040[22] Asghari-Moghaddam H, Farmahini-Farahani A, Morrow K, et al. Near-DRAM acceleration with single-ISA heterogeneous processing in standard memory modules. IEEE Micro, 2016, 36, 24 doi: 10.1109/MM.2016.8[23] Wang S, Yu B, Xiao W W, et al. A 135 GBps/Gbit 0.66 pJ/bit stacked embedded DRAM with multilayer arrays by fine pitch hybrid bonding and mini-TSV. 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023, 1 doi: 10.23919/VLSITechnologyandCir57934.2023.10185427[24] Niu D M, Li S C, Wang Y H, et al. 184QPS/W 64Mb/mm23D logic-to-DRAM hybrid bonding with process-near-memory engine for recommendation system. 2022 IEEE International Solid-State Circuits Conference (ISSCC), 2022, 1 doi: 10.1109/ISSCC42614.2022.9731694[25] Bai F J, Jiang X P, Wang S, et al. A stacked embedded DRAM array for LPDDR4/4X using hybrid nonding 3D Integration with 34GB/s/1Gb 0.88pJ/b logic-to-memory interface. 2020 IEEE International Electron Devices Meeting (IEDM), 2020, 1, 6.6.1 doi: 10.1109/IEDM13553.2020.9372039[26] Wang S, Jiang X P, Bai F J, et al. A true process-heterogeneous stacked embedded DRAM structure based on wafer-level hybrid bonding. Electronics, 2023, 12, 1077 doi: 10.3390/electronics12051077[27] Park N, Ryu S, Kung J, et al. High-throughput near-memory processing on CNNs with 3D HBM-like memory. ACM Trans on Design Automation of Electronic Systems (TODAES), 2021, 26, 1 doi: 10.1145/3460971[28] Kwon J, Wen S J, Fung R, et al. Temperature estimation of HBM2 channels with tail distribution of retention errors in FPGA-HBM2 platform. Electronics, 2022, 12, 32 doi: 10.3390/electronics12010032[29] Farmahini-Farahani A, Gurumurthi S, Loh G, et al. Challenges of high-capacity DRAM stacks and potential directions. Proceedings of the Workshop on Memory Centric High Performance Computing, 2018, 142535 doi: 10.1145/3286475.32864[30] Son K, Kim S, Park H, et al. Thermal and signal integrity co-design and verification of embedded cooling structure with thermal transmission line for high bandwidth memory module. IEEE Trans Compon Packag Manuf Technol, 2022, 12, 1542 doi: 10.1109/TCPMT.2022.3201673[31] Jiang X P, Jia X R, Wang S, et al. A cross-process signal integrity analysis (CPSIA) method and design optimization for wafer-on-wafer stacked DRAM. Micromachines, 2024, 15, 557 doi: 10.3390/mi15050557 -
Proportional views