-
摘要: 針對龍芯中央處理器(CPU)無對應高性能服務器芯片組的現狀,設計開發了一種為龍芯CPU篩選芯片組的架構,并實現了一種龍芯CPU和芯片組適配的方法。提出了采用現場可編程門陣列(FPGA)串聯在龍芯CPU和即將適配的多組芯片組之間的架構。借助于此架構,設計實現了在CPU和芯片組之間待處理物理信號線的連接方法,設計了兩者之間上下電時序配合的調試方法,設計實現了規避兩者信號協議差異的方法。借助該架構和這些方法能夠實現同時篩選多款芯片組的目的,避免了以前需要設計多款主板進行適配的情況,節省了重復研發主板的成本;找到了可以適配龍芯CPU的高性能服務器芯片組;其芯片組規格參數和性能高于目前龍芯CPU所用的芯片組,開拓了其在服務器領域的應用。Abstract: The CPU is the core part of all integrated circuits. Although some homemade CPUs of proprietary intellectual property rights are rapidly developed, few high-performance chipsets are available, especially in server domains, to match them. Thus, the total systems designed using these CPUs and low-performance chipsets do not have proper performance. The Loongson CPU faces the same problem. To seek better chipsets for it, certain architecture and some methods are designed and implemented to adapt different types of chipsets. In this architecture, a field-programmable gate array (FPGA) is linked between a CPU and these chipsets. An FPGA is divided into three domains: an HT (hyper transport) bus domain, a processing domain for important but temporarily indeterminate signals, and a CPLD (complex programmable logic device) function domain. In these adaption processes, HT bus signals, the temporarily indeterminate signals, and power signals in CPUs and chipsets are respectively linked into three domains in an FPGA and treated by a programming FPGA to perform all types of possible signal combinations. The power sequence between the CPU and chipsets is coordinated to the right order using an FPGA. The signal integrity difference between them is avoided and trimmed to the right state by amending their signals in the FPGA. In this system, the experimental results show that this architecture and these methods simultaneously make more chipsets work together to be adapted than before in a single motherboard. This combination avoids researching and developing many different motherboards for every type of possible chipset and greatly reduces costs. High-performance server chipsets can be found to properly match the Loongson CPU and have better specifications and higher performance than those currently used for the Loongson CPU. A prototype system composed of the Loongson CPU and five types of chipsets is designed and implemented. Using the above architecture and methods, a type of optimal server chipsets SR5690 + SP5100 has been found, and the matching principles or correct settings for the signal connection and power sequence have been concluded. The Loongson 3B4000 two-way SMP motherboard with SR5690 + SP5100 chipsets is also produced. On this motherboard, the results of evaluation experiments on computing performance tests by the SPEC CPU 2006 program, storage performance tests by the IO zone program, and network performance tests by the Netperf program are performed. Compared with the current Loongson 3B4000 server with a 7A1000 chipset, the test results show the performance on three items is improved by approximately 10%. The combination of the Loongson CPU and this type of server chipset provides wider applications in the server market and promotes the development of the Loongson CPU in its ecosystem.
-
Key words:
- Loongson /
- chipsets /
- adaption /
- server /
- field-programmable gate array
-
表 1 HT總線的連接信號線
Table 1. Hyper transport bus link signals
Signal Width Description CAD 2, 4, 8, or 16 Command, addresses, and data (CAD). Carries HyperTransport? requests, responses, addresses, and data. CAD width can be different in each direction. CTL 1, 2, or 4 Differentiates control and data. Each byte of CAD has a control(CTL) signal in the Gen3 protocol. One CTL signal is used for an entire link in the Gen1 protocol. CLK 1, 2, or 4 Clocks(CLK)for the CAD and CTL signals. Each byte of CAD and its respective CTL signal has a separate clock signal. 表 2 HT總線的復位/初始化信號線
Table 2. Reset/Initialization signals of the HT bus
Signal Width Description PWROK 1 Power and clocks are stable RESET# 1 Reset the HyperTransport? chain 表 3 HT總線的電源管理信號線
Table 3. Power management signals
Signal Width Description LDTSTOP# 1 Enables and disables links during system state transitions LDTREQ# 1 Indicates link is active or requested by a device 表 4 芯片組規格對比
Table 4. Comparison of different chipset specifications
Item Features of 7A1000 Features of SR5690 + SP5100 HT bus HT3.0 × 16 HT3.0 × 16 PCIE 32 lanes 42 lanes SATA 3 × SATA2.0 6 × SATA2.0 USB Ports 6 × USB2.0 14 × USB2.0 RAS No Yes IOMMU No Yes 表 5 SPEC CPU2006性能對比
Table 5. Analysis of SPEC CPU2006 performance
Server int_speed_
baseint_rate_
basefp_speed_
basefp_rate_
base7A1000 server 12.30 78.07 12.02 74.90 SR5690+ SP5100 server 13.02 83.60 12.80 82.60 Performance improvement/% 6 7 6 10 表 6 IOZone性能對比
Table 6. Analysis of IOzone performance
Server 512 Byte read speed/ (MB·s?1)(Average of three results) 1 MB read speed/ (MB·s?1)(Average of three results) 512 Byte write speed/ (MB·s?1)(Average of three results) 1 MB write speed/ (MB·s?1)(Average of three results) 7A1000 server 38.56 696.31 1.25 306.76 SR5690+SP5100 server 43.19 800.76 1.53 383.45 Performance improvement/% 12 15 22 25 表 7 Netperf性能對比
Table 7. Analysis of Netperf performance
Server TCP Throughput/ (MB·s?1)
(Average of three results)TCP transfer rate/ (Times·s?1)
(Average of three results)UDP Throughput/ (MB·s?1)
(Average of three results)UDP transfer rate/ (Times·s?1)
(Average of three results)7A1000 server 850.51 8738.91 852.64 8999.10 SR5690+SP5100 server 935.56 9787.58 946.43 9989.00 Performance improvement/% 10 12 11 11 -
參考文獻
[1] Hu W W. Developing our own CPU should take the road of marketing driven technology. J Inf Secur Res, 2019, 5(5): 450胡偉武. 發展自主CPU應該走市場帶技術的道路. 信息安全研究, 2019, 5(5):450 [2] Ni G N. Adhere to the self-reliance and self-improvement of IT innovation system technology, build a powerful network country and digital China. J Inf Secur Res, 2021, 7(1): 2 doi: 10.3969/j.issn.2096-1057.2021.01.001倪光南. 堅持信創科技自立自強建設網絡強國和數字中國. 信息安全研究, 2021, 7(1):2 doi: 10.3969/j.issn.2096-1057.2021.01.001 [3] Ma W, Yao J B, Chang Y S, et al. Current situation and prospect of CPU development in China. Appl IC, 2019, 36(4): 5馬威, 姚靜波, 常永勝, 等. 國產CPU發展的現狀與展望. 集成電路應用, 2019, 36(4):5 [4] Xiong J, Xia Z P, Lin J, et al. Study of performance test scheme of information system based on domestic CPU and OS. Comput Eng, 2015, 41(12): 82 doi: 10.3969/j.issn.1000-3428.2015.12.016熊婧, 夏仲平, 林軍, 等. 基于國產CPU/OS的信息系統性能測試方案研究. 計算機工程, 2015, 41(12):82 doi: 10.3969/j.issn.1000-3428.2015.12.016 [5] Zhang Z G, Zheng N B, Zhou Z F, et al. The research and design of office information system based on homemade software and hardware. Comput Inf Technol, 2012, 20(5): 8 doi: 10.3969/j.issn.1005-1228.2012.05.003張忠革, 鄭年斌, 周澤峰, 等. 基于國產CPU/OS的辦公信息系統研究與設計. 電腦與信息技術, 2012, 20(5):8 doi: 10.3969/j.issn.1005-1228.2012.05.003 [6] Hu X D, Yang J X, Zhu Y. Shenwei-1600: a high-performance multi-core microprocessor. Sci Sin Information, 2015, 45(4): 513 doi: 10.1360/N112014-00295胡向東, 楊劍新, 朱英. 高性能多核處理器申威1600. 中國科學:信息科學, 2015, 45(4):513 doi: 10.1360/N112014-00295 [7] Shen J, Long B, Jiang H, et al. Implementation and optimization of vector trigonometric functions on phytium processors. J Comput Res Dev, 2020, 57(12): 2610 doi: 10.7544/issn1000-1239.2020.20190721沈潔, 龍標, 姜浩, 等. 飛騰處理器上向量三角函數的設計實現與優化. 計算機研究與發展, 2020, 57(12):2610 doi: 10.7544/issn1000-1239.2020.20190721 [8] Fang J B, Du Q, Tang T, et al. Performance comparison between FT-1500A and Intel XEON. Comput Eng Sci, 2019, 41(1): 1 doi: 10.3969/j.issn.1007-130X.2019.01.001方建濱, 杜琦, 唐滔, 等. 飛騰處理器與商用處理器性能比較. 計算機工程與科學, 2019, 41(1):1 doi: 10.3969/j.issn.1007-130X.2019.01.001 [9] Hu X D, Ke X M, Yin F, et al. Shenwei-26010: A high-performance many-core processor. J Comput Res Dev, 2021, 58(6): 1155 doi: 10.7544/issn1000-1239.2021.20201041胡向東, 柯希明, 尹飛, 等. 高性能眾核處理器申威26010. 計算機研究與發展, 2021, 58(6):1155 doi: 10.7544/issn1000-1239.2021.20201041 [10] Hong W J, Li K L, Quan Z, et al. PETSc's heterogeneous parallel algorithm design and performance optimization on the Sunway TaihuLight system. Chin J Comput, 2017, 40(9): 2057 doi: 10.11897/SP.J.1016.2017.02057洪文杰, 李肯立, 全哲, 等. 面向神威·太湖之光的PETSc可擴展異構并行算法及其性能優化. 計算機學報, 2017, 40(9):2057 doi: 10.11897/SP.J.1016.2017.02057 [11] Meng X F, Gao X, Cong M, et al. System performance optimization and analysis of Godson-3A multiprocessor. J Comput Res Dev, 2012, 49(Suppl 1): 137孟小甫, 高翔, 從明, 等. 龍芯3A多核處理器系統級性能優化與分析. 計算機研究與發展, 2012, 49(增刊1): 137 [12] Zhao H, Wan J W, Bao Z G, et al. Application of independent and controllable technology in test missions. J Spacecr TT&C Technol, 2015, 34(2): 109趙輝, 萬俊偉, 鮑忠貴, 等. 自主可控技術在試驗任務領域的應用研究. 飛行器測控學報, 2015, 34(2):109 [13] Yuan G X, Zhang Y Q, Yuan L. State of the art analysis of China HPC 2021. Comput Eng Sci, 2021, 43(12): 2091 doi: 10.3969/j.issn.1007-130X.2021.12.001袁國興, 張云泉, 袁良. 2021年中國高性能計算機發展現狀分析. 計算機工程與科學, 2021, 43(12):2091 doi: 10.3969/j.issn.1007-130X.2021.12.001 [14] Cai F, Shen H H, Gao X. The design and implementation of north-bridge used in Godson-2 prototype system. Chin High Technol Lett, 2010, 20(1): 61蔡飛, 沈海華, 高翔. 龍芯2號原型系統北橋的設計與實現. 高技術通訊, 2010, 20(1): 61 [15] Liu D, Li X, Xu S Y, et al. Design and implementation of homemade information processing platform. J Telem Track Command, 2018, 39(6): 7 doi: 10.3969/j.issn.2095-1000.2018.06.002劉達, 李鑫, 徐松艷, 等. 國產化信息處理平臺設計與實現. 遙測遙控, 2018, 39(6):7 doi: 10.3969/j.issn.2095-1000.2018.06.002 [16] Zhu S S, Lu Y K, Liu L, et al. Design of AIO security computer based on Loongson CPU. Ind Control Comput, 2020, 33(11): 16 doi: 10.3969/j.issn.1001-182X.2020.11.006朱書杉, 路永軻, 劉磊, 等. 基于龍芯處理器的一體式安全計算機設計. 工業控制計算機, 2020, 33(11):16 doi: 10.3969/j.issn.1001-182X.2020.11.006 [17] Zhao B, Yang M H, Liu W, et al. Research on security & trust computer based on Loongson CPU. Comput Technol Dev, 2015, 25(3): 126趙斌, 楊明華, 柳偉, 等. 基于龍芯處理器的自主可信計算機研究. 計算機技術與發展, 2015, 25(3):126 [18] Wu J. The Design of North-bridge Used in Godson System[Dissertation]. Hefei: University of Science and Technology of China, 2003武杰. 龍芯系統中的北橋設計[學位論文]. 合肥: 中國科技大學, 2003 [19] Evans A, Silburt A, Vrckovnik G, et al. Functional verification of large ASICs // Proceedings of the 35th annual Design Automation Conference. New York, 1998: 650 [20] Ganapathy G, Narayan R, Jorden G, et al. Hardware emulation for functional verification of K5 // Proceedings of the 33rd Design Automation Conference. Las Vegas, 1996: 315 [21] Ray J, Hoe J C. High-level modeling and FPGA prototyping of microprocessors // Proceedings of the 2003 ACM/SIGDA Eleventh International Symposium on Field Programmable Gate Arrays. Monterey, 2003: 100 [22] Li X B, Tang Z M, Li W. FPGA verification for heterogeneous multi-core processor. J Comput Res Dev, 2021, 58(12): 2684 doi: 10.7544/issn1000-1239.2021.20200289李小波, 唐志敏, 李文. 面向異構多核處理器的FPGA驗證. 計算機研究與發展, 2021, 58(12):2684 doi: 10.7544/issn1000-1239.2021.20200289 [23] Liu Y C, Wang J, Chen Y J, et al. Survey on computer system simulator. J Comput Res Dev, 2015, 52(1): 3 doi: 10.7544/issn1000-1239.2015.20140104劉雨辰, 王佳, 陳云霽, 等. 計算機系統模擬器研究綜述. 計算機研究與發展, 2015, 52(1):3 doi: 10.7544/issn1000-1239.2015.20140104 [24] Gateley J, Blatt M, Chen D, et al. UltraSPARC-I emulation // Proceedings of the 32nd ACM/IEEE Conference on Design Automation Conference. San Francisco, 1995: 13 [25] Zhou S J, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform // 2017 29th International Symposium on Computer Architecture and High Performance Computing. Campinas, 2017: 137 [26] Zhou H W, Xu S, Wang Z Y, et al. FPGA verification for memory link interface of many-core processor. J Natl Univ Def Technol, 2018, 40(3): 176 doi: 10.11887/j.cn.201803027周宏偉, 徐實, 王忠奕, 等. 眾核處理器訪存鏈路接口的FPGA驗證. 國防科技大學學報, 2018, 40(3):176 doi: 10.11887/j.cn.201803027 [27] Pang K, Shi Z F, Zhou J H, et al. Network topology exploration of coarse-grained reconfigurable architecture based on FPGA. J Tianjin Univ Sci Technol, 2018, 51(5): 507龐科, 史再峰, 周佳慧, 等. 基于FPGA的粗粒度可重構系統拓撲網絡結構開發. 天津大學學報(自然科學與工程技術版), 2018, 51(5):507 [28] Liu Y F, Liu P, Jiang Y T, et al. Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification. Int J Electron, 2010, 97(10): 1241 doi: 10.1080/00207217.2010.512017 [29] Hu W W, Wang J, Gao X, et al. Godson-3: A scalable multicore RISC processor with x86 emulation. IEEE Micro, 2009, 29(2): 17 doi: 10.1109/MM.2009.30 [30] Kalla R, Sinharoy B, Tendler J M. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro, 2004, 24(2): 40 doi: 10.1109/MM.2004.1289290 [31] Kongetira P, Aingaran K, Olukotun K. Niagara: a 32-way multithreaded Sparc processor. IEEE Micro, 2005, 25(2): 21 doi: 10.1109/MM.2005.35 [32] Chen X M, Jha N K. A 3-D CPU-FPGA-DRAM hybrid architecture for low-power computation. IEEE Trans Very Large Scale Integr (VLSI)Syst, 2016, 24(5): 1649 doi: 10.1109/TVLSI.2015.2483525 [33] Wang H D, Gao X, Chen Y J, et al. Interconnection of Godson-3 multi-core processor. J Comput Res Dev, 2008, 45(12): 2001王煥東, 高翔, 陳云霽, 等. 龍芯3號互聯系統的設計與實現. 計算機研究與發展, 2008, 45(12):2001 [34] Feng K K, Jia F, Du X J, et al. Design and realization of HT interconnection and memory fault diagnosis method for Loongson-3 mainboard. Comput Meas Control, 2020, 28(6): 1馮珂珂, 賈凡, 杜曉杰, 等. 龍芯3號板卡HT互聯及內存故障診斷方法的設計與實現. 計算機測量與控制, 2020, 28(6):1 -