Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137

Library Home Bookshelves View by Type Using Search

Books Catalogs/Sales Lists Journals Reports Thesis/Dissertation

Search for Books Search for Journals Manage Subjects Statistics Books without DDC/LCC Top Unstructured Orphaned Articles

Bookshelves (DDC layout)Bookshelves (LCC layout)Latest Books

Advanced

Search inside 'Parallel Computing' only

- Only viewable:

Reference Type	Journal (article/letter/editorial)
Title	Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator
Journal	Parallel Computing
Authors	Xing, Siyang		Author
	Li, Youmeng		Author
	Deng, Zikun		Author
	Zheng, Qijun		Author
	Lu, Zeyu		Author
	Wang, Qinglin		Author
Year	2025 (June)	Volume	124
Publisher	Elsevier BV
DOI	doi:10.1016/j.parco.2025.103137Search in ResearchGate
	Generate Citation Formats
Mindat Ref. ID	18378314	Long-form Identifier	mindat:1:5:18378314:5
GUID	0
Full Reference	Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137
Plain Text	Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137
In	(2025) Parallel Computing Vol. 124. Elsevier BV

References Listed

These are the references the publisher has listed as being connected to the article. Please check the article itself for the full list of references which may differ. Not all references are currently linkable within the Digital Library.

	Not Yet Imported: - proceedings-article : 10.1109/CVPR.2017.243 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Not Yet Imported: - proceedings-article : 10.1109/CVPR.2016.91 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Han (2015) Adv. Neural Inf. Process. Syst. Learning both weights and connections for efficient neural network 28
	Not Yet Imported: - proceedings-article : 10.1109/ICCV.2017.298 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Not Yet Imported: - proceedings-article : 10.1109/CVPR.2016.435 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	San Juan (2020) High performance and portable convolution operators for multicore processors , 91
	Not Yet Imported: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services - proceedings-article : 10.1145/3498361.3538940 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Not Yet Imported: ACM Transactions on Architecture and Code Optimization - journal-article : 10.1145/3570305 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Li (2016) Performance analysis of GPU-based convolutional neural networks , 67
	Kim (2017) Performance analysis of CNN frameworks for GPUs , 55
	Not Yet Imported: - journal-article : 10.1109/TVLSI.2018.2815603 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Not Yet Imported: - journal-article : 10.1109/TVLSI.2020.3002779 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Not Yet Imported: IEEE Micro - journal-article : 10.1109/MM.2013.129 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Zhang (2018) High performance zero-memory overhead direct convolutions , 5776
	Not Yet Imported: - journal-article : 10.1145/3625004 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Sousa, Rafael, Pereira, Marcio, Kwon, Yongin, Kim, Taeho, Jung, Namsoon, Kim, Chang Soo, Frank, Michael, Araujo, Guido (2023) Tensor slicing and optimization for multicore NPUs. Journal of Parallel and Distributed Computing, 175. 66-79 doi:10.1016/j.jpdc.2022.12.008
	Das (2016)
	Georganas (2018) Anatomy of high-performance deep learning convolutions on simd architectures , 830
	Simonyan (2014)
	Not Yet Imported: - proceedings-article : 10.1109/CVPR.2016.90 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Not Yet Imported: - proceedings-article : 10.1109/CVPR.2015.7298594 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Wang (2023) J. Natl. Univ. Déf. Technol. Evaluating matrix multiplication-based convolution algorithm on multi-core digital signal processors 45, 86
	Not Yet Imported: - journal-article : 10.1109/TVLSI.2018.2825145 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Bytyn (2019) An application-specific VLIW processor with vector instruction set for CNN acceleration , 1
	Lee (2017) Parallel deep convolutional neural network training by exploiting the overlapping of computation and communication , 183
	Chen (2020) Hardware acceleration implementation of three-dimensional convolutional neural network on vector digital signal processors , 122
	Not Yet Imported: IEEE Transactions on Parallel and Distributed Systems - journal-article : 10.1109/TPDS.2018.2877359 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, et al., TVM: An automated End-to-End optimizing compiler for deep learning, in: 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 18, 2018, pp. 578–594.
	Rasch (2019) Generating portable high-performance code via multi-dimensional homomorphisms , 354
	Kim (2019) A code generator for high-performance tensor contractions on GPUs , 85
	Not Yet Imported: - journal-article : 10.1145/3355606 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Yin (2022) Optimizing irregular-shaped matrix-matrix multiplication on multi-core DSPs , 451
	Not Yet Imported: - journal-article : 10.1145/1356052.1356053 If you would like this item imported into the Digital Library, please contact us quoting Journal ID
	Zhang (2018) J. Zhejiang Univ. (Eng. Science) Parallel computing method for two-dimensional matrix convolution 52, 515
	Wang (2020) J. Comput. Res. Dev. Optimizing winograd-based fast convolution algorithm on phytium multi-core CPUs 57, 1140
	Wang (2020) Optimizing FFT-based convolution on ARMv8 multi-core CPUs , 248
	Wang (2019) Parallel convolution algorithm using implicit matrix multiplication on multi-core cpus , 1
	Liu, Zhong, Xiao, Xin, Li, Chen, Ma, Sheng, Rangyu, Deng (2022) Optimizing convolutional neural networks on multi-core vector accelerator. Parallel Computing, 112. 102945 doi:10.1016/j.parco.2022.102945

	Liu, Zhong, Xiao, Xin, Li, Chen, Ma, Sheng, Rangyu, Deng (2022) Optimizing convolutional neural networks on multi-core vector accelerator. Parallel Computing, 112. 102945 doi:10.1016/j.parco.2022.102945
	Wang, Hao, Yu, Ce, Xiao, Jian, Tang, Shanjiang, Lu, Yu, Fu, Hao, Kang, Bo, Zheng, Gang, Cui, Chenzhou (2022) A method for efficient radio astronomical data gridding on multi-core vector processor. Parallel Computing, 113. 102972 doi:10.1016/j.parco.2022.102972
	*Jacobsen, Dana A., Senocak, Inanc (2013) Multi-level parallelism for incompressible flow computations on GPU clusters. Parallel Computing, 39 (1). 1-20 doi:10.1016/j.parco.2012.10.002*
	Blagojevic, Filip, Nikolopoulos, Dimitrios S., Stamatakis, Alexandros, Antonopoulos, Christos D., Curtis-Maury, Matthew (2007) Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Computing, 33 (10). 700-719 doi:10.1016/j.parco.2007.09.004
	*Krotkiewski, M., Dabrowski, M. (2010) Parallel symmetric sparse matrix–vector product on scalar multi-core CPUs. Parallel Computing, 36 (4). 181-198 doi:10.1016/j.parco.2010.02.003*
	*Vu, Lan, Alaghband, Gita (2014) Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Computing, 40 (10). 768-785 doi:10.1016/j.parco.2014.08.003*
	*Badr, Mario, Enright Jerger, Natalie (2018) A high-level model for exploring multi-core architectures. Parallel Computing, 80. 23-35 doi:10.1016/j.parco.2018.10.006*
	Hussain, Md Maruf, Fujimoto, Noriyuki (2020) GPU-based parallel multi-objective particle swarm optimization for large swarms and high dimensional problems. Parallel Computing, 92. 102589pp. doi:10.1016/j.parco.2019.102589
	Radulović, Milan B., Girbal, Sylvain, Tomašević, Milo V. (2017) Low-level implementation of the SISC protocol for thread-level speculation on a multi-core architecture. Parallel Computing, 67. 1-19 doi:10.1016/j.parco.2017.07.007
	Li, Kuan, He, Kang, Graillat, Stef, Jiang, Hao, Gu, Tongxiang, Liu, Jie (2023) Multi-level parallel multi-layer block reproducible summation algorithm. Parallel Computing, 115. 102996 doi:10.1016/j.parco.2023.102996
	*Gerbessiotis, Alexandros V. (2015) Extending the BSP model for multi-core and out-of-core computing: MBSP. Parallel Computing, 41. 90-102 doi:10.1016/j.parco.2014.12.002*

Mindat.org is an outreach project of the Hudson Institute of Mineralogy, a 501(c)(3) not-for-profit organization.
Copyright © mindat.org and the Hudson Institute of Mineralogy 1993-2025, except where stated. Most political location boundaries are © OpenStreetMap contributors. Mindat.org relies on the contributions of thousands of members and supporters. Founded in 2000 by Jolyon Ralph.
To cite: Ralph, J., Von Bargen, D., Martynov, P., Zhang, J., Que, X., Prabhu, A., Morrison, S. M., Li, W., Chen, W., & Ma, X. (2025). Mindat.org: The open access mineralogy database to accelerate data-intensive geoscience research. American Mineralogist, 110(6), 833–844. doi:10.2138/am-2024-9486.
Privacy Policy - Terms & Conditions - Contact Us / DMCA issues - Report a bug/vulnerability Current server date and time: September 6, 2025 02:46:34

Go to top of page

Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137

References Listed

See Also