Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137
Reference Type | Journal (article/letter/editorial) | ||
---|---|---|---|
Title | Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator | ||
Journal | Parallel Computing | ||
Authors | Xing, Siyang | Author | |
Li, Youmeng | Author | ||
Deng, Zikun | Author | ||
Zheng, Qijun | Author | ||
Lu, Zeyu | Author | ||
Wang, Qinglin | Author | ||
Year | 2025 (June) | Volume | 124 |
Publisher | Elsevier BV | ||
DOI | doi:10.1016/j.parco.2025.103137Search in ResearchGate | ||
Generate Citation Formats | |||
Mindat Ref. ID | 18378314 | Long-form Identifier | mindat:1:5:18378314:5 |
GUID | 0 | ||
Full Reference | Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137 | ||
Plain Text | Xing, Siyang; Li, Youmeng; Deng, Zikun; Zheng, Qijun; Lu, Zeyu; Wang, Qinglin (2025) Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator. Parallel Computing, 124. doi:10.1016/j.parco.2025.103137 | ||
In | (2025) Parallel Computing Vol. 124. Elsevier BV |
References Listed
These are the references the publisher has listed as being connected to the article. Please check the article itself for the full list of references which may differ. Not all references are currently linkable within the Digital Library.
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2017.243 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2016.91 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Han (2015) Adv. Neural Inf. Process. Syst. Learning both weights and connections for efficient neural network 28 | |
Not Yet Imported: - proceedings-article : 10.1109/ICCV.2017.298 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2016.435 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
San Juan (2020) High performance and portable convolution operators for multicore processors , 91 | |
Not Yet Imported: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services - proceedings-article : 10.1145/3498361.3538940 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: ACM Transactions on Architecture and Code Optimization - journal-article : 10.1145/3570305 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Li (2016) Performance analysis of GPU-based convolutional neural networks , 67 | |
Kim (2017) Performance analysis of CNN frameworks for GPUs , 55 | |
Not Yet Imported: - journal-article : 10.1109/TVLSI.2018.2815603 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - journal-article : 10.1109/TVLSI.2020.3002779 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: IEEE Micro - journal-article : 10.1109/MM.2013.129 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Zhang (2018) High performance zero-memory overhead direct convolutions , 5776 | |
Not Yet Imported: - journal-article : 10.1145/3625004 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
![]() | |
Das (2016) | |
Georganas (2018) Anatomy of high-performance deep learning convolutions on simd architectures , 830 | |
Simonyan (2014) | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2016.90 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2015.7298594 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Wang (2023) J. Natl. Univ. Déf. Technol. Evaluating matrix multiplication-based convolution algorithm on multi-core digital signal processors 45, 86 | |
Not Yet Imported: - journal-article : 10.1109/TVLSI.2018.2825145 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Bytyn (2019) An application-specific VLIW processor with vector instruction set for CNN acceleration , 1 | |
Lee (2017) Parallel deep convolutional neural network training by exploiting the overlapping of computation and communication , 183 | |
Chen (2020) Hardware acceleration implementation of three-dimensional convolutional neural network on vector digital signal processors , 122 | |
Not Yet Imported: IEEE Transactions on Parallel and Distributed Systems - journal-article : 10.1109/TPDS.2018.2877359 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, et al., TVM: An automated End-to-End optimizing compiler for deep learning, in: 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 18, 2018, pp. 578–594. | |
Rasch (2019) Generating portable high-performance code via multi-dimensional homomorphisms , 354 | |
Kim (2019) A code generator for high-performance tensor contractions on GPUs , 85 | |
Not Yet Imported: - journal-article : 10.1145/3355606 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Yin (2022) Optimizing irregular-shaped matrix-matrix multiplication on multi-core DSPs , 451 | |
Not Yet Imported: - journal-article : 10.1145/1356052.1356053 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Zhang (2018) J. Zhejiang Univ. (Eng. Science) Parallel computing method for two-dimensional matrix convolution 52, 515 | |
Wang (2020) J. Comput. Res. Dev. Optimizing winograd-based fast convolution algorithm on phytium multi-core CPUs 57, 1140 | |
Wang (2020) Optimizing FFT-based convolution on ARMv8 multi-core CPUs , 248 | |
Wang (2019) Parallel convolution algorithm using implicit matrix multiplication on multi-core cpus , 1 | |
![]() |
See Also
These are possibly similar items as determined by title/reference text matching only.
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() |