Cui, Yang; Zhang, Juan (2025) MFEAM: Multi-View Feature Enhanced Attention Model for Image Captioning. Applied Sciences, 15 (15). doi:10.3390/app15158368
Reference Type | Journal (article/letter/editorial) | ||
---|---|---|---|
Title | MFEAM: Multi-View Feature Enhanced Attention Model for Image Captioning | ||
Journal | Applied Sciences | ||
Authors | Cui, Yang | Author | |
Zhang, Juan | Author | ||
Year | 2025 (July 28) | Volume | 15 |
Issue | 15 | ||
Publisher | MDPI AG | ||
DOI | doi:10.3390/app15158368Search in ResearchGate | ||
Generate Citation Formats | |||
Mindat Ref. ID | 18781277 | Long-form Identifier | mindat:1:5:18781277:3 |
GUID | 0 | ||
Full Reference | Cui, Yang; Zhang, Juan (2025) MFEAM: Multi-View Feature Enhanced Attention Model for Image Captioning. Applied Sciences, 15 (15). doi:10.3390/app15158368 | ||
Plain Text | Cui, Yang; Zhang, Juan (2025) MFEAM: Multi-View Feature Enhanced Attention Model for Image Captioning. Applied Sciences, 15 (15). doi:10.3390/app15158368 | ||
In | (2025, July) Applied Sciences Vol. 15 (15). MDPI AG |
References Listed
These are the references the publisher has listed as being connected to the article. Please check the article itself for the full list of references which may differ. Not all references are currently linkable within the Digital Library.
Not Yet Imported: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - proceedings-article : 10.1109/CVPR.2015.7298935 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - proceedings-article : 10.1109/CVPR.2015.7298932 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2018.00583 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition - proceedings-article : 10.1109/CVPR.2018.00636 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - proceedings-article : 10.1109/CVPR.2017.131 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - journal-article : 10.1016/j.neucom.2020.03.087 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - report : 10.21236/ADA623249 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - proceedings-article : 10.1145/2964284.2964299 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR.2017.345 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - journal-article : 10.1016/j.neucom.2018.08.069 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18–24). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning, PmLR, Virtual. | |
Not Yet Imported: - journal-article : 10.1016/j.eswa.2022.117174 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: 2022 26th International Conference on Pattern Recognition (ICPR) - proceedings-article : 10.1109/ICPR56361.2022.9955644 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Zhang (2024) Int. J. Comput. Appl. Mobilenet V3-transformer, a lightweight model for image caption 46, 1 | |
Chen, J., Ge, C., Xie, E., Wu, Y., Yao, L., Ren, X., Wang, Z., Luo, P., Lu, H., and Li, Z. PIXART-sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation. Proceedings of the European Conference on Computer Vision. | |
Moratelli, N., Caffagni, D., Cornia, M., Baraldi, L., and Cucchiara, R. (2024). Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization. arXiv. | |
Wang, F., Mei, J., and Yuille, A. Sclip: Rethinking self-attention for dense vision-language inference. Proceedings of the European Conference on Computer Vision. | |
Moratelli, N., Cornia, M., Baraldi, L., and Cucchiara, R. Fluent and Accurate Image Captioning with a Self-Trained Reward Model. Proceedings of the International Conference on Pattern Recognition. | |
Tarvainen, A., and Valpola, H. (2017, January 4–9). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Proceedings of the 31st International Conference on Neural Information Processing System, Long Beach, CA, USA. | |
Gu, Y., Dong, L., Wei, F., and Huang, M. (2024, January 7–11). MiniLLM: Knowledge distillation of large language models. Proceedings of the Twelfth International Conference on Learning Representations, Vienna Austria. | |
Kang (2024) Adv. Neural Inf. Process. Syst. Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks 36, 48573 | |
Li, Z., Li, X., Fu, X., Zhang, X., Wang, W., Chen, S., and Yang, J. (2024, January 16–22). Promptkd: Unsupervised prompt distillation for vision-language models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. | |
Nguyen (2024) Adv. Neural Inf. Process. Syst. Improving multimodal datasets with image captioning 36, 22047 | |
Mahmoud, A., Elhoushi, M., Abbas, A., Yang, Y., Ardalani, N., Leather, H., and Morcos, A.S. (2024, January 16–22). Sieve: Multimodal dataset pruning using image captioning models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. | |
Awadalla, A., Xue, L., Shu, M., Yan, A., Wang, J., Purushwalkam, S., Shen, S., Lee, H., Lo, O., and Park, J.S. (2024). BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions. arXiv. | |
Yu, Q., Sun, Q., Zhang, X., Cui, Y., Zhang, F., Cao, Y., Wang, X., and Liu, J. (2024, January 16–20). Capsfusion: Rethinking image-text data at scale. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. | |
Chen, L., Li, J., Dong, X., Zhang, P., He, C., Wang, J., Zhao, F., and Lin, D. Sharegpt4v: Improving large multi-modal models with better captions. Proceedings of the European Conference on Computer Vision. | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR52688.2022.01949 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - journal-article : 10.1007/s00530-022-01036-z If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Yang (2024) IEEE Trans. Geosci. Remote Sens. Bootstrapping interactive image-text alignment for remote sensing image captioning 62, 1 | |
Not Yet Imported: - journal-article : 10.1007/s11042-024-18150-x If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - book-chapter : 10.1007/978-3-030-01264-9_42 If you would like this item imported into the Digital Library, please contact us quoting Book ID 9783030012632 | |
Vaswani (2017) Adv. Neural Inf. Process. Syst. Attention is all you need 17, 6000 | |
Hinton, G. (2015). Distilling the Knowledge in a Neural Network. arXiv. | |
Not Yet Imported: - proceedings-article : 10.1109/ICCVW54120.2021.00350 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Sameni, S., Kafle, K., Tan, H., and Jenni, S. (2024, January 16–22). Building Vision-Language Models on Solid Foundations with Masked Distillation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. | |
Ren (2025) Appl. Intell. EDIR: An expert method for describing image regions based on knowledge distillation and triple fusion 55, 62 | |
Not Yet Imported: - proceedings-article : 10.1109/CVPR42600.2020.00483 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Bajpai, D.J., and Hanawal, M.K. (2024). CAPEEN: Image Captioning with Early Exits and Knowledge Distillation. arXiv. | |
![]() | |
Xiao, B., Wu, H., Xu, W., Dai, X., Hu, H., Lu, Y., Zeng, M., Liu, C., and Yuan, L. (2024, January 16–22). Florence-2: Advancing a unified representation for a variety of vision tasks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. | |
Not Yet Imported: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - proceedings-article : 10.1109/CVPR.2015.7299087 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - book-chapter : 10.1007/978-3-319-10602-1_48 If you would like this item imported into the Digital Library, please contact us quoting Book ID 9783319106014 | |
Not Yet Imported: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02 - proceedings-article : 10.3115/1073083.1073135 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Banerjee, S., and Lavie, A. (2005, January 29). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA. | |
Lin, C.Y. (2004, January 22). Rouge: A package for automatic evaluation of summaries. Proceedings of the Text Summarization Branches Out, Barcelona, Spain. | |
Not Yet Imported: - book-chapter : 10.1007/978-3-319-46454-1_24 If you would like this item imported into the Digital Library, please contact us quoting Book ID 9783319464534 | |
Not Yet Imported: - proceedings-article : 10.18653/v1/P16-1162 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Yao, T., Pan, Y., Li, Y., and Mei, T. (November, January 27). Hierarchy parsing for image captioning. Proceedings of the IEEE/CVF INTERNATIONAL Conference on Computer Vision, Seoul, Republic of Korea. | |
Wu, M., Zhang, X., Sun, X., Zhou, Y., Chen, C., Gu, J., Sun, X., and Ji, R. (November, January 27). Difnet: Boosting visual information flow for image captioning. Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition, Seoul, Republic of Korea. | |
Huang, L., Wang, W., Chen, J., and Wei, X.Y. (November, January 27). Attention on attention for image captioning. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea. | |
Not Yet Imported: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - proceedings-article : 10.1109/CVPR42600.2020.01098 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - journal-article : 10.1609/aaai.v35i3.16328 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - proceedings-article : 10.1109/CVPR46437.2021.01521 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: - journal-article : 10.1007/s00530-023-01230-7 If you would like this item imported into the Digital Library, please contact us quoting Journal ID | |
Not Yet Imported: Applied Intelligence - journal-article : 10.1007/s10489-022-03624-y If you would like this item imported into the Digital Library, please contact us quoting Journal ID |
See Also
These are possibly similar items as determined by title/reference text matching only.
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() | |
![]() |