67891011129 of 70
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Vision-Language Models in Medical Imaging for Cancer Diagnosis: A Bibliometric Review
Abubakar Tafawa Balewa University, Nigeria.
Federal University Dutsin-Ma, Nigeria.
Euro-Mediterranean University of Fez, Morocco.
Victoria University of Wellington, New Zealand.
Show others and affiliations
2026 (English)In: Bioengineering, E-ISSN 2306-5354, Vol. 13, no 4, article id 466Article, review/survey (Refereed) Published
Abstract [en]

The demand for advanced detection methods and accurate staging remains a global challenge in cancer diagnosis. Even though traditional deep learning models in medical imaging achieve high precision, they suffer from limited explainability and multimodal reasoning due to their black-box nature, thereby limiting their clinical applicability. To address this gap, recent research has increasingly explored multimodal approaches that integrate visual and textual clinical data to enhance diagnostic accuracy and interpretability. This study presents a bibliometric analysis of 408 publications from 2021 to 2025, collected from Web of Science and Scopus, using VOSviewer and R-Bibliometrix to map citation networks, co-authorship, and keyword co-occurrences. The results reveal a rapid growth from 1 publication in 2021 to 269 in 2025, with significant contributions from leading countries and institutions. Thematic analysis indicates a shift from conventional convolutional approaches toward transformer-based and self-supervised methods, alongside increasing attention to multimodal learning in cancer imaging tasks such as breast, lung, and brain cancer analysis. Overall, this study provides a structured overview of the evolving research landscape, highlighting key trends, emerging themes, and research gaps to inform future developments in multimodal artificial intelligence for cancer diagnosis.

Place, publisher, year, edition, pages
MDPI, 2026. Vol. 13, no 4, article id 466
Keywords [en]
vision-language models, multimodal AI, medical imaging, bibliometric analysis, cancer diagnosis
National Category
Cancer and Oncology Radiology and Medical Imaging
Identifiers
URN: urn:nbn:se:bth-29471DOI: 10.3390/bioengineering13040466ISI: 001749893400001Scopus ID: 2-s2.0-105037048265OAI: oai:DiVA.org:bth-29471DiVA, id: diva2:2057177
Available from: 2026-05-04 Created: 2026-05-04 Last updated: 2026-05-08Bibliographically approved

Open Access in DiVA

fulltext(30745 kB)8 downloads
File information
File name FULLTEXT01.pdfFile size 30745 kBChecksum SHA-512
0722e92010dd30a12949d38331dc68d830d05292f712686e38f15fe2db8a4688440e340bb307ea6550fa39c39dc3f4ddbb110e3fca387ae224a4482f47e57603
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Kusetogullari, Hüseyin

Search in DiVA

By author/editor
Kusetogullari, Hüseyin
By organisation
Department of Computer Science
In the same journal
Bioengineering
Cancer and OncologyRadiology and Medical Imaging

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 43 hits
67891011129 of 70
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf