Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Exploring similarity patterns in a large scientific corpus
Linnaeus University.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.ORCID-id: 0000-0001-6745-4398
Linköping University.
Linnaeus University.
2025 (Engelska)Ingår i: PLOS ONE, E-ISSN 1932-6203, Vol. 20, nr 4, artikel-id e0321114Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Similarity-based analysis is a common and intuitive tool for exploring large data sets. For instance, grouping data items by their level of similarity, regarding one or several chosen aspects, can reveal patterns and relations from the intrinsic structure of the data and thus provide important insights in the sense-making process. Existing analytical methods (such as clustering and dimensionality reduction) tend to target questions such as “Which objects are similar?”; but since they are not necessarily well-suited to answer questions such as “How does the result change if we change the similarity criteria?” or “How are the items linked together by the similarity relations?” they do not unlock the full potential of similarity-based analysis—and here we see a gap to fill. In this paper, we propose that the concept of similarity could be regarded as both: (1) a relation between items, and (2) a property in its own, with a specific distribution over the data set. Based on this approach, we developed an embedding-based computational pipeline together with a prototype visual analytics tool which allows the user to perform similarity-based exploration of a large set of scientific publications. To demonstrate the potential of our method, we present two different use cases, and we also discuss the strengths and limitations of our approach. 

Ort, förlag, år, upplaga, sidor
Public Library of Science (PLoS), 2025. Vol. 20, nr 4, artikel-id e0321114
Nyckelord [en]
analytic method, article, dimensionality reduction, human
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:bth-27792DOI: 10.1371/journal.pone.0321114ISI: 001488705600008Scopus ID: 2-s2.0-105003254126OAI: oai:DiVA.org:bth-27792DiVA, id: diva2:1955873
Forskningsfinansiär
KK-stiftelsen, 20210077ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsTillgänglig från: 2025-05-02 Skapad: 2025-05-02 Senast uppdaterad: 2025-09-30Bibliografiskt granskad

Open Access i DiVA

fulltext(6016 kB)44 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 6016 kBChecksumma SHA-512
5601d751f4d6960d7414ba8454c598e4b7e3b80323fd46e887c99bec3c75d8302e3d6812052f7a299e1ded85b9ae6d8dd5d5f95c7a0237c7a9dd172af31fbe54
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Jusufi, Ilir

Sök vidare i DiVA

Av författaren/redaktören
Jusufi, Ilir
Av organisationen
Institutionen för datavetenskap
I samma tidskrift
PLOS ONE
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 47 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 268 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf