Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Visually guided extraction of prevalent topics
Linnaeus University.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.ORCID-id: 0000-0001-6745-4398
Linköping University.
Linnaeus University.
2025 (Engelska)Ingår i: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 24, nr 2, s. 179-198Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The sensemaking process of large sets of text documents is highly challenging for tasks such as obtaining a comprehensive overview or keeping up with the most important trends and topics. Even though several established methods for condensation and summarization of large text corpora exist, many of them lack the ability to account for difference in prevalence between identified topics, which in turn impedes quantitative analysis. In this paper, we therefore propose a novel prevalence-aware method for topic extraction, and show how it can be used to obtain important insights from two text corpora with very different content. We also implemented a prototype visual analytics tool which guides the user in the search for relevant insights and promotes trust in the yielded results. We have verified our application by a user study, as well as by a validation run on a data set with previously known topic structure. The results clearly show that our approach is suitable for text mining, that it can be used by non-experts, and that it offers features which makes it an interesting candidate for use in several different analysis scenarios. 

Ort, förlag, år, upplaga, sidor
Sage Publications, 2025. Vol. 24, nr 2, s. 179-198
Nyckelord [en]
similarity calculations, text embedding, text mining, topic modeling, Visual analytics, Embeddings, Sense making, Similarity calculation, Text corpora, Text document, Text-mining, Topic extraction
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:bth-27447DOI: 10.1177/14738716241312400ISI: 001408697200001Scopus ID: 2-s2.0-105001067590OAI: oai:DiVA.org:bth-27447DiVA, id: diva2:1936281
Projekt
Rekrytering 21
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKK-stiftelsen, 20210077Tillgänglig från: 2025-02-10 Skapad: 2025-02-10 Senast uppdaterad: 2025-09-30Bibliografiskt granskad

Open Access i DiVA

fulltext(4278 kB)95 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 4278 kBChecksumma SHA-512
b51ebc8b6dbb8c873e4bb7a4e30ba00c29ec34c5a7123e55d33341ab686d8be92b334f6b50e5e06db251e9ea2a75abbde1f3934e4ceb8548c95d300f45ecf06e
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Jusufi, Ilir

Sök vidare i DiVA

Av författaren/redaktören
Jusufi, Ilir
Av organisationen
Institutionen för datavetenskap
I samma tidskrift
Information Visualization
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 95 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 1606 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf