Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Instance based Approach to Find the Types of Correspondence between the Attributes of Heterogeneous Datasets
Blekinge Institute of Technology, School of Computing.
Blekinge Institute of Technology, School of Computing.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Abstract [en]

Context: Determining attribute correspondence is the most important, time consuming and knowledge intensive part during databases integration. It is also used in other data manipulation applications such as data warehousing, data design, semantic web and e-commerce. Objectives: In this thesis the aim is to investigate how to find the types of correspondence between the attributes of heterogeneous datasets when schema design information of the data sets is unknown. Methods: A literature review was conducted to extract the knowledge related to the approaches that are used to find the correspondence between the attributes of heterogeneous datasets. Extracted knowledge from the literature review is used in developing an instance based approach for finding types of correspondence between the attributes of heterogeneous datasets when schema design information is unknown. To validate the proposed approach an experiment was conducted in the real environment using the data provided by the Telecom Industry (Ericsson) Karlskrona. Evaluation of the results was carried using the well known and mostly used measures from information retrieval field precision, recall and F-measure. Results: To find the types of correspondence between the attributes of heterogeneous datasets, good results depend on the ability of the algorithm to avoid the unmatched pairs of rows during the Row Similarity Phase. An evaluation of proposed approach is performed via experiments. We found 96.7% (average of three experiments) F-measure. Conclusions: The analysis showed that the proposed approach was feasible to be used and it provided users a mean to find the corresponding attributes and the types of correspondence between corresponding attributes, based on the information extracted from the similar pairs of rows from the heterogeneous data sets where their similarity based on the same common primary keys values.

Place, publisher, year, edition, pages
2012. , 60 p.
Keyword [en]
Attribute Correspondence, Heterogeneous databases schema matching, Instance based matching.
National Category
Mathematics Computer Science
Identifiers
URN: urn:nbn:se:bth-1938Local ID: oai:bth.se:arkivexB0A5519C99CF7CE0C1257ABB0059410BOAI: oai:DiVA.org:bth-1938DiVA: diva2:829195
Uppsok
Physics, Chemistry, Mathematics
Supervisors
Available from: 2015-04-22 Created: 2012-11-19 Last updated: 2015-06-30Bibliographically approved

Open Access in DiVA

fulltext(1854 kB)71 downloads
File information
File name FULLTEXT01.pdfFile size 1854 kBChecksum SHA-512
276ccbb6234cad88d1ea95807f1f30a02981687fe56e2bcc239932c62744bfcaefddda79e1f33412bd24d2b7fcbf2d99427608fc17cc3194f2372d1e385dbd97
Type fulltextMimetype application/pdf

By organisation
School of Computing
MathematicsComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 71 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 35 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf