Research & Publication

Abstract:Duplicate means representing two real world objects to the same entity. Now XML is used for data transmission in web, presence of duplicates is the major problem that faced on XML mining. Due to the wide use of XML we have to identify the duplicates init that may reduce the quality of data. By recognizing and eliminating duplicates in XML data could be the solution. For this a strategy based on Bayesian Network called XMLDup to detect duplicates is currently used. Here introduce a new genetic based approach for xml duplicate detection and using MBAT, a swam intelligence algorithm for optimizing this for improving its efficiency which shows high performance as compared with XMPDup.