Searching Inter-disciplinary Scientific Big Data Based on Latent Correlation Analysis
Gonzales Eloy in Proceedings of 2013 IEEE International Conference on Big Data.
In this paper, a novel cross-database search system (Cross-DB) is proposed. The aim of Cross-DB is to facilitate the search of interdisciplinary-correlated datasets from large-scale, multi-domain and heterogeneous data repositories. With conventional systems or portals for searching scientific datasets, the scientists must know the relation between the datasets in advance or must find their relations manually. In Cross-DB, the datasets search process is based on discovering an optimal combination of their multiple and latent associations such as spatio-temporal, ontological, and citational correlations based on evolutionary computing. The basic concepts of Cross-DB are introduced as well as its main components. Comparisons with an existing search engine based on a massive datasets repository demonstrate the feasibility and the correctness of the proposed framework. We show that offering to the user a full set composed of correlated datasets is a useful alternative to the classical ranking methods. Experimental result shows that our system can overachieve conventional portal search in terms of relevance and novelty.