Abdullahi, A.U. and Ahmad, R. and Zakaria, N.M. (2016) Big data: Performance profiling of Meteorological and Oceanographic data on Hive. In: UNSPECIFIED.
Full text not available from this repository.Abstract
The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE.
Item Type: | Conference or Workshop Item (UNSPECIFIED) |
---|---|
Additional Information: | cited By 3; Conference of 3rd International Conference on Computer and Information Sciences, ICCOINS 2016 ; Conference Date: 15 August 2016 Through 17 August 2016; Conference Code:125433 |
Uncontrolled Keywords: | Benchmarking; Data warehouses; Gas industry; Information science; Public utilities; Query processing; Response time (computer systems), Data technologies; Data tools; Hadoop; Hive; Information technology industry; Meteorological and oceanographic data; Oil and gas companies; Query types, Big data |
Depositing User: | Mr Ahmad Suhairi UTP |
Date Deposited: | 09 Nov 2023 16:18 |
Last Modified: | 09 Nov 2023 16:18 |
URI: | https://khub.utp.edu.my/scholars/id/eprint/6478 |