eprintid: 9858 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/00/98/58 datestamp: 2023-11-09 16:36:30 lastmod: 2023-11-09 16:36:30 status_changed: 2023-11-09 16:29:59 type: conference_item metadata_visibility: show creators_name: Amirthalingam, T. creators_name: Rais, H.M. title: Automated Table Partitioner (ATAP) in Apache Hive ispublished: pub keywords: Data mining; Data warehouses; Information retrieval; Predictive analytics; Query processing, Data-definition languages; Filesystem; Hadoop frameworks; Lexical analyzers; Market leader; Proof of concept; Scalable solution; User experience, Big data note: cited By 0; Conference of 4th International Conference on Computer and Information Sciences, ICCOINS 2018 ; Conference Date: 13 August 2018 Through 14 August 2018; Conference Code:141665 abstract: Big Data and Predictive Analytics have been a game-changing paradigm in academia and industry for the past decade, inspiring numerous efforts in multiple spaces. One of many such technologies is Hadoop, an open-sourced framework based on MapReduce for highly distributive and scalable solutions. As Hadoop became more popular, other technologies were built, making it an ecosystem by itself. Currently, there are hundreds of tools and utilities that add-on to the Hadoop framework, and Apache Hive is one of the most prominent options. Hive is built as a data warehousing layer that interacts with Hadoop and the underlying filesystem, HDFS. It quickly became the market leader in query processing as it provides better user experience than MapReduce. Nevertheless, it imposes rigid structures that are unyielding to the ever changing nature of data. This paper proposes a novel mean of automating the table partitioning in Hive. It includes a lexical analyzer that reads HiveQL queries and, in return, issues Data Definition Language (DDL) for table restructure if a particular column is read more than the user-set coefficient factor. Multiple experiment made for this research have returned results that further solidified this proof of concept for its feasibility, adaptability and usability. © 2018 IEEE. date: 2018 publisher: Institute of Electrical and Electronics Engineers Inc. official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85057106418&doi=10.1109%2fICCOINS.2018.8510580&partnerID=40&md5=351de50627f2c377035a254c9c734807 id_number: 10.1109/ICCOINS.2018.8510580 full_text_status: none publication: 2018 4th International Conference on Computer and Information Sciences: Revolutionising Digital Landscape for Sustainable Smart Society, ICCOINS 2018 - Proceedings refereed: TRUE isbn: 9781538647431 citation: Amirthalingam, T. and Rais, H.M. (2018) Automated Table Partitioner (ATAP) in Apache Hive. In: UNSPECIFIED.