The 10th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (IEEE UEMCON 2019) held in Columbia University, NY in October 2019. The event provided an opportunity for researchers, educators and students to discuss and exchange ideas on issues, trends, and developments in Information Technology, Electronics and Mobile Communication. The conference brought together scholars from different disciplinary backgrounds to emphasize dissemination of ongoing research in the fields of in Information Technology, Electronics and Mobile Communication. The conference included a peer-reviewed program of technical sessions, special sessions, business application sessions, tutorials, and demonstration sessions.
The paper, A Hive and SQL Case Study in Cloud Data Analytics, was based on joint work between MS in Computer Science alum Shireesha Chandra ’12, and Drs. Aparna Varde and Jiayin Wang.
The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.