TY - JOUR
T1 - Design and Implementation of HSQL
T2 - A SQL-like language for Data Analysis in Distributed Systems
AU - Bhadauria, Anurag Singh
AU - Bain, Atreya
AU - Shetty, Jyoti
AU - Shobha, G.
AU - Chala, Arjuna
AU - Clements, Jeremy
N1 - Publisher Copyright:
© 2021. All Rights Reserved.
PY - 2021
Y1 - 2021
N2 - In today's modern world, we're experiencing a substantial increase in the use of data in various fields, and this has necessitated the use of distributed systems to consume and process Big Data. Machine learning tends to benefit from the usage of Big Data, and the models generated from such techniques tend to be more effective. However, there is a steep learning curve to getting used to handling Big Data, as traditional data management tools fail to perform well. Distributed systems have become popular, where the task of data processing is split amongst various nodes in clusters. SQL, is a popular database management language popular to data scientists. It is often given second class support, where SQL can be embedded into a primary language of use (e.g. SQL in Scala for Spark), which allows for using SQL but one still needs to know the primary language of the platform (Scala, as per the example, or ECL in HPCC Systems). It may also be present as a supported language. In either case, using useful tooling such as Visualizing data and creating and using machine learning models become difficult, as the user needs to fall back to the primary language of the system. In the proposed work, a new SQL-like language, HSQL, an open source distributed systems solution, was developed for allowing new users to get used to its distributed architecture and the ECL language, with which it primarily operates with (which was chosen as a target). Additionally, a program that could translate HSQL-based programs to ECL for use was made. HSQL was made to be completely inter-compatible with ECL programs, and it was able to provide a compact and easy to comprehend SQL-like syntax for performing general data analysis, creation of Machine learning models and visualizations while allowing a modular structure to such programs.
AB - In today's modern world, we're experiencing a substantial increase in the use of data in various fields, and this has necessitated the use of distributed systems to consume and process Big Data. Machine learning tends to benefit from the usage of Big Data, and the models generated from such techniques tend to be more effective. However, there is a steep learning curve to getting used to handling Big Data, as traditional data management tools fail to perform well. Distributed systems have become popular, where the task of data processing is split amongst various nodes in clusters. SQL, is a popular database management language popular to data scientists. It is often given second class support, where SQL can be embedded into a primary language of use (e.g. SQL in Scala for Spark), which allows for using SQL but one still needs to know the primary language of the platform (Scala, as per the example, or ECL in HPCC Systems). It may also be present as a supported language. In either case, using useful tooling such as Visualizing data and creating and using machine learning models become difficult, as the user needs to fall back to the primary language of the system. In the proposed work, a new SQL-like language, HSQL, an open source distributed systems solution, was developed for allowing new users to get used to its distributed architecture and the ECL language, with which it primarily operates with (which was chosen as a target). Additionally, a program that could translate HSQL-based programs to ECL for use was made. HSQL was made to be completely inter-compatible with ECL programs, and it was able to provide a compact and easy to comprehend SQL-like syntax for performing general data analysis, creation of Machine learning models and visualizations while allowing a modular structure to such programs.
KW - ANTLR4
KW - HPCC
KW - Javascript
KW - Parser
KW - SQL
KW - big data
KW - context free grammar
KW - distributed systems
KW - language
KW - machine learning
KW - transpiler
UR - http://www.scopus.com/inward/record.url?scp=85121242075&partnerID=8YFLogxK
U2 - 10.14569/IJACSA.2021.0121190
DO - 10.14569/IJACSA.2021.0121190
M3 - Artículo
AN - SCOPUS:85121242075
SN - 2158-107X
VL - 12
SP - 796
EP - 803
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 11
ER -