The objective of the project is to analyze the impact of differently skewed data distributions on the most common database operations, namely, NORMALIZE, DENORMALIZE, JOIN, SORT, TABLE, and PROJECT using a set of queries, and analyzing their runtimes, and also to estimate the effective performance skew of a set of queries based on the data skew of the dataset on a multi-computing cluster The project aims to automate the process of skew prediction by analyzing the execution graphs of a job on the HPCC Systems cluster and predicting the probable performance skew for a given set of queries using a Random Forest Regressor Model.
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Mishra, H., Jayant, S., Chala, A., Camper, D., Shobha, G. & Shetty, J., Mar 30 2019, ICBDE 2019 - 2019 International Conference on Big Data and Education.p. 66-694 p. (ACM International Conference Proceeding Series).
Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review