Automated Data Skew Profiler

  • Mishra, Harsh (CoI)
  • Jayant, S (CoI)

Project Details


The objective of the project is to analyze the impact of differently skewed data distributions on the most common database operations, namely, NORMALIZE, DENORMALIZE, JOIN, SORT, TABLE, and PROJECT using a set of queries, and analyzing their runtimes, and also to estimate the effective performance skew of a set of queries based on the data skew of the dataset on a multi-computing cluster The project aims to automate the process of skew prediction by analyzing the execution graphs of a job on the HPCC Systems cluster and predicting the probable performance skew for a given set of queries using a Random Forest Regressor Model.
Effective start/end date01/1/1812/31/18


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
  • Data Skew Profiling using HPCC Systems

    Mishra, H., Jayant, S., Chala, A., Camper, D., Shobha, G. & Shetty, J., Mar 30 2019, ICBDE 2019 - 2019 International Conference on Big Data and Education. p. 66-69 4 p. (ACM International Conference Proceeding Series).

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

  • Data Skewer Profile

    Villanustre, F., 2018

    Research output: Non-textual formSoftware