Auto-Detection of Field-Level Dependencies in Data Workflow on a Distributed Platform

Y. Surya, Sumanth Hegde, Jyothi Shetty, G. Shobha, Dan Camper

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the observed significant rise in the use of data across a variety of industries in the modern world, distributed systems are now required to process and consume Big Data. HPCC (High-Performance Computing Clusters) system is an open-source data lake platform built for high-speed large-volume data engineering. Enterprise Control Language (ECL) is a declarative language specifically designed for huge data projects on the HPCC system platform. Large amounts of data are processed on a regular basis using HPCC systems, In the proposed work an approach to understand and interpret the data flow within an ECL program is investigated. The current system renders an XML graph, which shows the operations at dataset level which can be viewed in the ECL Watch, an interactive web application developed by HPCC systems. As the data changes of individual fields within dataset are not represented, the proposed work field level data and dependencies within datasets are tracked and their changes and operations are visualized as a directed acyclic graph for a generic ECL program to understand its data workflow. The core of this project relies on parsing the ECL IR (Intermediate Representation) emitted by the ECL compiler. The IR generated is transformed into graphical format. The system was tested against sample ECL programs available in ECL watch and other programs available in the platform regression tests and it provided a simple easy to comprehend data flow visualization.

Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on Soft Computing for Problem Solving - SocProS 2023
EditorsMillie Pant, Kusum Deep, Atulya Nagar
PublisherSpringer Science and Business Media Deutschland GmbH
Pages373-385
Number of pages13
ISBN (Print)9789819731794
DOIs
StatePublished - 2024
Event12th International Conference on Soft Computing for Problem Solving, SocProS 2023 - Roorkee, India
Duration: Aug 10 2023Aug 12 2023

Publication series

NameLecture Notes in Networks and Systems
Volume994 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference12th International Conference on Soft Computing for Problem Solving, SocProS 2023
Country/TerritoryIndia
CityRoorkee
Period08/10/2308/12/23

Keywords

  • Big data
  • Distributed systems
  • Graph Representation
  • HPCC systems
  • Parser

Fingerprint

Dive into the research topics of 'Auto-Detection of Field-Level Dependencies in Data Workflow on a Distributed Platform'. Together they form a unique fingerprint.

Cite this