HPCC Benchmarking

Rushikesh Ghatpande

Research output: Contribution to journalArticlepeer-review


High-performance Computing Cluster (HPCC) Thor is a big data analytics engine. It is designed to execute big data workflows including extraction, loading, cleansing, transformations, linking, and indexing. A Thor cluster is similar in its function, execution environment, filesystem, and capabilities to the Google and Hadoop MapReduce platforms. It can be optimized for its parallel data processing purpose. HPCC’s declarative and data flow-oriented language - ECL defines the processing result desired; the specific processing steps required to perform the processing are left to the language compiler. These salient features make it a viable alternative to established big data analytics engines like Spark and Hadoop. In this report, we have benchmarked HPCC alongside Hadoop. We present a detailed study of scalability, time for execution, and system-level metrics like CPU utilization, memory utilization and disk I/O patterns. We try to identify general trends in performance and which engine is better suited for a particular workload.
Original languageAmerican English
StatePublished - Feb 1 2019


Dive into the research topics of 'HPCC Benchmarking'. Together they form a unique fingerprint.

Cite this