Abstract
High-performance Computing Cluster (HPCC) Thor is a big data analytics engine. It is designed to execute big data workflows including extraction, loading, cleansing, transformations, linking, and indexing. A Thor cluster is similar in its function, execution environment, filesystem, and capabilities to the Google and Hadoop MapReduce platforms. It can be optimized for its parallel data processing purpose. HPCC’s declarative and data flow-oriented language - ECL defines the processing result desired; the specific processing steps required to perform the processing are left to the language compiler. These salient features make it a viable alternative to established big data analytics engines like Spark and Hadoop. In this report, we have benchmarked HPCC alongside Hadoop. We present a detailed study of scalability, time for execution, and system-level metrics like CPU utilization, memory utilization and disk I/O patterns. We try to identify general trends in performance and which engine is better suited for a particular workload.
Original language | American English |
---|---|
Journal | Archived |
State | Published - Feb 1 2019 |