Projects per year
Abstract
The proliferation of Big Data processing environments such as Hadoop, Apache Spark, and HPCC Systems is driving the development of performance analysis tools in these distributed systems. The goal is to achieve high performance through the optimization of Big Data applications. However, tuning performance in a fine-grained manner is quite challenging due to the high complexity and massive size of the distributed systems. ECL-Watch is a data-flow based fine-grained comprehensive Big Data performance analysis tool utilizing the high level declarative dataflow programming language ECL in HPCC Systems. As a case study, we implement and optimize the Yinyang K-Means machine learning algorithm in ECL in HPCC Systems. The experimental results show that the performance of the native ECL version of the Yinyang K-Means algorithm increased significantly after tuning: from being about three times slower than the standard K-Means implementation in ECL, to become roughly 15% faster than standard K-Means.
| Original language | American English |
|---|---|
| Title of host publication | IEEE International Conference |
| Editors | Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 2941-2950 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781538627143 |
| DOIs | |
| State | Published - 2017 |
| Event | 5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States Duration: Dec 11 2017 → Dec 14 2017 |
Publication series
| Name | Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 |
|---|---|
| Volume | 2018-January |
Conference
| Conference | 5th IEEE International Conference on Big Data, Big Data 2017 |
|---|---|
| Country/Territory | United States |
| City | Boston |
| Period | 12/11/17 → 12/14/17 |
Keywords
- Big Data
- Distributed Computing
- HPCC Systems
- Machine Learning
- Performance Analysis
- Tuning and Optimization
Fingerprint
Dive into the research topics of 'ECL-watch: A big data application performance tuning tool in the HPCC systems platform: A big data application performance tuning tool in the HPCC systems platform'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Teaching and research with LexisNexis HPCC systems with Clemson University
Xu, L. (CoI) & Apon, A. (PI)
01/1/18 → 12/1/18
Project: Research