Project Details
Description
This project involved implementing the DBSCAN algorithm on a multi-node setup. The approach was to leverage several of ECL’s existing paradigms for distributed computing. The high dimensional data was first sprayed onto Thor. Next, local clustering was performed at each HPCC node and the results were stored in a record structure. Finally the local clusters across nodes were merged with a tree-based union find data structure. An ECL interface was created to abstract the implementation and to provide users with the option to choose from a multitude of distance metrics. The algorithm was compared against the standard implementations provided by the python machine learning packages such as sci-kit. The results showed significant gains in speedup with no dip in accuracy.
| Status | Finished |
|---|---|
| Effective start/end date | 01/1/19 → 12/31/19 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
-
Massively Scalable Image Processing on the HPCC Systems Big Data Platform
Hukkeri, T., Shobha, G., Shubham, P., Shetty, J., Yatish, H. & Naweed, M., 2020, In: ICPS Proceedings.Research output: Contribution to journal › Article › peer-review
Open Access2 Link opens in a new tab Scopus citations -
Press/Media
-
Academic Program Spotlight - HSQL, Generative Adversarial Networks and the DBSCAN clustering algorithm
06/18/20
1 Media contribution
Press/Media