Extending Current ML Library with LexisNexis HPCC Systems

  • Villanustre, Flavio (CoI)
  • Yatish, HR (CoI)
  • Shubham, Phal (CoI)
  • Hukkeri, Tanmay (CoI)
  • Suryanarayana, A (CoI)

Project Details

Description

This project involved implementing the DBSCAN algorithm on a multi-node setup. The approach was to leverage several of ECL’s existing paradigms for distributed computing. The high dimensional data was first sprayed onto Thor. Next, local clustering was performed at each HPCC node and the results were stored in a record structure. Finally the local clusters across nodes were merged with a tree-based union find data structure. An ECL interface was created to abstract the implementation and to provide users with the option to choose from a multitude of distance metrics. The algorithm was compared against the standard implementations provided by the python machine learning packages such as sci-kit. The results showed significant gains in speedup with no dip in accuracy.
StatusFinished
Effective start/end date01/1/1912/31/19

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
  • Massively Scalable Image Processing on the HPCC Systems Big Data Platform

    Hukkeri, T., Shobha, G., Shubham, P., Shetty, J., Yatish, H. & Naweed, M., 2020, In: ICPS Proceedings.

    Research output: Contribution to journalArticlepeer-review

    Open Access
    2 Scopus citations
  • DBSCAN

    Yatish, H. (Photographer), 2019

    Research output: Non-textual formSoftware