The "BigSE" project: lessons learned from validating industrial text mining: Lessons learned from validating industrial text mining

Rahul Krishna, Zhe Yu, Amritanshu Agrawal, Manuel Dominguez, David Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

As businesses become increasingly reliant on big data analytics, it becomes increasingly important to test the choices made within the data miners. This paper reports lessons learned from the BigSE Lab, an industrial/university collaboration that augments industrial activity with low-cost testing of data miners (by graduate students).

BigSE is an experiment in academic/ industrial collaboration. Funded by a gift from LexisNexis, BigSE has no specific deliverables. Rather, it is fueled by a research question "what can industry and academia learn from each other?". Based on open source data and tools, the output of this work is (a) more exposure by commercial engineers to state-of-the-art methods and (b) more exposure by students to industrial text mining methods (plus research papers that comment on methods on how to improve those methods).

The results so far are encouraging. Students at BigSE Lab have found numerous "standard" choices for text mining that could be replaced by simpler and less resource intensive methods. Further, that work also found additional text mining choices that could significantly improve the performance of industrial data miners.
Original languageAmerican English
Title of host publicationBIGDSE '16 Proceedings of the 2nd International Workshop on BIG Data Software Engineering
PublisherAssociation for Computing Machinery, Inc
Pages65-71
Number of pages7
ISBN (Electronic)9781450341523
DOIs
StatePublished - 2016
Event2nd International Workshop on BIG Data Software Engineering, BIGDSE 2016 - Austin, United States
Duration: May 16 2016 → …

Publication series

NameProceedings - 2nd International Workshop on BIG Data Software Engineering, BIGDSE 2016

Conference

Conference2nd International Workshop on BIG Data Software Engineering, BIGDSE 2016
Country/TerritoryUnited States
CityAustin
Period05/16/16 → …

Keywords

  • E-Discovery
  • Software engineering
  • Testing

Fingerprint

Dive into the research topics of 'The "BigSE" project: lessons learned from validating industrial text mining: Lessons learned from validating industrial text mining'. Together they form a unique fingerprint.

Cite this