The "BigSE" project: lessons learned from validating industrial text mining

Rahul Krishna, Zhe Yu, Amritanshu Agrawal, Manuel Dominguez, David Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

As businesses become increasingly reliant on big data analytics, it becomes increasingly important to test the choices made within the data miners. This paper reports lessons learned from the BigSE Lab, an industrial/university collaboration that augments industrial activity with low-cost testing of data miners (by graduate students).

BigSE is an experiment in academic/ industrial collaboration. Funded by a gift from LexisNexis, BigSE has no specific deliverables. Rather, it is fueled by a research question "what can industry and academia learn from each other?". Based on open source data and tools, the output of this work is (a) more exposure by commercial engineers to state-of-the-art methods and (b) more exposure by students to industrial text mining methods (plus research papers that comment on methods on how to improve those methods).

The results so far are encouraging. Students at BigSE Lab have found numerous "standard" choices for text mining that could be replaced by simpler and less resource intensive methods. Further, that work also found additional text mining choices that could significantly improve the performance of industrial data miners.
Original languageAmerican English
Title of host publicationBIGDSE '16 Proceedings of the 2nd International Workshop on BIG Data Software Engineering
DOIs
StatePublished - 2016

Fingerprint Dive into the research topics of 'The "BigSE" project: lessons learned from validating industrial text mining'. Together they form a unique fingerprint.

  • Cite this

    Krishna, R., Yu, Z., Agrawal, A., Dominguez, M., & Wolf, D. (2016). The "BigSE" project: lessons learned from validating industrial text mining. In BIGDSE '16 Proceedings of the 2nd International Workshop on BIG Data Software Engineering https://doi.org/10.1145/2896825.2896836