There are several tools and methods available in the space of data management, preparation, and cleaning. This is natural considering that it is both necessary and time-consuming. The product range spans from complete AI platforms to simple algorithms and scripts. However, data processing frameworks that cover all steps from raw time series measurement to structured and processed datasets are rare and insufficient.
In the academic literature, much of the focus is placed on outlier detection, missing values, and data imputation. A recent survey by Wang and Wang lists methods and implementations currently regarded as the cutting edge. Wang and Wang write "There are many tools or systems for data cleaning, but they are not effective on time series cleaning problems.", and emphasize the need for more research and tools. They highlight Cleanits by Ding et al., which focuses on addressing missing, inconsistent, or abnormal values.
A tool for data alignment, summarization, and processing similar to Squashy was not found during the writing of this project proposal. Nor has it been found during any of the patent applications filed related to the Squashy technology.
We will hence adapt and 'productify' our proprietary production data mining tool (Squashy) for industries beyond Oil & Gas. The new standalone tool will support data scientists at industrial companies to leverage and operationalize the potential of their data, while also catering to data scientists at third-party AI/data science companies providing analytics services to such industrial companies.
Squashy has been in operation since 2014 within O&G and shows great potential for applicability in other industries. As a first step in 2022, we are looking to expand into adjacent industries with similar sensor data workflows (mapped with our partners at the Boston Consulting Group) via questionnaires and meetings with BCG customers and Solution Seeker leads in different countries.
Businesses today are awash with data but are not able to extract real value without proper data preparation. This process unnecessarily consumes a significant share of data scientists’ time (80% according to a recent analysis by The Economist).
Solution Seeker's proprietary data mining tool (Data Janitor) has two main features; firstly it automates and quality-controls the data mining and, secondly, it provides higher quality models and results due to a framework for implementing domain-specific processing techniques, like event categorization, alignment and summarization.
The Eurostars funding enables us to further productize our Data Janitor into a standalone product and also generalize it into a framework that can address more types of industrial data in addition to our core domain (Oil & Gas). Today we are using it as an internal tool for our own data scientists, but we truly see the potential benefit of extracting it as a standalone product catering to different heavy industries dealing with time series streamed in real-time from sensors.
Together with our partners at The Boston Consulting Group (BCG) in France, we discovered that data mining is still a hurdle for many data scientists and we believe further development of our Data Janitor can address some of these issues in a meaningful way. BCG is helping us to identify industries, use cases and customers through their unique network worldwide so that we can jointly pilot the technology.
We are excited to get to work on this project with our partners BCG France, and want to thank EU Horizon Europe/ Eurostars and the Norwegian Research Council for granting us funding.
Evaluated by 3 independent experts, our project application achieved a score of 49 out of 54, and was summarized as follows:
“The project reflects the deep knowledge about data science solutions in terms of the data mining process, data preparation, and deployment. (...)To my best knowledge, this is a state-of-the-art proposal.”