Problem Statement

Preparing AI Datasets

Structured data is necessary for every machine learning task. It is a requirement in order to move forward with training and testing. Models cannot be trained on unstructured data, which highlights the significance of structured data. In any machine learning, deep learning, or AI challenge, it is therefore an essential task. Anything can be represented by raw data, including text and images. Precision is needed while arranging and labelling the data because inaccurate labelling will lead to inaccurate outcomes.

PS Number: PSAIML003

Domain Bucket: Artificial Intelligence
Category: Software
Dataset : NA

Build Solution that answer this type of question-:How can we convert, say 95% of unstructured data without any standard to structured data with standard through automated tools? ;How can we have a low code / no code solution in open source to prepare AI ready data?

Background of the Problem

Every Machine Learning task requires structured data. It is the basic need to proceed with training and testing. The importance of structured data can be seen from the fact that models cannot be trained on unstructured data. Hence, it is a vital task in any machine learning, deep learning/AI problem. Raw data can correspond to any thing – text, images…. Labelling the data/structuring it requires precision as failure in doing so will result in wrong labels thus wrong results. Labelling the data/structuring the data is the most painful aspect for any machine learning engineer, automating this will solve a huge problem.There is a requirement of a an automation tool(Web,App,Standalone tool) which takes as input the unstructured data and as output generates the structured data. 

Objective

The primary Objective of this is to make model that answerthis type of questions-:How can we convert, say 95% of unstructured data without any standard to structured data with standard through automated tools? ;How can we have a low code / no code solution in open source to prepare AI ready data?

Summary

The most challenging part of machine learning engineering is labelling and arranging data; automating this will solve a significant issue. There is a need for an automated tool (web, app, standalone application) that accepts unstructured data as input and produces structured data as output.