Problem Statement

Analysis of Hinglish Content

This problem statement’s main goal is to analyses non-standard Hindi and English social media messages in order to ascertain their meaning. The transition from desktops and laptops to smartphones has, in part, influenced how individuals engage on social media platforms during the past few years.

PS Number: PSAIML001

Domain Bucket: Artificial Intelligence
Category: Software
Dataset : NA

Create a system that can comprehend, correctly interpret text written in “Hinglish,” and take into account the other subtleties mentioned. The English corresponding sentence should be the output. It is not advised to use outside translation and transliteration services.
There will be some example Hinglish sentences supplied. If further information is needed to solve this problem, participants may utilise it.

Background of the Problem

The primary objective of this problem statement is to analyse social media posts in non-standard English and Hindi and understand its meaning. The way people communicate on social media platforms has evolved over the last few years, driven, to some extent, by the shift from desktops and laptops to smartphones.

Objective

Build a solution that is able to understand and accurately interpret content written in “Hinglish”, as well as address the other nuances discussed. The output should be the equivalent sentence in English. Use of external translation and transliteration services is not recommended.

Some indicative Hinglish sentences shall be provided. Participants may use additional data, if required, for working on this problem statement.

Summary

People are a lot more casual and concise with their social media posts, and do not feel the need to write fully-formed and grammatically correct sentences in any particular language, when posting on social media. Newer forms of communication, such as mixing Hindi and English together as part of the same sentence and using words from either language interchangeably, appear to have become quite normal now. For example, “hum kal ghumne ja rahe hai ” written out using English alphabets, or “hum tomorrow Dilli jayenge”, as well as “?? afternoon 3pm ????? ??” which uses a combination of Hindi and English – sometimes referred to as “Hinglish”.

Further, commonly used short notations like LOL (short of laughing out loud), ROTFL (short of rolling on the floor laughing) etc., as well as any colloquial slangs or lingo, are also commonly used now-a-days.

Techniques such as NLP, which stands for Natural Language Processing, and NLU, which stands for Natural Language Understanding, are not able to entirely understand such content, and require a lot of training.