Evaluating Inter-Bilingual Semantic Parsing for Indian Languages

About

Despite significant progress in Natural Language Generation for Indian languages (Indic-NLP), there is a lack of datasets around complex structured tasks such as semantic parsing. One reason for this imminent gap is the complexity of the logical form, which makes English to multilingual translation difficult. The process involves alignment of logical forms, intents and slots with translated unstructured utterance. To address this, we propose an Inter-bilingual Seq2seq Semantic parsing dataset IE-SEMPARSE for 11 distinct Indian languages. We highlight the proposed task’s practicality, and evaluate existing multilingual seq2seq models across several train-test strategies. Our experiment reveals a high correlation across performance of original multilingual semantic parsing datasets (such as mTOP, multilingual TOP and multiATIS++) and our proposed IE-SEMPARSE suite.

People

The following people have worked on the paper, "Evaluating Inter-Bilingual Semantic Parsing for Indian Languages":

From left to right, Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan

Citation

Please cite our paper as below.

@inproceedings{aggarwal-etal-2023-evaluating,
			title = "Evaluating Inter-Bilingual Semantic Parsing for {I}ndian Languages",
			author = "Aggarwal, Divyanshu and
			Gupta, Vivek and
			Kunchukuttan, Anoop",
			booktitle = "Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)",
			month = jul,
			year = "2023",
			address = "Toronto, Canada",
			publisher = "Association for Computational Linguistics",
			url = "https://aclanthology.org/2023.nlp4convai-1.9",
			pages = "102--122",
			abstract = "Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack
			of datasets around complex structured tasks such as semantic parsing. One reason for this imminent gap is the complexity
			of the logical form, which makes English to multilingual translation difficult. The process involves alignment of
			logical forms, intents and slots with translated unstructured utterance. To address this, we propose an Inter-bilingual
			Seq2seq Semantic parsing dataset IE-SemParse Suite for 11 distinct Indian languages. We highlight the proposed task{'}s
			practicality, and evaluate existing multilingual seq2seq models across several train-test strategies. Our experiment
			reveals a high correlation across performance of original multilingual semantic parsing datasets (such as mTOP,
			multilingual TOP and multiATIS++) and our proposed IE-SemParse suite.",
			}
			

Acknowledgement

We express our gratitude to Nitish Gupta from Google Research India for his invaluable and insightful suggestions aimed at enhancing the quality of our paper.Divyanshu Aggarwal acknowledges all the support from Amex, AI LabsAuthors thank members of the Utah NLP group for their valuable insights and suggestions at various stages of the project, and reviewers for their helpful comments. Additionally, we appreciate the inputs provided by Vivek Srikumar and Ellen Riloff. Vivek Gupta acknowledges support from Bloomberg's Data Science Ph.D. Fellowship.