INDRA (INDian Regional Adapter): Optimized Generative AI Model for Multi-Lingual Platforms in the Context of India's Linguistic Diversity

Shivani Yadao; Harshita  Vyas

doi:10.16925/2357-6014.2025.03.10

INDRA (INDian Regional Adapter)

Optimized Generative AI Model for Multi-Lingual Platforms in the Context of India's Linguistic Diversity

DOI: https://doi.org/10.16925/2357-6014.2025.03.10

Vol. 21 No. 3 (2025)

Published: 05-09-2025

Shivani Yadao

Stanley College of Engineering and Technology for Women

Harshita Vyas

Stanley College of Engineering and Technology for Women

Introduction: India’s multilingual diversity poses significant challenges in natural language processing. INDRA introduces a unified generative AI framework optimized for multiple Indic languages.
Problem: Existing multilingual models underperform in handling low-resource Indic languages. There is a need for a more effective and scalable NLP architecture tailored to India’s linguistic landscape.
Objective: The study aims to develop and evaluate INDRA, a novel architecture that enhances multilingual NLP performance, especially for underrepresented Indic languages.
Methodology: INDRA integrates a shared encoder-decoder with language family-specific adapters, typological features, and hierarchical attention. It is benchmarked against mBART, IndicTrans2, MuRIL, and mT5 using standard NLP metrics.
Results: The experimental evaluation shows that INDRA outperforms all baseline models in accuracy, F1-score, BLEU, TER, and chrF++, particularly for low-resource languages.
Conclusion: INDRA proves to be an effective and efficient solution for multilingual NLP in India, offering impro-ved performance and scalability.
Originality: The architecture’s novel use of hierarchical attention and language-specific components tailored for Indic languages marks a significant innovation over existing models.
Limitations: The study focuses on textual datasets and does not yet address speech or multimodal processing within Indic languages.

Keywords: Indian Languages, Generative AI, INDRA, NLP, Multilingual translation

PDF

How to Cite

[1]

S. Yadao and H. Vyas, “INDRA (INDian Regional Adapter): Optimized Generative AI Model for Multi-Lingual Platforms in the Context of India’s Linguistic Diversity”, ing. Solidar, vol. 21, no. 3, pp. 1–26, Sep. 2025, doi: 10.16925/2357-6014.2025.03.10.

Download Citation

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Cession of rights and ethical commitment

As the author of the article, I declare that is an original unpublished work exclusively created by me, that it has not been submitted for simultaneous evaluation by another publication and that there is no impediment of any kind for concession of the rights provided for in this contract.

In this sense, I am committed to await the result of the evaluation by the journal Ingeniería Solidaría before considering its submission to another medium; in case the response by that publication is positive, additionally, I am committed to respond for any action involving claims, plagiarism or any other kind of claim that could be made by third parties.

At the same time, as the author or co-author, I declare that I am completely in agreement with the conditions presented in this work and that I cede all patrimonial rights, in other words, regarding reproduction, public communication, distribution, dissemination, transformation, making it available and all forms of exploitation of the work using any medium or procedure, during the term of the legal protection of the work and in every country in the world, to the Universidad Cooperativa de Colombia Press.

References

1. Khanuja, S., Bansal, D., Mehtani, S., Khosla, S., Dey, A., Gopalan, B., ... & Talukdar, P. (2021). MuRIL: Multilingual Representations for Indian Languages.

2. Gala, J., Chitale, P. A., Raghavan, A. K., Gumma, V., Doddapaneni, S., Kumar, A. & Kunchukuttan, A. (2023). IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages.

3. Kakwani, D., Kunchukuttan, A., Golla, S., Gokul, N.C., Bhattacharyya, A., Khapra, M. M., & Kumar, P. (2020). IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.

4. Dabre, R., Shrotriya, H., Kunchukuttan, A., Puduppully, R., Khapra, M. M., & Kumar, P. (2022). IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages.

5. Bhat, S., & Varma, V. (2023). Generative Models For Indic Languages: Evaluating Content Generation Capabilities.

6. Ahuja, K., Diddee, H., Hada, R., Ochieng, M.,

Ramesh, K., Jain, P., ... & Sitaram, S. (2023). MEGA: Multilingual Evaluation of Generative AI.

7. Das, S., Panda, D., Mishra, T. K., Patra, B. K., & Ekbal, A. (2024). Multilingual Neural Machine Translation for Indic to Indic Languages.

8. Aggarwal, D., Gupta, V., & Kunchukuttan, A. (2022). INDICXNLI: Evaluating Multilingual Inference for Indian Languages.

9. Gautam, D., Kodali, P., Shrivastava, M., Gupta, K., Goel, A., & Kumaraguru, P. (2022). CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences.

10. Deb, B., Zheng, G., Shokouhi, M., & Awadallah, A. H. (2023). A Conditional Generative Matching Model for Multi-lingual Reply Suggestion.

11. KJ, S., Jain, V., Bhaduri, S., Roy, T., & Chadha, A. (2023). Decoding the Diversity: A Review of the Indic AI Research Landscape.

12. Popović, M. (2015). chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation.

13. Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., AK, R., Sharma, A., ... & Khapra, M. M. (2022). Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages. Transactions of the Association for Computational Linguistics, 10, 145-162.

14. Sai, A. B., Mohankumar, A. K., & Khapra, M.

M. (2022). A Survey of Evaluation Metrics Used for NLG Systems. ACM Computing Surveys, 55(2), 1-39.

15. Tang, Y., Tran, C., Li, X., Chen, P. J., Goyal, N., Chaudhary, V., & Johnson, M. (2021). Multilingual translation from denoising pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

16. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need in Advances in Neural Information Processing Systems (pp. 5998-6008).

17. Xue, L., Constant, N., Roberts, A., Kale, M.,

Al-Rfou, R., Siddhant, A., ... & Raffel, C. (2021). mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the

North American Chapter of the Association for Computational Linguistics.

18. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating text generation with BERT. In the International Conference on Learning Representations.

19. Khanuja, S., Bansal, D., Mehtani, S., Khosla, S., Dey, A., Gopalan, B., ... & Choudhary, S. (2021). MuRIL: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730.

20. Kunchukuttan, A., Mehta, P., & Bhattacharyya,

P. (2019). The IIT Bombay English-Hindi parallel corpus. arXiv preprint arXiv:1710.02855.

21. Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., ... & Zettlemoyer, L. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726–742.

22. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318)

MÉTRICAS

ARTICLE VIEWS: 72

PDF VIEWS: 59

https://plu.mx/plum/a/?doi=10.16925/2357-6014.2025.03.10