Research Articles

INDRA (INDian Regional Adapter)

Optimized Generative AI Model for Multi-Lingual Platforms in the Context of India's Linguistic Diversity

Vol. 21 No. 3 (2025)
Published: 05-09-2025
Shivani Yadao
Stanley College of Engineering and Technology for Women
Harshita Vyas
Stanley College of Engineering and Technology for Women

Introduction: India’s multilingual diversity poses significant challenges in natural language processing. INDRA introduces a unified generative AI framework optimized for multiple Indic languages.
Problem: Existing multilingual models underperform in handling low-resource Indic languages. There is a need for a more effective and scalable NLP architecture tailored to India’s linguistic landscape.
Objective: The study aims to develop and evaluate INDRA, a novel architecture that enhances multilingual NLP performance, especially for underrepresented Indic languages.
Methodology: INDRA integrates a shared encoder-decoder with language family-specific adapters, typological features, and hierarchical attention. It is benchmarked against mBART, IndicTrans2, MuRIL, and mT5 using standard NLP metrics.
Results: The experimental evaluation shows that INDRA outperforms all baseline models in accuracy, F1-score, BLEU, TER, and chrF++, particularly for low-resource languages.
Conclusion: INDRA proves to be an effective and efficient solution for multilingual NLP in India, offering impro-ved performance and scalability.
Originality: The architecture’s novel use of hierarchical attention and language-specific components tailored for Indic languages marks a significant innovation over existing models.
Limitations: The study focuses on textual datasets and does not yet address speech or multimodal processing within Indic languages.

Keywords: Indian Languages, Generative AI, INDRA, NLP, Multilingual translation

How to Cite

[1]
S. Yadao and H. Vyas, “INDRA (INDian Regional Adapter): Optimized Generative AI Model for Multi-Lingual Platforms in the Context of India’s Linguistic Diversity”, ing. Solidar, vol. 21, no. 3, pp. 1–26, Sep. 2025, doi: 10.16925/2357-6014.2025.03.10.

1. Khanuja, S., Bansal, D., Mehtani, S., Khosla, S., Dey, A., Gopalan, B., ... & Talukdar, P. (2021). MuRIL: Multilingual Representations for Indian Languages.

2. Gala, J., Chitale, P. A., Raghavan, A. K., Gumma, V., Doddapaneni, S., Kumar, A. & Kunchukuttan, A. (2023). IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages.

3. Kakwani, D., Kunchukuttan, A., Golla, S., Gokul, N.C., Bhattacharyya, A., Khapra, M. M., & Kumar, P. (2020). IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages.

4. Dabre, R., Shrotriya, H., Kunchukuttan, A., Puduppully, R., Khapra, M. M., & Kumar, P. (2022). IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages.

5. Bhat, S., & Varma, V. (2023). Generative Models For Indic Languages: Evaluating Content Generation Capabilities.

6. Ahuja, K., Diddee, H., Hada, R., Ochieng, M.,

Ramesh, K., Jain, P., ... & Sitaram, S. (2023). MEGA: Multilingual Evaluation of Generative AI.

7. Das, S., Panda, D., Mishra, T. K., Patra, B. K., & Ekbal, A. (2024). Multilingual Neural Machine Translation for Indic to Indic Languages.

8. Aggarwal, D., Gupta, V., & Kunchukuttan, A. (2022). INDICXNLI: Evaluating Multilingual Inference for Indian Languages.

9. Gautam, D., Kodali, P., Shrivastava, M., Gupta, K., Goel, A., & Kumaraguru, P. (2022). CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences.

10. Deb, B., Zheng, G., Shokouhi, M., & Awadallah, A. H. (2023). A Conditional Generative Matching Model for Multi-lingual Reply Suggestion.

11. KJ, S., Jain, V., Bhaduri, S., Roy, T., & Chadha, A. (2023). Decoding the Diversity: A Review of the Indic AI Research Landscape.

12. Popović, M. (2015). chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation.

13. Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., AK, R., Sharma, A., ... & Khapra, M. M. (2022). Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages. Transactions of the Association for Computational Linguistics, 10, 145-162.

14. Sai, A. B., Mohankumar, A. K., & Khapra, M.

M. (2022). A Survey of Evaluation Metrics Used for NLG Systems. ACM Computing Surveys, 55(2), 1-39.

15. Tang, Y., Tran, C., Li, X., Chen, P. J., Goyal, N., Chaudhary, V., & Johnson, M. (2021). Multilingual translation from denoising pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

16. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need in Advances in Neural Information Processing Systems (pp. 5998-6008).

17. Xue, L., Constant, N., Roberts, A., Kale, M.,

Al-Rfou, R., Siddhant, A., ... & Raffel, C. (2021). mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the

North American Chapter of the Association for Computational Linguistics.

18. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating text generation with BERT. In the International Conference on Learning Representations.

19. Khanuja, S., Bansal, D., Mehtani, S., Khosla, S., Dey, A., Gopalan, B., ... & Choudhary, S. (2021). MuRIL: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730.

20. Kunchukuttan, A., Mehta, P., & Bhattacharyya,

P. (2019). The IIT Bombay English-Hindi parallel corpus. arXiv preprint arXiv:1710.02855.

21. Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., ... & Zettlemoyer, L. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726–742.

22. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318)

MÉTRICAS
ARTICLE VIEWS: 72
PDF VIEWS: 59