A corpus-based morpho-syntactic analysis of ‘kare’ in Bangla: Theory and applications

Priyanka Biswas, Kalika Bali, Monojit Choudhury

Proceedings of the International Conference On NLP (ICON) |

করে (kare) is one of the most frequent words in Bangla corpus, which exhibits various morpho-syntactic functions. Morphological-ly, the word can be analyzed as a noun (meaning “hand”, “tax”, etc.) with a locative case-marker, a finite verb (meaning “do” or “does”) as well as a non-finite form of “do” (meaning “having done”). However, owing to various functional modifications this particu-lar lexical item can also be used as a postpo-sition and particle. Due to this extremely va-riable behavior, the lexical item “kare‟ is problematic for several NLP tasks and, there-fore, calls for a special treatment. In this pa-per we investigate the various distributions and functions of „kare‟ and identify eleven basic morpho-syntactic categories covering these various functions. On the basis of di-achronic and synchronic evidence, we show how these various functions of “kare‟ can be explained by positing etymological homo-morphism and/or functional diversification. Further, we propose suggestions for dealing with “kare‟ during morphological analysis, parts-of-speech tagging, chunking and other advanced NLP tasks.