I am a principal applied scientist in the applied sciences – CISL group of MSR India, Bangalore, and am the technical lead for the members of this group. The group consists of applied scientists, research development software engineers and research fellows. The focus of the group is to develop: (a) machine learning and statistics based solutions for various applications and (b) algorithms for large scale machine learning. Applications that we have been working include: (a) service analytics covering a broad range of applications including service monitoring and diagnosis, intrusion detection (network security), etc., and (b) enterprise search and enabling mechanism for collaborative tagging.
Our current focus is in building service monitoring and diagnostic tools powered at the backend by machine learning and statistics based algorithms. Our primary area of research is in identifying useful patterns from high dimensional time series data and relating discovered patterns for diagnostic aids. We are implementing an ML stack that is easy to use for service monitoring/diagnostic team members having limited or no machine learning background.
My research interests are primarily in the areas of ML applications, large scale machine learning, numerical optimization and data mining. Recently, I have started to explore the space of designing deep neural nets (more specifically, LSTMs and convolution neural networks) that are interpretable for text applications.
Under this theme, we conduct applied research focusing on development of machine learning and data mining tools to address a broad range of problems and applications. Our application focus is currently on monitoring services and security.
Service monitoring is an important problem to be addressed in order to ensure high quality service in applications such as distributed compute/storage platform services, as offered on cloud or On Premise scenarios. This involves analysing and deriving insights from different high volume data sources such as high dimensional time series data and service logs. We are developing a generic system that can detect unusual or anomalous patterns, relate to service level issues, etc. In these problems, domain knowledge plays a crucial role and our framework can take such knowledge into account.
Intrusion detection is an important problem to be addressed in order to ensure secure networks. The problems that we study include analysing user sessions in a large network of machines and how they relate to each other when a hacker moves from one machine to another over a period of time. The scale of the problem is so huge with several billions of user sessions happening everyday, typical of a large company. We are developing scalable machine learning algorithms to rank unusual or anomalous sessions and graphs of connected sessions that handle such large volume of data.
Distributed Machine Learning
Scalable machine learning over big data is an important problem as the volume of data collected is ever-growing in many different applications. To analyse or build classifier models on such data quickly we often require distributed compute/storage environments. One popular distributed environments is Hadoop running on a cluster of commodity machines. In such environments, communications costs can be prohibitively high. Therefore, there is a need to develop efficient algorithms that trades off communication and computation costs. We have developed several algorithms to address these requirements for training linear and non-linear classifier models.