M365 Research banner: network of connected points

Efficient AI applications: context engineering and agents

Modern AI systems face a dual challenge: delivering high‑quality outputs while staying cost- and latency‑efficient. Every token processed and every millisecond of compute impacts scalability, user experience, and sustainability. Efficiency isn’t just an optimisation, it’s a design principle that makes AI applications feasible and scalable.

Efficient AI applications start with the right context. Identifying relevant information, reducing redundancy, and maintaining long-term memory are key to effective performance. Our research combines structured and unstructured context pruning, hybrid retrieval, and intelligent compression to minimize unnecessary tokens without losing utility. These techniques improve quality‑per‑dollar for complex workflows and enable more predictable latency over long sessions.

Example: LLMLingua2 and TACO-RL prompt compression algorithms for efficient context engineering

Beyond context, we build efficient agents that make smart decisions about tools, compute, and memory. These agents plan, route, and execute tasks with minimal overhead, leveraging long‑horizon memory and selective model pathways to reduce redundant steps and optimise resource use.

From context engineering to efficient agentic workflows, our goal is simple: AI applications that do more with fewer resources — fast, reliable, and ready for real‑world scale.