Back

Languages & Frameworks

Download PDF

The frameworks, libraries, and protocols that underpin AI development. These are the software foundations your applications are built with.

Adopt

Mature, well-supported technologies ready for production use.

PyTorch

PyTorch has demonstrated consistent maturity and widespread adoption across both research and production environments, earning its place in our Adopt ring. We’re seeing it emerge as the default choice for many machine learning teams, particularly those working on deep learning projects, thanks to its intuitive Python-first approach and dynamic computational graphs that make debugging and prototyping significantly easier.

The framework’s robust ecosystem, exceptional documentation and strong community support make it a reliable choice for teams at any scale. While TensorFlow remains relevant, particularly in production deployments, PyTorch’s seamless integration with popular machine learning tools, extensive pre-trained model repository and growing deployment options through TorchServe have addressed previous concerns about production readiness. The framework’s adoption by major technology organisations and research institutions, coupled with its regular release cycle and stability, gives us confidence in recommending it as a default choice for new machine learning projects.

dbt

We’ve placed dbt (data build tool) in the Adopt ring because it has proven to be an essential framework for organising and managing the data transformations that feed AI systems. dbt brings software engineering best practices such as version control and testing to data transformation workflows, which is crucial when preparing data for AI model training and inference.

The reliability and maintainability of AI systems heavily depend on the quality of their input data, and dbt helps teams achieve this by making data transformations more transparent and trustworthy. We’ve seen teams successfully use dbt to create clean, well-documented data pipelines that connect data warehouses to AI applications, while maintaining the agility to quickly adapt to changing requirements. Its integration with modern data platforms and strong community support make it a solid choice for organisations building out their AI infrastructure.

MCP

Anthropic’s Model Context Protocol (MCP) has rapidly gained adoption, addressing the need for standardised integration between language models and external tools. MCP solves the persistent problem of connecting AI models to organisational data without requiring custom integration work for each connection. MCP servers are straightforward to create (our teams have built functional servers within hours) and the growing ecosystem of community-created servers reduces development overhead further.

Since our last radar, we’ve seen rapid uptake within organisations. Some are pursuing ambitious goals of making all internal APIs AI-accessible via MCP servers, creating a unified interface through which AI assistants interact with enterprise systems. Teams should be realistic about the implementation investment required.

Security must be a first-class concern. Every MCP server should enforce authentication and authorisation independently of the calling model, and tool grants should follow the principle of least privilege. Audit logging of all tool calls is essential for traceability. MCP servers returning untrusted data can become an indirect prompt injection vector, so outputs from external sources need careful sanitisation before being fed back into model context.

A broader architectural concern surfaced in April 2026 when OX Security demonstrated that MCP’s STDIO transport executes arbitrary commands without validation, leading to multiple CVEs across downstream projects. More concerning was their proof of concept showing malicious entries accepted by most MCP registries. Teams sourcing MCP servers from public marketplaces should treat them with the same caution as any third-party dependency: review the code, pin versions and run servers with minimal permissions.

For simpler workflows that operate on local files and code rather than external services, Claude Skills offer a lighter-weight alternative worth considering before committing to MCP server development.

See also: Claude Skills, Agentic tool use

Trial

Promising technologies with growing adoption that are worth exploring in production-adjacent settings.

Microsoft Agent Framework

Microsoft Agent Framework (MAF) launched in October 2025, merging AutoGen’s agent abstractions with Semantic Kernel’s enterprise features into a single open-source platform. MAF supports Python and .NET, offering graph-based workflows for multi-agent orchestration alongside session-based state management and telemetry.

We’ve placed it in Trial rather than Adopt because the framework is still reaching stability. That said, it consolidates Microsoft’s previously fragmented agent story, and organisations building multi-agent systems on Microsoft’s stack should evaluate it as their primary framework.

As with any agent framework granting tool access, apply least-privilege scoping and build in prompt injection awareness from the start.

See also: Agentic tool use, MCP, A2A

A2A

Google’s Agent2Agent (A2A) protocol addresses the need for standardised communication between AI agents. Launched in April 2025 and now governed by the Linux Foundation, A2A enables agents from different providers to discover capabilities and collaborate without custom integration.

The protocol complements MCP rather than competing with it. MCP connects models to tools and data sources; A2A handles agent-to-agent communication. The design centres on “Agent Cards” that advertise capabilities in JSON format, enabling dynamic task delegation. The protocol supports text and video streaming with built-in security features for enterprise deployment.

We’ve placed A2A in Trial because it remains relatively new with limited production deployment patterns. Teams should evaluate whether their use cases require agent-to-agent communication versus simpler architectures. For most organisations, starting with MCP for tool integration before exploring A2A represents a sensible progression.

LLM testing frameworks

DeepEval provides a systematic framework for evaluating LLM outputs, with built-in metrics for relevance, factual accuracy, hallucination detection and toxicity. It integrates with pytest, making it accessible to teams familiar with Python testing workflows.

Promptfoo takes a CLI-first approach with strong CI/CD integration. Where DeepEval is a Python library you embed in your test suite, Promptfoo runs as a standalone tool comparing outputs across models and prompt variants. OpenAI acquired Promptfoo in March 2026, though it remains open-source under the MIT licence. Teams running multi-model strategies should consider whether the acquisition affects their comfort with Promptfoo as a neutral evaluation tool.

DeepEval suits teams that want evaluation embedded in their Python test suite. Promptfoo suits teams wanting standalone prompt regression testing, particularly those with CI/CD pipelines that gate deployments on evaluation results.

See also: LLM-as-a-judge, AI red teaming tools

LlamaIndex

LlamaIndex, formerly known as GPT Index, is a framework that supports developers in connecting large language models with external data sources in a structured way. It provides tools to build indices, data structures that help LLMs access relevant information efficiently, thereby improving their ability to handle specific tasks requiring contextual or domain-specific data.

We consider LlamaIndex suitable for teams trialling methods to augment LLM performance, especially in data-centric applications. While its modular design and focus on customisation are appealing, its relative immaturity as a toolkit means that teams may encounter challenges around setup or adapting it to complex datasets. As with many emerging tools, its value depends on careful experimentation and matching it to the right problem space.

LangChain & LangGraph

LangChain and its companion LangGraph move up to Trial this quarter. LangGraph 1.0 reached stable release, addressing earlier concerns about abstraction churn and giving teams a more reliable foundation for building multi-step LLM workflows.

We’ve observed teams successfully using these frameworks for prototypes and smaller production systems. LangChain handles general-purpose LLM interactions while LangGraph extends this to stateful, graph-based agent workflows. The rapid pace of change in the underlying AI platforms means that some of LangChain’s abstractions may still become less relevant as the ecosystem evolves, so we recommend focused experiments that test whether these tools simplify your specific use case. Organisations on Microsoft’s stack should evaluate Microsoft Agent Framework, which offers similar orchestration capabilities with strong .NET support.

See also: Microsoft Agent Framework, AutoGen

Formal specification languages

Formal specification languages allow teams to describe system behaviour with enough precision that properties can be verified before code is written. They sit on a spectrum: lightweight languages that structure intent, model checkers that explore state spaces and full theorem provers that deliver mathematical proof. AI assistants have lowered the barrier to entry, making formal specification viable for a broader range of software than the safety-critical systems that historically justified the investment.

TLA+, created by Leslie Lamport, is the most widely adopted formal specification language for distributed systems. Amazon used it to find subtle bugs in AWS infrastructure that testing alone could not surface. Alloy, developed by Daniel Jackson at MIT, takes a lighter approach with automatic analysis, well suited for exploring design spaces and finding counterexamples early. FizzBee offers a more accessible alternative to TLA+ designed for practitioners.

We’ve been developing Allium, our own specification language at the practical end of this spectrum. Allium captures system behaviour in a structured, machine-readable format that AI agents can use to guide implementation and generate tests. It cannot prove properties hold across all possible states the way TLA+ or Alloy can, but it captures intent precisely enough to serve as contracts between humans and AI. The right level of formality depends on what is at stake.

Reviewing AI-generated formal specifications still requires enough understanding of the language to validate what the AI produced. AI lowers the authoring barrier but does not eliminate the need for comprehension.

See also: Spec-driven development, Neurosymbolic AI, Prolog

Assess

Emerging or specialised technologies that merit evaluation for specific use cases.

Prolog

Prolog sits in Assess due to its renewed relevance for neurosymbolic AI architectures. This decades-old logic programming language offers something LLMs fundamentally lack: guaranteed logical inference with explainable reasoning chains.

LLMs excel at understanding natural language but cannot reliably follow complex rules or explain why they reached a conclusion. Prolog does exactly this. By coupling an LLM with a Prolog reasoning engine, teams can build systems where the LLM handles ambiguous input and Prolog enforces business logic, validates conclusions or traverses knowledge graphs. Implementations typically use Prolog to represent domain rules that validate LLM outputs before they reach users. This pattern is particularly valuable in regulated industries where decisions must be auditable.

We’ve kept Prolog in Assess because the tooling ecosystem for LLM integration remains immature and performance can be challenging at scale. Teams should also consider whether semantic web technologies (RDF, OWL, SPARQL) might serve similar purposes with better tooling support.

See also: Neurosymbolic AI, Ontologies for AI grounding

JAX

JAX sits in our Assess ring as we observe increasing interest in this ML framework that combines NumPy’s familiar API with hardware acceleration and automatic differentiation. While TensorFlow and PyTorch remain dominant in the ML ecosystem, we’re seeing JAX gain traction particularly in research settings and among teams working on custom ML architectures.

JAX’s functional approach to ML computation and its ability to compile to multiple hardware targets through XLA (Accelerated Linear Algebra) set it apart from more established frameworks. It shows promise for projects requiring high-performance numerical computing, though teams should weigh its relative immaturity in deployment tooling and a smaller ecosystem of pre-built components. We recommend teams experimenting with JAX do so on research projects or contained proofs-of-concept before considering broader adoption.

OpenAI AgentKit

OpenAI launched AgentKit at DevDay in October 2025, comprising Agent Builder for visual workflow design, ChatKit for embeddable interfaces, integrated evals and a Connector Registry for tool integration.

AgentKit sits in Assess because several factors warrant caution. The platform is only months old, with key components still in beta. More significantly, it represents a substantial commitment to the OpenAI ecosystem. Unlike framework-agnostic alternatives such as LangChain or Microsoft Agent Framework, teams adopting AgentKit tie their agent infrastructure to a single provider’s roadmap and pricing. Usage-based costs can become unpredictable as agentic workloads scale.

For organisations already invested in OpenAI’s platform, AgentKit offers a streamlined path from prototype to production. Teams requiring vendor flexibility should evaluate open alternatives first. The agent framework space remains competitive, and committing to a vendor-specific platform this early carries meaningful switching costs.

PydanticAI

PydanticAI brings the developer experience of FastAPI to generative AI application development. Built by the team behind Pydantic (which underpins OpenAI SDK, Anthropic SDK, LangChain and others), it offers model-agnostic support across major LLM providers, structured responses through Pydantic validation and a dependency injection system for testing.

PydanticAI uses existing Python patterns rather than introducing new paradigms, lowering the learning curve for teams already familiar with the ecosystem. As a relatively new framework, we’re placing it in Assess while watching for broader production adoption. Organisations with Python-based stacks should consider evaluating it.

Smolagents

smolagents takes a minimalist approach to agent development. With a core codebase under 1,000 lines, it prioritises simplicity over comprehensiveness. Early feedback suggests it works well for prototyping agentic concepts before transitioning to more robust frameworks such as Microsoft Agent Framework or LangGraph for production. The code-based agent approach, where agents execute actions as Python code snippets, reduces LLM calls but carries inherent security considerations.

We’ve positioned smolagents in Assess because it lacks production validation and the security implications of code execution require careful evaluation. Teams exploring agent architectures should weigh its simplicity against its limitations for production-grade systems.

CrewAI

CrewAI provides a framework for creating teams of specialised AI agents that collaborate through coordinated effort. It offers a structured approach to defining agent roles and task delegation, with human-in-the-loop integration and the ability to combine agents with different capabilities for complex workflows.

While CrewAI has been used in production, it remains in Assess because the multi-agent paradigm itself is still evolving. Organisations need to evaluate whether managing multiple agents offers sufficient benefit over simpler approaches for their use cases. Best practices for agent collaboration are still emerging, and implementations may require considerable tuning.

DSPy

DSPy treats prompts as optimisable programs rather than handcrafted text. Developed at Stanford, developers define signatures (input-output specifications) and modules (composable building blocks), and DSPy’s optimisers automatically generate effective prompts based on example data. The optimisation process can discover strategies that humans might not have considered.

The framework shows particular promise for complex pipelines involving multiple LLM calls or retrieval steps. DSPy remains in Assess because the learning curve can be steep and teams should evaluate whether their use cases justify the investment. For simpler single-prompt applications, traditional approaches may remain more practical.

LinkML

LinkML allows teams to define data models in YAML and generates multiple outputs: JSON Schema for validation, Python dataclasses for code, RDF/OWL for semantic web compatibility and documentation. This makes it valuable for phased ontology development where teams want to start practically but preserve the option for formalisation later.

The framework emerged from biomedical informatics but applies broadly. For AI applications, LinkML models can define entities and relationships for knowledge graphs and structured output schemas for LLMs. It remains in Assess because adoption is relatively niche. Organisations already committed to JSON Schema may find less incremental value, but for teams starting fresh on knowledge representation, LinkML offers a middle path between ad-hoc schemas and full OWL modelling.

Hold

Not recommended for new projects; better alternatives exist.

AutoGen

AutoGen is now in maintenance mode, receiving bug fixes only. In October 2025 Microsoft merged AutoGen and Semantic Kernel into the Microsoft Agent Framework (MAF), which consolidates both projects’ capabilities into a single platform. Semantic Kernel is likewise in maintenance mode. Teams currently using either framework should plan their migration to MAF; new projects should start there directly.

TensorFlow

We have placed TensorFlow in the Hold ring for several reasons. While TensorFlow remains a capable deep learning framework that helped popularise machine learning at scale, we’re seeing teams struggle with its steep learning curve and complex deployment story compared to more modern alternatives. The framework’s syntax and intricate architecture could act as headwinds for teams new to machine learning.

PyTorch has emerged as the clear community favourite for both research and production deployments, with arguably a more intuitive programming model and better debugging capabilities. For new projects we recommend exploring higher-level tools or PyTorch unless there are compelling reasons to use TensorFlow, such as maintaining existing deployments or specific requirements around TensorFlow Extended (TFX) for ML pipelines.

Keras

We have placed Keras in the Hold ring primarily due to its transition from a standalone deep learning framework to becoming more tightly integrated with TensorFlow, along with the emergence of more modern alternatives that offer better developer experiences.

While Keras served as an excellent entry point for many developers into deep learning, providing an intuitive API that made neural networks more accessible, the deep learning ecosystem has evolved significantly. Frameworks such as PyTorch have gained substantial momentum, offering clearer debugging, better documentation and a more Pythonic approach. Additionally, recent high-level frameworks such as Lightning and FastAI provide similar ease-of-use benefits while maintaining closer alignment with current best practices in deep learning development. For new projects, we recommend exploring these alternatives rather than investing in Keras-specific expertise.

R

Despite R’s historical significance in data science and statistical computing, we’ve placed it in the Hold ring for new projects. While R remains capable for statistical analysis and data visualisation, we’re seeing its adoption declining in favour of Python’s more comprehensive ecosystem for machine learning and AI workflows.

The key factors driving this recommendation are the overwhelming industry preference for Python-based ML frameworks and the stronger integration of Python with modern AI platforms and tools. While R retains some advantages for specific statistical applications and academic research, we believe teams starting new AI initiatives will benefit from standardising on Python to maximise their access to cutting-edge AI libraries and tools.

OpenCL

We’ve placed OpenCL in the Hold ring of our Languages & Frameworks quadrant. While OpenCL (Open Computing Language) was groundbreaking when introduced as a standard for parallel programming across different types of processors, we believe teams should look to alternatives for new projects.

Despite its promise of write-once-run-anywhere code for GPUs, CPUs, and other accelerators, OpenCL has seen declining industry support and faces significant challenges. Major hardware vendors have shifted their focus to more specialised frameworks such as CUDA for NVIDIA hardware, while newer alternatives such as SYCL and modern GPU compute frameworks offer better developer experiences with similar cross-platform benefits. The complexity of the OpenCL programming model, combined with inconsistent tooling support and a fragmented ecosystem, makes it increasingly difficult to justify for new development compared to more actively maintained alternatives.

Get industry news, insights, research, updates and events directly to your inbox

Sign up for our newsletter