Infrastructure and managed services that host and run AI workloads: cloud AI services, vector databases, model serving platforms and MLOps infrastructure.

Adopt

Established, well-supported services ready for production AI deployments.

Foundation models

Foundation model providers continue to evolve rapidly. Major players such as OpenAI, Anthropic, Google and Meta compete alongside emerging organisations including DeepSeek, Alibaba and IBM.

Providers differentiate across three tiers: smaller, faster models optimised for speed and cost; larger, more capable models balancing capabilities with response times; and specialised reasoning models for complex problem-solving. The distinction between general and reasoning models is blurring, with GPT-5.4 and Claude Opus 4.6 integrating extended reasoning natively rather than through separate variants.

Foundation models warrant adoption for many business applications when paired with appropriate infrastructure (few-shot prompting, guardrails, RAG and evaluation frameworks). There is no universal “best model”. We recommend benchmarking against your specific use cases, considering factors beyond raw performance: pricing, reliability, data privacy requirements and deployment options. High-quality open weight models with permissive licensing provide additional options for organisations with specific security or deployment requirements.

Key considerations

Performance & capabilities (accuracy, speed, and domain-specific strengths)
Total cost of ownership (API costs, compute resources, and integration)
Deployment options & technical requirements (cloud, self-hosted, edge)
Data privacy & compliance (regulatory, legal, and security implications)
Integration & lifecycle management (context limitations, version control, updates)
Vendor stability & support (roadmap alignment, documentation, community)

Foundation model providers feature comparison (April 2026)

Provider	Open Weights	Enterprise Focus	Reasoning Models	Edge Deployment	Long Context	Embedding API	Agentic Workflows	Model Selection Link
Alibaba	✓			✓	✓	✓		Models
Anthropic		✓	✓		✓		✓	Models
AWS		✓	✓		✓			Models
Cohere	✓	✓	✓		✓	✓		Models
DeepSeek	✓		✓	✓				Models
Google		✓	✓		✓	✓	✓	Models
IBM	✓	✓	✓	✓	✓			Models
Meta	✓			✓	✓			Models
MiniMax	✓			✓	✓		✓	Models
Mistral AI	✓	✓	✓	✓		✓		Models
OpenAI	✓	✓	✓	✓	✓	✓	✓	Models
Stability AI	✓			✓				Models
X	✓		✓		✓		✓	Models
Zhipu AI	✓	✓	✓		✓			Models

Feature definitions

Open Weights: Models whose weights are publicly available for download and customisation
Enterprise Focus: Strong emphasis on governance, security, and enterprise integration
Reasoning Models: Specialised models for complex reasoning tasks such as mathematics or step-by-step problem solving
Edge Deployment: Optimised for deployment on edge devices or resource-constrained environments
Long Context: Support for context windows of 250K tokens or more
Embedding API: Dedicated text embedding models and APIs for generating vector representations of text for semantic search and similarity tasks
Agentic Workflows: Ability to autonomously plan and execute multi-step tasks using tools and external services. Goes beyond basic function calling to include complex workflow orchestration, error handling, dynamic planning based on intermediate results, and completing entire business processes without human intervention at each step

Weights & Biases

CoreWeave acquired Weights & Biases in May 2025. The platform continues under new ownership and our recommendation stands, though teams should monitor how the product evolves.

Weights & Biases tracks and visualises machine learning experiments. It provides a robust solution for managing ML workflows, particularly with complex models and large datasets. By making experiment tracking light touch, requiring just a few lines of code, it removes the friction that prevents teams from maintaining good measurement practices.

Collaboration features such as shared dashboards and reports make results visible to the whole team. Rather than knowledge being siloed in individual notebooks, experiments become shared assets. This visibility leads to faster knowledge sharing and quicker iteration cycles. However, tool adoption alone isn’t enough; teams need to actively foster a culture that values measurement and experimentation for these benefits to materialise.

Temporal

Temporal is a workflow orchestration platform that provides durable execution for long-running processes. Although not AI-specific, it has become increasingly relevant as organisations build production agentic systems that must survive failures and run reliably over extended periods.

The core value is durability. If a multi-step process fails halfway through, it resumes from exactly where it left off. This differs from Kubernetes, which restarts crashed containers but knows nothing about application state. Temporal remembers that your workflow was on step 5 of 10, waiting for human approval, with specific context variables intact. The two are complementary, each handling failures at its respective layer.

Temporal’s programming model treats workflows as ordinary code in familiar languages (Go, Java, Python, TypeScript, .NET) rather than configuration or visual diagrams. For agentic systems specifically, it addresses failure modes that many agent frameworks ignore: LLM calls timing out, tool invocations needing retry with backoff, human approvals taking days. Built-in retry policies and the ability to pause workflows indefinitely handle these cleanly.

We recommend Temporal for organisations moving beyond prototype agents toward production deployments where reliability matters. For simpler use cases, lighter-weight alternatives may suffice.

Data pipeline orchestration tools

Data pipeline orchestration has become essential infrastructure for managing complex data workflows, particularly those supporting AI initiatives. Whilst transformation tools such as dbt handle the “what” of data processing, orchestration platforms manage the scheduling, execution and monitoring of entire pipelines.

Apache Airflow is the established standard, with broad integration support across cloud platforms, though teams often find the learning curve steep. Prefect emphasises developer experience and dynamic workflow adaptation, with faster development cycles but fewer third-party integrations. Dagster takes an asset-centric approach where data assets become first-class citizens, providing built-in lineage tracking and data quality monitoring.

The choice depends on organisational context. Established enterprises with diverse toolchains often gravitate towards Airflow’s ecosystem breadth, teams prioritising developer velocity prefer Prefect, and organisations with complex lineage requirements consider Dagster’s asset-aware approach.

Cloud model hosting platforms

Model hosting has evolved beyond simple API access, with distinct platforms serving different needs from prototyping to enterprise production. Cloud-based hosting has become the default for most AI deployments.

Enterprise production environments typically use AWS Bedrock, Google Vertex AI or Azure OpenAI Service, which provide fine-tuning capabilities with enterprise security and integration with existing cloud infrastructure. For performance-critical applications, specialised providers such as Fireworks AI and Together AI focus on inference optimisation and custom model deployment, though teams must weigh simplified deployment against reduced ecosystem integration.

The inference hardware market is shifting beneath these platforms. Groq’s custom LPU chips and Cerebras’s wafer-scale processors have attracted major investment. For most organisations, these developments improve existing hosting providers’ offerings rather than requiring a direct relationship with the chip makers.

Development teams and startups often prefer Replicate, Modal or Hugging Face Inference Endpoints, which offer direct paths from trained model to production API with flexible pricing. Hugging Face supports deployment of 60,000+ models with minimal configuration. The trade-off is more limited enterprise governance.

The choice reflects organisational priorities: enterprises with compliance requirements gravitate towards major cloud providers, performance-focused teams benefit from specialised inference platforms, and development teams prioritising rapid iteration prefer simplified deployment.

Trial

Platforms with growing adoption that offer innovative approaches worth exploring for specific use cases.

Production AI monitoring platforms

Whilst experiment tracking tools such as Weights & Biases and MLflow excel at managing the development lifecycle, a distinct category has emerged to monitor AI systems in production. These tools detect drift and unexpected behaviour in deployed models that only surface when models encounter real-world data at scale.

Arize AI provides unified observability across traditional ML and LLM applications, continuously tracking feature and embedding drift. Evidently AI offers both an open-source library and cloud platform, with over 100 metrics covering data quality and drift monitoring.

The key benefit is proactive detection: organisations learn about performance degradation before customer impact rather than discovering issues through support tickets. For teams already practising observability, AI-specific monitoring represents a natural extension of existing practices.

Open weight LLMs

Open weight LLMs (sometimes incorrectly called “open source”) reached maturity in 2025, with some surpassing frontier models on specific tasks. MiniMax M2.7, Moonshot’s Kimi K2.5, Zhipu’s GLM-5 and DeepSeek V3.2 compete directly with closed models on coding and reasoning benchmarks. Smaller models such as Microsoft’s Phi-4 and Google’s Gemma 4 run on consumer hardware, making self-hosted inference practical for local coding assistants, on-device processing and latency-sensitive applications.

This quarter, source code was accidentally exposed through AI provider tooling, and autonomous agents with broad permissions destroyed production data. A self-hosted deployment puts data flow and model versioning under the operator’s control. Prompts, outputs and proprietary code stay within the organisation, and the model only changes when the operator changes it.

Regulation is heading the same way. The EU AI Act, financial services rules and healthcare frameworks all want organisations to show where data flowed and how models behaved, not take a vendor’s word for it. GLM-5 was trained entirely on Huawei Ascend chips with no NVIDIA dependency, a reminder that supply chain sovereignty extends down to the silicon.

Self-hosting still needs considerable ML engineering expertise, and total cost of ownership isn’t always lower than API alternatives once compute and engineering time are factored in. For routine work, pay-per-use APIs still win. For regulated or sensitive data, the calculation has changed.

AI-powered workflow automation platforms

Visual workflow automation platforms allow teams to build AI-powered business processes through drag-and-drop interfaces rather than code. Zapier focuses on connecting SaaS applications with AI capabilities. n8n offers self-hosting, open-source licensing and extensive customisation for technical teams. Microsoft Power Automate provides native Office 365 integration with enterprise governance. Make.com emphasises sophisticated visual workflow design with AI agent functionality.

Common use cases include lead qualification using LLM analysis, automated content generation, customer support routing and data processing pipelines incorporating AI models.

When evaluating, consider technical capability, data sovereignty requirements and scalability. Self-hosted solutions such as n8n offer maximum control but require expertise, while SaaS offerings reduce overhead but may have cost implications at scale. Teams should also assess error recovery and debugging capabilities, as AI components can fail less predictably than traditional integrations.

Digital twin platforms

A digital twin is a virtual representation of a physical system that maintains bidirectional synchronisation with its real-world counterpart. Unlike traditional simulation, digital twins continuously ingest live sensor data, enabling organisations to test changes in simulation before deploying them, diagnose problems without physical inspection and simulate the impact of new equipment on existing workflows.

NVIDIA Omniverse has emerged as the dominant platform, providing a simulation environment built on OpenUSD that enables physically accurate rendering and real-time collaboration. Its Isaac Sim extension targets robotics simulation specifically.

Digital twin platforms remain in Trial because successful deployment requires significant organisational investment beyond the platform itself. The challenge lies in data integration, maintaining synchronisation between physical and virtual systems and building capability to act on simulation insights. These are most relevant for organisations operating complex physical infrastructure: manufacturing plants, logistics networks, energy systems or robotics deployments. Organisations earlier in their data maturity journey should ensure foundational sensor instrumentation and data pipelines are in place before investing.

Assess

Emerging or specialised services that require careful evaluation before adoption.

Galileo

Galileo takes a distinctive approach to AI evaluation: rather than relying on expensive frontier LLMs as judges, it uses purpose-built small language models called Luna-2 for low-latency evaluation of hallucination detection, context adherence and output quality. This matters in production, where evaluation metrics need to run at serving latency to act as quality gates.

For agent debugging, Galileo provides Timeline, Conversation and Graph views for tracing execution paths. Cloud and on-prem deployment options make it viable for regulated industries with data residency requirements.

The Luna-2 approach is a differentiator. Most evaluation platforms use frontier models as judges, which is slow, expensive and creates a circular dependency on the providers you’re evaluating.

The trade-off is vendor dependency. Teams wanting open-source flexibility should consider Langfuse for tracing or Phoenix for observability. The evaluation tooling space remains competitive, and we want to see how it settles before moving Galileo beyond Assess.

See also: LLM observability tools, Production AI monitoring platforms

Kubeflow

Kubeflow is an open-source ML platform built on Kubernetes, combining orchestration capabilities with ML-specific tools: Pipelines for workflow automation and KFServing for model deployment. This integrated approach helps bridge the gap between data scientists and operations teams.

Several factors keep Kubeflow in Assess. Implementing it demands expertise in both Kubernetes and ML engineering. Many organisations struggle with complexity during setup and ongoing maintenance, reporting a steep learning curve before seeing tangible benefits.

Organisations with established ML practices and Kubernetes expertise should consider it, particularly for challenges around model deployment, experiment reproducibility or resource utilisation. Smaller teams or those earlier in their ML journey may prefer managed options such as Vertex AI Pipelines.

Process mining platforms

You cannot reliably automate processes that have not yet been optimised. Most of a process flows as expected, but the exceptions and edge cases determine whether automation succeeds or fails. As agentic AI moves toward enterprise deployment, process mining is becoming essential preparation: it uses event logs from enterprise systems to discover how processes actually execute versus the designed workflow. The related discipline of task mining records user activity at the desktop level, capturing the tacit knowledge of how workers handle exceptions and workarounds. Together, these capabilities map the reality that AI agents would need to replicate.

The major platforms serve different organisational profiles. Celonis, the market leader, delivers the deepest analytics for complex multi-system processes but requires significant investment. ABBYY Timeline offers integrated process and task mining in a more accessible package, suited to business users who want to identify bottlenecks without coding. QPR ProcessAnalyzer, available in the Snowflake Marketplace, is the natural choice for Snowflake-centric companies. UiPath Process Mining provides seamless integration between discovery and automation execution, though this creates ecosystem lock-in. Microsoft Power Automate Process Mining is limited compared to specialist platforms but lowers the barrier for organisations already on the Microsoft stack. Fluxicon Disco is a standalone desktop application best suited for consultants and rapid proof-of-concept projects.

We recommend starting with process discovery in a contained domain rather than attempting an enterprise-wide rollout.

Taalas

Taalas hard-wires AI models directly into custom silicon (ASICs). Rather than running models on general-purpose GPUs, Taalas “prints” a specific model onto a chip, unifying storage and compute at DRAM-level density. The result is a fixed-function chip that holds one model and cannot be rewritten. Their first product, the HC1, integrates Meta’s Llama 3.1 8B, with claimed performance of 16,000 tokens per second per user at 10x GPU throughput and 20x lower production cost.

The trade-off is radical inflexibility. Each chip runs exactly one model. When that model is superseded, the chip becomes electronic waste. This makes Taalas most compelling for scenarios where a specific model will be deployed at scale for an extended period: edge inference, embedded systems or dedicated infrastructure for a known workload. For regulated industries, a model baked into silicon is inherently versioned and immutable, simplifying reproducibility concerns. Teams subject to Model Risk Management requirements may find this appealing.

We’ve placed Taalas in Assess because the technology is early and the economics only work at significant scale. The assumption that a single model will remain useful long enough to justify dedicated silicon runs counter to how fast the field moves.

AI governance platforms

The EU AI Act reaches full enforcement in August 2026. High-risk AI systems must demonstrate conformity assessments, risk documentation, human oversight and ongoing monitoring. Banks and insurers already operate under model risk management frameworks (SR 11-7 in the US, SS1/23 in the UK), but these were written for statistical models with stable inputs and deterministic outputs. Extending them to LLM-based systems requires tooling that existing GRC platforms were not designed to provide.

Credo AI has emerged as the leading dedicated platform, providing AI risk classification aligned to the EU AI Act, NIST AI RMF and ISO 42001. It powers the compliance accelerators in IBM watsonx.governance through an OEM partnership, and Microsoft is integrating it with Azure AI Foundry. Others compose governance workflows from existing tools: model cards in Hugging Face, risk registers in GRC platforms, audit trails from observability tooling. Dedicated platforms offer structured workflows out of the box, while bespoke approaches require more engineering but can match how your organisation already works.

The space is young and consolidating fast. Financial services organisations should be evaluating now, particularly those with EU AI Act obligations, but should expect the vendor landscape to shift.

Agent memory architectures

Stateless agents hit a wall quickly. An AI coding assistant that forgets what it learned yesterday, or a customer service agent that can’t recall a conversation from last week, forces users to re-establish context on every interaction. Agent memory architectures address this with tiered storage: working context for the current session, long-term memory for persistent knowledge and episodic recall for past interactions.

Mem0 has gained the most traction, serving as the exclusive memory provider for the AWS Agent SDK. It sits alongside your agent framework, automatically extracting and retrieving memories without changes to orchestration logic. Letta (formerly MemGPT) pioneered treating context management like virtual memory, paging information in and out as needed. Zep differentiates on temporal accuracy, maintaining a knowledge graph that tracks how facts change over time.

For regulated industries, agent memory introduces governance questions your existing data policies may not cover. Where does the memory reside? How do you audit what an agent “knows” about a customer? How do you comply with deletion requests? Involve compliance and data protection teams early.

Hold

Not recommended for new projects; better alternatives exist.

Building against vendor-specific APIs

Tightly coupling applications to vendor-specific LLM APIs poses significant risk in a market where capabilities and pricing shift monthly. Organisations that build directly against OpenAI, Anthropic or other proprietary APIs often find themselves locked in, facing painful migrations when a better or more cost-effective model emerges.

We recommend abstraction libraries that provide a common interface to multiple providers. AISuite or Simon Willison’s LLM CLI let you switch between models with minimal code changes, handling the nuances of different APIs behind a consistent interface. These abstractions add some complexity and may limit access to vendor-specific features, but the protection against lock-in outweighs these drawbacks in most cases.

Platforms