The LinkedIn Generative AI Application Tech Stack: Extending to Build AI agents
Last year, we unveiled LinkedIn’s generative AI application tech stack, which enabled us to build GenAI applications at scale. At the time, we touched on developing agents for members and customers and had just launched Hiring Assistant, LinkedIn’s first AI agent for recruiters, to our charter customers. Since then, we’ve been focused on improving the Hiring Assistant experience, and we’re excited to make it globally available in English to customers by the end of September.
To scale to this level and improve the value of our agent, we relied on strategic tech stack updates. This post will cover extensions to the GenAI application platform that have enabled us to build AI agents that think, plan, and act in collaboration with users, marking the next phase in LinkedIn’s GenAI evolution. These foundational changes ultimately help us build products that are proactive, personalized, and adaptive for our members and customers.
We’ll share details on how we’ve thought about and adapted to building agentic experiences with key learnings on:
Reusing existing infrastructure and providing strong developer abstractions are key to scaling complex AI systems efficiently.
Designing for human-in-the-loop control ensures trust and safety while enabling agents to operate autonomously when appropriate.
Observability and context engineering have become essential for debugging, continuous improvement, and delivering adaptive, personalized experiences.
Finally, adopting open protocols is critical to enabling interoperability and avoiding fragmentation as agent ecosystems grow.
We’re all learning as an engineering community as we build tech stacks, platforms and frameworks to help our businesses grow and scale. That’s why it’s important to continue to share lessons and insights to help others as we all build better and more valuable AI products.
Anatomy of a GenAI agent
A GenAI agent is an autonomous or semi-autonomous system powered by large language models (LLMs) designed to perform complex, long-running tasks—often in dynamic environments—by reasoning, planning, and interacting with systems and humans. Unlike traditional software, Gen AI agents don't just follow hardcoded logic; they use LLMs to interpret goals, make decisions, adapt to changing conditions, and even break tasks into subtasks before solving them.
Given the cognitive limitations of current LLMs and to ensure that human operators have control, agents need to operate with a “human in the loop” (HITL) approach. What this means is that agents seek clarification, get feedback, or request approvals at key decision points, balancing autonomy with control. By combining LLM reasoning with task memory, tools, and human collaboration, Gen AI agents mark a shift from one-off prompts to continuous, contextual AI workflows.
Figure 1: Illustration of a typical agent reasoning loop control flow.
For most use cases, the agent is not a single monolithic application, but rather a facade over multiple such agentic applications. This approach offers the following key advantages.
Modularity: Each agent handles a specific task, making the system easier to build, test, and extend.
Scalability: Tasks can run in parallel, improving performance for long-running or complex workflows.
Resilience: Failures in one agent don’t crash the whole system—errors can be retried or isolated.
Flexibility: New capabilities can be added by plugging in new agents, enabling rapid iteration.
However, for this to work at scale, we need a way for application developers to “define” agents and an intelligent orchestrator that manages task flow, dependencies, communication between agents and HITL steps.
Defining and registering agents within our systems
As mentioned, one of the key enablers to crafting and scaling agent-based systems is to help define those agents in an accessible way for the system and meaningful way for the application developers.
Since agents run online in production, we decided to reuse the same mechanism we use for service to service communication at LinkedIn, aka gRPC, and came up with a standard gRPC service schema definition. Developers simply annotate this definition with some platform defined proto3 options that describe the metadata of their agent, and register it via a build plugin into the skill registry (a central service that tracks available agents, their metadata, and how to invoke them) as described in our previous blog post.
Laying the groundwork for multi-agent orchestration
Having a single monolithic agent application isn’t tenable for many of the complex tasks and workflows that our members and customers engage in across LinkedIn. So by defining our agents, we can then lay the groundwork to orchestrate multiple agents - increasing their capability and effectiveness in tackling a task.
In order to run multi-agent systems at scale, we need to fallback to the classic distributed systems concepts of horizontal scaling with compute deployed across multiple regions for better scalability, performance and availability. This in turn means that all the classic distributed systems problems of consistency, availability and partitioning return with an additional twist; namely a highly non-deterministic workload given the nature of GenAI agents.
Rather than solve this from scratch, we looked to our existing production systems for inspiration and realized that our messaging system best emulated the characteristics we wanted in a multi-agent orchestrator. Long lived tasks could be broken down into a sequence of messages with guaranteed first in first out (FIFO) delivery and seamless message history lookup. Parallelization at scale could be handled via multiple message threads. We could also piggy back on existing resilience constructs built into the messaging system for persistent retries and eventual delivery in the face of issues like host outages or cross-region traffic shifts. Last but not the least, we could also build upon the existing end-user delivery and synchronization mechanisms used by LinkedIn messaging.
The only missing piece was building an adapter between messaging and the gRPC contract that agents exposed to avoid exposing the messy details of messaging to the application developers. We did this by building libraries that abstracted everything behind async calls to a central agent lifecycle service that did the necessary messaging <-> rpc adaptations by appropriately invoking the messaging platform to create/update/retrieve messages and invoke the destination agent rpc endpoints.
The following diagram shows how these components of the platform relate to each other. Of note, the skill registry helps agents identify which gRPC calls can be made using the agent lifecycle service, and experiential memory allows agents to remember facts, preferences, and any other things they have learned. However, the core of all agent interactions happens through the messaging platform and thus leverages both technology and existing operations expertise.
Figure 2: Diagram showing the relationship between different agent platform components
Designing the inter-agent and agent-user communication protocol in this way provides all the benefits of an existing system. It allows agents to respond in a single chunk, incrementally across a single synchronous response (synchronous streaming) or even split their responses across multiple asynchronous messages (asynchronous streaming); thus allowing us to model a wide range of execution modalities.
Applying thoughtful client integrations to support better user experiences
A member or customer’s interactions with an agent can instantly shape its perceived value and usefulness. This makes engineering client interactions to be seamless and helpful a key consideration in agent systems.
When considering LinkedIn, our members interact with agents via our web and mobile applications. Since agent interactions can be asynchronous long running tasks that can span beyond a single user session with an application, it was important to build out a library to handle the following:
Server-to-client push to enable agents to notify members when agents are done with long running tasks.
Cross-device state synchronization to maintain consistent application state across multiple devices.
Incremental streaming to optimize the delivery of large, latency-prone LLM responses
Error handling and fallbacks to ensure that agents remain operable even when the client environment presents obstacles.
A companion library living in the frontend API server complements the client libraries by offering endpoints for message communication, response retrieval, and authorization simplifying client integration with agents.
Tackling observability for agents
Observability in traditional distributed systems is already complex, requiring careful instrumentation, correlation of signals across services, and robust alerting mechanisms. For GenAI agents, it's significantly harder. Unlike conventional software that follows deterministic logic, agents combine dynamic, stochastic behaviors with distributed systems patterns. This makes debugging, monitoring, optimization, and continuous improvement especially challenging.
At LinkedIn, we’ve adopted a hybrid observability strategy tailored to the two distinct stages of agent development: pre-production and production.
In pre-production, our focus is on rich introspection and iteration. We’ve integrated with LangSmith for tracing and evaluation. Since many of our agent components are built on LangGraph and LangChain, LangSmith offers a seamless developer experience. With a few lines of code changes, developers can capture detailed execution traces, including LLM calls, tool usage, and control flow across chains and agents. This high-fidelity view allows engineers to quickly identify failure modes, adjust agent behaviors, and iterate with confidence. At this stage, we capture and retain full execution context.
In production, the constraints change. Agents operate within our distributed infrastructure, interacting with many downstream systems and handling real user data. Here, we rely on a hardened observability foundation built on OpenTelemetry (OTel), the de facto standard across LinkedIn. We instrument key agent lifecycle events—such as LLM calls, tool invocation, and memory usage—into structured, privacy-safe OTel spans. This allows us to correlate agent behavior with upstream requests, downstream calls, and platform performance at scale. While the traces are leaner than in pre-production, they are optimized for production debugging, reliability monitoring, and compliance.
But observability is not just about real-time debugging and monitoring, it’s also foundational to learning and continuous improvement. That’s why we tightly integrate our observability stack with our holistic evaluation platform. Execution traces are persisted and aggregated into datasets that power offline evaluations, model regression tests, and prompt tuning experiments. These traces become the raw material for understanding agent behavior over time, detecting regressions, and improving agent quality systematically.
Figure 3: Example diagram of observability workflows
This layered approach—developer-friendly traces in pre-production, scalable structured observability in production, and unified trace-based evaluation across both—allows us to move fast without sacrificing visibility or reliability.
Balancing experimentation and integration with developer tooling
While there is intense focus on bringing agents to product and quickly showing value, experimentation and iteration still have a huge role to play in building a successful agent experience. Developer tooling is one the best avenues for creating that space to test, learn, innovate and evaluate when to integrate and scale an agent initiative.
We built a Playground as a testing ground for developers to enable rapid prototyping and experimentation, enabling users to conceptualize and iterate on ideas without committing to extensive integration efforts. Key features include:
Agent experimentation: Developers can experiment with agents and engage in two-way communication, facilitating testing and validation of agent behaviors.
Skill exploration: The Playground offers tools to search for registered skills, inspect metadata, and directly invoke them using user inputs.
Memory inspection: Developers can examine memory contents and observe how they change over time by viewing historical revisions
Identity management: The Playground provides tools to manage and inspect identity data, enabling developers to test with varied authorization scenarios and product licenses.
Observability: Experimental invocations provide observability traces, giving quick insight into failures during development.
Taking note of emergent agent design patterns
While we’ve shared many practical ways we’ve extended our tech stack within LinkedIn, we are also constantly evaluating how our learned observations and experiences might reflect or be shaped by the emerging agent design patterns across domains.
UX trends: So, what has advanced in the agentic UX space? Agentic experiences have been evolving to keep pace with the exponential advancements in AI: reasoning models explaining their chain-of-thought, deep research agents, browser use agents and background agents. Users are now used to chat based experiences and increasingly finding them to be more comfortable than interacting with a traditional GUI. They are now delegating more of their tasks to agents to work semi or fully autonomously.
Intent alignment, explainability and control: The key aspects to agentic experiences now include intent alignment, explaining thought and control. Agents must be able to align with user expectations, even when they are vague, by explaining their thought process, grounding their statements with verifiable facts and citations. They need to seek feedback and give enough control to the user even when working autonomously.
Unleashing data: Data has evolved from a passive asset to the driving force behind agentic workflows, where intelligent agents amplify its power by generating insights, making it digestible, and acting upon it. RAG and knowledge graphs surface the latent meaning hidden within data, making it more usable for both agents and users. Central to this transformation is context engineering—a practice that involves feeding LLMs with the right data and memory in alignment with specific goals—unlocking a new tier of responsiveness and intelligence. By infusing the right user memories into multi-agent systems, we enable personalized, responsive experiences that feel truly adaptive. We are also leveraging data intensive big data offline jobs to curate and refine long term agent memories.
Background Agents can take on longer tasks, perform them autonomously behind the scenes and finally present the finished work for review. Users can assign tasks to them through a task assignment system like GitHub actions or Jira. Agents can methodically perform the tasks in the background. Coding assistants and deep research agents are well suited for such background work. This is one way to optimize the use of GPU compute at idle, off peak times.
Frameworks: Every major company has released its own agentic framework and there are more than a hundred available today. But no one framework has a dominant market share yet. LangGraph is quite popular due to its vast integrations, production readiness and observability features. We have embraced LangGraph and adapted it to work with LinkedIn messaging and memory infrastructure using custom built providers. This allows agentic developers to use popular frameworks to work with our agentic platform.
Final thoughts
Our LinkedIn generative AI application tech stack is a reflection of how far we’ve really come on our journey from simple generative AI use cases to complex multi-agent experiences. At this stage with agents, we are sharing some final thoughts from our perspective on areas of importance and shifting influence in this space.
Security and privacy: Within the platform and individual agents, user data is handled with strict boundaries to support privacy, security, and control. Experiential Memory, Conversation Memory, and other data stores are siloed by design, with privacy-preserving methods governing how information flows between components like the Client Data Layer, the Playground/other agents, and the Agent Lifecycle Service. Any sharing between these domains is designed to happen through explicit, policy-driven interfaces—avoiding direct access—with strong authentication and authorization checks enforced for every cross-component call, including tool invocations. These safeguards ensure that permitted agents or services can access specific data, and all access is logged and auditable, keeping member information compartmentalized and secure.
Sync vs async agent invocation: In addition to the async Messaging based delivery for agent invocation, we have now enabled a new sync delivery mode, which bypasses the async queue and directly invokes the agent with sideways message creation. This significantly speeds up delivery with predictable times for user facing interactive agentic experiences. So, agent developers now have the option of strong consistency with async delivery vs eventual consistency with sync delivery.
MCP and A2A have become foundational protocols for enabling dynamic agent discovery and collaboration—across diverse frameworks and distributed environments. MCP empowers agents to explore and interact with the world through tool-based interfaces, while A2A facilitates seamless teamwork among agents. With widespread support from leading model providers like Claude, OpenAI, and Azure, MCP makes it easier for companies to surface and activate their data via standardized interfaces. We are incrementally adopting these open protocols, moving away from a proprietary skill registry, paving the way for more intelligent, interoperable, and context-aware agent ecosystems.
There is no one correct path towards building successfully with agents. Our hope is to share the approaches, lessons and insights that we’ve collected over our time to help others make progress in their own exciting journey with agents.