Research Pipeline Example¶
This Proof-of-Concept (POC) demonstrates a sophisticated multi-agent pipeline designed to automate the research and generation of a comprehensive report on a given topic.
Overview¶
The pipeline orchestrates 7 specialized agents, each handling a specific stage of the research process, from initial topic exploration to final content editing and visualization. It showcases advanced AgentVault concepts like complex workflows, artifact passing, and leveraging specialized agent skills.
Workflow Diagram¶
The diagram below depicts the flow of control and data artifacts between the orchestrator and the seven agents in the research pipeline.
(Diagram showing the orchestrator sequentially calling Topic Research, Content Crawler, Information Extraction, Fact Verification, Content Synthesis, Editor, and Visualization agents, passing artifacts between relevant steps.)
Workflow Steps¶
- Orchestrator (
langgraph_research_orchestrator
) -> Topic Research Agent (local-poc/topic-research
)- Input: Research
topic
,depth
,focus_areas
. - Output: Structured
research_plan
(subtopics, questions) andsearch_queries
.
- Input: Research
- Orchestrator -> Content Crawler Agent (
local-poc/content-crawler
)- Input:
search_queries
from the previous step. - Action: Searches the web using configured engines, scrapes content from relevant URLs.
- Output:
raw_content
artifact (list of dicts containing URL, title, scraped text, query source).
- Input:
- Orchestrator -> Information Extraction Agent (
local-poc/information-extraction
)- Input:
raw_content
artifact. - Action: Analyzes scraped text using an LLM to extract key facts, statistics, and direct quotes relevant to the research subtopics.
- Output:
extracted_information
artifact (list of all extracted facts/quotes) andinfo_by_subtopic
artifact (facts/quotes grouped by subtopic).
- Input:
- Orchestrator -> Fact Verification Agent (
local-poc/fact-verification
)- Input:
extracted_information
artifact. - Action: Uses an LLM to assess the credibility of each fact based on source URL and content, assigns a confidence score, and flags potential contradictions.
- Output:
verified_facts
artifact (facts annotated with verification status, confidence, notes) andverification_report
artifact (list of issues found).
- Input:
- Orchestrator -> Content Synthesis Agent (
local-poc/content-synthesis
)- Input:
verified_facts
,research_plan
. - Action: Uses an LLM to generate a structured draft article (Markdown) based on the verified facts, following the research plan outline. Creates citations.
- Output:
draft_article
artifact (Markdown text) andbibliography
artifact (structured references).
- Input:
- Orchestrator -> Editor Agent (
local-poc/editor
)- Input:
draft_article
. - Action: Uses an LLM to review the draft for clarity, style, grammar, and consistency according to configured preferences.
- Output:
edited_article
artifact (refined Markdown text) andedit_suggestions
artifact (list of changes made or suggested).
- Input:
- Orchestrator -> Visualization Agent (
local-poc/visualization
)- Input:
verified_facts
. - Action: Identifies data suitable for visualization within the verified facts and generates placeholder visualization metadata (or actual SVGs if configured).
- Output:
viz_metadata
artifact (list describing generated visualizations and related facts), potentially individual visualization artifacts (e.g.,bar_chart_1.svg
).
- Input:
- Orchestrator: Saves final article, visualization metadata, and all intermediate artifacts locally. Logs completion status.
Components¶
poc_agents/research_pipeline/
: Contains the individual agent Python scripts (e.g.,topic_research_agent.py
), the sharedbase_agent.py
, agent card files (agent_cards/
), Dockerfiles (dockerfiles/
), environment configuration (envs/
), and supporting scripts.poc_agents/langgraph_scrapepipe/
: Contains the LangGraph-based orchestrator (src/langgraph_research_orchestrator/
), its configuration (pipeline_config.json
), and runner scripts (run_pipeline.py
).
Setup¶
- Prerequisites: Docker, Docker Compose, Python 3.11+, Poetry. Ensure the
agentvault_network
Docker network exists (docker network create agentvault_network
). The AgentVault Registry should also be running. - LLM Server: An OpenAI-compatible LLM server (like LM Studio or Ollama) must be running and accessible to the Docker containers (often via
http://host.docker.internal:1234/v1
). Ensure the required models (e.g., Llama 3 Instruct, Nomic Embed) are loaded. - Environment Variables: Configure the
.env
files within each agent's corresponding directory underpoc_agents/research_pipeline/envs/
. These files define theLLM_API_URL
,LLM_API_KEY
,LLM_MODEL_NAME
,PORT
, andAGENT_CARD_PATH
for each agent. - Orchestrator Configuration: Configure the orchestrator's
.env
file (poc_agents/langgraph_scrapepipe/.env
) primarily withAGENTVAULT_REGISTRY_URL
. Pipeline behavior (timeouts, artifact paths, etc.) is controlled bypipeline_config.json
within the orchestrator's directory. Useconfig_generator.py
to create/modify this file. - Build & Run Docker Compose:
- Navigate to the
poc_agents/research_pipeline/
directory. - Run:
docker-compose build
(ordocker compose build
) - Run:
docker-compose up -d
(ordocker compose up -d
)
- Navigate to the
Running the POC¶
- Navigate: Change to the
poc_agents/langgraph_scrapepipe/
directory. -
Run Orchestrator: Use the provided runner script: ```bash # Example: Standard depth research on AI in Healthcare python run_pipeline.py --topic "Impact of AI on Healthcare" --depth standard
Example: Comprehensive research with specific focus areas¶
python run_pipeline.py --topic "Quantum Computing Applications" --depth comprehensive --focus-areas "Cryptography" "Drug Discovery"
Example: Using a custom configuration file¶
python run_pipeline.py --topic "Renewable Energy Storage" --config my_custom_config.json
``3. **Monitor Logs:** Check the orchestrator logs and individual agent logs via
docker logs`.
Example Run (GIF)¶
This animation captures the log output from the research pipeline orchestrator and the various agents involved as they process a research topic.
(Animation showing logs from the orchestrator and research agents (topic, crawler, extraction, verification, synthesis, editor, visualization) processing a topic)
Expected Output¶
- Orchestrator Logs: Detailed logs showing each step of the pipeline, agent calls, artifact saving, and the final status.
- Local Artifacts: Generated files stored in the directory specified by
orchestration.artifact_base_path
in the orchestrator's configuration file (defaults topoc_agents/langgraph_scrapepipe/pipeline_artifacts/
). Files are organized byproject_id
and then by node name (e.g.,pipeline_artifacts/<project_id>/content_synthesis/draft_article.md
).
Key Features Demonstrated¶
- Complex Multi-Agent Workflow: Orchestrating a 7-step pipeline.
- LangGraph Orchestration: Defining the workflow as a stateful graph.
- Agent Specialization: Each agent performs a distinct task in the research process.
- LLM Integration: Multiple agents leverage LLMs for different tasks (planning, extraction, verification, synthesis, editing).
- A2A Communication: Agents interact via the AgentVault A2A protocol.
- Local Artifact Management: Passing data between steps by saving/loading local files.
- Configuration System: Centralized configuration for pipeline behavior.
- Error Handling & Retries: Built-in retries for agent communication and error handling within the graph.