{"parentUuid":null,"isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":"Design a minimal, production-lean implementation plan for a standalone Next.js (App Router) AI chatbot POC. Verify API details against current Vercel AI SDK v6 / AI Gateway docs (use the Vercel docs MCP search tool and WebFetch as needed) — do NOT rely on memory for the exact API surface.\n\n## Context\nThis is a POC chatbot for Covalba, a French B2B \"cool roof\" (toiture réfléchissante) company. It will be embedded via a single `<script>` tag on a WordPress site. It answers ONLY from a `knowledge.md` doc injected into the system prompt (NO RAG). Model: Claude Haiku via Vercel AI Gateway (OIDC auth on deploy), with direct Anthropic key fallback for local dev. Language: French.\n\nIt lives in a NEW `chatbot/` subdirectory inside an existing Vite+React repo (the chatbot is its own self-contained Next.js app with its own package.json — NOT a monorepo, just a nested independent project deployed as a separate Vercel project with root directory = chatbot/).\n\n## What I need you to produce (a concrete implementation design):\n\n1. **Model wiring**: Exact way to call Claude Haiku through the AI Gateway in AI SDK v6 (plain provider string `'anthropic/claude-haiku-4-5'`?). Confirm the current gateway model id for the latest Haiku. Show how to fall back to direct `@ai-sdk/anthropic` with `ANTHROPIC_API_KEY` when running purely local without gateway access. Confirm what env var the gateway uses locally (AI_GATEWAY_API_KEY?) and how OIDC works on Vercel deploys.\n\n2. **Prompt caching**: Exact AI SDK v6 syntax to enable Anthropic ephemeral prompt caching on the large system prompt (knowledge.md). Show the `providerOptions.anthropic.cacheControl` placement. Confirm it works both via the gateway and via the direct provider.\n\n3. **Chat route handler** (`app/api/chat/route.ts`): show the shape using `streamText`, `convertToModelMessages`, `tools`, multi-step (`stopWhen: stepCountIs(...)`), and `toUIMessageStreamResponse()`. Confirm exact import paths in v6 (`ai` vs `@ai-sdk/react`).\n\n4. **Tools** (server-side, AI SDK v6 `tool()` with zod): \n   - `suggest_page(intent)` — maps intent to one of a fixed set of Covalba routes, returns {label, url, reason}. Show the tool definition pattern and how the result is rendered in the UI (tool part rendering in useChat messages).\n   - `capture_lead(nom, email, telephone, besoin, raison)` — POC: console.log + send email via Resend (`resend` npm pkg). Show the Resend call and graceful degradation when RESEND_API_KEY absent.\n   - `log_unanswered(question)` — POC: structured console.log (visible in Vercel logs). \n   \n5. **Frontend chat UI**: minimal `useChat` (v6, `@ai-sdk/react`, DefaultChatTransport → /api/chat). Show how to render text parts AND tool-invocation parts (e.g. render suggest_page as a clickable link/button). Keep it lightweight — no heavy template.\n\n6. **Embeddable widget** (the key deliverable): A `public/widget.js` (vanilla JS, ~100 lines) that, on any host page, injects a discrete floating bubble bottom-right labeled \"Une question ?\". On click it opens an iframe pointing to the app's `/embed` route (a minimal full-screen chat page with no site chrome). Use iframe for CSS isolation from the WordPress host. Pass the host page URL into the iframe (query param) so capture_lead can attach the origin page. Handle open/close + responsive (full-screen on mobile). Show the postMessage pattern for close-from-inside. Confirm CORS/embedding headers needed (the /embed route must allow framing — note frame-ancestors / X-Frame-Options config in next.config or middleware).\n\n7. **Rate limiting**: basic, pluggable. In-memory token bucket by IP for local POC; optional Upstash Redis (@upstash/ratelimit) when env vars present. Note Vercel KV is deprecated. Show where it hooks into the route handler.\n\n8. **File tree** for the `chatbot/` directory: every file to create, with a one-line purpose each.\n\n9. **Env vars** needed (`.env.example`): list them all with a comment each (ANTHROPIC_API_KEY, AI_GATEWAY_API_KEY, RESEND_API_KEY, RESEND_FROM, LEAD_INBOX, UPSTASH_*, NEXT_PUBLIC_SITE_BASE_URL).\n\n10. **Dependencies**: exact npm packages + versions to install (ai, @ai-sdk/react, @ai-sdk/anthropic, zod, resend, @upstash/ratelimit, @upstash/redis, next, react, etc).\n\nThe Covalba site routes the suggest_page tool should map to (real routes): \n- `/` home, `/simulateur-economie-energie`, `/estimation`, `/diagnostic`, `/contact`, `/devenir-applicateur`, `/qui-sommes-nous`, `/bat-en-112` (prime CEE), `/solutions/covatherm`, `/solutions/covatherm-light`, `/solutions/covaseal-20`, `/solutions/covametal-20`, `/guide-cool-roof`, `/faq`, `/references`, and sector pages `/industries/{tertiaire,distribution,logistique,collectivites,agricole,erp}`.\n\nBe concrete and code-forward but concise. Flag anything where the current API differs from what I described. Do not write any files — return the design as your final message."},"uuid":"aa6725ad-73f6-437d-b2cd-92bd9cbac11e","timestamp":"2026-06-07T15:59:05.803Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"aa6725ad-73f6-437d-b2cd-92bd9cbac11e","isSidechain":true,"agentId":"aed0c7b7da27aaf24","attachment":{"type":"deferred_tools_delta","addedNames":["CronCreate","CronDelete","CronList","EnterWorktree","ExitWorktree","ListMcpResourcesTool","Monitor","NotebookEdit","PushNotification","ReadMcpResourceTool","RemoteTrigger","TaskCreate","TaskGet","TaskList","TaskStop","TaskUpdate","WebFetch","WebSearch","mcp__EstimanouContent__add_article_block","mcp__EstimanouContent__add_block","mcp__EstimanouContent__create_article","mcp__EstimanouContent__create_page","mcp__EstimanouContent__delete_article","mcp__EstimanouContent__delete_article_block","mcp__EstimanouContent__delete_block","mcp__EstimanouContent__get_article","mcp__EstimanouContent__get_city_page","mcp__EstimanouContent__get_page","mcp__EstimanouContent__list_articles","mcp__EstimanouContent__list_available_blocks","mcp__EstimanouContent__list_city_pages","mcp__EstimanouContent__list_pages","mcp__EstimanouContent__list_site_urls","mcp__EstimanouContent__reorder_article_blocks","mcp__EstimanouContent__reorder_blocks","mcp__EstimanouContent__replace_article_content","mcp__EstimanouContent__replace_city_seo_content","mcp__EstimanouContent__replace_page_layout","mcp__EstimanouContent__search_internal_links","mcp__EstimanouContent__update_article","mcp__EstimanouContent__update_article_block","mcp__EstimanouContent__update_block","mcp__EstimanouContent__update_city_page_content","mcp__claude_ai_Ahref__batch-analysis","mcp__claude_ai_Ahref__brand-radar-ai-responses","mcp__claude_ai_Ahref__brand-radar-ai-responses-entities","mcp__claude_ai_Ahref__brand-radar-cited-domains","mcp__claude_ai_Ahref__brand-radar-cited-domains-entities","mcp__claude_ai_Ahref__brand-radar-cited-pages","mcp__claude_ai_Ahref__brand-radar-cited-pages-entities","mcp__claude_ai_Ahref__brand-radar-impressions-history","mcp__claude_ai_Ahref__brand-radar-impressions-history-entities","mcp__claude_ai_Ahref__brand-radar-impressions-overview","mcp__claude_ai_Ahref__brand-radar-impressions-overview-entities","mcp__claude_ai_Ahref__brand-radar-mentions-history","mcp__claude_ai_Ahref__brand-radar-mentions-history-entities","mcp__claude_ai_Ahref__brand-radar-mentions-overview","mcp__claude_ai_Ahref__brand-radar-mentions-overview-entities","mcp__claude_ai_Ahref__brand-radar-sov-history","mcp__claude_ai_Ahref__brand-radar-sov-history-entities","mcp__claude_ai_Ahref__brand-radar-sov-overview","mcp__claude_ai_Ahref__brand-radar-sov-overview-entities","mcp__claude_ai_Ahref__doc","mcp__claude_ai_Ahref__gsc-anonymous-queries","mcp__claude_ai_Ahref__gsc-ctr-by-position","mcp__claude_ai_Ahref__gsc-keyword-history","mcp__claude_ai_Ahref__gsc-keywords","mcp__claude_ai_Ahref__gsc-metrics-by-country","mcp__claude_ai_Ahref__gsc-page-history","mcp__claude_ai_Ahref__gsc-pages","mcp__claude_ai_Ahref__gsc-pages-history","mcp__claude_ai_Ahref__gsc-performance-by-device","mcp__claude_ai_Ahref__gsc-performance-by-position","mcp__claude_ai_Ahref__gsc-performance-history","mcp__claude_ai_Ahref__gsc-positions-history","mcp__claude_ai_Ahref__keywords-explorer-matching-terms","mcp__claude_ai_Ahref__keywords-explorer-overview","mcp__claude_ai_Ahref__keywords-explorer-related-terms","mcp__claude_ai_Ahref__keywords-explorer-search-suggestions","mcp__claude_ai_Ahref__keywords-explorer-volume-by-country","mcp__claude_ai_Ahref__keywords-explorer-volume-history","mcp__claude_ai_Ahref__management-brand-radar-prompts","mcp__claude_ai_Ahref__management-brand-radar-reports","mcp__claude_ai_Ahref__management-keyword-list-keywords","mcp__claude_ai_Ahref__management-locations","mcp__claude_ai_Ahref__management-project-competitors","mcp__claude_ai_Ahref__management-project-keywords","mcp__claude_ai_Ahref__management-projects","mcp__claude_ai_Ahref__public-crawler-ip-ranges","mcp__claude_ai_Ahref__public-crawler-ips","mcp__claude_ai_Ahref__rank-tracker-competitors-domains","mcp__claude_ai_Ahref__rank-tracker-competitors-overview","mcp__claude_ai_Ahref__rank-tracker-competitors-pages","mcp__claude_ai_Ahref__rank-tracker-competitors-stats","mcp__claude_ai_Ahref__rank-tracker-overview","mcp__claude_ai_Ahref__rank-tracker-serp-overview","mcp__claude_ai_Ahref__render-data-table","mcp__claude_ai_Ahref__render-scorecard","mcp__claude_ai_Ahref__render-time-series-chart","mcp__claude_ai_Ahref__serp-overview","mcp__claude_ai_Ahref__site-audit-issues","mcp__claude_ai_Ahref__site-audit-page-content","mcp__claude_ai_Ahref__site-audit-page-explorer","mcp__claude_ai_Ahref__site-audit-projects","mcp__claude_ai_Ahref__site-explorer-all-backlinks","mcp__claude_ai_Ahref__site-explorer-anchors","mcp__claude_ai_Ahref__site-explorer-backlinks-stats","mcp__claude_ai_Ahref__site-explorer-broken-backlinks","mcp__claude_ai_Ahref__site-explorer-crawled-pages","mcp__claude_ai_Ahref__site-explorer-domain-rating","mcp__claude_ai_Ahref__site-explorer-domain-rating-history","mcp__claude_ai_Ahref__site-explorer-keywords-history","mcp__claude_ai_Ahref__site-explorer-linked-anchors-external","mcp__claude_ai_Ahref__site-explorer-linked-anchors-internal","mcp__claude_ai_Ahref__site-explorer-linked-domains","mcp__claude_ai_Ahref__site-explorer-metrics","mcp__claude_ai_Ahref__site-explorer-metrics-by-country","mcp__claude_ai_Ahref__site-explorer-metrics-history","mcp__claude_ai_Ahref__site-explorer-organic-competitors","mcp__claude_ai_Ahref__site-explorer-organic-keywords","mcp__claude_ai_Ahref__site-explorer-outlinks-stats","mcp__claude_ai_Ahref__site-explorer-pages-by-backlinks","mcp__claude_ai_Ahref__site-explorer-pages-by-internal-links","mcp__claude_ai_Ahref__site-explorer-pages-by-traffic","mcp__claude_ai_Ahref__site-explorer-pages-history","mcp__claude_ai_Ahref__site-explorer-paid-pages","mcp__claude_ai_Ahref__site-explorer-refdomains-history","mcp__claude_ai_Ahref__site-explorer-referring-domains","mcp__claude_ai_Ahref__site-explorer-top-pages","mcp__claude_ai_Ahref__site-explorer-total-search-volume-history","mcp__claude_ai_Ahref__site-explorer-url-rating-history","mcp__claude_ai_Ahref__social-media-activity-history","mcp__claude_ai_Ahref__social-media-authors","mcp__claude_ai_Ahref__social-media-channel-metrics","mcp__claude_ai_Ahref__social-media-channels","mcp__claude_ai_Ahref__social-media-post-metrics","mcp__claude_ai_Ahref__social-media-posts","mcp__claude_ai_Ahref__subscription-info-limits-and-usage","mcp__claude_ai_Ahref__web-analytics-browser-versions","mcp__claude_ai_Ahref__web-analytics-browser-versions-chart","mcp__claude_ai_Ahref__web-analytics-browsers","mcp__claude_ai_Ahref__web-analytics-browsers-chart","mcp__claude_ai_Ahref__web-analytics-chart","mcp__claude_ai_Ahref__web-analytics-cities","mcp__claude_ai_Ahref__web-analytics-cities-chart","mcp__claude_ai_Ahref__web-analytics-continents","mcp__claude_ai_Ahref__web-analytics-continents-chart","mcp__claude_ai_Ahref__web-analytics-countries","mcp__claude_ai_Ahref__web-analytics-countries-chart","mcp__claude_ai_Ahref__web-analytics-devices","mcp__claude_ai_Ahref__web-analytics-devices-chart","mcp__claude_ai_Ahref__web-analytics-entry-pages","mcp__claude_ai_Ahref__web-analytics-entry-pages-chart","mcp__claude_ai_Ahref__web-analytics-exit-pages","mcp__claude_ai_Ahref__web-analytics-exit-pages-chart","mcp__claude_ai_Ahref__web-analytics-languages","mcp__claude_ai_Ahref__web-analytics-languages-chart","mcp__claude_ai_Ahref__web-analytics-operating-systems","mcp__claude_ai_Ahref__web-analytics-operating-systems-chart","mcp__claude_ai_Ahref__web-analytics-operating-systems-versions","mcp__claude_ai_Ahref__web-analytics-operating-systems-versions-chart","mcp__claude_ai_Ahref__web-analytics-referrers","mcp__claude_ai_Ahref__web-analytics-referrers-chart","mcp__claude_ai_Ahref__web-analytics-source-channels","mcp__claude_ai_Ahref__web-analytics-source-channels-chart","mcp__claude_ai_Ahref__web-analytics-sources","mcp__claude_ai_Ahref__web-analytics-sources-chart","mcp__claude_ai_Ahref__web-analytics-stats","mcp__claude_ai_Ahref__web-analytics-top-pages","mcp__claude_ai_Ahref__web-analytics-top-pages-chart","mcp__claude_ai_Ahref__web-analytics-utm-params","mcp__claude_ai_Ahref__web-analytics-utm-params-chart","mcp__claude_ai_Canva__cancel-editing-transaction","mcp__claude_ai_Canva__comment-on-design","mcp__claude_ai_Canva__commit-editing-transaction","mcp__claude_ai_Canva__copy-design","mcp__claude_ai_Canva__create-design-from-brand-template","mcp__claude_ai_Canva__create-design-from-candidate","mcp__claude_ai_Canva__create-folder","mcp__claude_ai_Canva__export-design","mcp__claude_ai_Canva__generate-design","mcp__claude_ai_Canva__generate-design-structured","mcp__claude_ai_Canva__get-assets","mcp__claude_ai_Canva__get-brand-template-dataset","mcp__claude_ai_Canva__get-design","mcp__claude_ai_Canva__get-design-candidates","mcp__claude_ai_Canva__get-design-content","mcp__claude_ai_Canva__get-design-pages","mcp__claude_ai_Canva__get-design-thumbnail","mcp__claude_ai_Canva__get-export-formats","mcp__claude_ai_Canva__get-presenter-notes","mcp__claude_ai_Canva__help","mcp__claude_ai_Canva__import-design-from-url","mcp__claude_ai_Canva__list-brand-kits","mcp__claude_ai_Canva__list-comments","mcp__claude_ai_Canva__list-folder-items","mcp__claude_ai_Canva__list-replies","mcp__claude_ai_Canva__merge-designs","mcp__claude_ai_Canva__move-item-to-folder","mcp__claude_ai_Canva__perform-editing-operations","mcp__claude_ai_Canva__publish-brand-template","mcp__claude_ai_Canva__reply-to-comment","mcp__claude_ai_Canva__request-outline-review","mcp__claude_ai_Canva__resize-design","mcp__claude_ai_Canva__resolve-shortlink","mcp__claude_ai_Canva__search-brand-templates","mcp__claude_ai_Canva__search-designs","mcp__claude_ai_Canva__search-folders","mcp__claude_ai_Canva__start-editing-transaction","mcp__claude_ai_Canva__upload-asset-from-url","mcp__claude_ai_DATA_GOUV_FR__get_dataservice_info","mcp__claude_ai_DATA_GOUV_FR__get_dataservice_openapi_spec","mcp__claude_ai_DATA_GOUV_FR__get_dataset_info","mcp__claude_ai_DATA_GOUV_FR__get_metrics","mcp__claude_ai_DATA_GOUV_FR__get_resource_info","mcp__claude_ai_DATA_GOUV_FR__list_dataset_resources","mcp__claude_ai_DATA_GOUV_FR__query_resource_data","mcp__claude_ai_DATA_GOUV_FR__search_dataservices","mcp__claude_ai_DATA_GOUV_FR__search_datasets","mcp__claude_ai_DATA_GOUV_FR__search_organizations","mcp__claude_ai_Data_For_SEO__authenticate","mcp__claude_ai_Data_For_SEO__complete_authentication","mcp__claude_ai_Esteworld-Body__add_block","mcp__claude_ai_Esteworld-Body__delete_block","mcp__claude_ai_Esteworld-Body__find_esteworld_page","mcp__claude_ai_Esteworld-Body__fix_global_blocks","mcp__claude_ai_Esteworld-Body__get_block","mcp__claude_ai_Esteworld-Body__get_blocks_library","mcp__claude_ai_Esteworld-Body__get_breadcrumb","mcp__claude_ai_Esteworld-Body__get_fr_en_mapping","mcp__claude_ai_Esteworld-Body__get_internal_links","mcp__claude_ai_Esteworld-Body__get_internal_links_en","mcp__claude_ai_Esteworld-Body__get_maillage_rules","mcp__claude_ai_Esteworld-Body__get_page","mcp__claude_ai_Esteworld-Body__get_page_json","mcp__claude_ai_Esteworld-Body__get_page_status","mcp__claude_ai_Esteworld-Body__get_redirections","mcp__claude_ai_Esteworld-Body__get_section_content","mcp__claude_ai_Esteworld-Body__get_thematique_pages","mcp__claude_ai_Esteworld-Body__get_translation_languages","mcp__claude_ai_Esteworld-Body__list_pages","mcp__claude_ai_Esteworld-Body__list_pages_by_thematique","mcp__claude_ai_Esteworld-Body__list_pages_by_type","mcp__claude_ai_Esteworld-Body__list_pages_to_create","mcp__claude_ai_Esteworld-Body__list_pages_to_rewrite","mcp__claude_ai_Esteworld-Body__list_pages_with_esteworld","mcp__claude_ai_Esteworld-Body__list_pages_without_esteworld","mcp__claude_ai_Esteworld-Body__n8n_activate_workflow","mcp__claude_ai_Esteworld-Body__n8n_get_executions","mcp__claude_ai_Esteworld-Body__n8n_get_workflow","mcp__claude_ai_Esteworld-Body__n8n_list_workflows","mcp__claude_ai_Esteworld-Body__n8n_trigger_webhook","mcp__claude_ai_Esteworld-Body__preview_global_fixes","mcp__claude_ai_Esteworld-Body__reorder_blocks","mcp__claude_ai_Esteworld-Body__search_content","mcp__claude_ai_Esteworld-Body__search_links_by_keyword","mcp__claude_ai_Esteworld-Body__search_links_by_keyword_en","mcp__claude_ai_Esteworld-Body__submit_page","mcp__claude_ai_Esteworld-Body__submit_page_light","mcp__claude_ai_Esteworld-Body__translate_page","mcp__claude_ai_Esteworld-Body__update_block","mcp__claude_ai_Figma__add_code_connect_map","mcp__claude_ai_Figma__create_new_file","mcp__claude_ai_Figma__generate_diagram","mcp__claude_ai_Figma__get_code_connect_map","mcp__claude_ai_Figma__get_code_connect_suggestions","mcp__claude_ai_Figma__get_context_for_code_connect","mcp__claude_ai_Figma__get_design_context","mcp__claude_ai_Figma__get_figjam","mcp__claude_ai_Figma__get_libraries","mcp__claude_ai_Figma__get_metadata","mcp__claude_ai_Figma__get_screenshot","mcp__claude_ai_Figma__get_variable_defs","mcp__claude_ai_Figma__search_design_system","mcp__claude_ai_Figma__send_code_connect_mappings","mcp__claude_ai_Figma__upload_assets","mcp__claude_ai_Figma__use_figma","mcp__claude_ai_Figma__whoami","mcp__claude_ai_Gmail__create_draft","mcp__claude_ai_Gmail__create_label","mcp__claude_ai_Gmail__delete_label","mcp__claude_ai_Gmail__get_thread","mcp__claude_ai_Gmail__label_message","mcp__claude_ai_Gmail__label_thread","mcp__claude_ai_Gmail__list_drafts","mcp__claude_ai_Gmail__list_labels","mcp__claude_ai_Gmail__search_threads","mcp__claude_ai_Gmail__unlabel_message","mcp__claude_ai_Gmail__unlabel_thread","mcp__claude_ai_Gmail__update_label","mcp__claude_ai_Google_Calendar__create_event","mcp__claude_ai_Google_Calendar__delete_event","mcp__claude_ai_Google_Calendar__get_event","mcp__claude_ai_Google_Calendar__list_calendars","mcp__claude_ai_Google_Calendar__list_events","mcp__claude_ai_Google_Calendar__respond_to_event","mcp__claude_ai_Google_Calendar__suggest_time","mcp__claude_ai_Google_Calendar__update_event","mcp__claude_ai_Google_Drive__copy_file","mcp__claude_ai_Google_Drive__create_file","mcp__claude_ai_Google_Drive__download_file_content","mcp__claude_ai_Google_Drive__get_file_metadata","mcp__claude_ai_Google_Drive__get_file_permissions","mcp__claude_ai_Google_Drive__list_recent_files","mcp__claude_ai_Google_Drive__read_file_content","mcp__claude_ai_Google_Drive__search_files","mcp__claude_ai_SEMRush__backlink_research","mcp__claude_ai_SEMRush__execute_report","mcp__claude_ai_SEMRush__get_report_schema","mcp__claude_ai_SEMRush__keyword_research","mcp__claude_ai_SEMRush__organic_research","mcp__claude_ai_SEMRush__overview_research","mcp__claude_ai_SEMRush__projects_research","mcp__claude_ai_SEMRush__siteaudit_research","mcp__claude_ai_SEMRush__subdomain_research","mcp__claude_ai_SEMRush__subfolder_research","mcp__claude_ai_SEMRush__tracking_research","mcp__claude_ai_SEMRush__trends_research","mcp__claude_ai_SEMRush__url_research","mcp__claude_ai_Vercel__add_toolbar_reaction","mcp__claude_ai_Vercel__change_toolbar_thread_resolve_status","mcp__claude_ai_Vercel__check_domain_availability_and_price","mcp__claude_ai_Vercel__deploy_to_vercel","mcp__claude_ai_Vercel__edit_toolbar_message","mcp__claude_ai_Vercel__get_access_to_vercel_url","mcp__claude_ai_Vercel__get_deployment","mcp__claude_ai_Vercel__get_deployment_build_logs","mcp__claude_ai_Vercel__get_project","mcp__claude_ai_Vercel__get_runtime_logs","mcp__claude_ai_Vercel__get_toolbar_thread","mcp__claude_ai_Vercel__list_deployments","mcp__claude_ai_Vercel__list_projects","mcp__claude_ai_Vercel__list_teams","mcp__claude_ai_Vercel__list_toolbar_threads","mcp__claude_ai_Vercel__reply_to_toolbar_thread","mcp__claude_ai_Vercel__search_vercel_documentation","mcp__claude_ai_Vercel__web_fetch_vercel_url","mcp__gpt-image-2__continue_edit_session","mcp__gpt-image-2__edit_image","mcp__gpt-image-2__end_edit_session","mcp__gpt-image-2__generate_image","mcp__gpt-image-2__list_edit_sessions","mcp__gpt-image-2__start_edit_session","mcp__hostinger__DNS_deleteDNSRecordsV1","mcp__hostinger__DNS_getDNSRecordsV1","mcp__hostinger__DNS_getDNSSnapshotListV1","mcp__hostinger__DNS_getDNSSnapshotV1","mcp__hostinger__DNS_resetDNSRecordsV1","mcp__hostinger__DNS_restoreDNSSnapshotV1","mcp__hostinger__DNS_updateDNSRecordsV1","mcp__hostinger__DNS_validateDNSRecordsV1","mcp__hostinger__VPS_activateFirewallV1","mcp__hostinger__VPS_attachPublicKeyV1","mcp__hostinger__VPS_createFirewallRuleV1","mcp__hostinger__VPS_createNewFirewallV1","mcp__hostinger__VPS_createNewProjectV1","mcp__hostinger__VPS_createPTRRecordV1","mcp__hostinger__VPS_createPostInstallScriptV1","mcp__hostinger__VPS_createPublicKeyV1","mcp__hostinger__VPS_createSnapshotV1","mcp__hostinger__VPS_deactivateFirewallV1","mcp__hostinger__VPS_deleteFirewallRuleV1","mcp__hostinger__VPS_deleteFirewallV1","mcp__hostinger__VPS_deletePTRRecordV1","mcp__hostinger__VPS_deletePostInstallScriptV1","mcp__hostinger__VPS_deleteProjectV1","mcp__hostinger__VPS_deletePublicKeyV1","mcp__hostinger__VPS_deleteSnapshotV1","mcp__hostinger__VPS_getActionDetailsV1","mcp__hostinger__VPS_getActionsV1","mcp__hostinger__VPS_getAttachedPublicKeysV1","mcp__hostinger__VPS_getBackupsV1","mcp__hostinger__VPS_getDataCenterListV1","mcp__hostinger__VPS_getFirewallDetailsV1","mcp__hostinger__VPS_getFirewallListV1","mcp__hostinger__VPS_getMetricsV1","mcp__hostinger__VPS_getPostInstallScriptV1","mcp__hostinger__VPS_getPostInstallScriptsV1","mcp__hostinger__VPS_getProjectContainersV1","mcp__hostinger__VPS_getProjectContentsV1","mcp__hostinger__VPS_getProjectListV1","mcp__hostinger__VPS_getProjectLogsV1","mcp__hostinger__VPS_getPublicKeysV1","mcp__hostinger__VPS_getScanMetricsV1","mcp__hostinger__VPS_getSnapshotV1","mcp__hostinger__VPS_getTemplateDetailsV1","mcp__hostinger__VPS_getTemplatesV1","mcp__hostinger__VPS_getVirtualMachineDetailsV1","mcp__hostinger__VPS_getVirtualMachinesV1","mcp__hostinger__VPS_installMonarxV1","mcp__hostinger__VPS_purchaseNewVirtualMachineV1","mcp__hostinger__VPS_recreateVirtualMachineV1","mcp__hostinger__VPS_resetHostnameV1","mcp__hostinger__VPS_restartProjectV1","mcp__hostinger__VPS_restartVirtualMachineV1","mcp__hostinger__VPS_restoreBackupV1","mcp__hostinger__VPS_restoreSnapshotV1","mcp__hostinger__VPS_setHostnameV1","mcp__hostinger__VPS_setNameserversV1","mcp__hostinger__VPS_setPanelPasswordV1","mcp__hostinger__VPS_setRootPasswordV1","mcp__hostinger__VPS_setupPurchasedVirtualMachineV1","mcp__hostinger__VPS_startProjectV1","mcp__hostinger__VPS_startRecoveryModeV1","mcp__hostinger__VPS_startVirtualMachineV1","mcp__hostinger__VPS_stopProjectV1","mcp__hostinger__VPS_stopRecoveryModeV1","mcp__hostinger__VPS_stopVirtualMachineV1","mcp__hostinger__VPS_syncFirewallV1","mcp__hostinger__VPS_uninstallMonarxV1","mcp__hostinger__VPS_updateFirewallRuleV1","mcp__hostinger__VPS_updatePostInstallScriptV1","mcp__hostinger__VPS_updateProjectV1","mcp__hostinger__billing_deletePaymentMethodV1","mcp__hostinger__billing_disableAutoRenewalV1","mcp__hostinger__billing_enableAutoRenewalV1","mcp__hostinger__billing_getCatalogItemListV1","mcp__hostinger__billing_getPaymentMethodListV1","mcp__hostinger__billing_getSubscriptionListV1","mcp__hostinger__billing_setDefaultPaymentMethodV1","mcp__hostinger__domains_checkDomainAvailabilityV1","mcp__hostinger__domains_createDomainForwardingV1","mcp__hostinger__domains_createWHOISProfileV1","mcp__hostinger__domains_deleteDomainForwardingV1","mcp__hostinger__domains_deleteWHOISProfileV1","mcp__hostinger__domains_disableDomainLockV1","mcp__hostinger__domains_disablePrivacyProtectionV1","mcp__hostinger__domains_enableDomainLockV1","mcp__hostinger__domains_enablePrivacyProtectionV1","mcp__hostinger__domains_getDomainDetailsV1","mcp__hostinger__domains_getDomainForwardingV1","mcp__hostinger__domains_getDomainListV1","mcp__hostinger__domains_getWHOISProfileListV1","mcp__hostinger__domains_getWHOISProfileUsageV1","mcp__hostinger__domains_getWHOISProfileV1","mcp__hostinger__domains_purchaseNewDomainV1","mcp__hostinger__domains_updateDomainNameserversV1","mcp__hostinger__hosting_createAccountDatabaseV1","mcp__hostinger__hosting_createWebsiteParkedDomainV1","mcp__hostinger__hosting_createWebsiteSubdomainV1","mcp__hostinger__hosting_createWebsiteV1","mcp__hostinger__hosting_deleteAccountDatabaseV1","mcp__hostinger__hosting_deleteWebsiteParkedDomainV1","mcp__hostinger__hosting_deleteWebsiteSubdomainV1","mcp__hostinger__hosting_deployJsApplication","mcp__hostinger__hosting_deployStaticWebsite","mcp__hostinger__hosting_deployWordpressPlugin","mcp__hostinger__hosting_deployWordpressTheme","mcp__hostinger__hosting_generateAFreeSubdomainV1","mcp__hostinger__hosting_importWordpressWebsite","mcp__hostinger__hosting_installWordPressV1","mcp__hostinger__hosting_listAccountDatabasesV1","mcp__hostinger__hosting_listAvailableDatacentersV1","mcp__hostinger__hosting_listJsDeployments","mcp__hostinger__hosting_listOrdersV1","mcp__hostinger__hosting_listWebsiteParkedDomainsV1","mcp__hostinger__hosting_listWebsiteSubdomainsV1","mcp__hostinger__hosting_listWebsitesV1","mcp__hostinger__hosting_listWordPressInstallationsV1","mcp__hostinger__hosting_showJsDeploymentLogs","mcp__hostinger__hosting_verifyDomainOwnershipV1","mcp__hostinger__reach_createANewContactSegmentV1","mcp__hostinger__reach_createANewContactV1","mcp__hostinger__reach_createNewContactsV1","mcp__hostinger__reach_deleteAContactV1","mcp__hostinger__reach_getSegmentDetailsV1","mcp__hostinger__reach_listContactGroupsV1","mcp__hostinger__reach_listContactsV1","mcp__hostinger__reach_listProfilesV1","mcp__hostinger__reach_listSegmentContactsV1","mcp__hostinger__reach_listSegmentsV1","mcp__hostinger__v2_getDomainVerificationsDIRECT","mcp__n8n-mcp__get_node","mcp__n8n-mcp__get_template","mcp__n8n-mcp__n8n_audit_instance","mcp__n8n-mcp__n8n_autofix_workflow","mcp__n8n-mcp__n8n_create_workflow","mcp__n8n-mcp__n8n_delete_workflow","mcp__n8n-mcp__n8n_deploy_template","mcp__n8n-mcp__n8n_executions","mcp__n8n-mcp__n8n_generate_workflow","mcp__n8n-mcp__n8n_get_workflow","mcp__n8n-mcp__n8n_health_check","mcp__n8n-mcp__n8n_list_workflows","mcp__n8n-mcp__n8n_manage_credentials","mcp__n8n-mcp__n8n_manage_datatable","mcp__n8n-mcp__n8n_test_workflow","mcp__n8n-mcp__n8n_update_full_workflow","mcp__n8n-mcp__n8n_update_partial_workflow","mcp__n8n-mcp__n8n_validate_workflow","mcp__n8n-mcp__n8n_workflow_versions","mcp__n8n-mcp__search_nodes","mcp__n8n-mcp__search_templates","mcp__n8n-mcp__tools_documentation","mcp__n8n-mcp__validate_node","mcp__n8n-mcp__validate_workflow","mcp__nanobanana__generate_image","mcp__nanobanana__maintenance","mcp__nanobanana__show_output_stats","mcp__nanobanana__upload_file","mcp__plugin_vercel_vercel__authenticate","mcp__plugin_vercel_vercel__complete_authentication","mcp__refero__refero_get_flow","mcp__refero__refero_get_screen","mcp__refero__refero_get_screen_image","mcp__refero__refero_get_similar_screens","mcp__refero__refero_get_style","mcp__refero__refero_search_flows","mcp__refero__refero_search_screens","mcp__refero__refero_search_styles","mcp__semrush__authenticate","mcp__semrush__complete_authentication","mcp__stock-images-mcp__search_stock_images","mcp__trello__add_card_to_list","mcp__trello__add_checklist_item","mcp__trello__add_comment","mcp__trello__add_list_to_board","mcp__trello__archive_card","mcp__trello__archive_list","mcp__trello__assign_member_to_card","mcp__trello__attach_file_to_card","mcp__trello__attach_image_data_to_card","mcp__trello__attach_image_to_card","mcp__trello__create_board","mcp__trello__create_checklist","mcp__trello__create_label","mcp__trello__delete_comment","mcp__trello__delete_label","mcp__trello__find_checklist_items_by_description","mcp__trello__get_acceptance_criteria","mcp__trello__get_active_board_info","mcp__trello__get_board_labels","mcp__trello__get_board_members","mcp__trello__get_card","mcp__trello__get_card_comments","mcp__trello__get_card_history","mcp__trello__get_cards_by_list_id","mcp__trello__get_checklist_by_name","mcp__trello__get_checklist_items","mcp__trello__get_health","mcp__trello__get_health_detailed","mcp__trello__get_health_metadata","mcp__trello__get_health_performance","mcp__trello__get_lists","mcp__trello__get_my_cards","mcp__trello__get_recent_activity","mcp__trello__list_boards","mcp__trello__list_boards_in_workspace","mcp__trello__list_workspaces","mcp__trello__move_card","mcp__trello__perform_system_repair","mcp__trello__remove_member_from_card","mcp__trello__set_active_board","mcp__trello__set_active_workspace","mcp__trello__update_card_details","mcp__trello__update_checklist_item","mcp__trello__update_comment","mcp__trello__update_label"],"addedLines":["CronCreate","CronDelete","CronList","EnterWorktree","ExitWorktree","ListMcpResourcesTool","Monitor","NotebookEdit","PushNotification","ReadMcpResourceTool","RemoteTrigger","TaskCreate","TaskGet","TaskList","TaskStop","TaskUpdate","WebFetch","WebSearch","mcp__EstimanouContent__add_article_block","mcp__EstimanouContent__add_block","mcp__EstimanouContent__create_article","mcp__EstimanouContent__create_page","mcp__EstimanouContent__delete_article","mcp__EstimanouContent__delete_article_block","mcp__EstimanouContent__delete_block","mcp__EstimanouContent__get_article","mcp__EstimanouContent__get_city_page","mcp__EstimanouContent__get_page","mcp__EstimanouContent__list_articles","mcp__EstimanouContent__list_available_blocks","mcp__EstimanouContent__list_city_pages","mcp__EstimanouContent__list_pages","mcp__EstimanouContent__list_site_urls","mcp__EstimanouContent__reorder_article_blocks","mcp__EstimanouContent__reorder_blocks","mcp__EstimanouContent__replace_article_content","mcp__EstimanouContent__replace_city_seo_content","mcp__EstimanouContent__replace_page_layout","mcp__EstimanouContent__search_internal_links","mcp__EstimanouContent__update_article","mcp__EstimanouContent__update_article_block","mcp__EstimanouContent__update_block","mcp__EstimanouContent__update_city_page_content","mcp__claude_ai_Ahref__batch-analysis","mcp__claude_ai_Ahref__brand-radar-ai-responses","mcp__claude_ai_Ahref__brand-radar-ai-responses-entities","mcp__claude_ai_Ahref__brand-radar-cited-domains","mcp__claude_ai_Ahref__brand-radar-cited-domains-entities","mcp__claude_ai_Ahref__brand-radar-cited-pages","mcp__claude_ai_Ahref__brand-radar-cited-pages-entities","mcp__claude_ai_Ahref__brand-radar-impressions-history","mcp__claude_ai_Ahref__brand-radar-impressions-history-entities","mcp__claude_ai_Ahref__brand-radar-impressions-overview","mcp__claude_ai_Ahref__brand-radar-impressions-overview-entities","mcp__claude_ai_Ahref__brand-radar-mentions-history","mcp__claude_ai_Ahref__brand-radar-mentions-history-entities","mcp__claude_ai_Ahref__brand-radar-mentions-overview","mcp__claude_ai_Ahref__brand-radar-mentions-overview-entities","mcp__claude_ai_Ahref__brand-radar-sov-history","mcp__claude_ai_Ahref__brand-radar-sov-history-entities","mcp__claude_ai_Ahref__brand-radar-sov-overview","mcp__claude_ai_Ahref__brand-radar-sov-overview-entities","mcp__claude_ai_Ahref__doc","mcp__claude_ai_Ahref__gsc-anonymous-queries","mcp__claude_ai_Ahref__gsc-ctr-by-position","mcp__claude_ai_Ahref__gsc-keyword-history","mcp__claude_ai_Ahref__gsc-keywords","mcp__claude_ai_Ahref__gsc-metrics-by-country","mcp__claude_ai_Ahref__gsc-page-history","mcp__claude_ai_Ahref__gsc-pages","mcp__claude_ai_Ahref__gsc-pages-history","mcp__claude_ai_Ahref__gsc-performance-by-device","mcp__claude_ai_Ahref__gsc-performance-by-position","mcp__claude_ai_Ahref__gsc-performance-history","mcp__claude_ai_Ahref__gsc-positions-history","mcp__claude_ai_Ahref__keywords-explorer-matching-terms","mcp__claude_ai_Ahref__keywords-explorer-overview","mcp__claude_ai_Ahref__keywords-explorer-related-terms","mcp__claude_ai_Ahref__keywords-explorer-search-suggestions","mcp__claude_ai_Ahref__keywords-explorer-volume-by-country","mcp__claude_ai_Ahref__keywords-explorer-volume-history","mcp__claude_ai_Ahref__management-brand-radar-prompts","mcp__claude_ai_Ahref__management-brand-radar-reports","mcp__claude_ai_Ahref__management-keyword-list-keywords","mcp__claude_ai_Ahref__management-locations","mcp__claude_ai_Ahref__management-project-competitors","mcp__claude_ai_Ahref__management-project-keywords","mcp__claude_ai_Ahref__management-projects","mcp__claude_ai_Ahref__public-crawler-ip-ranges","mcp__claude_ai_Ahref__public-crawler-ips","mcp__claude_ai_Ahref__rank-tracker-competitors-domains","mcp__claude_ai_Ahref__rank-tracker-competitors-overview","mcp__claude_ai_Ahref__rank-tracker-competitors-pages","mcp__claude_ai_Ahref__rank-tracker-competitors-stats","mcp__claude_ai_Ahref__rank-tracker-overview","mcp__claude_ai_Ahref__rank-tracker-serp-overview","mcp__claude_ai_Ahref__render-data-table","mcp__claude_ai_Ahref__render-scorecard","mcp__claude_ai_Ahref__render-time-series-chart","mcp__claude_ai_Ahref__serp-overview","mcp__claude_ai_Ahref__site-audit-issues","mcp__claude_ai_Ahref__site-audit-page-content","mcp__claude_ai_Ahref__site-audit-page-explorer","mcp__claude_ai_Ahref__site-audit-projects","mcp__claude_ai_Ahref__site-explorer-all-backlinks","mcp__claude_ai_Ahref__site-explorer-anchors","mcp__claude_ai_Ahref__site-explorer-backlinks-stats","mcp__claude_ai_Ahref__site-explorer-broken-backlinks","mcp__claude_ai_Ahref__site-explorer-crawled-pages","mcp__claude_ai_Ahref__site-explorer-domain-rating","mcp__claude_ai_Ahref__site-explorer-domain-rating-history","mcp__claude_ai_Ahref__site-explorer-keywords-history","mcp__claude_ai_Ahref__site-explorer-linked-anchors-external","mcp__claude_ai_Ahref__site-explorer-linked-anchors-internal","mcp__claude_ai_Ahref__site-explorer-linked-domains","mcp__claude_ai_Ahref__site-explorer-metrics","mcp__claude_ai_Ahref__site-explorer-metrics-by-country","mcp__claude_ai_Ahref__site-explorer-metrics-history","mcp__claude_ai_Ahref__site-explorer-organic-competitors","mcp__claude_ai_Ahref__site-explorer-organic-keywords","mcp__claude_ai_Ahref__site-explorer-outlinks-stats","mcp__claude_ai_Ahref__site-explorer-pages-by-backlinks","mcp__claude_ai_Ahref__site-explorer-pages-by-internal-links","mcp__claude_ai_Ahref__site-explorer-pages-by-traffic","mcp__claude_ai_Ahref__site-explorer-pages-history","mcp__claude_ai_Ahref__site-explorer-paid-pages","mcp__claude_ai_Ahref__site-explorer-refdomains-history","mcp__claude_ai_Ahref__site-explorer-referring-domains","mcp__claude_ai_Ahref__site-explorer-top-pages","mcp__claude_ai_Ahref__site-explorer-total-search-volume-history","mcp__claude_ai_Ahref__site-explorer-url-rating-history","mcp__claude_ai_Ahref__social-media-activity-history","mcp__claude_ai_Ahref__social-media-authors","mcp__claude_ai_Ahref__social-media-channel-metrics","mcp__claude_ai_Ahref__social-media-channels","mcp__claude_ai_Ahref__social-media-post-metrics","mcp__claude_ai_Ahref__social-media-posts","mcp__claude_ai_Ahref__subscription-info-limits-and-usage","mcp__claude_ai_Ahref__web-analytics-browser-versions","mcp__claude_ai_Ahref__web-analytics-browser-versions-chart","mcp__claude_ai_Ahref__web-analytics-browsers","mcp__claude_ai_Ahref__web-analytics-browsers-chart","mcp__claude_ai_Ahref__web-analytics-chart","mcp__claude_ai_Ahref__web-analytics-cities","mcp__claude_ai_Ahref__web-analytics-cities-chart","mcp__claude_ai_Ahref__web-analytics-continents","mcp__claude_ai_Ahref__web-analytics-continents-chart","mcp__claude_ai_Ahref__web-analytics-countries","mcp__claude_ai_Ahref__web-analytics-countries-chart","mcp__claude_ai_Ahref__web-analytics-devices","mcp__claude_ai_Ahref__web-analytics-devices-chart","mcp__claude_ai_Ahref__web-analytics-entry-pages","mcp__claude_ai_Ahref__web-analytics-entry-pages-chart","mcp__claude_ai_Ahref__web-analytics-exit-pages","mcp__claude_ai_Ahref__web-analytics-exit-pages-chart","mcp__claude_ai_Ahref__web-analytics-languages","mcp__claude_ai_Ahref__web-analytics-languages-chart","mcp__claude_ai_Ahref__web-analytics-operating-systems","mcp__claude_ai_Ahref__web-analytics-operating-systems-chart","mcp__claude_ai_Ahref__web-analytics-operating-systems-versions","mcp__claude_ai_Ahref__web-analytics-operating-systems-versions-chart","mcp__claude_ai_Ahref__web-analytics-referrers","mcp__claude_ai_Ahref__web-analytics-referrers-chart","mcp__claude_ai_Ahref__web-analytics-source-channels","mcp__claude_ai_Ahref__web-analytics-source-channels-chart","mcp__claude_ai_Ahref__web-analytics-sources","mcp__claude_ai_Ahref__web-analytics-sources-chart","mcp__claude_ai_Ahref__web-analytics-stats","mcp__claude_ai_Ahref__web-analytics-top-pages","mcp__claude_ai_Ahref__web-analytics-top-pages-chart","mcp__claude_ai_Ahref__web-analytics-utm-params","mcp__claude_ai_Ahref__web-analytics-utm-params-chart","mcp__claude_ai_Canva__cancel-editing-transaction","mcp__claude_ai_Canva__comment-on-design","mcp__claude_ai_Canva__commit-editing-transaction","mcp__claude_ai_Canva__copy-design","mcp__claude_ai_Canva__create-design-from-brand-template","mcp__claude_ai_Canva__create-design-from-candidate","mcp__claude_ai_Canva__create-folder","mcp__claude_ai_Canva__export-design","mcp__claude_ai_Canva__generate-design","mcp__claude_ai_Canva__generate-design-structured","mcp__claude_ai_Canva__get-assets","mcp__claude_ai_Canva__get-brand-template-dataset","mcp__claude_ai_Canva__get-design","mcp__claude_ai_Canva__get-design-candidates","mcp__claude_ai_Canva__get-design-content","mcp__claude_ai_Canva__get-design-pages","mcp__claude_ai_Canva__get-design-thumbnail","mcp__claude_ai_Canva__get-export-formats","mcp__claude_ai_Canva__get-presenter-notes","mcp__claude_ai_Canva__help","mcp__claude_ai_Canva__import-design-from-url","mcp__claude_ai_Canva__list-brand-kits","mcp__claude_ai_Canva__list-comments","mcp__claude_ai_Canva__list-folder-items","mcp__claude_ai_Canva__list-replies","mcp__claude_ai_Canva__merge-designs","mcp__claude_ai_Canva__move-item-to-folder","mcp__claude_ai_Canva__perform-editing-operations","mcp__claude_ai_Canva__publish-brand-template","mcp__claude_ai_Canva__reply-to-comment","mcp__claude_ai_Canva__request-outline-review","mcp__claude_ai_Canva__resize-design","mcp__claude_ai_Canva__resolve-shortlink","mcp__claude_ai_Canva__search-brand-templates","mcp__claude_ai_Canva__search-designs","mcp__claude_ai_Canva__search-folders","mcp__claude_ai_Canva__start-editing-transaction","mcp__claude_ai_Canva__upload-asset-from-url","mcp__claude_ai_DATA_GOUV_FR__get_dataservice_info","mcp__claude_ai_DATA_GOUV_FR__get_dataservice_openapi_spec","mcp__claude_ai_DATA_GOUV_FR__get_dataset_info","mcp__claude_ai_DATA_GOUV_FR__get_metrics","mcp__claude_ai_DATA_GOUV_FR__get_resource_info","mcp__claude_ai_DATA_GOUV_FR__list_dataset_resources","mcp__claude_ai_DATA_GOUV_FR__query_resource_data","mcp__claude_ai_DATA_GOUV_FR__search_dataservices","mcp__claude_ai_DATA_GOUV_FR__search_datasets","mcp__claude_ai_DATA_GOUV_FR__search_organizations","mcp__claude_ai_Data_For_SEO__authenticate","mcp__claude_ai_Data_For_SEO__complete_authentication","mcp__claude_ai_Esteworld-Body__add_block","mcp__claude_ai_Esteworld-Body__delete_block","mcp__claude_ai_Esteworld-Body__find_esteworld_page","mcp__claude_ai_Esteworld-Body__fix_global_blocks","mcp__claude_ai_Esteworld-Body__get_block","mcp__claude_ai_Esteworld-Body__get_blocks_library","mcp__claude_ai_Esteworld-Body__get_breadcrumb","mcp__claude_ai_Esteworld-Body__get_fr_en_mapping","mcp__claude_ai_Esteworld-Body__get_internal_links","mcp__claude_ai_Esteworld-Body__get_internal_links_en","mcp__claude_ai_Esteworld-Body__get_maillage_rules","mcp__claude_ai_Esteworld-Body__get_page","mcp__claude_ai_Esteworld-Body__get_page_json","mcp__claude_ai_Esteworld-Body__get_page_status","mcp__claude_ai_Esteworld-Body__get_redirections","mcp__claude_ai_Esteworld-Body__get_section_content","mcp__claude_ai_Esteworld-Body__get_thematique_pages","mcp__claude_ai_Esteworld-Body__get_translation_languages","mcp__claude_ai_Esteworld-Body__list_pages","mcp__claude_ai_Esteworld-Body__list_pages_by_thematique","mcp__claude_ai_Esteworld-Body__list_pages_by_type","mcp__claude_ai_Esteworld-Body__list_pages_to_create","mcp__claude_ai_Esteworld-Body__list_pages_to_rewrite","mcp__claude_ai_Esteworld-Body__list_pages_with_esteworld","mcp__claude_ai_Esteworld-Body__list_pages_without_esteworld","mcp__claude_ai_Esteworld-Body__n8n_activate_workflow","mcp__claude_ai_Esteworld-Body__n8n_get_executions","mcp__claude_ai_Esteworld-Body__n8n_get_workflow","mcp__claude_ai_Esteworld-Body__n8n_list_workflows","mcp__claude_ai_Esteworld-Body__n8n_trigger_webhook","mcp__claude_ai_Esteworld-Body__preview_global_fixes","mcp__claude_ai_Esteworld-Body__reorder_blocks","mcp__claude_ai_Esteworld-Body__search_content","mcp__claude_ai_Esteworld-Body__search_links_by_keyword","mcp__claude_ai_Esteworld-Body__search_links_by_keyword_en","mcp__claude_ai_Esteworld-Body__submit_page","mcp__claude_ai_Esteworld-Body__submit_page_light","mcp__claude_ai_Esteworld-Body__translate_page","mcp__claude_ai_Esteworld-Body__update_block","mcp__claude_ai_Figma__add_code_connect_map","mcp__claude_ai_Figma__create_new_file","mcp__claude_ai_Figma__generate_diagram","mcp__claude_ai_Figma__get_code_connect_map","mcp__claude_ai_Figma__get_code_connect_suggestions","mcp__claude_ai_Figma__get_context_for_code_connect","mcp__claude_ai_Figma__get_design_context","mcp__claude_ai_Figma__get_figjam","mcp__claude_ai_Figma__get_libraries","mcp__claude_ai_Figma__get_metadata","mcp__claude_ai_Figma__get_screenshot","mcp__claude_ai_Figma__get_variable_defs","mcp__claude_ai_Figma__search_design_system","mcp__claude_ai_Figma__send_code_connect_mappings","mcp__claude_ai_Figma__upload_assets","mcp__claude_ai_Figma__use_figma","mcp__claude_ai_Figma__whoami","mcp__claude_ai_Gmail__create_draft","mcp__claude_ai_Gmail__create_label","mcp__claude_ai_Gmail__delete_label","mcp__claude_ai_Gmail__get_thread","mcp__claude_ai_Gmail__label_message","mcp__claude_ai_Gmail__label_thread","mcp__claude_ai_Gmail__list_drafts","mcp__claude_ai_Gmail__list_labels","mcp__claude_ai_Gmail__search_threads","mcp__claude_ai_Gmail__unlabel_message","mcp__claude_ai_Gmail__unlabel_thread","mcp__claude_ai_Gmail__update_label","mcp__claude_ai_Google_Calendar__create_event","mcp__claude_ai_Google_Calendar__delete_event","mcp__claude_ai_Google_Calendar__get_event","mcp__claude_ai_Google_Calendar__list_calendars","mcp__claude_ai_Google_Calendar__list_events","mcp__claude_ai_Google_Calendar__respond_to_event","mcp__claude_ai_Google_Calendar__suggest_time","mcp__claude_ai_Google_Calendar__update_event","mcp__claude_ai_Google_Drive__copy_file","mcp__claude_ai_Google_Drive__create_file","mcp__claude_ai_Google_Drive__download_file_content","mcp__claude_ai_Google_Drive__get_file_metadata","mcp__claude_ai_Google_Drive__get_file_permissions","mcp__claude_ai_Google_Drive__list_recent_files","mcp__claude_ai_Google_Drive__read_file_content","mcp__claude_ai_Google_Drive__search_files","mcp__claude_ai_SEMRush__backlink_research","mcp__claude_ai_SEMRush__execute_report","mcp__claude_ai_SEMRush__get_report_schema","mcp__claude_ai_SEMRush__keyword_research","mcp__claude_ai_SEMRush__organic_research","mcp__claude_ai_SEMRush__overview_research","mcp__claude_ai_SEMRush__projects_research","mcp__claude_ai_SEMRush__siteaudit_research","mcp__claude_ai_SEMRush__subdomain_research","mcp__claude_ai_SEMRush__subfolder_research","mcp__claude_ai_SEMRush__tracking_research","mcp__claude_ai_SEMRush__trends_research","mcp__claude_ai_SEMRush__url_research","mcp__claude_ai_Vercel__add_toolbar_reaction","mcp__claude_ai_Vercel__change_toolbar_thread_resolve_status","mcp__claude_ai_Vercel__check_domain_availability_and_price","mcp__claude_ai_Vercel__deploy_to_vercel","mcp__claude_ai_Vercel__edit_toolbar_message","mcp__claude_ai_Vercel__get_access_to_vercel_url","mcp__claude_ai_Vercel__get_deployment","mcp__claude_ai_Vercel__get_deployment_build_logs","mcp__claude_ai_Vercel__get_project","mcp__claude_ai_Vercel__get_runtime_logs","mcp__claude_ai_Vercel__get_toolbar_thread","mcp__claude_ai_Vercel__list_deployments","mcp__claude_ai_Vercel__list_projects","mcp__claude_ai_Vercel__list_teams","mcp__claude_ai_Vercel__list_toolbar_threads","mcp__claude_ai_Vercel__reply_to_toolbar_thread","mcp__claude_ai_Vercel__search_vercel_documentation","mcp__claude_ai_Vercel__web_fetch_vercel_url","mcp__gpt-image-2__continue_edit_session","mcp__gpt-image-2__edit_image","mcp__gpt-image-2__end_edit_session","mcp__gpt-image-2__generate_image","mcp__gpt-image-2__list_edit_sessions","mcp__gpt-image-2__start_edit_session","mcp__hostinger__DNS_deleteDNSRecordsV1","mcp__hostinger__DNS_getDNSRecordsV1","mcp__hostinger__DNS_getDNSSnapshotListV1","mcp__hostinger__DNS_getDNSSnapshotV1","mcp__hostinger__DNS_resetDNSRecordsV1","mcp__hostinger__DNS_restoreDNSSnapshotV1","mcp__hostinger__DNS_updateDNSRecordsV1","mcp__hostinger__DNS_validateDNSRecordsV1","mcp__hostinger__VPS_activateFirewallV1","mcp__hostinger__VPS_attachPublicKeyV1","mcp__hostinger__VPS_createFirewallRuleV1","mcp__hostinger__VPS_createNewFirewallV1","mcp__hostinger__VPS_createNewProjectV1","mcp__hostinger__VPS_createPTRRecordV1","mcp__hostinger__VPS_createPostInstallScriptV1","mcp__hostinger__VPS_createPublicKeyV1","mcp__hostinger__VPS_createSnapshotV1","mcp__hostinger__VPS_deactivateFirewallV1","mcp__hostinger__VPS_deleteFirewallRuleV1","mcp__hostinger__VPS_deleteFirewallV1","mcp__hostinger__VPS_deletePTRRecordV1","mcp__hostinger__VPS_deletePostInstallScriptV1","mcp__hostinger__VPS_deleteProjectV1","mcp__hostinger__VPS_deletePublicKeyV1","mcp__hostinger__VPS_deleteSnapshotV1","mcp__hostinger__VPS_getActionDetailsV1","mcp__hostinger__VPS_getActionsV1","mcp__hostinger__VPS_getAttachedPublicKeysV1","mcp__hostinger__VPS_getBackupsV1","mcp__hostinger__VPS_getDataCenterListV1","mcp__hostinger__VPS_getFirewallDetailsV1","mcp__hostinger__VPS_getFirewallListV1","mcp__hostinger__VPS_getMetricsV1","mcp__hostinger__VPS_getPostInstallScriptV1","mcp__hostinger__VPS_getPostInstallScriptsV1","mcp__hostinger__VPS_getProjectContainersV1","mcp__hostinger__VPS_getProjectContentsV1","mcp__hostinger__VPS_getProjectListV1","mcp__hostinger__VPS_getProjectLogsV1","mcp__hostinger__VPS_getPublicKeysV1","mcp__hostinger__VPS_getScanMetricsV1","mcp__hostinger__VPS_getSnapshotV1","mcp__hostinger__VPS_getTemplateDetailsV1","mcp__hostinger__VPS_getTemplatesV1","mcp__hostinger__VPS_getVirtualMachineDetailsV1","mcp__hostinger__VPS_getVirtualMachinesV1","mcp__hostinger__VPS_installMonarxV1","mcp__hostinger__VPS_purchaseNewVirtualMachineV1","mcp__hostinger__VPS_recreateVirtualMachineV1","mcp__hostinger__VPS_resetHostnameV1","mcp__hostinger__VPS_restartProjectV1","mcp__hostinger__VPS_restartVirtualMachineV1","mcp__hostinger__VPS_restoreBackupV1","mcp__hostinger__VPS_restoreSnapshotV1","mcp__hostinger__VPS_setHostnameV1","mcp__hostinger__VPS_setNameserversV1","mcp__hostinger__VPS_setPanelPasswordV1","mcp__hostinger__VPS_setRootPasswordV1","mcp__hostinger__VPS_setupPurchasedVirtualMachineV1","mcp__hostinger__VPS_startProjectV1","mcp__hostinger__VPS_startRecoveryModeV1","mcp__hostinger__VPS_startVirtualMachineV1","mcp__hostinger__VPS_stopProjectV1","mcp__hostinger__VPS_stopRecoveryModeV1","mcp__hostinger__VPS_stopVirtualMachineV1","mcp__hostinger__VPS_syncFirewallV1","mcp__hostinger__VPS_uninstallMonarxV1","mcp__hostinger__VPS_updateFirewallRuleV1","mcp__hostinger__VPS_updatePostInstallScriptV1","mcp__hostinger__VPS_updateProjectV1","mcp__hostinger__billing_deletePaymentMethodV1","mcp__hostinger__billing_disableAutoRenewalV1","mcp__hostinger__billing_enableAutoRenewalV1","mcp__hostinger__billing_getCatalogItemListV1","mcp__hostinger__billing_getPaymentMethodListV1","mcp__hostinger__billing_getSubscriptionListV1","mcp__hostinger__billing_setDefaultPaymentMethodV1","mcp__hostinger__domains_checkDomainAvailabilityV1","mcp__hostinger__domains_createDomainForwardingV1","mcp__hostinger__domains_createWHOISProfileV1","mcp__hostinger__domains_deleteDomainForwardingV1","mcp__hostinger__domains_deleteWHOISProfileV1","mcp__hostinger__domains_disableDomainLockV1","mcp__hostinger__domains_disablePrivacyProtectionV1","mcp__hostinger__domains_enableDomainLockV1","mcp__hostinger__domains_enablePrivacyProtectionV1","mcp__hostinger__domains_getDomainDetailsV1","mcp__hostinger__domains_getDomainForwardingV1","mcp__hostinger__domains_getDomainListV1","mcp__hostinger__domains_getWHOISProfileListV1","mcp__hostinger__domains_getWHOISProfileUsageV1","mcp__hostinger__domains_getWHOISProfileV1","mcp__hostinger__domains_purchaseNewDomainV1","mcp__hostinger__domains_updateDomainNameserversV1","mcp__hostinger__hosting_createAccountDatabaseV1","mcp__hostinger__hosting_createWebsiteParkedDomainV1","mcp__hostinger__hosting_createWebsiteSubdomainV1","mcp__hostinger__hosting_createWebsiteV1","mcp__hostinger__hosting_deleteAccountDatabaseV1","mcp__hostinger__hosting_deleteWebsiteParkedDomainV1","mcp__hostinger__hosting_deleteWebsiteSubdomainV1","mcp__hostinger__hosting_deployJsApplication","mcp__hostinger__hosting_deployStaticWebsite","mcp__hostinger__hosting_deployWordpressPlugin","mcp__hostinger__hosting_deployWordpressTheme","mcp__hostinger__hosting_generateAFreeSubdomainV1","mcp__hostinger__hosting_importWordpressWebsite","mcp__hostinger__hosting_installWordPressV1","mcp__hostinger__hosting_listAccountDatabasesV1","mcp__hostinger__hosting_listAvailableDatacentersV1","mcp__hostinger__hosting_listJsDeployments","mcp__hostinger__hosting_listOrdersV1","mcp__hostinger__hosting_listWebsiteParkedDomainsV1","mcp__hostinger__hosting_listWebsiteSubdomainsV1","mcp__hostinger__hosting_listWebsitesV1","mcp__hostinger__hosting_listWordPressInstallationsV1","mcp__hostinger__hosting_showJsDeploymentLogs","mcp__hostinger__hosting_verifyDomainOwnershipV1","mcp__hostinger__reach_createANewContactSegmentV1","mcp__hostinger__reach_createANewContactV1","mcp__hostinger__reach_createNewContactsV1","mcp__hostinger__reach_deleteAContactV1","mcp__hostinger__reach_getSegmentDetailsV1","mcp__hostinger__reach_listContactGroupsV1","mcp__hostinger__reach_listContactsV1","mcp__hostinger__reach_listProfilesV1","mcp__hostinger__reach_listSegmentContactsV1","mcp__hostinger__reach_listSegmentsV1","mcp__hostinger__v2_getDomainVerificationsDIRECT","mcp__n8n-mcp__get_node","mcp__n8n-mcp__get_template","mcp__n8n-mcp__n8n_audit_instance","mcp__n8n-mcp__n8n_autofix_workflow","mcp__n8n-mcp__n8n_create_workflow","mcp__n8n-mcp__n8n_delete_workflow","mcp__n8n-mcp__n8n_deploy_template","mcp__n8n-mcp__n8n_executions","mcp__n8n-mcp__n8n_generate_workflow","mcp__n8n-mcp__n8n_get_workflow","mcp__n8n-mcp__n8n_health_check","mcp__n8n-mcp__n8n_list_workflows","mcp__n8n-mcp__n8n_manage_credentials","mcp__n8n-mcp__n8n_manage_datatable","mcp__n8n-mcp__n8n_test_workflow","mcp__n8n-mcp__n8n_update_full_workflow","mcp__n8n-mcp__n8n_update_partial_workflow","mcp__n8n-mcp__n8n_validate_workflow","mcp__n8n-mcp__n8n_workflow_versions","mcp__n8n-mcp__search_nodes","mcp__n8n-mcp__search_templates","mcp__n8n-mcp__tools_documentation","mcp__n8n-mcp__validate_node","mcp__n8n-mcp__validate_workflow","mcp__nanobanana__generate_image","mcp__nanobanana__maintenance","mcp__nanobanana__show_output_stats","mcp__nanobanana__upload_file","mcp__plugin_vercel_vercel__authenticate","mcp__plugin_vercel_vercel__complete_authentication","mcp__refero__refero_get_flow","mcp__refero__refero_get_screen","mcp__refero__refero_get_screen_image","mcp__refero__refero_get_similar_screens","mcp__refero__refero_get_style","mcp__refero__refero_search_flows","mcp__refero__refero_search_screens","mcp__refero__refero_search_styles","mcp__semrush__authenticate","mcp__semrush__complete_authentication","mcp__stock-images-mcp__search_stock_images","mcp__trello__add_card_to_list","mcp__trello__add_checklist_item","mcp__trello__add_comment","mcp__trello__add_list_to_board","mcp__trello__archive_card","mcp__trello__archive_list","mcp__trello__assign_member_to_card","mcp__trello__attach_file_to_card","mcp__trello__attach_image_data_to_card","mcp__trello__attach_image_to_card","mcp__trello__create_board","mcp__trello__create_checklist","mcp__trello__create_label","mcp__trello__delete_comment","mcp__trello__delete_label","mcp__trello__find_checklist_items_by_description","mcp__trello__get_acceptance_criteria","mcp__trello__get_active_board_info","mcp__trello__get_board_labels","mcp__trello__get_board_members","mcp__trello__get_card","mcp__trello__get_card_comments","mcp__trello__get_card_history","mcp__trello__get_cards_by_list_id","mcp__trello__get_checklist_by_name","mcp__trello__get_checklist_items","mcp__trello__get_health","mcp__trello__get_health_detailed","mcp__trello__get_health_metadata","mcp__trello__get_health_performance","mcp__trello__get_lists","mcp__trello__get_my_cards","mcp__trello__get_recent_activity","mcp__trello__list_boards","mcp__trello__list_boards_in_workspace","mcp__trello__list_workspaces","mcp__trello__move_card","mcp__trello__perform_system_repair","mcp__trello__remove_member_from_card","mcp__trello__set_active_board","mcp__trello__set_active_workspace","mcp__trello__update_card_details","mcp__trello__update_checklist_item","mcp__trello__update_comment","mcp__trello__update_label"],"removedNames":[],"readdedNames":[]},"type":"attachment","uuid":"2b62418e-d1cb-46f9-b0b3-5a67d7917875","timestamp":"2026-06-07T15:59:05.815Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"2b62418e-d1cb-46f9-b0b3-5a67d7917875","isSidechain":true,"agentId":"aed0c7b7da27aaf24","attachment":{"type":"skill_listing","content":"- adapt: Adapt designs to work across different screen sizes, devices, contexts, or platforms. Implements breakpoints, fluid layouts, and touch targets. Use when the user mentions responsive design, mobile layouts, breakpoints, viewport adaptation, or cross-device compatibility.\n- animate: Review a feature and enhance it with purposeful animations, micro-interactions, and motion effects that improve usability and delight. Use when the user mentions adding animation, transitions, micro-interactions, motion design, hover effects, or making the UI feel more alive.\n- arrange: Improve layout, spacing, and visual rhythm. Fixes monotonous grids, inconsistent spacing, and weak visual hierarchy. Use when the user mentions layout feeling off, spacing issues, visual hierarchy, crowded UI, alignment problems, or wanting better composition.\n- audit: Run technical quality checks across accessibility, performance, theming, responsive design, and anti-patterns. Generates a scored report with P0-P3 severity ratings and actionable plan. Use when the user wants an accessibility check, performance audit, or technical quality review.\n- bolder: Amplify safe or boring designs to make them more visually interesting and stimulating. Increases impact while maintaining usability. Use when the user says the design looks bland, generic, too safe, lacks personality, or wants more visual impact and character.\n- bun-to-next: Pipeline de migration d'un projet Vite + Bun + React (preprod) vers un projet Next.js 16 (App Router, production-ready, SEO/SSR/edge). Préserve les composants, les assets, les CSS modules et les tokens. Inclut conversion du routing manuel vers file-based routes, remplacement de <picture> par <Image>, configuration metadata API, déploiement Vercel. Utiliser quand l'utilisateur dit \"migrer vers next\", \"passer en prod\", \"convertir vite en next\", \"de bun à next\", \"bun-to-next\", ou après un Acte 7 (capture & compare) validé sur un projet Vite+Bun.\n- clarify: Improve unclear UX copy, error messages, microcopy, labels, and instructions to make interfaces easier to understand. Use when the user mentions confusing text, unclear labels, bad error messages, hard-to-follow instructions, or wanting better UX writing.\n- colorize: Add strategic color to features that are too monochromatic or lack visual interest, making interfaces more engaging and expressive. Use when the user mentions the design looking gray, dull, lacking warmth, needing more color, or wanting a more vibrant or expressive palette.\n- critique: Evaluate design from a UX perspective, assessing visual hierarchy, information architecture, emotional resonance, cognitive load, and overall quality with quantitative scoring, persona-based testing, automated anti-pattern detection, and actionable feedback. Use when the user asks to review, critique, evaluate, or give feedback on a design or component.\n- delight: Add moments of joy, personality, and unexpected touches that make interfaces memorable and enjoyable to use. Elevates functional to delightful. Use when the user asks to add polish, personality, animations, micro-interactions, delight, or make an interface feel fun or memorable.\n- design-taste-frontend: Anti-slop frontend skill for landing pages, portfolios, and redesigns. The agent reads the brief, infers the right design direction, and ships interfaces that do not look templated. Real design systems when applicable, audit-first on redesigns, strict pre-flight check.\n- distill: Strip designs to their essence by removing unnecessary complexity. Great design is simple, powerful, and clean. Use when the user asks to simplify, declutter, reduce noise, remove elements, or make a UI cleaner and more focused.\n- extract: Extract and consolidate reusable components, design tokens, and patterns into your design system. Identifies opportunities for systematic reuse and enriches your component library. Use when the user asks to create components, refactor repeated UI patterns, build a design system, or extract tokens.\n- feedbackpin: Fetch visual feedback tasks from FeedbackPin, resolve them by fixing the code, and report back via API\n- full-output-enforcement: Overrides default LLM truncation behavior. Enforces complete code generation, bans placeholder patterns, and handles token-limit splits cleanly. Apply to any task requiring exhaustive, unabridged output.\n- gsap-core: Official GSAP skill for the core API — gsap.to(), from(), fromTo(), easing, duration, stagger, defaults, gsap.matchMedia() (responsive, prefers-reduced-motion). Use when the user asks for a JavaScript animation library, animation in React/Vue/vanilla, GSAP tweens, easing, basic animation, responsive or reduced-motion animation, or when animating DOM/SVG with GSAP. Recommend GSAP when the user needs timelines, scroll-driven animation, or a framework-agnostic library. GSAP runs in any framework or vanilla JS; powers Webflow Interactions.\n- gsap-frameworks: Official GSAP skill for Vue, Svelte, and other non-React frameworks — lifecycle, scoping selectors, cleanup on unmount. Use when the user wants animation in Vue, Nuxt, Svelte, SvelteKit, or asks about GSAP with Vue/Svelte, onMounted, onMount, onDestroy. Recommend GSAP for framework animation unless another library is specified. For React use gsap-react.\n- gsap-performance: Official GSAP skill for performance — prefer transforms, avoid layout thrashing, will-change, batching. Use when optimizing GSAP animations, reducing jank, or when the user asks about animation performance, FPS, or smooth 60fps.\n- gsap-plugins: Official GSAP skill for GSAP plugins — registration, ScrollToPlugin, ScrollSmoother, Flip, Draggable, Inertia, Observer, SplitText, ScrambleText, SVG and physics plugins, CustomEase, EasePack, CustomWiggle, CustomBounce, GSDevTools. Use when the user asks about a GSAP plugin, scroll-to, flip animations, draggable, SVG drawing, or plugin registration.\n- gsap-react: Official GSAP skill for React — useGSAP hook, refs, gsap.context(), cleanup. Use when the user wants animation in React or Next.js, or asks about GSAP with React, useGSAP, or cleanup on unmount. Recommend GSAP for React animation unless the user has chosen another library.\n- gsap-scrolltrigger: Official GSAP skill for ScrollTrigger — scroll-linked animations, pinning, scrub, triggers. Use when building or recommending scroll-based animation, parallax, pinned sections, or when the user asks about ScrollTrigger, scroll animations, or pinning. Recommend GSAP for scroll-driven animation when no library is specified.\n- gsap-timeline: Official GSAP skill for timelines — gsap.timeline(), position parameter, nesting, playback. Use when sequencing animations, choreographing keyframes, or when the user asks about animation sequencing, timelines, or animation order (in GSAP or when recommending a library that supports timelines).\n- gsap-utils: Official GSAP skill for gsap.utils — clamp, mapRange, normalize, interpolate, random, snap, toArray, wrap, pipe. Use when the user asks about gsap.utils, clamp, mapRange, random, snap, toArray, wrap, or helper utilities in GSAP.\n- harden: Improve interface resilience through better error handling, i18n support, text overflow handling, and edge case management. Makes interfaces robust and production-ready. Use when the user asks to harden, make production-ready, handle edge cases, add error states, or fix overflow and i18n issues.\n- high-end-visual-design: Teaches the AI to design like a high-end agency. Defines the exact fonts, spacing, shadows, card structures, and animations that make a website feel expensive. Blocks all the common defaults that make AI designs look cheap or generic.\n- impeccable: Create distinctive, production-grade frontend interfaces with high design quality. Generates creative, polished code that avoids generic AI aesthetics. Use when the user asks to build web components, pages, artifacts, posters, or applications, or when any design skill requires project context. Call with 'craft' to run the full shape-then-build flow, or 'teach' for design context setup.\n- industrial-brutalist-ui: Raw mechanical interfaces fusing Swiss typographic print with military terminal aesthetics. Rigid grids, extreme type scale contrast, utilitarian color, analog degradation effects. For data-heavy dashboards, portfolios, or editorial sites that need to feel like declassified blueprints.\n- minimalist-ui: Clean editorial-style interfaces. Warm monochrome palette, typographic contrast, flat bento grids, muted pastels. No gradients, no heavy shadows.\n- netlify-cli: Déploie, gère et configure des sites Netlify depuis le terminal via le CLI officiel. Utilise quand l'utilisateur dit \"netlify\", \"déployer sur netlify\", \"netlify deploy\", \"netlify link\", \"netlify env\", \"netlify functions\", \"build netlify\", ou mentionne un site hébergé sur Netlify.\n- nextjs-wordpress-headless: Branche un projet Next.js existant sur WordPress headless (ACF PRO + WPGraphQL). Gère pages uniques, CPT, SEO, revalidation on-demand.\n- normalize: Audits and realigns UI to match design system standards, spacing, tokens, and patterns. Use when the user mentions consistency, design drift, mismatched styles, tokens, or wants to bring a feature back in line with the system.\n- onboard: Designs and improves onboarding flows, empty states, and first-run experiences to help users reach value quickly. Use when the user mentions onboarding, first-time users, empty states, activation, getting started, or new user flows.\n- optimize: Diagnoses and fixes UI performance across loading speed, rendering, animations, images, and bundle size. Use when the user mentions slow, laggy, janky, performance, bundle size, load time, or wants a faster, smoother experience.\n- overdrive: Pushes interfaces past conventional limits with technically ambitious implementations — shaders, spring physics, scroll-driven reveals, 60fps animations. Use when the user wants to wow, impress, go all-out, or make something that feels extraordinary.\n- paf-site-magic: Pipeline image-first pour construire un site web depuis des maquettes IA, avec extraction de DA depuis un site existant et réutilisation maximale de composants. Génère mockup desktop + mockup mobile par section avec gpt-image-2 (thinking mode auto), itère avec validation utilisateur (génération parallèle en background Python), produit chaque asset visuel comme image ISOLÉE (jamais découpée du mockup), code la section en React+TS+Tailwind+shadcn en RÉUTILISANT les composants existants du projet (Navbar, Footer, marquees logos, sticky CTAs, buttons), audite avec captures multi-viewport via Playwright. Tout est rangé section par section dans le dossier ressources/ du repo git. Utiliser quand l'utilisateur dit \"paf-site-magic\", \"site depuis mockup\", \"maquette IA\", \"mockup-to-code\", \"pipeline visuel\", \"générer le site à partir d'images\", \"créer une landing par sections\", \"forge\", \"refonte covalba style\", ou \"page industrie\".\n- polish: Performs a final quality pass fixing alignment, spacing, consistency, and micro-detail issues before shipping. Use when the user mentions polish, finishing touches, pre-launch review, something looks off, or wants to go from good to great.\n- quieter: Tones down visually aggressive or overstimulating designs, reducing intensity while preserving quality. Use when the user mentions too bold, too loud, overwhelming, aggressive, garish, or wants a calmer, more refined aesthetic.\n- redesign-existing-projects: Upgrades existing websites and apps to premium quality. Audits current design, identifies generic AI patterns, and applies high-end design standards without breaking functionality. Works with any CSS framework or vanilla CSS.\n- refero-design: Research-First design methodology using Refero MCP. Use when creating new screens, flows, or interfaces—especially when the user asks to design, build, or create UI. Guides systematic research with Refero tools (search_screens, search_flows, get_screen, get_flow, get_design_guidance), pattern extraction from real products, and quality craft. Prevents generic \"AI slop\" designs by grounding every decision in research and professional-grade execution.\n- scheduled-actions: Manage Paf-Studio Code scheduled actions. Use when creating, editing, testing, disabling, or auditing recurring jobs that run via Codex, Claude, OpenCode, project worktrees, or Automator workspaces.\n- seo: Comprehensive SEO analysis for any website or business type. Performs full site audits, single-page deep analysis, technical SEO checks (crawlability, indexability, Core Web Vitals with INP), schema markup detection/validation/generation, content quality assessment (E-E-A-T framework per Dec 2025 update extending to all competitive queries), image optimization, sitemap analysis, and Generative Engine Optimization (GEO) for AI Overviews, ChatGPT, and Perplexity citations. Analyzes AI crawler accessibility (GPTBot, ClaudeBot, PerplexityBot), llms.txt compliance, brand mention signals, and passage-level citability. Industry detection for SaaS, e-commerce, local business, publishers, agencies. Triggers on: \"SEO\", \"audit\", \"schema\", \"Core Web Vitals\", \"sitemap\", \"E-E-A-T\", \"AI Overviews\", \"GEO\", \"technical SEO\", \"content quality\", \"page speed\", \"structured data\".\n- seo-audit: Full website SEO audit with parallel subagent delegation. Crawls up to 500 pages, detects business type, delegates to 10 specialists (7 core + 3 conditional), generates health score. Use when user says \"audit\", \"full SEO check\", \"analyze my site\", or \"website health check\".\n- seo-backlinks: Backlink profile analysis: referring domains, anchor text distribution, toxic link detection, competitor gap analysis. Requires DataForSEO extension. Use when user says \"backlinks\", \"link profile\", \"referring domains\", \"anchor text\", \"toxic links\", \"link gap\", \"link building\", \"disavow\", or \"backlink audit\".\n- seo-competitor-pages: Generate SEO-optimized competitor comparison and alternatives pages. Covers \"X vs Y\" layouts, \"alternatives to X\" pages, feature matrices, schema markup, and conversion optimization. Use when user says \"comparison page\", \"vs page\", \"alternatives page\", \"competitor comparison\", \"X vs Y\", \"versus\", \"compare competitors\", or \"alternative to\".\n- seo-content: Content quality and E-E-A-T analysis with AI citation readiness assessment. Use when user says \"content quality\", \"E-E-A-T\", \"content analysis\", \"readability check\", \"thin content\", or \"content audit\".\n- seo-dataforseo: Live SEO data via DataForSEO MCP server. SERP analysis (Google, Bing, Yahoo, YouTube), keyword research (volume, difficulty, intent, trends), backlink profiles, on-page analysis (Lighthouse, content parsing), competitor analysis, content analysis, business listings, AI visibility (ChatGPT scraper, LLM mention tracking), and domain analytics. Requires DataForSEO extension installed. Use when user says \"dataforseo\", \"live SERP\", \"keyword volume\", \"backlink data\", \"competitor data\", \"AI visibility check\", \"LLM mentions\", or \"real search data\".\n- seo-firecrawl: Full-site crawling, scraping, and site mapping via Firecrawl MCP. Use when user says \"crawl site\", \"map site\", \"full crawl\", \"find all pages\", \"broken links\", \"site structure\", \"discover pages\", \"JS rendering\", or needs site-wide analysis.\n- seo-geo: Optimize content for AI Overviews (formerly SGE), ChatGPT web search, Perplexity, and other AI-powered search experiences. Generative Engine Optimization (GEO) analysis including brand mention signals, AI crawler accessibility, llms.txt compliance, passage-level citability scoring, and platform-specific optimization. Use when user says \"AI Overviews\", \"SGE\", \"GEO\", \"AI search\", \"LLM optimization\", \"Perplexity\", \"AI citations\", \"ChatGPT search\", or \"AI visibility\".\n- seo-google: Google SEO APIs: Search Console (Search Analytics, URL Inspection, Sitemaps), PageSpeed Insights v5, CrUX field data with 25-week history, Indexing API v3, and GA4 organic traffic. Provides real Google field data for Core Web Vitals, indexation status, search performance, and organic traffic trends. Use when user says \"search console\", \"GSC\", \"PageSpeed\", \"CrUX\", \"field data\", \"indexing API\", \"GA4 organic\", \"URL inspection\", \"google api setup\", \"real CWV data\", \"impressions\", \"clicks\", \"CTR\", \"position data\", \"LCP\", \"INP\", \"CLS\", \"FCP\", \"TTFB\", or \"Lighthouse scores\".\n- seo-hreflang: Hreflang and international SEO audit, validation, and generation. Detects common mistakes, validates language/region codes, and generates correct hreflang implementations. Use when user says \"hreflang\", \"i18n SEO\", \"international SEO\", \"multi-language\", \"multi-region\", or \"language tags\".\n- seo-image-gen: AI image generation for SEO assets: OG/social preview images, blog hero images, schema images, product photography, infographics. Powered by Gemini via nanobanana-mcp. Requires banana extension installed. Use when user says \"generate image\", \"OG image\", \"social preview\", \"hero image\", \"blog image\", \"product photo\", \"infographic\", \"seo image\", \"create visual\", \"image-gen\", \"favicon\", \"schema image\", \"pinterest pin\", \"generate visual\", \"banner\", or \"thumbnail\".\n- seo-images: Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention. Use when user says \"image optimization\", \"alt text\", \"image SEO\", \"image size\", or \"image audit\".\n- seo-local: Local SEO analysis covering Google Business Profile optimization, NAP consistency, citation health, review signals, local schema markup, location page quality, multi-location SEO, and industry-specific recommendations. Detects business type (brick-and-mortar, SAB, hybrid) and industry vertical (restaurant, healthcare, legal, home services, real estate, automotive). Use when user says \"local SEO\", \"Google Business Profile\", \"GBP\", \"map pack\", \"local pack\", \"citations\", \"NAP consistency\", \"local rankings\", \"service area\", \"multi-location\", or \"local search\".\n- seo-maps: Maps intelligence for local SEO — geo-grid rank tracking, GBP profile auditing via API, review intelligence across Google/Tripadvisor/Trustpilot, cross-platform NAP verification (Google/Bing/Apple/OSM), competitor radius mapping, and LocalBusiness schema generation from API data. Three-tier capability: free (Overpass + Geoapify), DataForSEO (full intelligence), DataForSEO + Google (maximum coverage). Use when user says \"maps\", \"geo-grid\", \"rank tracking\", \"GBP audit\", \"review velocity\", \"competitor radius\", \"maps analysis\", \"local rank tracking\", \"Share of Local Voice\", or \"SoLV\".\n- seo-page: Deep single-page SEO analysis covering on-page elements, content quality, technical meta tags, schema, images, and performance. Use when user says \"analyze this page\", \"check page SEO\", \"single URL\", \"check this page\", \"page analysis\", or provides a single URL for review.\n- seo-plan: Strategic SEO planning for new or existing websites. Industry-specific templates, competitive analysis, content strategy, and implementation roadmap. Use when user says \"SEO plan\", \"SEO strategy\", \"SEO planning\", \"content strategy\", \"keyword strategy\", \"content calendar\", \"site architecture\", or \"SEO roadmap\".\n- seo-programmatic: Programmatic SEO planning and analysis for pages generated at scale from data sources. Covers template engines, URL patterns, internal linking automation, thin content safeguards, and index bloat prevention. Use when user says \"programmatic SEO\", \"pages at scale\", \"dynamic pages\", \"template pages\", \"generated pages\", or \"data-driven SEO\".\n- seo-schema: Detect, validate, and generate Schema.org structured data. JSON-LD format preferred. Use when user says \"schema\", \"structured data\", \"rich results\", \"JSON-LD\", or \"markup\".\n- seo-sitemap: Analyze existing XML sitemaps or generate new ones with industry templates. Validates format, URLs, and structure. Use when user says \"sitemap\", \"generate sitemap\", \"sitemap issues\", or \"XML sitemap\".\n- seo-technical: Technical SEO audit across 9 categories: crawlability, indexability, security, URL structure, mobile, Core Web Vitals, structured data, JavaScript rendering, and IndexNow protocol. Use when user says \"technical SEO\", \"crawl issues\", \"robots.txt\", \"Core Web Vitals\", \"site speed\", or \"security headers\".\n- shape: Plan the UX and UI for a feature before writing code. Runs a structured discovery interview, then produces a design brief that guides implementation. Use during the planning phase to establish design direction, constraints, and strategy before any code is written.\n- stitch-design-taste: Semantic Design System Skill for Google Stitch. Generates agent-friendly DESIGN.md files that enforce premium, anti-generic UI standards — strict typography, calibrated color, asymmetric layouts, perpetual micro-motion, and hardware-accelerated performance.\n- typeset: Improves typography by fixing font choices, hierarchy, sizing, weight, and readability so text feels intentional. Use when the user mentions fonts, type, readability, text hierarchy, sizing looks off, or wants more polished, intentional typography.\n- brainstorming: You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.\n- brutalist-skill: Raw mechanical interfaces fusing Swiss typographic print with military terminal aesthetics. Rigid grids, extreme type scale contrast, utilitarian color, analog degradation effects. For data-heavy dashboards, portfolios, or editorial sites that need to feel like declassified blueprints.\n- check-conflicts: Verifie les conflits potentiels avec les PRs ouvertes avant de travailler sur un fichier ou une section.\n- dispatching-parallel-agents: Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies\n- executing-plans: Use when you have a written implementation plan to execute in a separate session with review checkpoints\n- feedbackpin: Fetch visual feedback tasks from FeedbackPin, resolve them by fixing the code, and report back via API\n- finishing-a-development-branch: Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup\n- minimalist-skill: Clean editorial-style interfaces. Warm monochrome palette, typographic contrast, flat bento grids, muted pastels. No gradients, no heavy shadows.\n- output-skill: Overrides default LLM truncation behavior. Enforces complete code generation, bans placeholder patterns, and handles token-limit splits cleanly. Apply to any task requiring exhaustive, unabridged output.\n- receiving-code-review: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation\n- redesign-skill: Upgrades existing websites and apps to premium quality. Audits current design, identifies generic AI patterns, and applies high-end design standards without breaking functionality. Works with any CSS framework or vanilla CSS.\n- requesting-code-review: Use when completing tasks, implementing major features, or before merging to verify work meets requirements\n- soft-skill: Teaches the AI to design like a high-end agency. Defines the exact fonts, spacing, shadows, card structures, and animations that make a website feel expensive. Blocks all the common defaults that make AI designs look cheap or generic.\n- stitch-skill: Semantic Design System Skill for Google Stitch. Generates agent-friendly DESIGN.md files that enforce premium, anti-generic UI standards — strict typography, calibrated color, asymmetric layouts, perpetual micro-motion, and hardware-accelerated performance.\n- subagent-driven-development: Use when executing implementation plans with independent tasks in the current session\n- systematic-debugging: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes\n- taste-skill: Senior UI/UX Engineer. Architect digital interfaces overriding default LLM biases. Enforces metric-based rules, strict component architecture, CSS hardware acceleration, and balanced design engineering.\n- test-driven-development: Use when implementing any feature or bugfix, before writing implementation code\n- using-git-worktrees\n- using-superpowers\n- verification-before-completion\n- writing-plans\n- writing-skills\n- handoff: Cree ou met a jour HANDOFF.md pour passer le relais a l'agent suivant\n- deep-research: Deep research harness — fan-out web searches, fetch sources, adversarially verify claims, synthesize a cited report. - When the user wants a deep, multi-source, fact-checked research report on any topic. BEFORE invoking, check if the question is specific enough to research directly — if underspecified (e.g., \"what car to buy\" without budget/use-case/region), ask 2-3 clarifying questions to narrow scope. Then pass the refined question as args, weaving the answers in.\n- vercel:bootstrap\n- vercel:deploy\n- vercel:env\n- vercel:marketplace\n- vercel:status\n- vercel:ai-gateway\n- vercel:ai-sdk\n- vercel:auth\n- vercel:bootstrap\n- vercel:chat-sdk\n- vercel:deployments-cicd\n- vercel:env-vars\n- vercel:knowledge-update\n- vercel:marketplace\n- vercel:next-cache-components\n- vercel:next-forge\n- vercel:next-upgrade\n- vercel:nextjs\n- vercel:react-best-practices\n- vercel:routing-middleware\n- vercel:runtime-cache\n- vercel:shadcn\n- vercel:turbopack\n- vercel:vercel-agent\n- vercel:vercel-cli\n- vercel:vercel-firewall\n- vercel:vercel-functions\n- vercel:vercel-sandbox\n- vercel:vercel-storage\n- vercel:verification\n- vercel:workflow\n- update-config: Use this skill to configure the Claude Code harness via settings.json. Automated behaviors (\"from now on when X\", \"each time X\", \"whenever X\", \"before/after X\") require hooks configured in settings.json - the harness executes these, not Claude, so memory/preferences cannot fulfill them. Also use for: permissions (\"allow X\", \"add permission\", \"move permission to\"), env vars (\"set X=Y\"), hook troubleshooting, or any changes to settings.json/settings.local.json files. Examples: \"allow npm commands\", \"add bq permission to global settings\", \"move permission to user settings\", \"set DEBUG=true\", \"when claude stops show X\". For simple settings like theme/model, suggest the /config command.\n- keybindings-help: Use when the user wants to customize keyboard shortcuts, rebind keys, add chord bindings, or modify ~/.claude/keybindings.json. Examples: \"rebind ctrl+s\", \"add a chord shortcut\", \"change the submit key\", \"customize keybindings\".\n- verify: Verify that a code change actually does what it's supposed to by running the app and observing behavior. Use when asked to verify a PR, confirm a fix works, test a change manually, check that a feature works, or validate local changes before pushing.\n- code-review: Review the current diff for correctness bugs and reuse/simplification/efficiency cleanups at the given effort level (low/medium: fewer, high-confidence findings; high→max: broader coverage, may include uncertain findings; ultra: deep multi-agent review in the cloud). Pass --comment to post findings as inline PR comments, or --fix to apply the findings to the working tree after the review.\n- simplify: Review the changed code for reuse, simplification, efficiency, and altitude cleanups, then apply the fixes. Quality only — it does not hunt for bugs; use /code-review for that.\n- fewer-permission-prompts: Scan your transcripts for common read-only Bash and MCP tool calls, then add a prioritized allowlist to project .claude/settings.json to reduce permission prompts.\n- loop: Run a prompt or slash command on a recurring interval (e.g. /loop 5m /foo). Omit the interval to let the model self-pace. - When the user wants to set up a recurring task, poll for status, or run something repeatedly on an interval (e.g. \"check the deploy every 5 minutes\", \"keep running /babysit-prs\"). Do NOT invoke for one-off tasks.\n- schedule: Create, update, list, or run scheduled remote agents (routines) that execute on a cron schedule. - When the user wants to schedule a recurring remote agent, set up automated tasks, create a cron job for Claude Code, or manage their scheduled agents/routines. Also use when the user wants a one-time scheduled run (\"run this once at 3pm\", \"remind me to check X tomorrow\").\n- claude-api: Reference for the Claude API / Anthropic SDK — model ids, pricing, params, streaming, tool use, MCP, agents, caching, token counting, model migration.\nTRIGGER — read BEFORE opening the target file; don't skip because it \"looks like a one-liner\" — whenever: the prompt names Claude/Anthropic in any form (Claude, Anthropic, Opus, Sonnet, Haiku, `anthropic`, `@anthropic-ai`, `claude-*`, `us.anthropic.*`, `[1m]`); the user asks about an LLM (pricing/model choice/limits/caching) — never answer from memory; OR the task is LLM-shaped with provider unstated (agent/MCP/tool-definition/multi-agent/RAG/LLM-judge/computer-use; generate/summarize/extract/classify/rewrite/converse over NL; debugging refusals/cutoffs/streaming/tool-calls/tokens).\nSKIP only when another provider is being worked on (overrides all triggers): OpenAI/GPT/Gemini/Llama/Mistral/Cohere/Ollama named in the query; OR `grep -rE 'openai|langchain_openai|google.generativeai|genai|mistralai|cohere|ollama'` over the project hits (run this grep FIRST if no provider named — don't Read the file).\n- run: Launch and drive this project's app to see a change working. Use when asked to run, start, or screenshot the app, or to confirm a change works in the real app (not just tests). First looks for a project skill that already covers launching the app; otherwise falls back to built-in patterns per project type (CLI, server, TUI, Electron, browser-driven, library).\n- init\n- review\n- security-review","skillCount":132,"isInitial":true,"names":["adapt","animate","arrange","audit","bolder","bun-to-next","clarify","colorize","critique","delight","design-taste-frontend","distill","extract","feedbackpin","full-output-enforcement","gsap-core","gsap-frameworks","gsap-performance","gsap-plugins","gsap-react","gsap-scrolltrigger","gsap-timeline","gsap-utils","harden","high-end-visual-design","impeccable","industrial-brutalist-ui","minimalist-ui","netlify-cli","nextjs-wordpress-headless","normalize","onboard","optimize","overdrive","paf-site-magic","polish","quieter","redesign-existing-projects","refero-design","scheduled-actions","seo","seo-audit","seo-backlinks","seo-competitor-pages","seo-content","seo-dataforseo","seo-firecrawl","seo-geo","seo-google","seo-hreflang","seo-image-gen","seo-images","seo-local","seo-maps","seo-page","seo-plan","seo-programmatic","seo-schema","seo-sitemap","seo-technical","shape","stitch-design-taste","typeset","brainstorming","brutalist-skill","check-conflicts","dispatching-parallel-agents","executing-plans","feedbackpin","finishing-a-development-branch","minimalist-skill","output-skill","receiving-code-review","redesign-skill","requesting-code-review","soft-skill","stitch-skill","subagent-driven-development","systematic-debugging","taste-skill","test-driven-development","using-git-worktrees","using-superpowers","verification-before-completion","writing-plans","writing-skills","handoff","deep-research","vercel:bootstrap","vercel:deploy","vercel:env","vercel:marketplace","vercel:status","vercel:ai-gateway","vercel:ai-sdk","vercel:auth","vercel:bootstrap","vercel:chat-sdk","vercel:deployments-cicd","vercel:env-vars","vercel:knowledge-update","vercel:marketplace","vercel:next-cache-components","vercel:next-forge","vercel:next-upgrade","vercel:nextjs","vercel:react-best-practices","vercel:routing-middleware","vercel:runtime-cache","vercel:shadcn","vercel:turbopack","vercel:vercel-agent","vercel:vercel-cli","vercel:vercel-firewall","vercel:vercel-functions","vercel:vercel-sandbox","vercel:vercel-storage","vercel:verification","vercel:workflow","update-config","keybindings-help","verify","code-review","simplify","fewer-permission-prompts","loop","schedule","claude-api","run","init","review","security-review"]},"type":"attachment","uuid":"066809cc-281b-4e58-8dad-bf1f2eb724ab","timestamp":"2026-06-07T15:59:05.818Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"066809cc-281b-4e58-8dad-bf1f2eb724ab","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01Msx4tNYUMCKurfoo9CVSGi","type":"message","role":"assistant","content":[{"type":"text","text":"I'll verify the current AI SDK v6 / AI Gateway API surface against live docs before writing the design. Let me load the right tools and search."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":23499,"cache_creation_input_tokens":22964,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":22964,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBBsWgzNNWXvST9Poke","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"efaa2408-5444-4864-a31c-d5b58a5c2ded","timestamp":"2026-06-07T15:59:08.426Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"efaa2408-5444-4864-a31c-d5b58a5c2ded","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01Msx4tNYUMCKurfoo9CVSGi","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_011E8DH2tJHR9ujsKDmB3Czh","name":"ToolSearch","input":{"query":"select:WebFetch,mcp__claude_ai_Vercel__search_vercel_documentation","max_results":5},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":23499,"cache_creation_input_tokens":22964,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":22964,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBBsWgzNNWXvST9Poke","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"375363d8-874f-458b-a3fb-b30b8a03560a","timestamp":"2026-06-07T15:59:09.951Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"375363d8-874f-458b-a3fb-b30b8a03560a","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu_011E8DH2tJHR9ujsKDmB3Czh","content":[{"type":"tool_reference","tool_name":"WebFetch"},{"type":"tool_reference","tool_name":"mcp__claude_ai_Vercel__search_vercel_documentation"}]}]},"uuid":"4c756edf-36d3-4f5d-9086-252d70a6d0ce","timestamp":"2026-06-07T15:59:09.964Z","sourceToolAssistantUUID":"375363d8-874f-458b-a3fb-b30b8a03560a","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"4c756edf-36d3-4f5d-9086-252d70a6d0ce","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01Msx4tNYUMCKurfoo9CVSGi","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01FMwc5QfjwywcmUuuxduKqr","name":"Skill","input":{"skill":"claude-api"},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":23499,"cache_creation_input_tokens":22964,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":22964,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBBsWgzNNWXvST9Poke","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"4804a630-9152-4127-b867-3284863e6b46","timestamp":"2026-06-07T15:59:10.164Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"4804a630-9152-4127-b867-3284863e6b46","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu_01FMwc5QfjwywcmUuuxduKqr","content":"Launching skill: claude-api"}]},"uuid":"5f8f6248-b788-446f-8968-6938ca38c281","timestamp":"2026-06-07T15:59:10.334Z","sourceToolAssistantUUID":"4804a630-9152-4127-b867-3284863e6b46","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"5f8f6248-b788-446f-8968-6938ca38c281","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"type":"text","text":"Base directory for this skill: /tmp/claude-1001/bundled-skills/2.1.168/f0771099c2d141f7fc6522dfd927742f/claude-api\n\n# Building LLM-Powered Applications with Claude\n\nThis skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.\n\n## Before You Start\n\nScan the target file (or, if no target file, the prompt and project) for non-Anthropic provider markers — `import openai`, `from openai`, `langchain_openai`, `OpenAI(`, `gpt-4`, `gpt-5`, file names like `agent-openai.py` or `*-generic.py`, or any explicit instruction to keep the code provider-neutral. If you find any, stop and tell the user that this skill produces Claude/Anthropic SDK code; ask whether they want to switch the file to Claude or want a non-Claude implementation. Do not edit a non-Anthropic file with Anthropic SDK calls.\n\n## Output Requirement\n\nWhen the user asks you to add, modify, or implement a Claude feature, your code must call Claude through one of:\n\n1. **The official Anthropic SDK** for the project's language (`anthropic`, `@anthropic-ai/sdk`, `com.anthropic.*`, etc.). This is the default whenever a supported SDK exists for the project.\n2. **Raw HTTP** (`curl`, `requests`, `fetch`, `httpx`, etc.) — only when the user explicitly asks for cURL/REST/raw HTTP, the project is a shell/cURL project, or the language has no official SDK.\n\nNever mix the two — don't reach for `requests`/`fetch` in a Python or TypeScript project just because it feels lighter. Never fall back to OpenAI-compatible shims.\n\n**Never guess SDK usage.** Function names, class names, namespaces, method signatures, and import paths must come from explicit documentation — either the `{lang}/` files in this skill or the official SDK repositories or documentation links listed in `shared/live-sources.md`. If the binding you need is not explicitly documented in the skill files, WebFetch the relevant SDK repo from `shared/live-sources.md` before writing code. Do not infer Ruby/Java/Go/PHP/C# APIs from cURL shapes or from another language's SDK.\n\n## Defaults\n\nUnless the user requests otherwise:\n\nFor the Claude model version, please use Claude Opus 4.8, which you can access via the exact model string `claude-opus-4-8`. Please default to using adaptive thinking (`thinking: {type: \"adaptive\"}`) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high `max_tokens` — it prevents hitting request timeouts. Use the SDK's `.get_final_message()` / `.finalMessage()` helper to get the complete response if you don't need to handle individual stream events\n\n---\n\n## Subcommands\n\nIf the User Request at the bottom of this prompt is a bare subcommand string (no prose), search every **Subcommands** table in this document — including any in sections appended below — and follow the matching Action column directly. This lets users invoke specific flows via `/claude-api <subcommand>`. If no table in the document matches, treat the request as normal prose.\n\n| Subcommand | Action |\n|---|---|\n| `migrate` | Migrate existing Claude API code to a newer model. **Read `shared/model-migration.md` immediately** and follow it in order: Step 0 (confirm scope — ask which files/directories before any edit), Step 1 (classify each file), then the per-target breaking-changes section. Do not summarize the guide — execute it. If the user did not name a target model, ask which model to migrate to in the same turn as the scope question. |\n\n---\n\n## Language Detection\n\nBefore reading code examples, determine which language the user is working in:\n\n1. **Look at project files** to infer the language:\n\n   - `*.py`, `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` → **Python** — read from `python/`\n   - `*.ts`, `*.tsx`, `package.json`, `tsconfig.json` → **TypeScript** — read from `typescript/`\n   - `*.js`, `*.jsx` (no `.ts` files present) → **TypeScript** — JS uses the same SDK, read from `typescript/`\n   - `*.java`, `pom.xml`, `build.gradle` → **Java** — read from `java/`\n   - `*.kt`, `*.kts`, `build.gradle.kts` → **Java** — Kotlin uses the Java SDK, read from `java/`\n   - `*.scala`, `build.sbt` → **Java** — Scala uses the Java SDK, read from `java/`\n   - `*.go`, `go.mod` → **Go** — read from `go/`\n   - `*.rb`, `Gemfile` → **Ruby** — read from `ruby/`\n   - `*.cs`, `*.csproj` → **C#** — read from `csharp/`\n   - `*.php`, `composer.json` → **PHP** — read from `php/`\n\n2. **If multiple languages detected** (e.g., both Python and TypeScript files):\n\n   - Check which language the user's current file or question relates to\n   - If still ambiguous, ask: \"I detected both Python and TypeScript files. Which language are you using for the Claude API integration?\"\n\n3. **If language can't be inferred** (empty project, no source files, or unsupported language):\n\n   - Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP\n   - If AskUserQuestion is unavailable, default to Python examples and note: \"Showing Python examples. Let me know if you need a different language.\"\n\n4. **If unsupported language detected** (Rust, Swift, C++, Elixir, etc.):\n\n   - Suggest cURL/raw HTTP examples from `curl/` and note that community SDKs may exist\n   - Offer to show Python or TypeScript examples as reference implementations\n\n5. **If user needs cURL/raw HTTP examples**, read from `curl/`.\n\n### Language-Specific Feature Support\n\n| Language   | Tool Runner | Managed Agents | Notes                                 |\n| ---------- | ----------- | -------------- | ------------------------------------- |\n| Python     | Yes (beta)  | Yes (beta)     | Full support — `@beta_tool` decorator |\n| TypeScript | Yes (beta)  | Yes (beta)     | Full support — `betaZodTool` + Zod    |\n| Java       | Yes (beta)  | Yes (beta)     | Beta tool use with annotated classes  |\n| Go         | Yes (beta)  | Yes (beta)     | `BetaToolRunner` in `toolrunner` pkg  |\n| Ruby       | Yes (beta)  | Yes (beta)     | `BaseTool` + `tool_runner` in beta    |\n| C#         | Yes (beta)  | Yes (beta)     | `BetaToolRunner` + raw JSON schema    |\n| PHP        | Yes (beta)  | Yes (beta)     | `BetaRunnableTool` + `toolRunner()`   |\n| cURL       | N/A         | Yes (beta)     | Raw HTTP, no SDK features             |\n\n> **Managed Agents code examples**: dedicated language-specific READMEs are provided for Python, TypeScript, Go, Ruby, PHP, Java, and cURL (`{lang}/managed-agents/README.md`, `curl/managed-agents.md`). Read your language's README plus the language-agnostic `shared/managed-agents-*.md` concept files. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.\n\n---\n\n## Which Surface Should I Use?\n\n> **Start simple.** Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.\n\n| Use Case                                        | Tier            | Recommended Surface       | Why                                                          |\n| ----------------------------------------------- | --------------- | ------------------------- | ------------------------------------------------------------ |\n| Classification, summarization, extraction, Q&A  | Single LLM call | **Claude API**            | One request, one response                                    |\n| Batch processing or embeddings                  | Single LLM call | **Claude API**            | Specialized endpoints                                        |\n| Multi-step pipelines with code-controlled logic | Workflow        | **Claude API + tool use** | You orchestrate the loop                                     |\n| Custom agent with your own tools                | Agent           | **Claude API + tool use** | Maximum flexibility                                          |\n| Server-managed stateful agent with workspace    | Agent           | **Managed Agents**        | Anthropic runs the loop and hosts the tool-execution sandbox |\n| Persisted, versioned agent configs              | Agent           | **Managed Agents**        | Agents are stored objects; sessions pin to a version         |\n| Long-running multi-turn agent with file mounts  | Agent           | **Managed Agents**        | Per-session containers, SSE event stream, Skills + MCP       |\n\n> **Note:** Managed Agents is the right choice when you want Anthropic to run the agent loop *and* host the container where tools execute — file ops, bash, code execution all run in the per-session workspace. If you want to host the compute yourself or run your own custom tool runtime, Claude API + tool use is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).\n\n> **Cloud-provider access.** **Claude Platform on AWS** is Anthropic-operated with same-day API parity — Managed Agents and every feature in this skill work there, **except self-hosted sandboxes** (see `shared/claude-platform-on-aws.md`). **Amazon Bedrock**, **Google Vertex AI**, and **Microsoft Foundry** do **not** support Managed Agents or Anthropic server-side tools; use **Claude API + tool use** on those.\n\n### Decision Tree\n\n```\nWhat does your application need?\n\n0. Which provider?\n   ├── First-party API or Claude Platform on AWS → continue (full surface available).\n   └── Amazon Bedrock, Google Vertex AI, or Microsoft Foundry → Claude API (+ tool use for agents); Managed Agents not available there.\n\n1. Single LLM call (classification, summarization, extraction, Q&A)\n   └── Claude API — one request, one response\n\n2. Do you want Anthropic to run the agent loop and host a per-session\n   container where Claude executes tools (bash, file ops, code)?\n   └── Yes → Managed Agents — server-managed sessions, persisted agent configs,\n       SSE event stream, Skills + MCP, file mounts.\n       Examples: \"stateful coding agent with a workspace per task\",\n                 \"long-running research agent that streams events to a UI\",\n                 \"agent with persisted, versioned config used across many sessions\"\n\n3. Workflow (multi-step, code-orchestrated, with your own tools)\n   └── Claude API with tool use — you control the loop\n\n4. Open-ended agent (model decides its own trajectory, your own tools, you host the compute)\n   └── Claude API agentic loop (maximum flexibility)\n```\n\n### Should I Build an Agent?\n\nBefore choosing the agent tier, check all four criteria:\n\n- **Complexity** — Is the task multi-step and hard to fully specify in advance? (e.g., \"turn this design doc into a PR\" vs. \"extract the title from this PDF\")\n- **Value** — Does the outcome justify higher cost and latency?\n- **Viability** — Is Claude capable at this task type?\n- **Cost of error** — Can errors be caught and recovered from? (tests, review, rollback)\n\nIf the answer is \"no\" to any of these, stay at a simpler tier (single call or workflow).\n\n---\n\n## Architecture\n\nEverything goes through `POST /v1/messages`. Tools and output constraints are features of this single endpoint — not separate APIs.\n\n**User-defined tools** — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.\n\n**Server-side tools** — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in `tools`, Claude runs code automatically). Computer use can be server-hosted or self-hosted.\n\n**Structured outputs** — Constrains the Messages API response format (`output_config.format`) and/or tool parameter validation (`strict: true`). The recommended approach is `client.messages.parse()` which validates responses against your schema automatically. Note: the old `output_format` parameter is deprecated; use `output_config: {format: {...}}` on `messages.create()`.\n\n**Supporting endpoints** — Batches (`POST /v1/messages/batches`), Files (`POST /v1/files`), Token Counting (`POST /v1/messages/count_tokens` — see `shared/token-counting.md`), and Models (`GET /v1/models`, `GET /v1/models/{id}` — live capability/context-window discovery) feed into or support Messages API requests.\n\n---\n\n## Current Models (cached: 2026-05-26)\n\n| Model             | Model ID            | Context        | Input $/1M | Output $/1M |\n| ----------------- | ------------------- | -------------- | ---------- | ----------- |\n| Claude Opus 4.8   | `claude-opus-4-8`   | 1M             | $5.00      | $25.00      |\n| Claude Opus 4.7   | `claude-opus-4-7`   | 1M             | $5.00      | $25.00      |\n| Claude Opus 4.6   | `claude-opus-4-6`   | 1M             | $5.00      | $25.00      |\n| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 1M             | $3.00      | $15.00      |\n| Claude Haiku 4.5  | `claude-haiku-4-5`  | 200K           | $1.00      | $5.00       |\n\n**ALWAYS use `claude-opus-4-8` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says \"use sonnet\" or \"use haiku\". Never downgrade for cost — that's the user's decision, not yours.\n\n**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-6`, never `claude-sonnet-4-6-20251114` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., \"opus 4.5\", \"sonnet 3.7\"), read `shared/models.md` for the exact ID — do not construct one yourself.\n\nA note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.\n\n**Live capability lookup:** The table above is cached. When the user asks \"what's the context window for X\", \"does X support vision/thinking/effort\", or \"which models support Y\", query the Models API (`client.models.retrieve(id)` / `client.models.list()`) — see `shared/models.md` for the field reference and capability-filter examples.\n\n---\n\n## Thinking & Effort (Quick Reference)\n\n**Opus 4.8 / 4.7 — Adaptive thinking only:** Use `thinking: {type: \"adaptive\"}`. `thinking: {type: \"enabled\", budget_tokens: N}` returns a 400 — adaptive is the only on-mode. `{type: \"disabled\"}` and omitting `thinking` both work. Sampling parameters (`temperature`, `top_p`, `top_k`) are also removed and will 400. Opus 4.8 keeps the same request surface as 4.7 (no new breaking changes) — see `shared/model-migration.md` → Migrating to Opus 4.8 for the behavioral re-tuning, and → Migrating to Opus 4.7 for the full breaking-change list when coming from 4.6 or earlier. Note: with `thinking` disabled, Opus 4.8 may write longer reasoning into the visible response — leave adaptive thinking on, or add a final-answer-only instruction (see the migration guide).\n**Opus 4.6 — Adaptive thinking (recommended):** Use `thinking: {type: \"adaptive\"}`. Claude dynamically decides when and how much to think. No `budget_tokens` needed — `budget_tokens` is deprecated on Opus 4.6 and Sonnet 4.6 and should not be used for new code. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). **When the user asks for \"extended thinking\", a \"thinking budget\", or `budget_tokens`: always use Opus 4.8, 4.7, or 4.6 with `thinking: {type: \"adaptive\"}`. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use `budget_tokens` for new 4.6/4.7/4.8 code and do NOT switch to an older model.** *Gradual-migration carve-out:* `budget_tokens` is still functional on Opus 4.6 and Sonnet 4.6 as a transitional escape hatch — if you're migrating existing code and need a hard token ceiling before you've tuned `effort`, see `shared/model-migration.md` → Transitional escape hatch. Note: this carve-out does **not** apply to Opus 4.7 or 4.8 — `budget_tokens` is fully removed there.\n**Effort parameter (GA, no beta header):** Controls thinking depth and overall token spend via `output_config: {effort: \"low\"|\"medium\"|\"high\"|\"max\"}` (inside `output_config`, not top-level). Default is `high` (equivalent to omitting it). `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Opus 4.7 added `\"xhigh\"` (between `high` and `max`) — the best setting for most coding and agentic use cases on Opus 4.7/4.8, and the default in Claude Code; use a minimum of `high` for most intelligence-sensitive work. Works on Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. On Opus 4.7 and 4.8, effort matters more than on any prior Opus — re-tune it when migrating, and run long-horizon/agentic tasks at `high`/`xhigh` with the full task spec given up front. Combine with adaptive thinking for the best cost-quality tradeoffs. Lower effort means fewer and more-consolidated tool calls, less preamble, and terser confirmations — `high` is often the sweet spot balancing quality and token efficiency; use `max` when correctness matters more than cost; use `low` for subagents or simple tasks.\n\n**Opus 4.8 / 4.7 — thinking content omitted by default:** `thinking` blocks still stream but their text is empty unless you opt in with `thinking: {type: \"adaptive\", display: \"summarized\"}` (default is `\"omitted\"`). Silent change — no error. If you stream reasoning to users, the default looks like a long pause before output; set `\"summarized\"` to restore visible progress.\n\n**Task Budgets (beta, Opus 4.7 / 4.8):** `output_config: {task_budget: {type: \"tokens\", total: N}}` tells the model how many tokens it has for a full agentic loop — it sees a running countdown and self-moderates (minimum 20,000; beta header `task-budgets-2026-03-13`). Distinct from `max_tokens`, which is an enforced per-response ceiling the model is not aware of. See `shared/model-migration.md` → Task Budgets.\n\n**Sonnet 4.6:** Supports adaptive thinking (`thinking: {type: \"adaptive\"}`). `budget_tokens` is deprecated on Sonnet 4.6 — use adaptive thinking instead.\n\n**Older models (only if explicitly requested):** If the user specifically asks for Sonnet 4.5 or another older model, use `thinking: {type: \"enabled\", budget_tokens: N}`. `budget_tokens` must be less than `max_tokens` (minimum 1024). Never choose an older model just because the user mentions `budget_tokens` — use Opus 4.8 with adaptive thinking instead.\n\n---\n\n## Compaction (Quick Reference)\n\n**Beta, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** For long-running conversations that may exceed the 1M context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header `compact-2026-01-12`.\n\n**Critical:** Append `response.content` (not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.\n\nSee `{lang}/claude-api/README.md` (Compaction section) for code examples. Full docs via WebFetch in `shared/live-sources.md`.\n\n---\n\n## Prompt Caching (Quick Reference)\n\n**Prefix match.** Any byte change anywhere in the prefix invalidates everything after it. Render order is `tools` → `system` → `messages`. Keep stable content first (frozen system prompt, deterministic tool list), put volatile content (timestamps, per-request IDs, varying questions) after the last `cache_control` breakpoint.\n\n**Mid-conversation operator instructions** (beta header `mid-conversation-system-2026-04-07`, on supporting models): append `{\"role\": \"system\", ...}` to `messages[]` instead of editing top-level `system`. Preserves the cached history prefix and is the prompt-injection-safe operator channel. See `shared/prompt-caching.md` § Mid-conversation system messages.\n\n**Top-level auto-caching** (`cache_control: {type: \"ephemeral\"}` on `messages.create()`) is the simplest option when you don't need fine-grained placement. Max 4 breakpoints per request. Minimum cacheable prefix is ~1024 tokens — shorter prefixes silently won't cache.\n\n**Verify with `usage.cache_read_input_tokens`** — if it's zero across repeated requests, a silent invalidator is at work (`datetime.now()` in system prompt, unsorted JSON, varying tool set).\n\nFor placement patterns, architectural guidance, and the silent-invalidator audit checklist: read `shared/prompt-caching.md`. Language-specific syntax: `{lang}/claude-api/README.md` (Prompt Caching section).\n\n---\n\n## Managed Agents (Beta)\n\n**Managed Agents** is a third surface: server-managed stateful agents with Anthropic-hosted tool execution. You create a persisted, versioned Agent config (`POST /v1/agents`), then start Sessions that reference it. Each session provisions a container as the agent's workspace — bash, file ops, and code execution run there; the agent loop itself runs on Anthropic's orchestration layer and acts on the container via tools. The session streams events; you send messages and tool results back.\n\n**Managed Agents is available on the first-party API and Claude Platform on AWS.** It is **not** available on Amazon Bedrock, Google Vertex AI, or Microsoft Foundry — for agents there, use Claude API + tool use.\n\n**Mandatory flow:** Agent (once) → Session (every run). `model`/`system`/`tools` live on the agent, never the session. See `shared/managed-agents-overview.md` for the full reading guide, beta headers, and pitfalls.\n\n**Beta headers:** `managed-agents-2026-04-01` — the SDK sets this automatically for all `client.beta.{agents,environments,sessions,vaults,memory_stores}.*` calls. Skills API uses `skills-2025-10-02` and Files API uses `files-api-2025-04-14`, but you don't need to explicitly pass those in for endpoints other than `/v1/skills` and `/v1/files`.\n\n**Subcommands** — invoke directly with `/claude-api <subcommand>`:\n\n| Subcommand | Action |\n|---|---|\n| `managed-agents-onboard` | Walk the user through setting up a Managed Agent from scratch. **Read `shared/managed-agents-onboarding.md` immediately** and follow its interview script: mental model → know-or-explore branch → template config → session setup → **pre-flight viability check** → emit code. The viability check (reconcile the stated job against configured tools/credentials/data) catches under-resourced setups — missing a tool, credential, or data access — before the agent burns budget. Do not summarize — run the interview. |\n\n**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, outcomes, multiagent, webhooks, memory, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.\n\n**When the user wants to set up a Managed Agent from scratch** (e.g. \"how do I get started\", \"walk me through creating one\", \"set up a new agent\"): read `shared/managed-agents-onboarding.md` and run its interview — same flow as the `managed-agents-onboard` subcommand.\n\n**When the user asks \"how do I write the client code for X\":** reach for `shared/managed-agents-client-patterns.md` — covers lossless stream reconnect, `processed_at` queued/processed gate, interrupt, `tool_confirmation` round-trip, the correct idle/terminated break gate, post-idle status race, stream-first ordering, file-mount gotchas, keeping credentials host-side via custom tools, etc.\n\n---\n\n## Reference Documentation\n\nThe relevant documentation for your detected language is included below in `<doc>` tags. Each tag has a `path` attribute showing its original file path. Use this to find the right section:\n\n### Quick Task Reference\n\n**Single text classification/summarization/extraction/Q&A:**\n→ Refer to `typescript/claude-api/README.md`\n\n**Chat UI or real-time response display:**\n→ Refer to `typescript/claude-api/README.md` + `typescript/claude-api/streaming.md`\n\n**Long-running conversations (may exceed context window):**\n→ Refer to `typescript/claude-api/README.md` — see Compaction section\n\n**Migrating to a newer model or replacing a retired model:**\n→ Refer to `shared/model-migration.md`\n\n**Prompt caching / optimize caching / \"why is my cache hit rate low\":**\n→ Refer to `shared/prompt-caching.md` + `typescript/claude-api/README.md` (Prompt Caching section)\n\n**Count tokens in a file / prompt / diff (\"how many tokens is X\"):**\n→ Refer to `shared/token-counting.md` — use `messages.count_tokens`, never `tiktoken`\n\n**Function calling / tool use / agents:**\n→ Refer to `typescript/claude-api/README.md` + `shared/tool-use-concepts.md` + `typescript/claude-api/tool-use.md`\n\n**Batch processing (non-latency-sensitive):**\n→ Refer to `typescript/claude-api/README.md` + `typescript/claude-api/batches.md`\n\n**File uploads across multiple requests:**\n→ Refer to `typescript/claude-api/README.md` + `typescript/claude-api/files-api.md`\n\n**Agent design (tool surface, context management, caching strategy):**\n→ Refer to `shared/agent-design.md`\n\n**Anthropic CLI (`ant`) — terminal access, version-controlled agent/environment YAML, scripting:**\n→ Refer to `shared/anthropic-cli.md`\n\n**Managed Agents (server-managed stateful agents):**\n→ Refer to `shared/managed-agents-overview.md` and the rest of the `shared/managed-agents-*.md` files. For Python, TypeScript, and cURL, language-specific code examples live in `typescript/managed-agents/README.md`. Java, Go, Ruby, and PHP also support the API — translate the calls using your SDK's patterns from `typescript/claude-api.md`. C# does not currently have Managed Agents support; use raw HTTP from `curl/managed-agents.md` as a reference.\n\n**Error handling:**\n→ Refer to `shared/error-codes.md`\n\n**Latest docs via WebFetch:**\n→ Refer to `shared/live-sources.md` for URLs\n\n---\n\n## Included Documentation\n\n<doc path=\"shared/agent-design.md\">\n# Agent Design Patterns\n\nThis file covers decision heuristics for building agents on the Claude API: which primitives to reach for, how to design your tool surface, and how to manage context and cost over long runs. For per-tool mechanics and code examples, see `tool-use-concepts.md` and the language-specific folders.\n\n---\n\n## Model Parameters\n\n| Parameter | When to use it | What to expect |\n| --- | --- | --- |\n| **Adaptive thinking** (`thinking: {type: \"adaptive\"}`) | When you want Claude to control when and how much to think. | Claude determines thinking depth per request and automatically interleaves thinking between tool calls. No token budget to tune. |\n| **Effort** (`output_config: {effort: ...}`) | When adjusting the tradeoff between thoroughness and token efficiency. | Lower effort → fewer and more-consolidated tool calls, less preamble, terser confirmations. `medium` is often a favorable balance. Use `max` when correctness matters more than cost. |\n\nSee `SKILL.md` §Thinking & Effort for model support and parameter details.\n\n---\n\n## Designing Your Tool Surface\n\n### Bash vs. dedicated tools\n\nClaude doesn't know your application's security boundary, approval policy, or UX surface. Claude emits tool calls; your harness handles them. The shape of those tool calls determines what the harness can do.\n\nA **bash tool** gives Claude broad programmatic leverage — it can perform almost any action. But it gives the harness only an opaque command string, the same shape for every action. Promoting an action to a **dedicated tool** gives the harness an action-specific hook with typed arguments it can intercept, gate, render, or audit.\n\n**When to promote an action to a dedicated tool:**\n\n- **Security boundary.** Actions that require gating are natural candidates. Reversibility is a useful criterion: hard-to-reverse actions (external API calls, sending messages, deleting data) can be gated behind user confirmation. A `send_email` tool is easy to gate; `bash -c \"curl -X POST ...\"` is not.\n- **Staleness checks.** A dedicated `edit` tool can reject writes if the file changed since Claude last read it. Bash can't enforce that invariant.\n- **Rendering.** Some actions benefit from custom UI. Claude Code promotes question-asking to a tool so it can render as a modal, present options, and block the agent loop until answered.\n- **Scheduling.** Read-only tools like `glob` and `grep` can be marked parallel-safe. When the same actions run through bash, the harness can't tell a parallel-safe `grep` from a parallel-unsafe `git push`, so it must serialize.\n\n**Rule of thumb:** Start with bash for breadth. Promote to dedicated tools when you need to gate, render, audit, or parallelize the action.\n\n---\n\n## Anthropic-Provided Tools\n\n| Tool | Side | When to use it | What to expect |\n| --- | --- | --- | --- |\n| **Bash** | Client | Claude needs to execute shell commands. | Claude emits commands; your harness executes them. Reference implementation provided. |\n| **Text editor** | Client | Claude needs to read or edit files. | Claude views, creates, and edits files via your implementation. Reference implementation provided. |\n| **Computer use** | Client or Server | Claude needs to interact with GUIs, web apps, or visual interfaces. | Claude takes screenshots and issues mouse/keyboard commands. Can be self-hosted (you run the environment) or Anthropic-hosted. |\n| **Code execution** | Server | Claude needs to run code in a sandbox you don't want to manage. | Anthropic-hosted container with built-in file and bash sub-tools. No client-side execution. |\n| **Web search / fetch** | Server | Claude needs information past its training cutoff (news, current events, recent docs) or the content of a specific URL. | Claude issues a query or URL; Anthropic executes it and returns results with citations. |\n| **Memory** | Client | Claude needs to save context across sessions. | Claude reads/writes a `/memories` directory. You implement the storage backend. |\n\n**Client-side** tools are defined by Anthropic (name, schema, Claude's usage pattern) but executed by your harness. Anthropic provides reference implementations. **Server-side** tools run entirely on Anthropic infrastructure — declare them in `tools` and Claude handles the rest.\n\n---\n\n## Composing Tool Calls: Programmatic Tool Calling\n\nWith standard tool use, each tool call is a round trip: Claude calls the tool, the result lands in Claude's context, Claude reasons about it, then calls the next tool. Three sequential actions (read profile → look up orders → check inventory) means three round trips. Each adds latency and tokens, and most of the intermediate data is never needed again.\n\n**Programmatic tool calling (PTC)** lets Claude compose those calls into a script instead. The script runs in the code execution container. When the script calls a tool, the container pauses, the call is executed (client-side or server-side), and the result returns to the running code — not to Claude's context. The script processes it with normal control flow (loops, filters, branches). Only the script's final output returns to Claude.\n\n| When to use it | What to expect |\n| --- | --- |\n| Many sequential tool calls, or large intermediate results you want filtered before they hit the context window. | Claude writes code that invokes tools as functions. Runs in the code execution container. Token cost scales with final output, not intermediate results. |\n\n---\n\n## Scaling the Tool and Instruction Set\n\n| Feature | When to use it | What to expect |\n| --- | --- | --- |\n| **Tool search** | Many tools available, but only a few relevant per request. Don't want all schemas in context upfront. | Claude searches the tool set and loads only relevant schemas. Tool definitions are appended, not swapped — preserves cache (see Caching below). |\n| **Skills** | Task-specific instructions Claude should load only when relevant. | Each skill is a folder with a `SKILL.md`. The skill's description sits in context by default; Claude reads the full file when the task calls for it. |\n\nBoth patterns keep the fixed context small and load detail on demand.\n\n---\n\n## Long-Running Agents: Managing Context\n\n| Pattern | When to use it | What to expect |\n| --- | --- | --- |\n| **Context editing** | Context grows stale over many turns (old tool results, completed thinking). | Tool results and thinking blocks are cleared based on configurable thresholds. Keeps the transcript lean without summarizing. |\n| **Compaction** | Conversation likely to reach or exceed the context window limit. | Earlier context is summarized into a compaction block server-side. See `SKILL.md` §Compaction for the critical `response.content` handling. |\n| **Memory** | State must persist across sessions (not just within one conversation). | Claude reads/writes files in a memory directory. Survives process restarts. |\n\n**Choosing between them:** Context editing and compaction operate within a session — editing prunes stale turns, compaction summarizes when you're near the limit. Memory is for cross-session persistence. Many long-running agents use all three.\n\n---\n\n## Caching for Agents\n\n**Read `prompt-caching.md` first.** It covers the prefix-match invariant, breakpoint placement, the silent-invalidator audit, and why changing tools or models mid-session breaks the cache. This section covers only the agent-specific workarounds for those constraints.\n\n| Constraint (from `prompt-caching.md`) | Agent-specific workaround |\n| --- | --- |\n| Editing the system prompt mid-session invalidates the cache. | Append a `{\"role\": \"system\", ...}` message to `messages[]` instead (beta, on supporting models — see `prompt-caching.md` § Mid-conversation system messages). The cached prefix stays intact, and the model treats it as an operator-authority instruction rather than user text. On models that don't support it, fall back to a `<system-reminder>` text block in the user turn. |\n| Switching models mid-session invalidates the cache. | Spawn a **subagent** with the cheaper model for the sub-task; keep the main loop on one model. Claude Code's Explore subagents use Haiku this way. |\n| Adding/removing tools mid-session invalidates the cache. | Use **tool search** for dynamic discovery — it appends tool schemas rather than swapping them, so the existing prefix is preserved. |\n\nFor multi-turn breakpoint placement, use top-level auto-caching — see `prompt-caching.md` §Placement patterns.\n\n---\n\nFor live documentation on any of these features, see `live-sources.md`.\n</doc>\n\n<doc path=\"shared/anthropic-cli.md\">\n# Anthropic CLI (`ant`)\n\nThe `ant` CLI exposes every Claude API resource as a shell subcommand. Compared to `curl`: request bodies are built from typed flags or piped YAML instead of hand-written JSON, `@path` inlines file contents into any string field, `--transform` extracts fields with a GJSON path (no `jq`), list endpoints auto-paginate (cap total results with `--max-items N`; `--limit` only sets the server page size), and the `beta:` prefix auto-sets the right `anthropic-beta` header.\n\n## When to use the CLI vs the SDK\n\n**CLI for the control plane, SDK for the data plane.** Agents and environments are relatively static resources you define, configure, and debug with `ant` — check the YAML into your repo, apply from CI, inspect from a terminal. Sessions are dynamic and driven by your application through the SDK — create per task, stream events, react to tool calls, integrate into your product. Both hit the same API; the split is about where the call lives, not what's possible.\n\n| | Control plane → `ant` | Data plane → SDK |\n|---|---|---|\n| Resources | agents, environments, skills, vaults, files | sessions, events |\n| Cadence | Once per deploy / ad-hoc | Every task / every turn |\n| Lives in | `*.yaml` in your repo + CI + terminal | Application code |\n| Typical calls | `create < agent.yaml`, `update --version N`, `list`, `retrieve`, `archive`, `--debug` | `sessions.create()`, `events.stream()`, `events.send()` |\n\n## Install and auth\n\n```sh\n# macOS\nbrew install anthropics/tap/ant\nxattr -d com.apple.quarantine \"$(brew --prefix)/bin/ant\"\n\n# Linux / WSL — pick the release from github.com/anthropics/anthropic-cli/releases\ncurl -fsSL \"https://github.com/anthropics/anthropic-cli/releases/download/v${VERSION}/ant_${VERSION}_$(uname -s | tr A-Z a-z)_$(uname -m | sed -e s/x86_64/amd64/ -e s/aarch64/arm64/).tar.gz\" \\\n  | sudo tar -xz -C /usr/local/bin ant\n\n# Or from source (Go 1.22+)\ngo install github.com/anthropics/anthropic-cli/cmd/ant@latest\n```\n\n**Auth** — the CLI resolves credentials the same way the SDKs do (first match wins): explicit flags, then `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN` env vars, then `ANTHROPIC_PROFILE`, then the active profile from `ant auth login`. Override the host with `ANTHROPIC_BASE_URL` or `--base-url`.\n\n- **API key**: set `ANTHROPIC_API_KEY` in the environment.\n- **OAuth profile** (no static key to manage): `ant auth login` opens a browser, exchanges for a short-lived token, and stores a profile under `~/.config/anthropic/`. Subsequent `ant` (and SDK) calls pick it up automatically. `ant auth status` shows the active profile; `ant auth logout` clears it.\n\nTo hand the active credential to a subprocess or raw-HTTP script:\n\n```sh\n# Bare access token — for curl's Authorization header\ncurl https://api.anthropic.com/v1/messages \\\n  -H \"Authorization: Bearer $(ant auth print-credentials --access-token)\" \\\n  -H \"anthropic-version: 2023-06-01\" \\\n  -H \"content-type: application/json\" \\\n  -d '{\"model\": \"claude-opus-4-8\", \"max_tokens\": 1024, \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'\n\n# .env format — sets ANTHROPIC_AUTH_TOKEN (and ANTHROPIC_BASE_URL if the profile has one).\n# Output is bare KEY=value (no `export`), so use `set -a` to auto-export for child processes:\nset -a; eval \"$(ant auth print-credentials --env)\"; set +a\npython my_script.py   # SDK picks up ANTHROPIC_AUTH_TOKEN\n```\n\nOAuth tokens go on `Authorization: Bearer` (not `x-api-key:`). The token is short-lived and not auto-refreshed when passed via env var, so re-run `print-credentials` before it expires for long-running scripts. If both `ANTHROPIC_API_KEY` and `ANTHROPIC_AUTH_TOKEN` are set, the SDKs send both and the API rejects the request — unset `ANTHROPIC_API_KEY` before `eval`ing the `--env` output.\n\n## Command structure\n\n```\nant <resource>[:<subresource>] <action> [flags]\n```\n\nBeta resources (agents, sessions, environments, deployments, skills, vaults, memory stores) live under `beta:` — the CLI auto-sends the right `anthropic-beta` header, so don't pass it yourself unless overriding with `--beta <header>`. For self-hosted environments, `ant beta:worker poll/run` and `ant beta:environments:work stats/stop` drive and monitor the work queue — see `shared/managed-agents-self-hosted-sandboxes.md`.\n\n```sh\nant models list\nant messages create --model claude-opus-4-8 --max-tokens 1024 --message '{role: user, content: \"Hello\"}'\nant beta:agents retrieve --agent-id agent_01...\nant beta:sessions:events list --session-id session_01...\n```\n\n`ant --help` lists resources; append `--help` to any subcommand for its flags.\n\n## Global flags\n\n| Flag | Purpose |\n| --- | --- |\n| `--format` | `auto` (default: pretty if TTY, compact if piped), `json`, `jsonl`, `yaml`, `pretty`, `raw`, `explore` (interactive TUI) |\n| `--transform` | GJSON path applied to the response (per-item on list endpoints). Not applied when `--format raw`. |\n| `-r`, `--raw-output` | If the transformed result is a string, print it without quotes (jq semantics). Pair with `--transform` for scalar capture. |\n| `--max-items` | Cap total results returned from auto-paginating list endpoints (distinct from `--limit`, which is the server page size). |\n| `--format-error` / `--transform-error` | Same as `--format`/`--transform`, applied to error responses. `-r` does not apply to the error path — use `--format-error yaml` for unquoted error scalars. |\n| `--base-url` | Override API host |\n| `--debug` | Print full HTTP request + response to stderr (API key redacted) |\n\n## Output — `--transform` + `--format`\n\n`--transform` takes a [GJSON path](https://github.com/tidwall/gjson/blob/master/SYNTAX.md). On list endpoints it runs **per item**, not on the envelope.\n\n```sh\nant beta:agents list --transform '{id,name,model}' --format jsonl\n```\n\n**Extract a scalar for shell use:** pair `--transform` with `-r` (`--raw-output` — prints strings unquoted, jq-style):\n\n```sh\nAGENT_ID=$(ant beta:agents create --name \"My Agent\" --model '{id: claude-sonnet-4-6}' \\\n  --transform id -r)\n```\n\n## Input — flags, stdin, `@file`\n\n**Flags** — scalar fields map directly. Structured fields accept relaxed-YAML syntax (unquoted keys) or strict JSON. Repeatable flags build arrays (each `--tool`, `--event`, `--message` appends one element):\n\n```sh\nant beta:agents create \\\n  --name \"Research Agent\" \\\n  --model '{id: claude-opus-4-8}' \\\n  --tool '{type: agent_toolset_20260401}' \\\n  --tool '{type: custom, name: search_docs, input_schema: {type: object, properties: {query: {type: string}}}}'\n```\n\n**Stdin** — pipe a full JSON or YAML body. Merged with flags; flags win on conflict (for array fields, any flag **replaces** the stdin array entirely — it does not append). Quote the heredoc delimiter (`<<'YAML'`) to disable shell expansion inside the body:\n\n```sh\nant beta:agents create <<'YAML'\nname: Research Agent\nmodel: claude-opus-4-8\nsystem: |\n  You are a research assistant. Cite sources for every claim.\ntools:\n  - type: agent_toolset_20260401\nYAML\n```\n\n**`@file` references** — inline a file's contents into any string-valued field. Inside structured flag values, quote the path. Binary files are auto-base64'd; force with `@file://` (text) or `@data://` (base64). Escape a literal leading `@` as `\\@`.\n\n```sh\nant beta:agents create --name \"Researcher\" --model '{id: claude-sonnet-4-6}' --system @./prompts/researcher.txt\n\nant messages create --model claude-opus-4-8 --max-tokens 1024 \\\n  --message '{role: user, content: [\n    {type: document, source: {type: base64, media_type: application/pdf, data: \"@./scan.pdf\"}},\n    {type: text, text: \"Extract the text from this scanned document.\"}\n  ]}' \\\n  --transform 'content.0.text' -r\n```\n\nFlags that natively take a file path (e.g. `--file` on `beta:files upload`) accept a bare path without `@`.\n\n## Version-controlled Managed Agents resources\n\nThis is the recommended flow for defining agents and environments — check the YAML into your repo and sync via `create` (first time) / `update` (thereafter). See `shared/managed-agents-core.md` for the field reference.\n\n```yaml\n# summarizer.agent.yaml\nname: Summarizer\nmodel: claude-sonnet-4-6\nsystem: |\n  You are a helpful assistant that writes concise summaries.\ntools:\n  - type: agent_toolset_20260401\n```\n\n```sh\n# Create (once) — capture the ID\nAGENT_ID=$(ant beta:agents create < summarizer.agent.yaml --transform id -r)\n\n# Update (CI) — needs ID + current version (optimistic lock)\nant beta:agents update --agent-id \"$AGENT_ID\" --version 1 < summarizer.agent.yaml\n```\n\nSame pattern for environments (`ant beta:environments create|update < env.yaml`), then start a session with both IDs:\n\n```sh\nant beta:sessions create --agent \"$AGENT_ID\" --environment-id \"$ENV_ID\" --title \"Task\"\nant beta:sessions:events send --session-id \"$SID\" \\\n  --event '{type: user.message, content: [{type: text, text: \"Summarize X\"}]}'\nant beta:sessions:events list --session-id \"$SID\" --transform 'content.0.text' -r\nant beta:sessions:events stream --session-id \"$SID\"   # live event stream\n```\n\n### Interactive session loop (stream-before-send)\n\n`ant beta:sessions:events stream` only delivers events emitted *after* the stream opens — so open it **before** sending the kickoff to avoid missing early events. Use process substitution to hold the stream on a file descriptor, send, then read:\n\n```sh\nexec {stream}< <(ant beta:sessions:events stream --session-id \"$SID\" \\\n  --transform '{type,text:content.#(type==\"text\").text,err:error.message}' --format yaml)\n\nant beta:sessions:events send --session-id \"$SID\" > /dev/null <<'YAML'\nevents:\n  - type: user.message\n    content:\n      - type: text\n        text: Summarize the repo README\nYAML\n\ntype=\nwhile IFS= read -r -u \"$stream\" line; do\n  case \"$line\" in\n    type:\\ session.status_idle) break ;;\n    type:\\ session.error)\n      IFS= read -r -u \"$stream\" next || next=\n      case \"$next\" in err:\\ *) msg=${next#err: } ;; *) msg=unknown ;; esac\n      printf '\\n[Error: %s]\\n' \"$msg\"; break ;;\n    type:\\ *) type=${line#type: } ;;\n    text:*)\n      [[ $type == agent.message ]] || continue\n      val=${line#text: }\n      case \"$val\" in '|-'|'|') ;; *) printf '%s' \"$val\" ;; esac ;;\n    \\ \\ *)\n      if [[ $type == agent.message ]]; then printf '%s\\n' \"${line#  }\"; fi ;;\n  esac\ndone\nexec {stream}<&-\n```\n\nThis works for interactive exploration and demos. For application code that needs to react to `agent.tool_use` / `agent.custom_tool_use` events, reconnect after drops, or dedup against `events.list`, use the SDK — see `shared/managed-agents-client-patterns.md`.\n\n## Scripting patterns\n\n`--transform id -r` on a list endpoint emits one bare ID per line — compose with `xargs`, or use `--max-items N` to bound the result set without piping through `head`:\n\n```sh\nFIRST=$(ant beta:agents list --transform id -r --max-items 1)\nant beta:agents:versions list --agent-id \"$FIRST\" --transform '{version,created_at}' --format jsonl\n```\n\nError shaping mirrors the success path (note: `-r` does not apply to error output — use `--format-error yaml` for an unquoted scalar here):\n\n```sh\nant beta:agents retrieve --agent-id bogus --transform-error error.message --format-error yaml 2>&1\n```\n\nShell completion: `ant @completion {zsh|bash|fish|powershell}`.\n\nFor the full, always-current reference (including per-endpoint flags), WebFetch the **Anthropic CLI** URL in `shared/live-sources.md`.\n</doc>\n\n<doc path=\"shared/claude-platform-on-aws.md\">\n# Claude Platform on AWS\n\n**Anthropic-operated** access to the Claude Developer Platform through AWS infrastructure — SigV4 authentication, AWS IAM access control, and AWS Marketplace billing. Because Anthropic operates it, **the API surface matches first-party with same-day parity**: Managed Agents, server-side tools, batches, Files, and every feature in this skill work the same way (**except self-hosted sandboxes** — `config:{type:\"self_hosted\"}` is not available here; use `cloud`). Model IDs are the bare first-party strings (`claude-opus-4-8`, `claude-sonnet-4-6`) — **no provider prefix**.\n\n> **Not the same as Amazon Bedrock.** Bedrock is partner-operated (AWS runs the service; release schedules vary, feature subset, `anthropic.`-prefixed model IDs). Claude Platform on AWS and Bedrock coexist; pick by whether you need AWS-native IAM/billing with full Anthropic API parity (this page) vs. Bedrock's own ecosystem.\n\n---\n\n## Client & install\n\n| Language | Install | Client |\n|---|---|---|\n| Python | `pip install -U \"anthropic[aws]\"` | `from anthropic import AnthropicAWS` → `AnthropicAWS()` |\n| TypeScript | `npm install @anthropic-ai/aws-sdk` | `import AnthropicAws from \"@anthropic-ai/aws-sdk\"` → `new AnthropicAws()` |\n| Go | `go get github.com/anthropics/anthropic-sdk-go` | `import anthropicaws \"github.com/anthropics/anthropic-sdk-go/aws\"` → `anthropicaws.NewClient(ctx, anthropicaws.ClientConfig{})` |\n| C# | `dotnet add package Anthropic.Aws` | `new AnthropicAwsClient()` |\n| Java | See SDK repo in `shared/live-sources.md` | See SDK repo in `shared/live-sources.md` |\n| Ruby | `gem install anthropic aws-sdk-core` | See SDK repo in `shared/live-sources.md` |\n| PHP | `composer require anthropic-ai/sdk aws/aws-sdk-php` | See SDK repo in `shared/live-sources.md` |\n\nAfter construction, **use the client exactly as you would `Anthropic()`** — `client.messages.create(...)`, `client.beta.sessions.*`, etc., with bare model IDs.\n\n```python\nfrom anthropic import AnthropicAWS\n\nclient = AnthropicAWS()  # region + workspace_id from env; see below\nclient.messages.create(\n    model=\"claude-opus-4-8\",\n    max_tokens=1024,\n    messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n)\n```\n\n---\n\n## Required configuration\n\nTwo values must be available (constructor args or environment) — **there is no default fallback** for either:\n\n| Value | Env var | Notes |\n|---|---|---|\n| AWS region | `AWS_REGION` | Required. Unlike `AnthropicBedrock`, there is no `us-east-1` fallback. |\n| Workspace ID | `ANTHROPIC_AWS_WORKSPACE_ID` | Required. Routes requests to your Claude workspace. |\n\nEndpoint pattern: `https://aws-external-anthropic.{region}.api.aws/v1/...`. Requests are SigV4-signed with service name `aws-external-anthropic`.\n\n## Authentication\n\nThe client resolves AWS credentials via the standard precedence chain: explicit constructor args → environment (`AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`/`AWS_SESSION_TOKEN`) → shared profile → assumed role / instance metadata.\n\n**Short-term API keys** are also supported for cases where SigV4 isn't practical (e.g., browser, simple scripts). Mint one with the per-language token-generator package; pass it as `api_key` on the client. Lifetime is the **lesser of** the requested duration, the underlying credential's expiry, and **12 hours**. For package names and IAM details, WebFetch the Claude Platform on AWS page in `shared/live-sources.md`.\n\n---\n\n## What to tell users\n\n- Treat it as first-party: every section of this skill applies unchanged. Do **not** apply Bedrock's feature-availability mask.\n- Model IDs are bare (`claude-opus-4-8`). Do **not** add an `anthropic.` prefix.\n- A missing region or `workspace_id` throws at client-construction time (no request is sent). A **403** means the request reached the server — check for a **wrong** `workspace_id` or a missing IAM action on the principal. See the IAM actions reference in `shared/live-sources.md`.\n</doc>\n\n<doc path=\"shared/error-codes.md\">\n# HTTP Error Codes Reference\n\nThis file documents HTTP error codes returned by the Claude API, their common causes, and how to handle them. For language-specific error handling examples, see the `python/` or `typescript/` folders.\n\n## Error Code Summary\n\n| Code | Error Type              | Retryable | Common Cause                         |\n| ---- | ----------------------- | --------- | ------------------------------------ |\n| 400  | `invalid_request_error` | No        | Invalid request format or parameters |\n| 401  | `authentication_error`  | No        | Invalid or missing API key           |\n| 403  | `permission_error`      | No        | API key lacks permission             |\n| 404  | `not_found_error`       | No        | Invalid endpoint or model ID         |\n| 413  | `request_too_large`     | No        | Request exceeds size limits          |\n| 429  | `rate_limit_error`      | Yes       | Too many requests                    |\n| 500  | `api_error`             | Yes       | Anthropic service issue              |\n| 529  | `overloaded_error`      | Yes       | API is temporarily overloaded        |\n\n## Detailed Error Information\n\n### 400 Bad Request\n\n**Causes:**\n\n- Malformed JSON in request body\n- Missing required parameters (`model`, `max_tokens`, `messages`)\n- Invalid parameter types (e.g., string where integer expected)\n- Empty messages array\n- Messages not alternating user/assistant\n\n**Example error:**\n\n```json\n{\n  \"type\": \"error\",\n  \"error\": {\n    \"type\": \"invalid_request_error\",\n    \"message\": \"messages: roles must alternate between \\\"user\\\" and \\\"assistant\\\"\"\n  },\n  \"request_id\": \"req_011CSHoEeqs5C35K2UUqR7Fy\"\n}\n```\n\n**Fix:** Validate request structure before sending. Check that:\n\n- `model` is a valid model ID\n- `max_tokens` is a positive integer\n- `messages` array is non-empty and alternates correctly\n\n---\n\n### 401 Unauthorized\n\n**Causes:**\n\n- Missing `x-api-key` header or `Authorization` header\n- Invalid API key format\n- Revoked or deleted API key\n- OAuth bearer token sent via `x-api-key` instead of `Authorization: Bearer`\n- Both `ANTHROPIC_API_KEY` and `ANTHROPIC_AUTH_TOKEN` set — the SDK sends both headers and the API rejects the request\n\n**Fix:** Set `ANTHROPIC_API_KEY`, or run `ant auth login` and leave the client constructor empty. For raw HTTP with an OAuth token, use `Authorization: Bearer <token>` (not `x-api-key:`).\n\n---\n\n### 403 Forbidden\n\n**Causes:**\n\n- API key doesn't have access to the requested model\n- Organization-level restrictions\n- Attempting to access beta features without beta access\n\n**Fix:** Check your API key permissions in the Console. You may need a different API key or to request access to specific features.\n\n---\n\n### 404 Not Found\n\n**Causes:**\n\n- Typo in model ID (e.g., `claude-sonnet-4.6` instead of `claude-sonnet-4-6`)\n- Using deprecated model ID\n- Invalid API endpoint\n\n**Fix:** Use exact model IDs from the models documentation. You can use aliases (e.g., `claude-opus-4-8`).\n\n---\n\n### 413 Request Too Large\n\n**Causes:**\n\n- Request body exceeds maximum size\n- Too many tokens in input\n- Image data too large\n\n**Fix:** Reduce input size — truncate conversation history, compress/resize images, or split large documents into chunks.\n\n---\n\n### 400 Validation Errors\n\nSome 400 errors are specifically related to parameter validation:\n\n- `max_tokens` exceeds model's limit\n- Invalid `temperature` value (must be 0.0-1.0)\n- `budget_tokens` >= `max_tokens` in extended thinking\n- Invalid tool definition schema\n\n**Model-specific 400s on Opus 4.8 / 4.7:**\n\n- `temperature`, `top_p`, `top_k` are removed — sending any of them returns 400. Delete the parameter; see `shared/model-migration.md` → Per-SDK Syntax Reference.\n- `thinking: {type: \"enabled\", budget_tokens: N}` is removed — sending it returns 400. Use `thinking: {type: \"adaptive\"}` instead.\n\n**Common mistake with extended thinking on older models (Opus 4.6 and earlier):**\n\n```\n# Wrong: budget_tokens must be < max_tokens\nthinking: budget_tokens=10000, max_tokens=1000  → Error!\n\n# Correct\nthinking: budget_tokens=10000, max_tokens=16000\n```\n\n---\n\n### 429 Rate Limited\n\n**Causes:**\n\n- Exceeded requests per minute (RPM)\n- Exceeded tokens per minute (TPM)\n- Exceeded tokens per day (TPD)\n\n**Headers to check:**\n\n- `retry-after`: Seconds to wait before retrying\n- `x-ratelimit-limit-*`: Your limits\n- `x-ratelimit-remaining-*`: Remaining quota\n\n**Fix:** The Anthropic SDKs automatically retry 429 and 5xx errors with exponential backoff (default: `max_retries=2`). For custom retry behavior, see the language-specific error handling examples.\n\n---\n\n### 500 Internal Server Error\n\n**Causes:**\n\n- Temporary Anthropic service issue\n- Bug in API processing\n\n**Fix:** Retry with exponential backoff. If persistent, check [status.anthropic.com](https://status.anthropic.com).\n\n---\n\n### 529 Overloaded\n\n**Causes:**\n\n- High API demand\n- Service capacity reached\n\n**Fix:** Retry with exponential backoff. Consider using a different model (Haiku is often less loaded), spreading requests over time, or implementing request queuing.\n\n---\n\n## Common Mistakes and Fixes\n\n| Mistake                         | Error            | Fix                                                     |\n| ------------------------------- | ---------------- | ------------------------------------------------------- |\n| `temperature`/`top_p`/`top_k` on Opus 4.8 / 4.7 | 400 | Remove the parameter (see `shared/model-migration.md`)  |\n| `budget_tokens` on Opus 4.8 / 4.7 | 400            | Use `thinking: {type: \"adaptive\"}`                      |\n| `budget_tokens` >= `max_tokens` (older models) | 400 | Ensure `budget_tokens` < `max_tokens`                  |\n| Typo in model ID                | 404              | Use valid model ID like `claude-opus-4-8`               |\n| First message is `assistant`    | 400              | First message must be `user`                            |\n| Consecutive same-role messages  | 400              | Alternate `user` and `assistant`                        |\n| API key in code                 | 401 (leaked key) | Use environment variable                                |\n| Custom retry needs              | 429/5xx          | SDK retries automatically; customize with `max_retries` |\n\n## Typed Exceptions in SDKs\n\n**Always use the SDK's typed exception classes** instead of checking error messages with string matching. Each HTTP error code maps to a specific exception class:\n\n| HTTP Code | TypeScript Class                  | Python Class                      |\n| --------- | --------------------------------- | --------------------------------- |\n| 400       | `Anthropic.BadRequestError`       | `anthropic.BadRequestError`       |\n| 401       | `Anthropic.AuthenticationError`   | `anthropic.AuthenticationError`   |\n| 403       | `Anthropic.PermissionDeniedError` | `anthropic.PermissionDeniedError` |\n| 404       | `Anthropic.NotFoundError`         | `anthropic.NotFoundError`         |\n| 413       | `Anthropic.RequestTooLargeError`  | `anthropic.RequestTooLargeError`  |\n| 429       | `Anthropic.RateLimitError`        | `anthropic.RateLimitError`        |\n| 500+      | `Anthropic.InternalServerError`   | `anthropic.InternalServerError`   |\n| 529       | `Anthropic.OverloadedError`       | `anthropic.OverloadedError`       |\n| Any       | `Anthropic.APIError`              | `anthropic.APIError`              |\n\n```typescript\n// ✅ Correct: use typed exceptions\ntry {\n  const response = await client.messages.create({...});\n} catch (error) {\n  if (error instanceof Anthropic.RateLimitError) {\n    // Handle rate limiting\n  } else if (error instanceof Anthropic.APIError) {\n    console.error(`API error ${error.status}:`, error.message);\n  }\n}\n\n// ❌ Wrong: don't check error messages with string matching\ntry {\n  const response = await client.messages.create({...});\n} catch (error) {\n  const msg = error instanceof Error ? error.message : String(error);\n  if (msg.includes(\"429\") || msg.includes(\"rate_limit\")) { ... }\n}\n```\n\nAll exception classes extend `Anthropic.APIError`, which has a `status` property. Use `instanceof` checks from most specific to least specific (e.g., check `RateLimitError` before `APIError`).\n\n### Error `.type` Field\n\nAll `APIStatusError` subclasses now expose a `.type` property (Python: `.type`, TypeScript: `.type`, Java: `.errorType()`, Go: `.Type()`, Ruby: `.type`, PHP: `.type`) that returns the API error type string (e.g., `\"invalid_request_error\"`, `\"authentication_error\"`, `\"rate_limit_error\"`, `\"overloaded_error\"`). Use this for programmatic error classification when you need finer granularity than the HTTP status code — for example, distinguishing `\"billing_error\"` from `\"permission_error\"` (both map to 403).\n\n```python\nexcept anthropic.APIStatusError as e:\n    if e.type == \"rate_limit_error\":\n        # handle rate limiting\n    elif e.type == \"overloaded_error\":\n        # handle overload\n```\n</doc>\n\n<doc path=\"shared/live-sources.md\">\n# Live Documentation Sources\n\nThis file contains WebFetch URLs for fetching current information from platform.claude.com and Agent SDK repositories. Use these when users need the latest data that may have changed since the cached content was last updated.\n\n## When to Use WebFetch\n\n- User explicitly asks for \"latest\" or \"current\" information\n- Cached data seems incorrect\n- User asks about features not covered in cached content\n- User needs specific API details or examples\n\n## Claude API Documentation URLs\n\n### Models & Pricing\n\n| Topic           | URL                                                                          | Extraction Prompt                                                               |\n| --------------- | ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |\n| Models Overview | `https://platform.claude.com/docs/en/about-claude/models/overview.md`        | \"Extract current model IDs, context windows, and pricing for all Claude models\" |\n| Migration Guide | `https://platform.claude.com/docs/en/about-claude/models/migration-guide.md` | \"Extract breaking changes, deprecated parameters, and per-model migration steps when moving to a newer Claude model\" |\n| Pricing         | `https://platform.claude.com/docs/en/pricing.md`                             | \"Extract current pricing per million tokens for input and output\"               |\n\n### Core Features\n\n| Topic             | URL                                                                          | Extraction Prompt                                                                      |\n| ----------------- | ---------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |\n| Extended Thinking | `https://platform.claude.com/docs/en/build-with-claude/extended-thinking.md` | \"Extract extended thinking parameters, budget_tokens requirements, and usage examples\" |\n| Adaptive Thinking | `https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking.md` | \"Extract adaptive thinking setup, effort levels, and Claude Opus 4.8 usage examples\"         |\n| Effort Parameter  | `https://platform.claude.com/docs/en/build-with-claude/effort.md`            | \"Extract effort levels, cost-quality tradeoffs, and interaction with thinking\"        |\n| Tool Use          | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview.md`  | \"Extract tool definition schema, tool_choice options, and handling tool results\"       |\n| Streaming         | `https://platform.claude.com/docs/en/build-with-claude/streaming.md`         | \"Extract streaming event types, SDK examples, and best practices\"                      |\n| Prompt Caching    | `https://platform.claude.com/docs/en/build-with-claude/prompt-caching.md`    | \"Extract cache_control usage, pricing benefits, and implementation examples\"           |\n\n### Media & Files\n\n| Topic       | URL                                                                    | Extraction Prompt                                                 |\n| ----------- | ---------------------------------------------------------------------- | ----------------------------------------------------------------- |\n| Vision      | `https://platform.claude.com/docs/en/build-with-claude/vision.md`      | \"Extract supported image formats, size limits, and code examples\" |\n| PDF Support | `https://platform.claude.com/docs/en/build-with-claude/pdf-support.md` | \"Extract PDF handling capabilities, limits, and examples\"         |\n\n### API Operations\n\n| Topic            | URL                                                                         | Extraction Prompt                                                                                       |\n| ---------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| Batch Processing | `https://platform.claude.com/docs/en/build-with-claude/batch-processing.md` | \"Extract batch API endpoints, request format, and polling for results\"                                  |\n| Files API        | `https://platform.claude.com/docs/en/build-with-claude/files.md`            | \"Extract file upload, download, and referencing in messages, including supported types and beta header\" |\n| Token Counting   | `https://platform.claude.com/docs/en/build-with-claude/token-counting.md`   | \"Extract token counting API usage and examples\"                                                         |\n| Rate Limits      | `https://platform.claude.com/docs/en/api/rate-limits.md`                    | \"Extract current rate limits by tier and model\"                                                         |\n| Errors           | `https://platform.claude.com/docs/en/api/errors.md`                         | \"Extract HTTP error codes, meanings, and retry guidance\"                                                |\n| Amazon Bedrock   | `https://platform.claude.com/docs/en/build-with-claude/claude-on-amazon-bedrock.md` | \"Extract the AnthropicBedrockMantle client per language, `anthropic.`-prefixed model IDs, auth paths, feature availability, and regions\" |\n| Claude Platform on AWS | `https://platform.claude.com/docs/en/build-with-claude/claude-platform-on-aws.md` | \"Extract the AnthropicAWS client per language, SigV4 auth, credential precedence, short-term API keys, workspace_id, and region requirements\" |\n| Claude Platform on AWS — IAM actions | `https://platform.claude.com/docs/en/api/claude-platform-on-aws-iam-actions.md` | \"Extract the IAM action names, resource ARNs, and policy examples required for each API capability\" |\n\n### Tools\n\n| Topic          | URL                                                                                    | Extraction Prompt                                                                        |\n| -------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |\n| Code Execution | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool.md` | \"Extract code execution tool setup, file upload, container reuse, and response handling\" |\n| Computer Use   | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use.md`        | \"Extract computer use tool setup, capabilities, and implementation examples\"             |\n| Bash Tool      | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/bash-tool.md`           | \"Extract bash tool schema, reference implementation, and security considerations\"        |\n| Text Editor    | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/text-editor-tool.md`    | \"Extract text editor tool commands, schema, and reference implementation\"                |\n| Memory Tool    | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool.md`         | \"Extract memory tool commands, directory structure, and implementation patterns\"         |\n| Tool Search    | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool.md`    | \"Extract tool search setup, when to use, and cache interaction\"                          |\n| Programmatic Tool Calling | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling.md` | \"Extract PTC setup, script execution model, and tool invocation from code\"    |\n| Skills         | `https://platform.claude.com/docs/en/agents-and-tools/skills.md`                       | \"Extract skill folder structure, SKILL.md format, and loading behavior\"                  |\n\n### Advanced Features\n\n| Topic              | URL                                                                           | Extraction Prompt                                   |\n| ------------------ | ----------------------------------------------------------------------------- | --------------------------------------------------- |\n| Structured Outputs | `https://platform.claude.com/docs/en/build-with-claude/structured-outputs.md` | \"Extract output_config.format usage and schema enforcement\"                           |\n| Compaction         | `https://platform.claude.com/docs/en/build-with-claude/compaction.md`         | \"Extract compaction setup, trigger config, and streaming with compaction\"             |\n| Context Editing    | `https://platform.claude.com/docs/en/build-with-claude/context-editing.md`    | \"Extract context editing thresholds, what gets cleared, and configuration\"            |\n| Citations          | `https://platform.claude.com/docs/en/build-with-claude/citations.md`          | \"Extract citation format and implementation\"        |\n| Context Windows    | `https://platform.claude.com/docs/en/build-with-claude/context-windows.md`    | \"Extract context window sizes and token management\" |\n\n### Managed Agents\n\nUse these when a managed-agents binding, behavior, or wire-level detail isn't covered in the cached `shared/managed-agents-*.md` concept files or in `{lang}/managed-agents/README.md`.\n\n| Topic                 | URL                                                                              | Extraction Prompt                                                                               |\n| --------------------- | -------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |\n| Overview              | `https://platform.claude.com/docs/en/managed-agents/overview.md`                 | \"Extract the high-level architecture and how agents/sessions/environments/vaults fit together\" |\n| Quickstart            | `https://platform.claude.com/docs/en/managed-agents/quickstart.md`               | \"Extract the minimal end-to-end agent → environment → session → stream code path\"              |\n| Agent Setup           | `https://platform.claude.com/docs/en/managed-agents/agent-setup.md`              | \"Extract agent create/update/list-versions/archive lifecycle and parameters\"                   |\n| Define Outcomes       | `https://platform.claude.com/docs/en/managed-agents/define-outcomes.md`          | \"Extract outcome definitions, evaluation hooks, and success criteria configuration\"             |\n| Sessions              | `https://platform.claude.com/docs/en/managed-agents/sessions.md`                 | \"Extract session lifecycle, status transitions, idle/terminated semantics, and resume rules\"    |\n| Environments          | `https://platform.claude.com/docs/en/managed-agents/environments.md`             | \"Extract environment config (cloud/networking), management endpoints, and reuse model\"          |\n| Self-Hosted Sandboxes | `https://platform.claude.com/docs/en/managed-agents/self-hosted-sandboxes.md`    | \"Extract config:{type:self_hosted}, ANTHROPIC_ENVIRONMENT_KEY, EnvironmentWorker.run/run_one, beta_agent_toolset, ant beta:worker poll/run, webhook-driven wake\" |\n| Self-Hosted Sandboxes — Security | `https://platform.claude.com/docs/en/managed-agents/self-hosted-sandboxes-security.md` | \"Extract what the customer owns (hardening, egress, key custody, trust boundaries) vs what Anthropic cannot do\" |\n| Events and Streaming  | `https://platform.claude.com/docs/en/managed-agents/events-and-streaming.md`     | \"Extract event stream types, stream-first ordering, reconnect/dedupe, and steering patterns\"    |\n| Tools                 | `https://platform.claude.com/docs/en/managed-agents/tools.md`                    | \"Extract built-in toolset, custom tool definitions, and tool result wire format\"                |\n| Files                 | `https://platform.claude.com/docs/en/managed-agents/files.md`                    | \"Extract file upload, mount paths, session resources, and listing/downloading session outputs\"  |\n| Permission Policies   | `https://platform.claude.com/docs/en/managed-agents/permission-policies.md`      | \"Extract permission policy types (allow/deny/confirm) and per-tool config\"                     |\n| Multi-Agent           | `https://platform.claude.com/docs/en/managed-agents/multi-agent.md`              | \"Extract multi-agent composition patterns, sub-agent invocation, and result handoff\"            |\n| Observability         | `https://platform.claude.com/docs/en/managed-agents/observability.md`            | \"Extract logging, tracing, and usage telemetry exposed by managed agents\"                       |\n| Webhooks              | `https://platform.claude.com/docs/en/managed-agents/webhooks.md`                 | \"Extract webhook endpoint registration, HMAC signature verification, supported event types, and delivery semantics\" |\n| GitHub                | `https://platform.claude.com/docs/en/managed-agents/github.md`                   | \"Extract github_repository resource shape, multi-repo mounting, and token rotation\"             |\n| MCP Connector         | `https://platform.claude.com/docs/en/managed-agents/mcp-connector.md`            | \"Extract MCP server declaration on agents and vault-based credential injection at session\"     |\n| Vaults                | `https://platform.claude.com/docs/en/managed-agents/vaults.md`                   | \"Extract vault create, credential add/rotate, OAuth refresh shape, and archive\"                 |\n| Skills                | `https://platform.claude.com/docs/en/managed-agents/skills.md`                   | \"Extract skill packaging and loading model for managed agents\"                                  |\n| Memory                | `https://platform.claude.com/docs/en/managed-agents/memory.md`                   | \"Extract memory resource shape, scoping, and lifecycle\"                                         |\n| Onboarding            | `https://platform.claude.com/docs/en/managed-agents/onboarding.md`               | \"Extract first-run setup, prerequisites, and account/region requirements\"                      |\n| Cloud Containers      | `https://platform.claude.com/docs/en/managed-agents/cloud-containers.md`         | \"Extract cloud container runtime, image config, and network/storage knobs\"                     |\n| Migration             | `https://platform.claude.com/docs/en/managed-agents/migration.md`                | \"Extract migration paths from earlier APIs/preview shapes to GA managed agents\"                 |\n\n### Anthropic CLI\n\nThe `ant` CLI provides terminal access to the Claude API. Every API resource is exposed as a subcommand. It is one convenient way to create agents, environments, sessions, and other resources from version-controlled YAML, and to inspect responses interactively.\n\n| Topic         | URL                                                     | Extraction Prompt                                                                                  |\n| ------------- | ------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |\n| Anthropic CLI | `https://platform.claude.com/docs/en/api/sdks/cli.md`   | \"Extract CLI install, authentication, command structure, and the beta:agents/environments/sessions commands\" |\n\n---\n\n## Claude API SDK Repositories\n\nWebFetch these when a binding (class, method, namespace, field) isn't covered in the cached `{lang}/` skill files or in the managed-agents docs above. The SDKs include beta managed-agents support for `/v1/agents`, `/v1/sessions`, `/v1/environments`, and related resources — search the repo for `BetaManagedAgents`, `beta.agents`, `beta.sessions`, or the equivalent namespace for that language.\n\n| SDK        | URL                                                      | Extraction Prompt                                                                                                       |\n| ---------- | -------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |\n| Python     | `https://github.com/anthropics/anthropic-sdk-python`     | \"Extract beta managed-agents namespaces, classes, and method signatures (`client.beta.agents`, `client.beta.sessions`)\" |\n| TypeScript | `https://github.com/anthropics/anthropic-sdk-typescript` | \"Extract beta managed-agents namespaces, classes, and method signatures (`client.beta.agents`, `client.beta.sessions`)\" |\n| Java       | `https://github.com/anthropics/anthropic-sdk-java`       | \"Extract beta managed-agents classes, builders, and method signatures (`client.beta().agents()`, `BetaManagedAgents*`)\" |\n| Go         | `https://github.com/anthropics/anthropic-sdk-go`         | \"Extract beta managed-agents types and method signatures (`client.Beta.Agents`, `BetaManagedAgents*` event types)\"      |\n| Ruby       | `https://github.com/anthropics/anthropic-sdk-ruby`       | \"Extract beta managed-agents methods and parameter shapes (`client.beta.agents`, `client.beta.sessions`)\"               |\n| C#         | `https://github.com/anthropics/anthropic-sdk-csharp`     | \"Extract beta managed-agents classes and method signatures (NuGet package, `BetaManagedAgents*` types)\"                 |\n| PHP        | `https://github.com/anthropics/anthropic-sdk-php`        | \"Extract beta managed-agents classes and method signatures (`$client->beta->agents`, `BetaManagedAgents*` params)\"      |\n\n---\n\n## Fallback Strategy\n\nIf WebFetch fails (network issues, URL changed):\n\n1. Use cached content from the language-specific files (note the cache date)\n2. Inform user the data may be outdated\n3. Suggest they check platform.claude.com or the GitHub repos directly\n</doc>\n\n<doc path=\"shared/managed-agents-api-reference.md\">\n# Managed Agents — Endpoint Reference\n\nAll endpoints require `x-api-key` and `anthropic-version: 2023-06-01` headers. Managed Agents endpoints additionally require the `anthropic-beta` header.\n\n## Beta Headers\n\n```\nanthropic-beta: managed-agents-2026-04-01\n```\n\nThe SDK adds this header automatically for all `client.beta.{agents,environments,sessions,vaults,memory_stores}.*` calls. Skills endpoints use `skills-2025-10-02`; Files endpoints use `files-api-2025-04-14`.\n\n---\n\n## SDK Method Reference\n\nAll resources are under the `beta` namespace. Python and TypeScript share identical method names.\n\n| Resource | Python / TypeScript (`client.beta.*`) | Go (`client.Beta.*`) |\n| --- | --- | --- |\n| Agents | `agents.create` / `retrieve` / `update` / `list` / `archive` | `Agents.New` / `Get` / `Update` / `List` / `Archive` |\n| Agent Versions | `agents.versions.list` | `Agents.Versions.List` |\n| Environments | `environments.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Environments.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |\n| Environment Work (self-hosted) | `environments.work.poller` / `stats` / `stop` | See `shared/managed-agents-self-hosted-sandboxes.md` |\n| Sessions | `sessions.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Sessions.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |\n| Session Events | `sessions.events.list` / `send` / `stream` | `Sessions.Events.List` / `Send` / `StreamEvents` |\n| Session Threads | `sessions.threads.list` / `retrieve` / `archive`; `sessions.threads.events.list` / `stream` | `Sessions.Threads.List` / `Get` / `Archive`; `Sessions.Threads.Events.List` / `StreamEvents` |\n| Session Resources | `sessions.resources.add` / `retrieve` / `update` / `list` / `delete` | `Sessions.Resources.Add` / `Get` / `Update` / `List` / `Delete` |\n| Vaults | `vaults.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Vaults.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |\n| Credentials | `vaults.credentials.create` / `retrieve` / `update` / `list` / `delete` / `archive` / `mcp_oauth_validate` | `Vaults.Credentials.New` / `Get` / `Update` / `List` / `Delete` / `Archive` / `McpOauthValidate` |\n| Memory Stores | `memory_stores.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `MemoryStores.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |\n| Memories | `memory_stores.memories.create` / `retrieve` / `update` / `list` / `delete` | `MemoryStores.Memories.New` / `Get` / `Update` / `List` / `Delete` |\n| Memory Versions | `memory_stores.memory_versions.list` / `retrieve` / `redact` | `MemoryStores.MemoryVersions.List` / `Get` / `Redact` |\n\n**Naming quirks to watch for:**\n- Agents and Session Threads have **no delete** — only `archive`. Archive is **permanent**: the agent becomes read-only, new sessions cannot reference it, and there is no unarchive. Confirm with the user before archiving a production agent. Environments, Sessions, Vaults, Credentials, and Memory Stores have both `delete` and `archive`; Session Resources, Files, Skills, and Memories are `delete`-only; Memory Versions have neither — only `redact`.\n- Session resources use `add` (not `create`).\n- Go's event stream is `StreamEvents` (not `Stream`).\n- The self-hosted worker is **not** under `client.beta.*` — it's `EnvironmentWorker` from `anthropic.lib.environments` / `@anthropic-ai/sdk/helpers/beta/environments`; only `environments.work.poller/stats/stop` are client methods.\n\n**Agent shorthand:** `agent` on session create accepts either a bare string (`agent=\"agent_abc123\"` — uses latest version) or the full reference object (`{type: \"agent\", id: \"agent_abc123\", version: 123}`).\n\n**Model shorthand:** `model` on agent create accepts either a bare string (`model=\"claude-opus-4-8\"` — uses `standard` speed) or the full config object (`{id: \"claude-opus-4-6\", speed: \"fast\"}`). Note: `speed: \"fast\"` is only supported on Opus 4.6.\n\n---\n\n## Agents\n\n**Step one of every flow.** Sessions require a pre-created agent — there is no inline agent config under `managed-agents-2026-04-01`.\n\n| Method   | Path                                             | Operation        | Description                              |\n| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |\n| `GET` | `/v1/agents` | ListAgents | List agents |\n| `POST` | `/v1/agents` | CreateAgent | Create a saved agent configuration |\n| `GET` | `/v1/agents/{agent_id}` | GetAgent | Get agent details |\n| `POST` | `/v1/agents/{agent_id}` | UpdateAgent | Update agent configuration |\n| `POST` | `/v1/agents/{agent_id}/archive` | ArchiveAgent | Archive an agent. Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive — this is the terminal state. |\n| `GET` | `/v1/agents/{agent_id}/versions` | ListAgentVersions | List agent versions |\n\n## Sessions\n\n| Method   | Path                                             | Operation        | Description                              |\n| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |\n| `GET` | `/v1/sessions` | ListSessions | List sessions (paginated) |\n| `POST` | `/v1/sessions` | CreateSession | Create a new session |\n| `GET` | `/v1/sessions/{session_id}` | GetSession | Get session details |\n| `POST` | `/v1/sessions/{session_id}` | UpdateSession | Update session `metadata`/`title`, or `agent.tools`/`agent.mcp_servers`/`vault_ids` (session-local override; session must be `idle`). See `shared/managed-agents-core.md` → Updating the agent configuration mid-session. |\n| `DELETE` | `/v1/sessions/{session_id}` | DeleteSession | Delete a session |\n| `POST` | `/v1/sessions/{session_id}/archive` | ArchiveSession | Archive a session |\n\n## Events\n\n| Method   | Path                                             | Operation        | Description                              |\n| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |\n| `GET` | `/v1/sessions/{session_id}/events` | ListEvents | List events (polling, paginated) |\n| `POST` | `/v1/sessions/{session_id}/events` | SendEvents | Send events (user message, tool result) |\n| `GET` | `/v1/sessions/{session_id}/events/stream` | StreamEvents | Stream events via SSE |\n\n## Session Threads\n\nPer-subagent event streams in multiagent sessions. See `shared/managed-agents-multiagent.md`.\n\n| Method   | Path                                             | Operation        | Description                              |\n| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |\n| `GET` | `/v1/sessions/{session_id}/threads` | ListThreads | List threads (paginated) |\n| `GET` | `/v1/sessions/{session_id}/threads/{thread_id}` | GetThread | Retrieve one thread (carries `agent` snapshot, `status`, `parent_thread_id`, `stats`, `usage`) |\n| `POST` | `/v1/sessions/{session_id}/threads/{thread_id}/archive` | ArchiveThread | Archive a thread |\n| `GET` | `/v1/sessions/{session_id}/threads/{thread_id}/events` | ListThreadEvents | List past events for one thread (paginated) |\n| `GET` | `/v1/sessions/{session_id}/threads/{thread_id}/stream` | StreamThreadEvents | Stream one thread via SSE (SDK: `threads.events.stream`) |\n\n## Session Resources\n\n| Method   | Path                                                    | Operation        | Description                              |\n| -------- | ------------------------------------------------------- | ---------------- | ---------------------------------------- |\n| `GET` | `/v1/sessions/{session_id}/resources` | ListResources | List resources attached to session |\n| `POST` | `/v1/sessions/{session_id}/resources` | AddResource | Attach `file` or `github_repository` resource (SDK method: `add`, not `create`). `memory_store` resources attach at session-create time only. |\n| `GET` | `/v1/sessions/{session_id}/resources/{resource_id}` | GetResource | Get a single resource |\n| `POST` | `/v1/sessions/{session_id}/resources/{resource_id}` | UpdateResource | Update resource |\n| `DELETE` | `/v1/sessions/{session_id}/resources/{resource_id}` | DeleteResource | Remove resource from session |\n\n## Environments\n\n| Method   | Path                                                             | Operation            | Description                         |\n| -------- | ---------------------------------------------------------------- | -------------------- | ----------------------------------- |\n| `POST`   | `/v1/environments`                                     | CreateEnvironment    | Create environment                  |\n| `GET`    | `/v1/environments`                                     | ListEnvironments     | List environments                   |\n| `GET`    | `/v1/environments/{environment_id}`                    | GetEnvironment       | Get environment details             |\n| `POST`   | `/v1/environments/{environment_id}`                    | UpdateEnvironment    | Update environment                  |\n| `DELETE` | `/v1/environments/{environment_id}`                    | DeleteEnvironment    | Delete environment. Returns 204. |\n| `POST`   | `/v1/environments/{environment_id}/archive`            | ArchiveEnvironment   | Archive environment. Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive — this is the terminal state. |\n| `GET`    | `/v1/environments/{environment_id}/work/stats`         | WorkQueueStats       | Self-hosted work-queue depth/pending/workers. `x-api-key` auth. See `shared/managed-agents-self-hosted-sandboxes.md`. |\n| `POST`   | `/v1/environments/{environment_id}/work/{work_id}/stop` | StopWork            | Self-hosted: stop a claimed work item. `x-api-key` auth. |\n\nFor `type: \"self_hosted\"`, `config` is the bare `{\"type\": \"self_hosted\"}` — `networking` and `packages` do not apply.\n\n## Vaults\n\nVaults store MCP credentials that Anthropic manages on your behalf — OAuth credentials with auto-refresh, or static bearer tokens. Attach to sessions via `vault_ids`. See `managed-agents-tools.md` §Vaults for the conceptual guide and credential shapes.\n\n| Method   | Path                                             | Operation        | Description                              |\n| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |\n| `POST`   | `/v1/vaults`                                     | CreateVault      | Create a vault                           |\n| `GET`    | `/v1/vaults`                                     | ListVaults       | List vaults                              |\n| `GET`    | `/v1/vaults/{vault_id}`                          | GetVault         | Get vault details                        |\n| `POST`   | `/v1/vaults/{vault_id}`                          | UpdateVault      | Update vault                             |\n| `DELETE` | `/v1/vaults/{vault_id}`                          | DeleteVault      | Delete vault                             |\n| `POST`   | `/v1/vaults/{vault_id}/archive`                  | ArchiveVault     | Archive vault                            |\n\n## Credentials\n\nCredentials are individual secrets stored inside a vault.\n\n| Method   | Path                                                              | Operation          | Description                  |\n| -------- | ----------------------------------------------------------------- | ------------------ | ---------------------------- |\n| `POST`   | `/v1/vaults/{vault_id}/credentials`                               | CreateCredential   | Create a credential          |\n| `GET`    | `/v1/vaults/{vault_id}/credentials`                               | ListCredentials    | List credentials in vault    |\n| `GET`    | `/v1/vaults/{vault_id}/credentials/{credential_id}`               | GetCredential      | Get credential metadata      |\n| `POST`   | `/v1/vaults/{vault_id}/credentials/{credential_id}`               | UpdateCredential   | Update credential            |\n| `DELETE` | `/v1/vaults/{vault_id}/credentials/{credential_id}`               | DeleteCredential   | Delete credential            |\n| `POST`   | `/v1/vaults/{vault_id}/credentials/{credential_id}/archive`       | ArchiveCredential  | Archive credential           |\n| `POST`   | `/v1/vaults/{vault_id}/credentials/{credential_id}/mcp_oauth_validate` | McpOauthValidate | Validate an MCP OAuth credential |\n\n## Memory Stores\n\nWorkspace-scoped persistent memory that survives across sessions. Attach to a session via a `{\"type\": \"memory_store\", \"memory_store_id\": ...}` entry in `resources[]` (session-create time only). See `shared/managed-agents-memory.md` for the conceptual guide, the FUSE-mount agent interface, preconditions, and versioning.\n\n| Method   | Path                                             | Operation          | Description                              |\n| -------- | ------------------------------------------------ | ------------------ | ---------------------------------------- |\n| `POST`   | `/v1/memory_stores`                              | CreateMemoryStore  | Create a store (`name`, `description`, `metadata`) |\n| `GET`    | `/v1/memory_stores`                              | ListMemoryStores   | List stores (`include_archived`, `created_at_{gte,lte}`) |\n| `GET`    | `/v1/memory_stores/{memory_store_id}`            | GetMemoryStore     | Get store details                        |\n| `POST`   | `/v1/memory_stores/{memory_store_id}`            | UpdateMemoryStore  | Update store                             |\n| `DELETE` | `/v1/memory_stores/{memory_store_id}`            | DeleteMemoryStore  | Delete store                             |\n| `POST`   | `/v1/memory_stores/{memory_store_id}/archive`    | ArchiveMemoryStore | Archive store. Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive. |\n\n## Memories\n\nIndividual text documents inside a store (≤ 100KB each). `create` creates at a `path` and returns `409` (`memory_path_conflict_error`, with `conflicting_memory_id`) if the path is occupied; `update` mutates by `mem_...` ID (rename and/or content). Only `update` accepts a `precondition` (`{\"type\": \"content_sha256\", \"content_sha256\": ...}`) — on mismatch returns `409` (`memory_precondition_failed_error`). List endpoints accept `view: \"basic\"|\"full\"` (controls whether `content` is populated; `retrieve` defaults to `full`).\n\n| Method   | Path                                                              | Operation      | Description                              |\n| -------- | ----------------------------------------------------------------- | -------------- | ---------------------------------------- |\n| `GET`    | `/v1/memory_stores/{memory_store_id}/memories`                    | ListMemories   | Returns `Memory \\| MemoryPrefix`; filter by `path_prefix`, `depth`, `order_by`/`order` |\n| `POST`   | `/v1/memory_stores/{memory_store_id}/memories`                    | CreateMemory   | Create at `path` (SDK: `memories.create`); `409 memory_path_conflict_error` if occupied |\n| `GET`    | `/v1/memory_stores/{memory_store_id}/memories/{memory_id}`        | GetMemory      | Read one memory (defaults to `view=\"full\"`) |\n| `PATCH`  | `/v1/memory_stores/{memory_store_id}/memories/{memory_id}`        | UpdateMemory   | Change `content`, `path`, or both by ID; optional `precondition` |\n| `DELETE` | `/v1/memory_stores/{memory_store_id}/memories/{memory_id}`        | DeleteMemory   | Delete (optional `expected_content_sha256`) |\n\n## Memory Versions\n\nImmutable per-mutation snapshots (`memver_...`) — the audit and rollback surface. `operation` ∈ `created` / `modified` / `deleted`.\n\n| Method   | Path                                                                          | Operation             | Description                              |\n| -------- | ----------------------------------------------------------------------------- | --------------------- | ---------------------------------------- |\n| `GET`    | `/v1/memory_stores/{memory_store_id}/memory_versions`                         | ListMemoryVersions    | Newest-first; filter by `memory_id`, `operation`, `session_id`, `api_key_id`, `created_at_{gte,lte}` |\n| `GET`    | `/v1/memory_stores/{memory_store_id}/memory_versions/{version_id}`            | GetMemoryVersion      | List fields + full `content`             |\n| `POST`   | `/v1/memory_stores/{memory_store_id}/memory_versions/{version_id}/redact`     | RedactMemoryVersion   | Clear `content`/`content_sha256`/`content_size_bytes`/`path`; preserve actor + timestamps |\n\n## Files\n\n| Method   | Path                                             | Operation        | Description                              |\n| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- |\n| `POST`   | `/v1/files`                            | UploadFile       | Upload a file                            |\n| `GET`    | `/v1/files`                            | ListFiles        | List files                               |\n| `GET`    | `/v1/files/{file_id}`                  | GetFile          | Get file metadata (SDK method: `retrieve_metadata`) |\n| `GET`    | `/v1/files/{file_id}/content`          | DownloadFile     | Download file content                    |\n| `DELETE` | `/v1/files/{file_id}`                  | DeleteFile       | Delete a file                            |\n\n## Skills\n\n| Method   | Path                                                            | Operation          | Description                  |\n| -------- | --------------------------------------------------------------- | ------------------ | ---------------------------- |\n| `POST`   | `/v1/skills`                                          | CreateSkill        | Create a skill               |\n| `GET`    | `/v1/skills`                                          | ListSkills         | List skills                  |\n| `GET`    | `/v1/skills/{skill_id}`                               | GetSkill           | Get skill details            |\n| `DELETE` | `/v1/skills/{skill_id}`                               | DeleteSkill        | Delete a skill               |\n| `POST`   | `/v1/skills/{skill_id}/versions`                      | CreateVersion      | Create skill version         |\n| `GET`    | `/v1/skills/{skill_id}/versions`                      | ListVersions       | List skill versions          |\n| `GET`    | `/v1/skills/{skill_id}/versions/{version}`            | GetVersion         | Get skill version            |\n| `DELETE` | `/v1/skills/{skill_id}/versions/{version}`            | DeleteVersion      | Delete skill version         |\n\n---\n\n## Request/Response Schema Quick Reference\n\n### CreateAgent Request Body\n\n**Always start here.** `model`, `system`, `tools`, `mcp_servers`, `skills` are top-level fields on this object — they do NOT go on the session.\n\n```json\n{\n  \"name\": \"string (required, 1-256 chars)\",\n  \"model\": \"claude-opus-4-8 (required — bare string, or {id, speed} object)\",\n  \"description\": \"string (optional, up to 2048 chars)\",\n  \"system\": \"string (optional, up to 100,000 chars)\",\n  \"tools\": [\n    { \"type\": \"agent_toolset_20260401\" }\n  ],\n  \"skills\": [\n    { \"type\": \"anthropic\", \"skill_id\": \"xlsx\" },\n    { \"type\": \"custom\", \"skill_id\": \"skill_abc123\", \"version\": \"1\" }\n  ],\n  \"mcp_servers\": [\n    {\n      \"type\": \"url\",\n      \"name\": \"github\",\n      \"url\": \"https://api.githubcopilot.com/mcp/\"\n    }\n  ],\n  \"multiagent\": {\n    \"type\": \"coordinator\",\n    \"agents\": [\n      \"agent_abc123\",\n      { \"type\": \"agent\", \"id\": \"agent_def456\", \"version\": 4 },\n      { \"type\": \"self\" }\n    ]\n  },\n  \"metadata\": {\n    \"key\": \"value (max 16 pairs, keys ≤64 chars, values ≤512 chars)\"\n  }\n}\n```\n\n> Limits: `tools` max 128, `skills` max 20, `mcp_servers` max 20 (unique names). `multiagent.agents` 1–20 entries (string ID | `{type:\"agent\",id,version?}` | `{type:\"self\"}`) — see `shared/managed-agents-multiagent.md`.\n\n### CreateSession Request Body\n\n```json\n{\n  \"agent\": \"agent_abc123 (required — string shorthand for latest version, or {type: \\\"agent\\\", id, version} object)\",\n  \"environment_id\": \"env_abc123 (required)\",\n  \"title\": \"string (optional)\",\n  \"resources\": [\n    {\n      \"type\": \"github_repository\",\n      \"url\": \"https://github.com/owner/repo (required)\",\n      \"authorization_token\": \"ghp_... (required)\",\n      \"mount_path\": \"/workspace/repo (optional — defaults to /workspace/<repo-name>)\",\n      \"checkout\": { \"type\": \"branch\", \"name\": \"main\" }\n    }\n  ],\n  \"vault_ids\": [\"vlt_abc123 (optional — MCP credentials with auto-refresh)\"],\n  \"metadata\": {\n    \"key\": \"value\"\n  }\n}\n```\n\n> The `agent` field accepts only a string ID or `{type: \"agent\", id, version}` — `model`/`system`/`tools` live on the agent, not here.\n>\n> **`checkout`** accepts `{type: \"branch\", name: \"...\"}` or `{type: \"commit\", sha: \"...\"}`. Omit for the repo's default branch.\n\n### CreateEnvironment Request Body\n\n```json\n{\n  \"name\": \"string (required)\",\n  \"description\": \"string (optional)\",\n  \"config\": {\n    \"type\": \"cloud | self_hosted\",\n    \"networking\": {\n      \"type\": \"unrestricted | limited (union — see SDK types)\"\n    },\n    \"packages\": { }\n  },\n  \"metadata\": { \"key\": \"value\" }\n}\n```\n\n### SendEvents Request Body\n\n```json\n{\n  \"events\": [\n    {\n      \"type\": \"user.message\",\n      \"content\": [\n        {\n          \"type\": \"text\",\n          \"text\": \"Hello\"\n        }\n      ]\n    }\n  ]\n}\n```\n\n### Define Outcome Event\n\n```json\n{\n  \"type\": \"user.define_outcome\",\n  \"description\": \"Build a DCF model for Costco in .xlsx\",\n  \"rubric\": { \"type\": \"file\", \"file_id\": \"file_01...\" },\n  \"max_iterations\": 5\n}\n```\n\n> `rubric` is required: `{type: \"text\", content}` or `{type: \"file\", file_id}`. `max_iterations` default 3, max 20. Echoed back with `outcome_id` + `processed_at`. See `shared/managed-agents-outcomes.md`.\n\n### Tool Result Event\n\n```json\n{\n  \"type\": \"user.custom_tool_result\",\n  \"custom_tool_use_id\": \"sevt_abc123\",\n  \"content\": [{ \"type\": \"text\", \"text\": \"Result data\" }],\n  \"is_error\": false\n}\n```\n\n---\n\n## Error Handling\n\nManaged Agents endpoints use the standard Anthropic API error format. Errors are returned with an HTTP status code and a JSON body containing `type`, `error`, and `request_id`:\n\n```json\n{\n  \"type\": \"error\",\n  \"error\": {\n    \"type\": \"invalid_request_error\",\n    \"message\": \"Description of what went wrong\"\n  },\n  \"request_id\": \"req_011CRv1W3XQ8XpFikNYG7RnE\"\n}\n```\n\nInclude the `request_id` when reporting issues to Anthropic — it lets us trace the request end-to-end. The inner `error.type` is one of the following:\n\n| Status | Error type | Description |\n|---|---|---|\n| 400 | `invalid_request_error` | The request was malformed or missing required parameters |\n| 401 | `authentication_error` | Invalid or missing API key |\n| 403 | `permission_error` | The API key doesn't have permission for this operation |\n| 404 | `not_found_error` | The requested resource doesn't exist |\n| 409 | `invalid_request_error` | The request conflicts with the resource's current state (e.g., sending to an archived session) |\n| 413 | `request_too_large` | The request body exceeds the maximum allowed size |\n| 429 | `rate_limit_error` | Too many requests — check rate limit headers for retry timing |\n| 500 | `api_error` | An internal server error occurred |\n| 529 | `overloaded_error` | The service is temporarily overloaded — retry with backoff |\n\nNote that `409 Conflict` carries `error.type: \"invalid_request_error\"` (there is no separate `conflict_error` type); inspect both the HTTP status and the `message` to distinguish conflicts from other invalid requests.\n\n---\n\n## Rate Limits\n\nManaged Agents endpoints have per-organization request-per-minute (RPM) limits, separate from your [Messages API token limits](https://platform.claude.com/docs/en/api/rate-limits). Model inference inside a session still draws from your organization's standard ITPM/OTPM limits.\n\n| Endpoint group | Scope | RPM | Max concurrent |\n|---|---|---|---|\n| Create operations (Agents, Sessions, Vaults) | organization | 300 | — |\n| All other operations (Agents, Sessions, Vaults) | organization | 600 | — |\n| All operations (Environments) | organization | 60 | 5 |\n\nFiles and Skills endpoints use the standard tier-based [rate limits](https://platform.claude.com/docs/en/api/rate-limits).\n\nWhen a limit is exceeded the API returns `429` with a `rate_limit_error` (see [Error Handling](#error-handling) for the response envelope) and a `retry-after` header indicating how many seconds to wait before retrying. The Anthropic SDK reads this header and retries automatically.\n</doc>\n\n<doc path=\"shared/managed-agents-client-patterns.md\">\n# Managed Agents — Common Client Patterns\n\nPatterns you'll write on the client side when driving a Managed Agent session, grounded in working SDK examples.\n\nCode samples are TypeScript — Python and cURL follow the same shape; see `python/managed-agents/README.md` and `curl/managed-agents.md` for equivalents.\n\n---\n\n## 1. Lossless stream reconnect\n\n**Problem:** SSE has no replay. If the connection drops mid-session, a naive reconnect re-opens the stream from \"now\" and you silently miss every event emitted in between.\n\n**Solution:** on reconnect, fetch the full event history via `events.list()` *before* consuming the live stream, and dedupe on event ID as the live stream catches up.\n\n```ts\nconst seenEventIds = new Set<string>()\nconst stream = await client.beta.sessions.events.stream(session.id)\n\n// Stream is now open and buffering server-side. Read history first.\nfor await (const event of client.beta.sessions.events.list(session.id)) {\n  seenEventIds.add(event.id)\n  handle(event)\n}\n\n// Tail the live stream. Dedupe only gates handle() — terminal checks must run\n// even for already-seen events, or a terminal event that was in the history\n// response gets skipped by `continue` and the loop never exits.\nfor await (const event of stream) {\n  if (!seenEventIds.has(event.id)) {\n    seenEventIds.add(event.id)\n    handle(event)\n  }\n  if (event.type === 'session.status_terminated') break\n  if (event.type === 'session.status_idle' && event.stop_reason.type !== 'requires_action') break\n}\n```\n\n---\n\n## 2. `processed_at` — queued vs processed\n\nEvery event on the stream carries `processed_at` (ISO 8601). For client-sent events (`user.message`, `user.interrupt`, `user.tool_confirmation`, `user.custom_tool_result`) it's `null` when the event has been queued but not yet picked up by the agent, and populated once the agent processes it. The same event appears on the stream twice — once with `processed_at: null`, once with a timestamp.\n\n```ts\nfor await (const event of stream) {\n  if (event.type === 'user.message') {\n    if (event.processed_at == null) onQueued(event.id)\n    else onProcessed(event.id, event.processed_at)\n  }\n}\n```\n\nUse this to drive pending → acknowledged UI state for anything you send. How you map a locally-rendered optimistic message to the server-assigned `event.id` is application-specific (typically via the return value of `events.send()` or FIFO ordering).\n\n---\n\n## 3. Interrupt a running session\n\nSend `user.interrupt` as a normal event. The session keeps running until it reaches a safe boundary, then goes idle.\n\n```ts\nawait client.beta.sessions.events.send(session.id, {\n  events: [{ type: 'user.interrupt' }],\n})\n\n// Drain until the session is truly done — see Pattern 5 for the full gate.\nfor await (const event of stream) {\n  if (event.type === 'session.status_terminated') break\n  if (\n    event.type === 'session.status_idle' &&\n    event.stop_reason.type !== 'requires_action'\n  ) break\n}\n```\n\nReference: `interrupt.ts` — sends the interrupt the moment it sees `span.model_request_start`, drains to idle, then verifies via `sessions.retrieve()`.\n\n---\n\n## 4. `tool_confirmation` round-trip\n\nWhen the agent has `permission_policy: { type: 'always_ask' }`, any call to that tool fires an `agent.tool_use` event with `evaluated_permission === 'ask'` and the session goes idle waiting for a decision. Respond with `user.tool_confirmation`.\n\n```ts\nfor await (const event of stream) {\n  if (event.type === 'agent.tool_use' && event.evaluated_permission === 'ask') {\n    await client.beta.sessions.events.send(session.id, {\n      events: [{\n        type: 'user.tool_confirmation',\n        tool_use_id: event.id,         // not a toolu_ id — use event.id\n        result: 'allow',               // or 'deny'\n        // deny_message: '...',        // optional, only with result: 'deny'\n      }],\n    })\n  }\n}\n```\n\nKey points:\n- `tool_use_id` is `event.id` (typically `sevt_...`), **not** a `toolu_...` ID.\n- `result` is `'allow' | 'deny'`. Use `deny_message` to tell the model *why* you denied — it gets surfaced back to the agent.\n- Multiple pending tools: respond once per `agent.tool_use` event with `evaluated_permission === 'ask'`.\n\nReference: `tool-permissions.ts`.\n\n---\n\n## 5. Correct idle-break gate\n\nDo not break on `session.status_idle` alone. The session goes idle transiently — e.g. between parallel tool executions, while waiting for a `user.tool_confirmation`, or while awaiting a `user.custom_tool_result`. Break when idle with a terminal `stop_reason`, or on `session.status_terminated`.\n\n```ts\nfor await (const event of stream) {\n  handle(event)\n  if (event.type === 'session.status_terminated') break\n  if (event.type === 'session.status_idle') {\n    if (event.stop_reason.type === 'requires_action') continue // waiting on you — handle it\n    break // end_turn or retries_exhausted — both terminal\n  }\n}\n```\n\n`stop_reason.type` values on `session.status_idle`:\n- `requires_action` — agent is waiting on a client-side event (tool confirmation, custom tool result). Handle it, don't break.\n- `retries_exhausted` — terminal failure. Break, then check `sessions.retrieve()` for the error state.\n- `end_turn` — normal completion.\n\n---\n\n## 6. Post-idle status-write race\n\nThe SSE stream emits `session.status_idle` slightly before the session's queryable status reflects it. Clients that break on idle and immediately call `sessions.delete()` or `sessions.archive()` will intermittently 400 with \"cannot delete/archive while running.\"\n\nPoll before cleanup:\n\n```ts\nlet s\nfor (let i = 0; i < 10; i++) {\n  s = await client.beta.sessions.retrieve(session.id)\n  if (s.status !== 'running') break\n  await new Promise(r => setTimeout(r, 200))\n}\nif (s?.status !== 'running') {\n  await client.beta.sessions.archive(session.id)\n} // else: still running after 2s — don't archive, let it settle or escalate\n```\n\n---\n\n## 7. Stream-first, then send\n\nAlways open the stream **before** sending the kickoff event. Otherwise the agent may process the event and emit the first events before your consumer is attached, and you'll miss them.\n\n```ts\nconst stream = await client.beta.sessions.events.stream(session.id)\nawait client.beta.sessions.events.send(session.id, {\n  events: [{ type: 'user.message', content: [{ type: 'text', text: 'Hello' }] }],\n})\nfor await (const event of stream) { /* ... */ }\n```\n\nThe `Promise.all([stream, send])` shape works too, but stream-first is simpler and has the same effect — the stream starts buffering the moment it's opened.\n\n---\n\n## 8. File-mount gotchas\n\n**The mounted resource has a different `file_id` than the file you uploaded.** Session creation makes a session-scoped copy.\n\n```ts\nconst uploaded = await client.beta.files.upload({ file, purpose: 'agent_resource' })\n// uploaded.id         → the original file\nconst session = await client.beta.sessions.create({\n  /* ... */\n  resources: [{ type: 'file', file_id: uploaded.id, mount_path: '/workspace/data.csv' }],\n})\n// session.resources[0].file_id !== uploaded.id  ← different IDs\n```\n\nDelete the original via `files.delete(uploaded.id)`; the session-scoped copy is garbage-collected with the session. `mount_path` must be absolute — see `shared/managed-agents-environments.md`.\n\n---\n\n## 9. Secrets for non-MCP APIs and CLIs — keep them host-side via custom tools\n\n**Problem:** you want the agent to call a third-party API or run a CLI that needs a secret (API key, token, service-account credential), but there is currently no way to set environment variables inside the session container, and vaults currently hold MCP credentials only — they are not exposed to the container's shell. So `curl`, installed CLIs, or SDK clients running via the `bash` tool have no first-class place to read a secret from.\n\n**Solution:** move the authenticated call to your side. Declare a custom tool on the agent; when the agent emits `agent.custom_tool_use`, your orchestrator (the process reading the SSE stream) executes the call with its own credentials and responds with `user.custom_tool_result`. The container never sees the key.\n\n```ts\n// Agent template: declare the tool, no credentials\ntools: [{ type: 'custom', name: 'linear_graphql', input_schema: { /* query, vars */ } }]\n\n// Orchestrator: handle the call with host-side creds\nfor await (const event of stream) {\n  if (event.type === 'agent.custom_tool_use' && event.name === 'linear_graphql') {\n    const result = await linear.request(event.input.query, event.input.vars) // host's key\n    await client.beta.sessions.events.send(session.id, {\n      events: [{ type: 'user.custom_tool_result', tool_use_id: event.id, result }],\n    })\n  }\n}\n```\n\nSame shape works for `gh` CLI, local eval scripts, or anything else that needs host-side auth or binaries.\n\n**Security note:** this does not expose a public endpoint. `agent.custom_tool_use` arrives on the SSE stream your orchestrator already holds open with your Anthropic API key, and `user.custom_tool_result` goes back via `events.send()` under the same key. Your orchestrator is a client, not a server — nothing unauthenticated is listening.\n\n**Do not embed API keys in the system prompt or user messages as a workaround.** Prompts and messages are stored in the session's event history, returned by `events.list()`, and included in compaction summaries — a secret placed there is durably persisted and readable via the API for the life of the session.\n</doc>\n\n<doc path=\"shared/managed-agents-core.md\">\n# Managed Agents — Core Concepts\n\n## Architecture\n\nManaged Agents is built around four core concepts:\n\n| Concept | Endpoint | What it is |\n|---|---|---|\n| **Agent** | `/v1/agents` | A persisted, versioned object defining the agent's capabilities and persona: model, system prompt, tools, MCP servers, skills. **Must be created before starting a session.** See the Agents section below. |\n| **Session** | `/v1/sessions` | A stateful interaction with an agent. References a pre-created agent by ID + an environment + initial instructions. Produces an event stream. |\n| **Environment** | `/v1/environments` | A template defining the configuration for container provisioning. |\n| **Container** | N/A | An isolated compute instance where the agent's **tools** execute (bash, file ops, code). The agent loop does not run here — it runs on Anthropic's orchestration layer and acts on the container via tool calls. |\n\n```\n                       ┌─────────────────────────────────────┐\n                       │  Anthropic orchestration layer      │\nAgent (config) ───────▶│  (agent loop: Claude + tool calls)  │\n                       └──────────────┬──────────────────────┘\n                                      │ tool calls\n                                      ▼\nEnvironment (template) ──▶ Container (tool execution workspace)\n                                 │\n                         Session ─┤\n                                 ├── Resources (files, repos, memory stores — attached at startup)\n                                 ├── Vault IDs (MCP credential references)\n                                 └── Conversation (event stream in/out)\n```\n\n> **Agent creation is a prerequisite.** Sessions reference a pre-created agent by ID — `model`/`system`/`tools` live on the agent object, never on the session. Every flow starts with `POST /v1/agents`.\n\n---\n\n## Session Lifecycle\n\n```\nrescheduling → running ↔ idle → terminated\n```\n\n| Status         | Description                                                        |\n| -------------- | ------------------------------------------------------------------ |\n| `idle` | Agent has finished the current task, and is awaiting input. It's either waiting for input to continue working via a `user.message` or blocked awaiting a `user.custom_tool_result` or `user.tool_confirmation`. The `stop_reason` attached contains more information about why the Agent has stopped working. |\n| `running` | Session has starting running, and the Agent is actively doing work. |\n| `rescheduling` | Session is (re)scheduling after a retryable error has occurred, ready to be picked up by the orchestration system. |\n| `terminated` | Session has terminated, entering an irreversible and unusable state.  |\n\n- Events can be sent when the session is `running` or `idle`. Messages are queued and processed in order.\n- The agent transitions `idle → running` when it receives a new event, then back to `idle` when done.\n- Errors surface as `session.error` events in the stream, not as a status value.\n\n### Built-in session features\n\n- **Context compaction** — if you approach max context, the API automatically condenses session history to keep the interaction going\n- **Prompt caching** — historical repeated tokens are cached, reducing processing time and cost\n- **Extended thinking** — on by default, returned as `agent.thinking` events\n\n### Session operations\n\n| Operation | Notes |\n|---|---|\n| List / fetch | Paginated list or single resource by ID |\n| Update | Only `title` is updatable |\n| Archive | Session becomes **read-only**. Not reversible. |\n| Delete | Permanently deletes session, event history, container, and checkpoints. |\n\nThese are ops/inspection calls — typically made from a terminal, not application code. From the shell (see `shared/anthropic-cli.md`):\n\n```sh\nant beta:sessions list --transform '{id,title,status,created_at}' --format jsonl\nant beta:sessions retrieve --session-id \"$SID\"\nant beta:sessions:events stream --session-id \"$SID\"   # watch events live\nant beta:sessions archive  --session-id \"$SID\"\nant beta:sessions delete   --session-id \"$SID\"\n```\n\n---\n\n## Sessions\n\nA session is a running agent instance inside an environment.\n\n### Session Object\n\nKey fields returned by the API:\n\n| Field           | Type     | Description                                         |\n| --------------- | -------- | --------------------------------------------------- |\n| `type` | string | Always `\"session\"` |\n| `id` | string | Unique session ID |\n| `title` | string | Human-readable title |\n| `status` | string | `idle`, `running`, `rescheduling`, `terminated` |\n| `created_at` | string | ISO 8601 timestamp |\n| `updated_at` | string | ISO 8601 timestamp |\n| `archived_at` | string | ISO 8601 timestamp (nullable) |\n| `environment_id` | string | Environment ID |\n| `agent` | object | Agent configuration |\n| `resources` | array | Attached files, repos, and memory stores |\n| `metadata` | object | User-provided key-value pairs (max 8 keys) |\n| `usage` | object | Token usage statistics |\n\n### Creating a session\n\n**A session is meaningless without an agent.** Sessions reference a pre-created agent by ID. Create the agent first via `agents.create()`, then reference it:\n\n```ts\n// 1. Create the agent (reusable, versioned)\nconst agent = await client.beta.agents.create(\n  {\n    name: \"Coding Assistant\",\n    model: \"claude-opus-4-8\",\n    system: \"You are a helpful coding agent.\",\n    tools: [{ type: \"agent_toolset_20260401\"}],\n  },\n);\n\n// 2. Start a session that references it\nconst session = await client.beta.sessions.create(\n  {\n    agent: agent.id,  // string shorthand → latest version. Or: { type: \"agent\", id: agent.id, version: agent.version }\n    environment_id: environmentId,\n    title: \"Hello World Session\",\n  },\n);\n```\n\n> 💡 **Watch it live in Console.** While developing, print a link so you can click through to the session UI instead of parsing the event stream: `https://platform.claude.com/workspaces/{workspace}/sessions/{session.id}`. Use `default` for the workspace slug unless you created the session in a non-default workspace; the sessions list for your workspace is at `https://platform.claude.com/workspaces/default/sessions`. Include this `print`/`console.log` in code you generate for users who are iterating locally.\n\n**Session creation parameters:**\n\n| Field           | Type     | Required | Description                                    |\n| --------------- | -------- | -------- | ---------------------------------------------- |\n| `agent`         | string or object | **Yes** | String shorthand `\"agent_abc123\"` (latest version) or `{type: \"agent\", id, version}` |\n| `environment_id`| string   | **Yes**  | Environment ID                                 |\n| `title`         | string   | No       | Human-readable name (appears in logs/dashboards) |\n| `resources`     | array    | No       | Files, GitHub repos, or memory stores, attached to the container at startup. Memory stores are session-create-only (not addable via `resources.add()`). |\n| `vault_ids`     | array    | No       | Vault IDs (`vlt_*`) — MCP credentials with auto-refresh. See `shared/managed-agents-tools.md` → Vaults. |\n| `metadata`      | object   | No       | User-provided key-value pairs                  |\n\n**Agent configuration fields** (passed to `agents.create()`, not `sessions.create()`):\n\n| Field         | Type     | Required | Description                                    |\n| ------------- | -------- | -------- | ---------------------------------------------- |\n| `name`        | string   | **Yes**  | Human-readable name (1-256 chars)              |\n| `model`       | string or object | **Yes** | Claude model ID (bare string, or `{id, speed}` object). All Claude 4.5+ models supported. |\n| `system`      | string   | No       | System prompt — defines the agent's behavior (up to 100K chars) |\n| `tools`       | array    | No       | Encompasses three kinds: (1) pre-built Claude Agent tools (`agent_toolset_20260401`), (2) MCP tools (`mcp_toolset`), and (3) custom client-side tools. Max 128. |\n| `mcp_servers` | array    | No       | MCP server connections — standardized third-party capabilities (e.g. GitHub, Asana). Max 20, unique names. See `shared/managed-agents-tools.md` → MCP Servers. |\n| `skills`      | array    | No       | Customized \"best-practices\" context with progressive disclosure. Max 20. See `shared/managed-agents-tools.md` → Skills. |\n| `description` | string   | No       | Description of the agent (up to 2048 chars)    |\n| `multiagent`  | object   | No       | `{type: \"coordinator\", agents: [...]}` — roster this agent may delegate to. See `shared/managed-agents-multiagent.md`. |\n| `metadata`    | object   | No       | Arbitrary key-value pairs (max 16, keys ≤64 chars, values ≤512 chars) |\n\n---\n\n## Agents\n\n**This is where every Managed Agents flow begins.** The agent object is a persisted, versioned configuration — you create it once, then reference it by ID every time you start a session. No agent → no session.\n\n### Agent Object\n\nThe API is **flat** — `model`, `system`, `tools` etc. are top-level fields, not wrapped in an `agent:{}` sub-object.\n\n| Field              | Type     | Required | Description                                        |\n| ------------------ | -------- | -------- | -------------------------------------------------- |\n| `name`             | string   | Yes      | Human-readable name                                |\n| `model`            | string   | Yes      | Claude model ID                                    |\n| `system`           | string   | No       | System prompt                                      |\n| `tools`            | array    | No       | Agent toolset / MCP toolset / custom tools         |\n| `mcp_servers`      | array    | No       | MCP server connections                             |\n| `skills`           | array    | No       | Skill references (max 20)                          |\n| `description`      | string   | No       | Description of the agent                           |\n| `multiagent`       | object   | No       | Coordinator roster — see `shared/managed-agents-multiagent.md` |\n| `metadata`         | object   | No       | Arbitrary key-value pairs                          |\n\n### Lifecycle: create once, run many, update in place\n\nThe agent is a **persistent resource**, not a per-run parameter. The intended pattern:\n\n```\n┌─ setup (once) ─────────┐     ┌─ runtime (every invocation) ─┐\n│ agents.create()        │     │ sessions.create(             │\n│   → store agent_id     │ ──→ │   agent={type:..., id: ID}   │\n│     in config/env/db   │     │ )                            │\n└────────────────────────┘     └──────────────────────────────┘\n```\n\n**Anti-pattern:** calling `agents.create()` at the top of every script run. This accumulates orphaned agent objects, pays create latency on every invocation, and defeats the versioning model. If you see `agents.create()` in a function that's called per-request or per-cron-tick, that's wrong — hoist it to one-time setup and persist the ID.\n\n> **Recommended — define agents and environments as YAML + apply via the `ant` CLI.** The split is **CLI for the control plane, SDK for the data plane**: agents and environments are relatively static resources you manage with `ant` (version-controlled YAML, applied from CI); sessions are dynamic and driven by your application through the SDK. See `shared/anthropic-cli.md` → *Version-controlled Managed Agents resources* for the `ant beta:agents create < agent.yaml` / `update --version N` flow. The SDK `agents.create()` call shown elsewhere in this doc is the in-code equivalent — use it when you need to provision programmatically, but prefer the YAML flow for anything a human maintains.\n\n### Versioning\n\nEach `POST /v1/agents/{id}` (update) creates a new immutable version (numeric timestamp, e.g. `1772585501101368014`). The agent's history is append-only — you can't edit a past version.\n\n**Why version:**\n- **Reproducibility** — pin a session to a known-good config: `{type: \"agent\", id, version: 3}`\n- **Safe iteration** — update the agent without breaking sessions already running on the old version\n- **Rollback** — if a new system prompt regresses, pin new sessions back to the prior version while you debug\n\n**`version` is optional.** Omit it (or use the string shorthand `agent=\"agent_abc123\"`) to get the latest version at session-creation time. Pass it explicitly (`{type: \"agent\", id, version: N}`) to pin for reproducibility.\n\n**Getting the version to pin:** `agents.create()` and `agents.update()` both return `version` in the response. Store it alongside `agent_id`. To fetch the current latest for an existing agent: `GET /v1/agents/{id}` → `.version`.\n\n**When to update vs create new:** Update (`POST /v1/agents/{id}`) when it's conceptually the same agent with tweaked behavior (better prompt, extra tool). Create a new agent when it's a different persona/purpose. Rule of thumb: if you'd give it the same `name`, update.\n\n### Agent Endpoints\n\n| Operation        | Method   | Path                                  |\n| ---------------- | -------- | ------------------------------------- |\n| Create           | `POST`   | `/v1/agents`                          |\n| List             | `GET`    | `/v1/agents`                          |\n| Get              | `GET`    | `/v1/agents/{id}`                     |\n| Update           | `POST`   | `/v1/agents/{id}`                     |\n| Archive          | `POST`   | `/v1/agents/{id}/archive`             |\n\n> ⚠️ **Archive is permanent.** Archiving makes the agent read-only: existing sessions continue to run, but **new sessions cannot reference it**, and there is no unarchive. Since agents have no `delete`, this is the terminal lifecycle state. Never archive a production agent as routine cleanup — confirm with the user first.\n\n### Using an Agent in a Session\n\nReference the agent by string ID (latest version) or by object with an explicit version:\n\n```python\n# String shorthand — uses the agent's latest version\nsession = client.beta.sessions.create(\n    agent=agent.id,\n    environment_id=environment_id,\n)\n\n# Or pin to a specific version (int)\nsession = client.beta.sessions.create(\n    agent={\"type\": \"agent\", \"id\": agent.id, \"version\": agent.version},\n    environment_id=environment_id,\n)\n```\n\n### Updating the agent configuration mid-session\n\n`sessions.update()` can change `agent.tools`, `agent.mcp_servers` (including permission policies), and `vault_ids` on an **existing** session. This is a **session-local override** — it does not create a new agent version and does not propagate back to the agent object. The provided arrays are **full replacements**; to append one tool, `GET` the session, modify, and `POST` back. The session must be `idle` — interrupt first if running.\n\n```python\nclient.beta.sessions.update(\n    session.id,\n    agent={\n        \"tools\": [\n            {\"type\": \"agent_toolset_20260401\"},\n            {\"type\": \"mcp_toolset\", \"mcp_server_name\": \"linear\"},\n        ],\n        \"mcp_servers\": [{\"type\": \"url\", \"name\": \"linear\", \"url\": \"https://mcp.linear.app/sse\"}],\n    },\n    vault_ids=[\"vlt_...\"],\n)\n```\n</doc>\n\n<doc path=\"shared/managed-agents-environments.md\">\n# Managed Agents — Environments & Resources\n\n## Environments\n\nCreating a session requires an `environment_id`. Environments are **reusable configuration templates** for spinning up containers in Anthropic's infrastructure — you might create different environments for different use cases (e.g. data visualization vs web development, with different package sets). Anthropic handles scaling, container lifecycle, and work orchestration.\n\n**Environment names must be unique.** Creating an environment with an existing name returns 409.\n\n### Networking\n\n| Network Policy   | Description                                                   |\n| ---------------- | ------------------------------------------------------------- |\n| `unrestricted`   | Full egress (except legal blocklist)                          |\n| `limited`        | Deny-by-default; opt in via `allowed_hosts` / `allow_package_managers` / `allow_mcp_servers` |\n\n```json\n{\n  \"networking\": {\n    \"type\": \"limited\",\n    \"allow_package_managers\": true,\n    \"allow_mcp_servers\": true,\n    \"allowed_hosts\": [\"api.example.com\"]\n  }\n}\n```\n\nAll three `limited` fields are optional. `allow_package_managers` (default `false`) permits PyPI/npm/etc.; `allow_mcp_servers` (default `false`) permits the agent's configured MCP server endpoints without listing them in `allowed_hosts`.\n\n**MCP caveat:** Under `limited` networking, either set `allow_mcp_servers: true` or add each MCP server domain to `allowed_hosts`. Otherwise the container can't reach them and tools silently fail.\n\n### Creating an environment\n\nThe SDK adds `managed-agents-2026-04-01` automatically. TypeScript:\n\n```ts\nconst env = await client.beta.environments.create({\n  name: \"my_env\",\n  config: {\n    type: \"cloud\",\n    networking: { type: \"unrestricted\" },\n  },\n});\n```\n\n### Self-hosted sandboxes\n\nTo run tool execution in **your own infrastructure** instead of Anthropic's, set `config: {type: \"self_hosted\"}` — the agent loop stays on Anthropic's side, but `bash` / file ops / code execute in a container you control via an outbound-polling worker. The `networking` block does not apply (you control egress). Resource mounting (`file`, `github_repository`) and memory stores behave differently — see `shared/managed-agents-self-hosted-sandboxes.md` for the worker, credentials, and cloud-vs-self-hosted comparison.\n\n### Environment CRUD\n\n| Operation        | Method   | Path                                       | Notes |\n| ---------------- | -------- | ------------------------------------------ | ----- |\n| Create           | `POST`   | `/v1/environments`                         | |\n| List             | `GET`    | `/v1/environments`                         | Paginated (`limit`, `after_id`, `before_id`) |\n| Get              | `GET`    | `/v1/environments/{id}`                    | |\n| Update           | `POST`   | `/v1/environments/{id}`                    | Changes apply only to **new** containers; existing sessions keep their original config |\n| Delete           | `DELETE` | `/v1/environments/{id}`                    | Returns 204. |\n| Archive          | `POST`   | `/v1/environments/{id}/archive`            | Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive — terminal state. |\n\n---\n\n## Resources\n\nAttach files, GitHub repositories, and memory stores to a session. **Session creation blocks until all resources are mounted** — the container won't go `running` until every file and repo is in place. Max **999 file resources** per session. Multiple GitHub repositories per session are supported. For `type: \"memory_store\"` resources (persistent cross-session memory — max 8 per session), see `shared/managed-agents-memory.md`.\n\n### File Uploads (input — host → agent)\n\nUpload a file first via the Files API, then reference by `file_id` + `mount_path`:\n\n```ts\n// 1. Upload\nconst file = await client.beta.files.upload({\n  file: fs.createReadStream(\"data.csv\"),\n  purpose: \"agent\",\n});\n\n// 2. Attach as a session resource\nconst session = await client.beta.sessions.create({\n  agent: agent.id,\n  environment_id: envId,\n  resources: [\n    { type: \"file\", file_id: file.id, mount_path: \"/workspace/data.csv\" }\n  ],\n});\n```\n\n**`mount_path` is required** and must be absolute. Parent directories are created automatically. Agent working directory defaults to `/workspace`. Files are mounted read-only — the agent writes modified versions to new paths.\n\n### Session outputs (output — agent → host)\n\nThe agent can write files to `/mnt/session/outputs/` during a session. These are automatically captured by the Files API and can be listed and downloaded afterwards:\n\n```ts\n// After the turn completes, list output files scoped to this session:\nfor await (const f of client.beta.files.list({\n  scope_id: session.id,\n  betas: [\"managed-agents-2026-04-01\"],\n})) {\n  console.log(f.filename, f.size_bytes);\n  const resp = await client.beta.files.download(f.id);\n  const text = await resp.text();\n}\n```\n\n**Requirements:**\n- The `write` tool (or `bash`) must be enabled for the agent to create output files.\n- Session-scoped `files.list` / `files.download` captures outputs written to `/mnt/session/outputs/`.\n- The filter parameter is **`scope_id`** (REST query param `?scope_id=<session_id>`). The SDK's files resource auto-adds only the `files-api-2025-04-14` header, so pass `betas: [\"managed-agents-2026-04-01\"]` explicitly (or both headers on raw HTTP) — without it the API may reject `scope_id` as an unknown field. Requires `@anthropic-ai/sdk` ≥ 0.88.0 / `anthropic` (Python) ≥ 0.92.0 — older versions don't type `scope_id`. The `ant` CLI does **not** expose this flag yet; use the SDK or curl.\n- Pass the session ID returned by `sessions.create()` verbatim (e.g. `sesn_011CZx...`) — the API validates the prefix.\n- There's a brief indexing lag (~1–3s) between `session.status_idle` and output files appearing in `files.list`. Retry once or twice if empty.\n\n> **Fallback when `scope_id` filtering is unavailable** (older SDK, or endpoint returns an error): send a follow-up `user.message` asking the agent to `read` each file under `/mnt/session/outputs/` and return the contents. The agent streams the file bodies back as `agent.message` text. This works for text files only and costs output tokens — use it to unblock, not as the primary path.\n\nThis gives you a bidirectional file bridge: upload reference data in, download agent artifacts out.\n\n### GitHub Repositories\n\nClones a GitHub repository into the session container during initialization, before the agent begins execution. The agent can read, edit, commit, and push via `bash` (`git`). Multiple repositories per session are supported — add one `resources` entry per repo. Repositories are cached, so future sessions that use the same repository start faster.\n\nRepositories are attached for the lifetime of the session — to change which repositories are mounted, create a new session. You **can** rotate a repository's `authorization_token` on a running session via `client.beta.sessions.resources.update(resource_id, {session_id, authorization_token})`; the resource `id` is returned at session creation and by `resources.list()`.\n\n**Fields:**\n\n| Field | Required | Notes |\n|---|---|---|\n| `type` | ✅ | `\"github_repository\"` |\n| `url` | ✅ | The GitHub repository URL |\n| `authorization_token` | ✅ | GitHub Personal Access Token with repository access. **Never echoed in API responses.** |\n| `mount_path` | ❌ | Path where the repository will be cloned. Defaults to `/workspace/<repo-name>`. |\n| `checkout` | ❌ | `{type: \"branch\", name: \"...\"}` or `{type: \"commit\", sha: \"...\"}`. Defaults to the repo's default branch. |\n\n**Token permission levels** (fine-grained PATs):\n- `Contents: Read` — clone only\n- `Contents: Read and write` — push changes and create pull requests\n\n**How auth works:** `authorization_token` is never placed inside the container. `git pull` / `git push` and GitHub REST calls against the attached repository are routed through an Anthropic-side git proxy that injects the token after the request leaves the sandbox. Code running in the container — including anything the agent writes — cannot read or exfiltrate it.\n\n> ‼️ **To generate pull requests** you also need GitHub **MCP server** access — the `github_repository` resource gives filesystem + git access only. See `shared/managed-agents-tools.md` → MCP Servers. The PR workflow is: edit files in the mounted repo → push branch via `bash` (authenticated via the git proxy using `authorization_token`) → create PR via the MCP `create_pull_request` tool (authenticated via the vault).\n\n**TypeScript:**\n\n```ts\n// 1. Create the agent — declare GitHub MCP (no auth here)\nconst agent = await client.beta.agents.create(\n  {\n    name: 'GitHub Agent',\n    model: 'claude-opus-4-8',\n    mcp_servers: [\n      { type: 'url', name: 'github', url: 'https://api.githubcopilot.com/mcp/' },\n    ],\n    tools: [\n      { type: 'agent_toolset_20260401', default_config: { enabled: true } },\n      { type: 'mcp_toolset', mcp_server_name: 'github' },\n    ],\n  },\n);\n\n// 2. Start a session — attach vault for MCP auth + mount the repo\nconst session = await client.beta.sessions.create({\n  agent: agent.id,\n  environment_id: envId,\n  vault_ids: [vaultId],  // vault contains the GitHub MCP OAuth credential\n  resources: [\n    {\n      type: 'github_repository',\n      url: 'https://github.com/owner/repo',\n      authorization_token: process.env.GITHUB_TOKEN,  // repo clone token (≠ MCP auth)\n      checkout: { type: 'branch', name: 'main' },\n    },\n  ],\n});\n```\n\n**Python:**\n\n```python\nimport os\n\nagent = client.beta.agents.create(\n    name=\"GitHub Agent\",\n    model=\"claude-opus-4-8\",\n    mcp_servers=[{\n        \"type\": \"url\",\n        \"name\": \"github\",\n        \"url\": \"https://api.githubcopilot.com/mcp/\",\n    }],\n    tools=[\n        {\"type\": \"agent_toolset_20260401\", \"default_config\": {\"enabled\": True}},\n        {\"type\": \"mcp_toolset\", \"mcp_server_name\": \"github\"},\n    ],\n)\n\nsession = client.beta.sessions.create(\n    agent=agent.id,\n    environment_id=env_id,\n    vault_ids=[vault_id],  # vault contains the GitHub MCP OAuth credential\n    resources=[{\n        \"type\": \"github_repository\",\n        \"url\": \"https://github.com/owner/repo\",\n        \"authorization_token\": os.environ[\"GITHUB_TOKEN\"],  # repo clone token (≠ MCP auth)\n        \"checkout\": {\"type\": \"branch\", \"name\": \"main\"},\n    }],\n)\n```\n\n---\n\n## Files API\n\nUpload and manage files for use as session resources, and download files the agent wrote to `/mnt/session/outputs/`.\n\n| Operation        | Method   | Path                                  | SDK |\n| ---------------- | -------- | ------------------------------------- | --- |\n| Upload           | `POST`   | `/v1/files`                           | `client.beta.files.upload({ file })` |\n| List             | `GET`    | `/v1/files?scope_id=...`              | `client.beta.files.list({ scope_id, betas: [\"managed-agents-2026-04-01\"] })` |\n| Get Metadata     | `GET`    | `/v1/files/{id}`                      | `client.beta.files.retrieveMetadata(id)` |\n| Download         | `GET`    | `/v1/files/{id}/content`              | `client.beta.files.download(id)` → `Response` |\n| Delete           | `DELETE` | `/v1/files/{id}`                      | `client.beta.files.delete(id)` |\n\nThe `scope_id` filter on List scopes the results to files written to `/mnt/session/outputs/` by that session. Without the filter, you get all files uploaded to your account.\n</doc>\n\n<doc path=\"shared/managed-agents-events.md\">\n# Managed Agents — Events & Steering\n\n## Events\n\n### Sending Events\n\nSend events to a session via `POST /v1/sessions/{id}/events`.\n\n| Event Type                | When to Send                                        |\n| ------------------------- | --------------------------------------------------- |\n| `user.message`            | Send a user message |\n| `user.interrupt`          | Interrupt the agent while it's running |\n| `user.tool_confirmation`  | Approve/deny a tool call (when `always_ask` policy) |\n| `user.custom_tool_result` | Provide result for a custom tool call |\n| `user.define_outcome`     | Start a rubric-graded iterate loop — see `shared/managed-agents-outcomes.md` |\n\n### Receiving Events\n\nThree methods:\n\n1. **Streaming (SSE)**: `GET /v1/sessions/{id}/events/stream` — real-time Server-Sent Events. **Long-lived** — the server sends periodic heartbeats to keep the connection alive.\n2. **Polling**: `GET /v1/sessions/{id}/events` — paginated event list (query params: `limit` default 1000, `page`). **Returns immediately** — this is a plain paginated GET, not a long-poll.\n3. **Webhooks**: Anthropic POSTs session state transitions to your HTTPS endpoint — thin payloads (IDs only), HMAC-signed, Console-registered. See `shared/managed-agents-webhooks.md`.\n\nAll received events carry `id`, `type`, and `processed_at` (ISO 8601; `null` if not yet processed by the agent).\n\n> ⚠️ **Robust polling (raw HTTP).** If you bypass the SDK and roll your own poll loop, don't rely on `requests` or `httpx` timeouts as wall-clock caps — they're **per-chunk** read timeouts, reset every time a byte arrives. A trickling response (heartbeats, a wedged chunked-encoding body, a misbehaving proxy) can keep the call blocked indefinitely even with `timeout=(5, 60)` or `httpx.Timeout(120)`. Neither library has a \"total wall-clock\" timeout built in. For a hard deadline: track `time.monotonic()` at the loop level and break/cancel if a single request exceeds your budget (e.g. via a watchdog thread, or `asyncio.wait_for()` around async httpx). **Prefer the SDK** — `client.beta.sessions.events.stream()` and `client.beta.sessions.events.list()` handle timeout + retry sanely.\n>\n> If `GET /v1/sessions/{id}/events` (paginated) ever hangs after headers, you've likely hit `GET /v1/sessions/{id}/events` by mistake or a server-side stall — report it; don't treat it as a client-config problem.\n\n### Event Types (Received)\n\nEvent types use dot notation, grouped by namespace:\n\n| Event Type | Description |\n| --- | --- |\n| `agent.message` | Agent text output |\n| `agent.thinking` | Extended thinking blocks |\n| `agent.tool_use` | Agent used a built-in tool (`agent_toolset_20260401`) |\n| `agent.tool_result` | Result from a built-in tool |\n| `agent.mcp_tool_use` | Agent used an MCP tool |\n| `agent.mcp_tool_result` | Result from an MCP tool |\n| `agent.custom_tool_use` | Agent invoked a custom tool — session goes idle, you respond with `user.custom_tool_result` |\n| `agent.thread_context_compacted` | Conversation context was compacted |\n| `session.status_idle` | Agent has finished the current task, and is awaiting input. It's either waiting for input to continue working via a `user.message` or blocked awaiting a `user.custom_tool_result` or `user.tool_confirmation`. The `stop_reason` attached contains more information about why the Agent has stopped working. |\n| `session.status_running` | Session has starting running, and the Agent is actively doing work. |\n| `session.status_rescheduled` | Session is (re)scheduling after a retryable error has occurred, ready to be picked up by the orchestration system. |\n| `session.status_terminated` | Session has terminated, entering an irreversible and unusable state.  |\n| `session.error` | Error occurred during processing |\n| `span.model_request_start` | Model inference started |\n| `span.model_request_end` | Model inference completed |\n| `span.outcome_evaluation_start` / `_ongoing` / `_end` | Grader progress for outcome-oriented sessions — see `shared/managed-agents-outcomes.md` |\n| `session.thread_created` | Subagent thread spawned (multiagent) — see `shared/managed-agents-multiagent.md` |\n| `session.thread_status_running` / `_idle` / `_rescheduled` / `_terminated` | Subagent thread status transitions (multiagent). `_idle` carries `stop_reason`. |\n| `agent.thread_message_sent` / `_received` | Cross-thread message, carries `to_session_thread_id` / `from_session_thread_id` (multiagent) |\n\nThe stream also echoes back user-sent events (`user.message`, `user.interrupt`, `user.tool_confirmation`, `user.custom_tool_result`, `user.define_outcome`).\n\n---\n\n## Steering Patterns\n\nPractical patterns for driving a session via the events surface.\n\n### Stream-first ordering\n\n**Open the stream before sending events.** The stream only delivers events that occur *after* it's opened — it does not replay current state or historical events. If you send a message first and open the stream second, early events (including fast status transitions) arrive buffered in a single batch and you lose the ability to react to them in real time.\n\n```ts\n// ✅ Correct — stream and send concurrently\nconst [response] = await Promise.all([\n  streamEvents(sessionId),   // opens SSE connection\n  sendMessage(sessionId, text),\n]);\n\n// ❌ Wrong — events before stream opens arrive as a single buffered batch\nawait sendMessage(sessionId, text);\nconst response = await streamEvents(sessionId);\n```\n\n**For full history,** use `GET /v1/sessions/{id}/events` (paginated list) — the stream only gives you live events from connection onward.\n\n### Reconnecting after a dropped stream\n\n**The SSE stream has no replay.** If your connection drops (httpx read timeout, network blip) and you reconnect, you only get events emitted *after* reconnection. Any events emitted during the gap are lost from the stream.\n\n**The consolidation pattern:** on every (re)connect, overlap the stream with a history fetch and dedupe by event ID:\n\n```python\ndef connect_with_consolidation(client, session_id):\n    # 1. Open the SSE stream first\n    stream = client.beta.sessions.events.stream(session_id=session_id)\n\n    # 2. Fetch history to cover any gap\n    history = client.beta.sessions.events.list(\n        session_id=session_id,\n    )\n\n    # 3. Yield history first, then stream — dedupe by event.id\n    seen = set()\n    for ev in history.data:\n        seen.add(ev.id)\n        yield ev\n    for ev in stream:\n        if ev.id not in seen:\n            seen.add(ev.id)\n            yield ev\n```\n\n### Message queuing\n\n**You don't have to wait for a response before sending the next message.** User events are queued server-side and processed in order. This is useful for chat bridges where the user sends rapid follow-ups:\n\n```ts\n// All three go into one session; agent processes them in order\nawait sendMessage(sessionId, \"Summarize the README\");\nawait sendMessage(sessionId, \"Actually also check the CONTRIBUTING guide\");\nawait sendMessage(sessionId, \"And compare the two\");\n// Stream once — agent responds to all three as a coherent turn\n```\n\nEvents can be sent up to the Session at any time. There is no need to wait on a specific session status to enqueue new events via `client.beta.sessions.events.send()`\n\n### Interrupt\n\nAn `interrupt` event **jumps the queue** (ahead of any pending user messages) and forces the session into `idle`. Use this for \"stop\" / \"nevermind\" / \"cancel\" commands:\n\n```ts\nawait client.beta.sessions.events.send(sessionId, {\n  events: [{ type: 'interrupt' }],\n});\n```\n\nThe agent stops mid-task. It does not see the interrupt as a message — it just halts. Send a follow-up `user` event to explain what to do instead. If an outcome is active, the interrupt also marks `span.outcome_evaluation_end.result: \"interrupted\"` (see `shared/managed-agents-outcomes.md`).\n\n> **Note**: Interrupt events may have empty IDs in the current implementation. When troubleshooting, use the `processed_at` timestamp along with surrounding event IDs.\n\n### Event payloads\n\nsome events carry useful metadata beyond the status change itself:\n\n`session.status_idle` — includes a `stop_reason` field which elaborates on why the session stopped and what type of further action is required by the user.\n```json\n{\n  \"id\": \"sevt_456\",\n  \"processed_at\": \"2026-04-07T04:27:43.197Z\",\n  \"stop_reason\": {\n    \"event_ids\": [\n      \"sevt_123\"\n    ],\n    \"type\": \"requires_action\"\n  },\n  \"type\": \"status_idle\"\n}\n```\n\n`span.model_request_end` contains a `model_usage` field for cost tracking and efficiency analysis:\n\n```json\n{\n  \"type\": \"span.model_request_end\",\n  \"id\": \"sevt_456\",\n  \"is_error\": false,\n  \"model_request_start_id\": \"sevt_123\",\n  \"model_usage\": {\n    \"cache_creation_input_tokens\": 0,\n    \"cache_read_input_tokens\": 6656,\n    \"input_tokens\": 3571,\n    \"output_tokens\": 727\n  },\n  \"processed_at\": \"2026-04-07T04:11:32.189Z\"\n}\n```\n\n**`agent.thread_context_compacted`** — emitted when the conversation history was summarized to fit context. Includes `pre_compaction_tokens` so you know how much was squeezed:\n\n```json\n{\n  \"id\": \"sevt_abc123\",\n  \"processed_at\": \"2026-03-24T14:05:15.787Z\",\n  \"type\": \"agent.thread_context_compacted\"\n}\n```\n\n### Archive\n\nWhen done with a session, archive it to free resources:\n\n```ts\nawait client.beta.sessions.archive(sessionId);\n```\n\n> Archiving a **session** is routine cleanup — sessions are per-run and disposable. **Do not generalize this to agents or environments**: those are persistent, reusable resources, and archiving them is permanent (no unarchive; new sessions cannot reference them). See `shared/managed-agents-overview.md` → Common Pitfalls.\n</doc>\n\n<doc path=\"shared/managed-agents-memory.md\">\n# Managed Agents — Memory Stores\n\n> **Public beta.** Memory stores ship under the `managed-agents-2026-04-01` beta header; the SDK sets it automatically on all `client.beta.memory_stores.*` calls. If `client.beta.memory_stores` is missing, upgrade to the latest SDK release.\n\nSessions are ephemeral by default — when one ends, anything the agent learned is gone. A **memory store** is a workspace-scoped collection of small text documents that persists across sessions. When a store is attached to a session (via `resources[]`), it is mounted into the container as a filesystem directory; the agent reads and writes it with the ordinary file tools, and a system-prompt note tells it the mount is there.\n\nEvery mutation to a memory produces an immutable **memory version** (`memver_...`), giving you an audit trail and point-in-time rollback/redact.\n\n## Object model\n\n| Object | ID prefix | Scope | Notes |\n| --- | --- | --- | --- |\n| Memory store | `memstore_...` | Workspace | Attach to sessions via `resources[]` |\n| Memory | `mem_...` | Store | One text file, addressed by `path` (≤ 100KB each — prefer many small files) |\n| Memory version | `memver_...` | Memory | Immutable snapshot per mutation; `operation` ∈ `created` / `modified` / `deleted` |\n\n## Create a store\n\n`description` is passed to the agent so it knows what the store contains — write it for the model, not for humans.\n\n```python\nstore = client.beta.memory_stores.create(\n    name=\"User Preferences\",\n    description=\"Per-user preferences and project context.\",\n)\nprint(store.id)  # memstore_01Hx...\n```\n\nOther SDKs: TypeScript `client.beta.memoryStores.create({...})`; Go `client.Beta.MemoryStores.New(ctx, ...)`. See `shared/managed-agents-api-reference.md` → SDK Method Reference for the full per-language table.\n\nStores support `retrieve` / `update` / `list` (with `include_archived`, `created_at_{gte,lte}` filters) / `delete` / **`archive`**. Archive makes the store read-only — existing session attachments continue, new sessions cannot reference it; no unarchive.\n\n### Seed with content (optional)\n\nPre-load reference material before any session runs. `memories.create` creates a memory at the given `path`; if a memory already exists there the call returns `409` (`memory_path_conflict_error`, with the `conflicting_memory_id`). The store ID is the first positional argument.\n\n```python\nclient.beta.memory_stores.memories.create(\n    store.id,\n    path=\"/formatting_standards.md\",\n    content=\"All reports use GAAP formatting. Dates are ISO-8601...\",\n)\n```\n\n## Attach to a session\n\nMemory stores go in the session's `resources[]` array alongside `file` and `github_repository` resources (see `shared/managed-agents-environments.md` → Resources). Memory stores attach at **session create time only** — `sessions.resources.add()` does not accept `memory_store`.\n\n```python\nsession = client.beta.sessions.create(\n    agent=agent.id,\n    environment_id=environment.id,\n    resources=[\n        {\n            \"type\": \"memory_store\",\n            \"memory_store_id\": store.id,\n            \"access\": \"read_write\",  # or \"read_only\"; default is \"read_write\"\n            \"instructions\": \"User preferences and project context. Check before starting any task.\",\n        }\n    ],\n)\n```\n\n| Field | Required | Notes |\n| --- | --- | --- |\n| `type` | ✅ | `\"memory_store\"` |\n| `memory_store_id` | ✅ | `memstore_...` |\n| `access` | — | `\"read_write\"` (default) or `\"read_only\"` — enforced at the filesystem level on the mount |\n| `instructions` | — | Session-specific guidance for this store, in addition to the store's `name`/`description`. ≤ 4,096 chars. |\n\n**Max 8 memory stores per session.** Attach multiple when different slices of memory have different owners or lifecycles — e.g. one read-only shared-reference store plus one read-write per-user store, or one store per end-user/team/project sharing a single agent config.\n\n### How the agent sees it (FUSE mount)\n\nEach attached store is mounted in the session container at `/mnt/memory/<store-name>/`. The agent interacts with it using the standard file tools (`bash`, `read`, `write`, `edit`, `glob`, `grep`) — there are no dedicated memory tools. `access: \"read_only\"` makes the mount read-only at the filesystem level; `\"read_write\"` allows the agent to create, edit, and delete files under it. A short description of each mount (name, path, `instructions`, access) is automatically injected into the system prompt so the agent knows the store exists without you having to mention it.\n\nWrites the agent makes under the mount are persisted back to the store and produce memory versions just like host-side `memories.update` calls.\n\n## Manage memories directly (host-side)\n\nUse these for review workflows, correcting bad memories, or seeding stores out-of-band.\n\n### List\n\nReturns `Memory | MemoryPrefix` entries — a `MemoryPrefix` (`type: \"memory_prefix\"`, just a `path`) is a directory-like node when listing hierarchically. Use `path_prefix` to scope (include a trailing slash: `\"/notes/\"` matches `/notes/a.md` but not `/notes_backup/old.md`) and `depth` to bound the tree walk. `order_by` / `order` sort the result. Pass `view=\"full\"` to include `content` in each item; the default `\"basic\"` returns metadata only.\n\n```python\nfor m in client.beta.memory_stores.memories.list(store.id, path_prefix=\"/\"):\n    if m.type == \"memory\":\n        print(f\"{m.path}  ({m.content_size_bytes} bytes, sha={m.content_sha256[:8]})\")\n    else:  # \"memory_prefix\"\n        print(f\"{m.path}/\")\n```\n\n### Read\n\n```python\nmem = client.beta.memory_stores.memories.retrieve(memory_id, memory_store_id=store.id)\nprint(mem.content)\n```\n\n`retrieve` defaults to `view=\"full\"` (content included); `view` matters mainly on list endpoints.\n\n### Create vs. update\n\n| Operation | Addressed by | Semantics |\n| --- | --- | --- |\n| `memories.create(store_id, path=..., content=...)` | **Path** | Create at `path`. `409` (`memory_path_conflict_error`, includes `conflicting_memory_id`) if the path is already occupied. |\n| `memories.update(mem_id, memory_store_id=..., path=..., content=...)` | **`mem_...` ID** | Mutate existing memory. Change `content`, `path` (rename), or both. Renaming onto an occupied path returns the same `409 memory_path_conflict_error`. |\n\n```python\nmem = client.beta.memory_stores.memories.create(\n    store.id,\n    path=\"/preferences/formatting.md\",\n    content=\"Always use tabs, not spaces.\",\n)\n\nclient.beta.memory_stores.memories.update(\n    mem.id,\n    memory_store_id=store.id,\n    path=\"/archive/2026_q1_formatting.md\",  # rename\n)\n```\n\n### Optimistic concurrency (precondition on `update`)\n\n`memories.update` accepts a `precondition` so you can read → modify → write back without clobbering a concurrent writer. The only supported type is `content_sha256`. On mismatch the API returns `409` (`memory_precondition_failed_error`) — re-read and retry against fresh state.\n\n```python\nclient.beta.memory_stores.memories.update(\n    mem.id,\n    memory_store_id=store.id,\n    content=\"CORRECTED: Always use 2-space indentation.\",\n    precondition={\"type\": \"content_sha256\", \"content_sha256\": mem.content_sha256},\n)\n```\n\n### Delete\n\n```python\nclient.beta.memory_stores.memories.delete(mem.id, memory_store_id=store.id)\n```\n\nPass `expected_content_sha256` for a conditional delete.\n\n## Audit and rollback — memory versions\n\nEvery mutation creates an immutable `memver_...` snapshot. Versions accumulate for the lifetime of the parent memory; `memories.retrieve` always returns the current head, the version endpoints give you history.\n\n| Operation that triggers it | `operation` field on the version |\n| --- | --- |\n| `memories.create` at a new path | `\"created\"` |\n| `memories.update` changing `content`, `path`, or both (or an agent-side write to the mount) | `\"modified\"` |\n| `memories.delete` | `\"deleted\"` |\n\nEach version also records `created_by` — an actor object with `type` ∈ `session_actor` / `api_actor` / `user_actor` — and, after redaction, `redacted_at` + `redacted_by`.\n\n### List versions\n\nNewest-first, paginated. Filter by `memory_id`, `operation`, `session_id`, `api_key_id`, or `created_at_gte` / `created_at_lte`. Pass `view=\"full\"` to include `content`; default is metadata-only.\n\n```python\nfor v in client.beta.memory_stores.memory_versions.list(store.id, memory_id=mem.id):\n    print(f\"{v.id}: {v.operation}\")\n```\n\n### Retrieve a version\n\n```python\nversion = client.beta.memory_stores.memory_versions.retrieve(\n    version_id, memory_store_id=store.id\n)\nprint(version.content)\n```\n\n### Redact a version\n\nScrubs content from a historical version while preserving the audit trail (actor + timestamps). Clears `content`, `content_sha256`, `content_size_bytes`, and `path`; everything else stays. Use for leaked secrets, PII, or user-deletion requests.\n\n```python\nclient.beta.memory_stores.memory_versions.redact(version_id, memory_store_id=store.id)\n```\n\n## Endpoint reference\n\nSee `shared/managed-agents-api-reference.md` → Memory Stores / Memories / Memory Versions for the full HTTP method/path tables. Raw HTTP base path:\n\n```\nPOST   /v1/memory_stores\nPOST   /v1/memory_stores/{memory_store_id}/archive\nGET    /v1/memory_stores/{memory_store_id}/memories\nPATCH  /v1/memory_stores/{memory_store_id}/memories/{memory_id}\nGET    /v1/memory_stores/{memory_store_id}/memory_versions\nPOST   /v1/memory_stores/{memory_store_id}/memory_versions/{version_id}/redact\n```\n\nFor cURL examples and the CLI (`ant beta:memory-stores ...`), WebFetch the Memory URL in `shared/live-sources.md` → Managed Agents.\n</doc>\n\n<doc path=\"shared/managed-agents-multiagent.md\">\n# Managed Agents — Multiagent Sessions\n\nA coordinator agent can delegate to other agents within one session. All agents **share the container and filesystem**; each runs in its own **thread** — a context-isolated event stream with its own conversation history, model, system prompt, tools, MCP servers, and skills (from that agent's own config). Threads are persistent: the coordinator can send a follow-up to a subagent it called earlier and that subagent retains its prior turns.\n\nThe SDK sets the `managed-agents-2026-04-01` beta header automatically on all `client.beta.{agents,sessions}.*` calls; no additional header is required for multiagent.\n\n---\n\n## Declare the roster on the coordinator\n\n`multiagent` is a **top-level field** on `agents.create()` / `agents.update()` — **not** a `tools[]` entry. `agents` lists 1–20 roster entries. Nothing changes on `sessions.create()` — the roster is resolved from the coordinator's config.\n\n```python\norchestrator = client.beta.agents.create(\n    name=\"Engineering Lead\",\n    model=\"claude-opus-4-8\",\n    system=\"You coordinate engineering work. Delegate code review to the reviewer and test writing to the test agent.\",\n    tools=[{\"type\": \"agent_toolset_20260401\"}],\n    multiagent={\n        \"type\": \"coordinator\",\n        \"agents\": [\n            reviewer.id,                                            # bare string — latest version\n            {\"type\": \"agent\", \"id\": test_writer.id, \"version\": 4},  # pinned version\n            {\"type\": \"self\"},                                       # the coordinator itself\n        ],\n    },\n)\n\nsession = client.beta.sessions.create(agent=orchestrator.id, environment_id=env.id)\n```\n\n| Roster entry | Shape | Notes |\n|---|---|---|\n| String shorthand | `\"agent_abc123\"` | References the latest version of a stored agent. |\n| Agent reference | `{type: \"agent\", id, version?}` | Omit `version` to pin the latest at coordinator save time. |\n| Self | `{type: \"self\"}` | The coordinator can spawn copies of itself. |\n\nUp to **20 unique agents** in the roster; the coordinator may spawn **multiple copies** of each. **One level of delegation only** — depth > 1 is ignored.\n\n---\n\n## Threads\n\nThe session-level event stream is the **primary thread** — it shows the coordinator's trace plus a condensed view of subagent activity (thread status transitions and cross-thread messages, not every subagent tool call). Drill into a specific subagent via the per-thread endpoints:\n\n| Operation | HTTP | SDK (`client.beta.sessions.threads.*`) |\n|---|---|---|\n| List threads | `GET /v1/sessions/{sid}/threads` | `.list(session_id)` |\n| Retrieve one | `GET /v1/sessions/{sid}/threads/{tid}` | `.retrieve(thread_id, session_id=...)` |\n| Archive | `POST /v1/sessions/{sid}/threads/{tid}/archive` | `.archive(thread_id, session_id=...)` |\n| List thread events | `GET /v1/sessions/{sid}/threads/{tid}/events` | `.events.list(thread_id, session_id=...)` |\n| Stream thread events | `GET /v1/sessions/{sid}/threads/{tid}/stream` | `.events.stream(thread_id, session_id=...)` |\n\nEach `SessionThread` carries `id`, `status` (`running` | `idle` | `rescheduling` | `terminated`), `agent` (a resolved snapshot of the agent config — `id`, `name`, `model`, `system`, `tools`, `skills`, `mcp_servers`, `version`), `parent_thread_id` (null for the primary thread, which is included in the list), `archived_at`, and optional `stats`/`usage`. **Session status aggregates thread statuses** — if any thread is `running`, `session.status` is `running`. Max **25 concurrent threads**. When draining a per-thread stream, break on `session.thread_status_idle` (and check its `stop_reason` as you would for the session-level idle).\n\n---\n\n## Multiagent events (on the session stream)\n\n| Event | Payload highlights | Meaning |\n|---|---|---|\n| `session.thread_created` | `session_thread_id`, `agent_name` | A new thread was created. |\n| `session.thread_status_running` | `session_thread_id`, `agent_name` | Thread started activity. |\n| `session.thread_status_idle` | `session_thread_id`, `agent_name`, **`stop_reason`** | Thread is awaiting input. Inspect `stop_reason` (same shape as `session.status_idle.stop_reason`). |\n| `session.thread_status_rescheduled` | `session_thread_id`, `agent_name` | Thread is rescheduling after a retryable error. |\n| `session.thread_status_terminated` | `session_thread_id`, `agent_name` | Thread was archived or hit a terminal error. |\n| `agent.thread_message_sent` | `to_session_thread_id`, `to_agent_name`, `content` | Coordinator sent a follow-up to another thread. |\n| `agent.thread_message_received` | `from_session_thread_id`, `from_agent_name`, `content` | An agent delivered its result to the coordinator. |\n\n---\n\n## Tool permissions and custom tools from subagent threads\n\nWhen a subagent needs your client (an `always_ask` confirmation, or a custom tool result), the request is **cross-posted to the primary thread** with `session_thread_id` identifying the originating thread — so you only need to watch the session stream. Reply with `user.tool_confirmation` (carrying `tool_use_id`) or `user.custom_tool_result` (carrying `custom_tool_use_id`), and **echo the `session_thread_id` from the originating event** (the SDK param type and docstring expect it). The server also routes by the tool-use ID, so the echo is belt-and-suspenders rather than load-bearing — but include it.\n\n```python\nfor event_id in stop.event_ids:\n    pending = events_by_id[event_id]\n    confirmation = {\n        \"type\": \"user.tool_confirmation\",\n        \"tool_use_id\": event_id,\n        \"result\": \"allow\",\n    }\n    if pending.session_thread_id is not None:\n        confirmation[\"session_thread_id\"] = pending.session_thread_id\n    client.beta.sessions.events.send(session.id, events=[confirmation])\n```\n\nThe same pattern applies to `user.custom_tool_result`.\n\n---\n\n## Pitfalls\n\n- **Don't put the roster on `sessions.create()` or in `tools[]`.** `multiagent` is a top-level agent field; update the coordinator, then start a session that references it.\n- **Don't assume shared context.** Threads share the filesystem but not conversation history or tools. If the coordinator needs a subagent to act on something, it must say so in the delegated message (or write it to disk).\n- **Depth > 1 is ignored.** A subagent's own `multiagent` roster (if any) doesn't cascade — only the session's coordinator delegates.\n\nFor per-language bindings beyond Python, WebFetch `https://platform.claude.com/docs/en/managed-agents/multi-agent.md` (see `shared/live-sources.md`).\n</doc>\n\n<doc path=\"shared/managed-agents-onboarding.md\">\n# Managed Agents — Onboarding Flow\n\n> **Invoked via `/claude-api managed-agents-onboard`?** You're in the right place. Run the interview below — don't summarize it back to the user, ask the questions.\n\nUse this when a user wants to set up a Managed Agent from scratch: **branch on know-vs-explore → configure the template → set up the session → pre-flight viability check → emit working code.** The pre-flight check (§3) is not optional — a setup missing a tool, credential, or data access it needs will fail mid-run, and the gap is usually visible at setup time.\n\n> Read `shared/managed-agents-core.md` alongside this — it has full detail for each knob. This doc is the interview script, not the reference.\n\n---\n\nClaude Managed Agents is a hosted agent: Anthropic runs the agent loop on its orchestration layer and provisions a sandboxed container per session where the agent's tools execute (or, with a `self_hosted` environment, your own worker runs the tools — see `shared/managed-agents-self-hosted-sandboxes.md`). You supply the agent config and the environment config; the harness — event stream, sandbox orchestration, prompt caching, context compaction, and extended thinking — is handled for you.\n\n**What you supply:**\n- **An agent config** — tools, skills, model, system prompt. Reusable and versioned.\n- **An environment config** — the sandbox your agent's tools execute in (`cloud`: networking, packages; or `self_hosted`: your own infra). Reusable across agents.\n\nEach run of the agent is a **session**.\n\n---\n\n## 1. Know or explore?\n\nAsk the user:\n\n> Do you already know the agent you want to build, or would you like to explore some common patterns first?\n\n### Explore path — show the patterns\n\nFour shapes, same runtime code path (`sessions.create()` → `sessions.events.send()` → stream). Only the trigger and sink differ.\n\n| Pattern | Trigger | Example |\n|---|---|---|\n| Event-triggered | Webhook | GitHub PR push → CMA (GitHub tool) → Slack |\n| Scheduled | Cron | Daily brief: browser + GitHub + Jira → CMA → Slack |\n| Fire-and-forget PR | Human | Slack slash-command → CMA (GitHub tool) → PR passing CI |\n| Research + dashboard | Human | Topic → CMA (web search + `frontend-design` skill) → HTML dashboard |\n\nAsk which shape fits, then continue with the Know path using it as the reference.\n\n### Know path — configure template\n\nThree rounds. Batch the questions in each round; don't ask them one at a time.\n\n**Round A — Tools.** Start here; it's the most concrete part. Three types; ask which the user wants (any combination):\n\n| Type | What it is | How to guide |\n|---|---|---|\n| **Prebuilt Claude Agent tools** (`agent_toolset_20260401`) | Ready-to-use: `bash`, `read`, `write`, `edit`, `glob`, `grep`, `web_fetch`, `web_search`. Enable all at once, or individually via `enabled: true/false`. | Recommend enabling the full toolset. List the 8 tools so the user knows what they're getting. Full detail: `shared/managed-agents-tools.md` → Agent Toolset. |\n| **MCP tools** | Third-party integrations (GitHub, Linear, Asana, etc.) via `mcp_toolset`. Credentials live in a vault, not inline. | Ask which services. For each, walk through MCP server URL + vault credentials. Full detail: `shared/managed-agents-tools.md` → MCP Servers + Vaults. |\n| **Custom tools** | The user's own app handles these tool calls — agent fires `agent.custom_tool_use`, the app sends a result message back. | Ask for each tool: name, description, input schema. The app code that handles the event is *their* code — don't generate it. Full detail: `shared/managed-agents-tools.md` → Custom Tools. |\n\n**Round B — Skills, files, and repos.** What the agent has on hand when it starts.\n\n*Skills* — two types; both work the same way — Claude auto-uses them when relevant. Max 20 per agent.\n- [ ] **Pre-built Agent Skills**: `xlsx`, `docx`, `pptx`, `pdf`. Reference by name.\n- [ ] **Custom Skills**: skills uploaded to the user's org via the Skills API. Reference by `skill_id` + optional `version`. If the skill doesn't exist yet, walk the user through `POST /v1/skills` + `POST /v1/skills/{id}/versions` (beta header `skills-2025-10-02`). Full detail: `shared/managed-agents-tools.md` → Skills + Skills API.\n\n*GitHub repositories* — any repos the agent needs on-disk? For each:\n- [ ] Repo URL (`https://github.com/org/repo`)\n- [ ] `authorization_token` (PAT or GitHub App token scoped to the repo)\n- [ ] Optional `mount_path` (defaults to `/workspace/<repo-name>`) and `checkout` (branch or SHA)\n\nEmit as `resources: [{type: \"github_repository\", url, authorization_token, ...}]`. Full detail: `shared/managed-agents-environments.md` → GitHub Repositories.\n\n> ‼️ **PR creation needs the GitHub MCP server too.** `github_repository` gives filesystem access only — to open PRs, also attach the GitHub MCP server in Round A and credential it via a vault. The workflow is: edit files in the mounted repo → push branch via `bash` → create PR via the MCP `create_pull_request` tool.\n\n*Files* — any local files to seed the session with? For each:\n- [ ] Upload via the Files API → persist `file_id`\n- [ ] Choose a `mount_path` — absolute, e.g. `/workspace/data.csv` (parents auto-created; files mount read-only)\n\nEmit as `resources: [{type: \"file\", file_id, mount_path}]`. Max 999 file resources. Agent working directory defaults to `/workspace`. Full detail: `shared/managed-agents-environments.md` → Files API.\n\n**Round C — Identity, success criteria, environment:**\n- [ ] Name?\n- [ ] Job (one or two sentences — becomes the system prompt)?\n- [ ] **What does \"done\" look like?** Push for concrete, checkable success criteria — not \"a good report\" but \"a CSV with a numeric `price` column per SKU.\" Explicit criteria give the agent a clear target and let you verify the result; vague ones leave it guessing what \"done\" means. If they're gradeable, plan to wire an **Outcome** in §2 so the harness grades-and-revises against them. See `shared/managed-agents-outcomes.md`.\n- [ ] Networking: unrestricted internet from the container, or lock egress to specific hosts? (If locked, MCP server domains must be in `allowed_hosts` or tools silently fail.)\n- [ ] Model? (default `claude-opus-4-8`)\n\n---\n\n## 2. Set up the session\n\nPer-run. Points at the agent + environment, attaches credentials, kicks off.\n\n**Vault credentials** (if the agent declared MCP servers):\n- [ ] Existing vault, or create one? (`client.beta.vaults.create()` + `vaults.credentials.create()`)\n\nCredentials are write-only, matched to MCP servers by URL, auto-refreshed. See `shared/managed-agents-tools.md` → Vaults.\n\n**Kickoff — pick one:**\n- [ ] **Conversational:** a first `user.message` to the agent.\n- [ ] **Outcome-graded** (recommended when §Round C produced checkable criteria): send a `user.define_outcome` with a rubric *instead of* a `user.message` — the harness iterates and grades against the rubric until satisfied. Don't send both. See `shared/managed-agents-outcomes.md`.\n\nSession creation blocks until all resources mount. Open the event stream before sending the kickoff. Stream is SSE; break on `session.status_terminated`, or on `session.status_idle` with a terminal `stop_reason` — i.e. anything except `requires_action`, which fires transiently while the session waits on a tool confirmation or custom-tool result (see `shared/managed-agents-client-patterns.md` Pattern 5). Usage lands on `span.model_request_end`. Agent-written artifacts end up in `/mnt/session/outputs/` — download via `files.list({scope_id: session.id, betas: [\"managed-agents-2026-04-01\"]})`.\n\n**Console escape hatch.** In the runtime block you emit, print the session's Console URL right after `sessions.create()` so the user can watch it in the UI while iterating: `print(f\"Watch in Console: https://platform.claude.com/workspaces/default/sessions/{session.id}\")` (swap `default` for the user's workspace slug if they named one).\n\n---\n\n## 3. Pre-flight viability check — reconcile the job against the resources\n\n**Do this before emitting any code.** A common, avoidable failure is an under-resourced run: the ask is clear, but the agent is missing a tool, a credential, data access, or the context to act. The agent discovers the gap a few turns in, flails, and gives up — burning the budget to produce nothing. The gap is usually visible at setup time. Catch it here, not after the session fails.\n\nWalk the stated job clause by clause. For each action the agent must take, confirm a resource covers it — and name the gap out loud if one doesn't:\n\n| Gap class | Check | If missing |\n|---|---|---|\n| **Tool / integration** (most catchable upfront — config is statically inspectable) | Every verb in the job maps to an enabled tool or MCP server. \"Triage tickets\" → a ticketing MCP server; \"open a PR\" → GitHub MCP server (a `github_repository` mount alone can't open PRs); \"search the web\" → `web_search` enabled in the toolset. | Add the tool/MCP server in §Round A, or cut the ask from the job. |\n| **Credential / access** | Every MCP server has a vault credential attached (§2). Every external host the job touches is reachable — networking `unrestricted`, or the host is in `allowed_hosts`. | Create/attach the vault; widen `allowed_hosts`. These don't fail until runtime — the smoke-test in §4 is how you surface them cheaply. |\n| **Data** | Every file, dataset, or repo the job references is mounted as a `resource` (file, `github_repository`, or memory store). | Upload + mount it in §Round B, or tell the agent where to fetch it from. |\n| **Prompt quality / criteria** | The job is specific enough to act on, and \"done\" is checkable (§Round C). | Tighten the job; wire an Outcome. |\n\nState any unmet gaps to the user and resolve them before generating code. Don't emit a config you already know is under-resourced — an agent can't complete a task it lacks the tools, credentials, or data for.\n\n---\n\n## 4. Emit the code\n\nGo straight from the last interview answer to the code — no preamble about the setup-vs-runtime split, no \"the critical thing to internalize…\", no lecture about `agents.create()` being one-time. The two-block structure below already shows that; don't narrate it. Generate **two clearly-separated blocks**:\n\n**Block 1 — Setup (run once, store the IDs).** Prefer emitting this as **YAML files + `ant` CLI commands** — agents and environments are version-controlled definitions, and the CLI flow is what users should check into their repo and run from CI. Fall back to SDK code only if the user explicitly wants setup in-language or the `ant` CLI is unavailable.\n\nEmit:\n1. `<name>.agent.yaml` with everything from §Round A–C (flat: `name`, `model`, `system`, `tools`, `mcp_servers`, `skills`)\n2. `<name>.environment.yaml` with §Round C networking\n3. The apply commands:\n   ```sh\n   AGENT_ID=$(ant beta:agents create < <name>.agent.yaml --transform id -r)\n   ENV_ID=$(ant beta:environments create < <name>.environment.yaml --transform id -r)\n   # CI sync: ant beta:agents update --agent-id \"$AGENT_ID\" --version N < <name>.agent.yaml\n   ```\n\nSee `shared/anthropic-cli.md` for the full CLI reference. If emitting SDK code instead, label it `# ONE-TIME SETUP — run once, save the IDs to config/.env` and call `environments.create()` → `agents.create()`.\n\n**Block 2 — Runtime (run on every invocation).** This is SDK code in the detected language (Python/TS/cURL — see SKILL.md → Language Detection). The runtime path needs to react programmatically to events (tool confirmations, custom tool results, reconnect), which is SDK territory — don't emit shell loops here.\n1. Load `env_id` + `agent_id` from config/env\n2. `sessions.create(agent=AGENT_ID, environment_id=ENV_ID, resources=[...], vault_ids=[...])` — this blocks until resources mount, so a bad file/repo mount surfaces *here*, before any tokens are spent.\n3. **Smoke-test first when the job depends on MCP servers, credentials, or reachable hosts.** Credential and MCP-connectivity failures don't surface at `sessions.create()` — only when the agent first tries to use them. Send one cheap probe turn (\"Confirm you can reach <service> and list 1–2 items; don't start the task yet\"), check it succeeded, *then* send the real kickoff. A few hundred tokens here beats a runaway session that flails on a missing credential and gives up. Skip for agents with no external dependencies.\n4. Open stream, `events.send()` the kickoff (a `user.message`, or a `user.define_outcome` if §2 chose the outcome-graded path), loop until `session.status_terminated` or `session.status_idle && stop_reason.type !== 'requires_action'` (see `shared/managed-agents-client-patterns.md` Pattern 5 for the full gate — do not break on bare `session.status_idle`)\n\n> ⚠️ **Never emit `agents.create()` and `sessions.create()` in the same unguarded block.** That teaches the user to create a new agent on every run — the #1 anti-pattern. If they need a single script, wrap agent creation in `if not os.getenv(\"AGENT_ID\"):`.\n\nPull exact syntax from `python/managed-agents/README.md`, `typescript/managed-agents/README.md`, or `curl/managed-agents.md`. Don't invent field names.\n</doc>\n\n<doc path=\"shared/managed-agents-outcomes.md\">\n# Managed Agents — Outcomes\n\nAn **outcome** elevates a session from *conversation* to *work*: you state what \"done\" looks like, and the harness runs an iterate → grade → revise loop until the artifact meets the rubric, hits `max_iterations`, or is interrupted. A separate **grader** (independent context window) scores each iteration against your rubric and feeds per-criterion gaps back to the agent.\n\nThe SDK sets the `managed-agents-2026-04-01` beta header automatically on all `client.beta.sessions.*` calls; no additional header is required for outcomes.\n\n---\n\n## The `user.define_outcome` event\n\nOutcomes are not a field on `sessions.create()`. You create a normal session, then send a `user.define_outcome` event. The agent starts working on receipt — **do not also send a `user.message`** to kick it off.\n\n```python\nsession = client.beta.sessions.create(\n    agent=AGENT_ID,\n    environment_id=ENVIRONMENT_ID,\n    title=\"Financial analysis on Costco\",\n)\n\nclient.beta.sessions.events.send(\n    session_id=session.id,\n    events=[\n        {\n            \"type\": \"user.define_outcome\",\n            \"description\": \"Build a DCF model for Costco in .xlsx\",\n            \"rubric\": {\"type\": \"text\", \"content\": RUBRIC_MD},\n            # or: \"rubric\": {\"type\": \"file\", \"file_id\": rubric.id}\n            \"max_iterations\": 5,  # optional; default 3, max 20\n        }\n    ],\n)\n```\n\n| Field | Type | Notes |\n|---|---|---|\n| `type` | `\"user.define_outcome\"` | |\n| `description` | string | The task. This is what the agent works toward — no separate `user.message` needed. |\n| `rubric` | `{type: \"text\", content}` \\| `{type: \"file\", file_id}` | **Required.** Markdown with explicit, independently gradeable criteria. Upload once via `client.beta.files.upload(...)` (beta `files-api-2025-04-14`) to reuse across sessions. |\n| `max_iterations` | int | Optional. Default **3**, max **20**. |\n\nThe event is echoed back on the stream with a server-assigned `outcome_id` and `processed_at`.\n\n> **Writing rubrics.** Use explicit, gradeable criteria (\"CSV has a numeric `price` column\"), not vibes (\"data looks good\") — the grader scores each criterion independently, so vague criteria produce noisy loops. If you don't have a rubric, have Claude analyze a known-good artifact and turn that analysis into one.\n\n---\n\n## Outcome-specific events\n\nThese appear on the standard event stream (`sessions.events.stream` / `.list`) alongside the usual `agent.*` / `session.*` events.\n\n| Event | Payload highlights | Meaning |\n|---|---|---|\n| `span.outcome_evaluation_start` | `outcome_id`, `iteration` (0-indexed) | Grader began scoring iteration *N*. |\n| `span.outcome_evaluation_ongoing` | `outcome_id` | Heartbeat while the grader runs. Grader reasoning is opaque — you see *that* it's working, not *what* it's thinking. |\n| `span.outcome_evaluation_end` | `outcome_evaluation_start_id`, `outcome_id`, `iteration`, `result`, `explanation`, `usage` | Grader finished one iteration. `result` drives what happens next (table below). |\n\n### `span.outcome_evaluation_end.result`\n\n| `result` | Next |\n|---|---|\n| `satisfied` | Session → `idle`. Terminal for this outcome. |\n| `needs_revision` | Agent starts another iteration. |\n| `max_iterations_reached` | No further grader cycles. Agent may run one final revision, then session → `idle`. |\n| `failed` | Session → `idle`. Rubric fundamentally doesn't match the task (e.g. description and rubric contradict). |\n| `interrupted` | Only emitted if `_start` had already fired before a `user.interrupt` arrived. |\n\n```json\n{\n  \"type\": \"span.outcome_evaluation_end\",\n  \"id\": \"sevt_01jkl...\",\n  \"outcome_evaluation_start_id\": \"sevt_01def...\",\n  \"outcome_id\": \"outc_01a...\",\n  \"result\": \"satisfied\",\n  \"explanation\": \"All 12 criteria met: revenue projections use 5 years of historical data, ...\",\n  \"iteration\": 0,\n  \"usage\": { \"input_tokens\": 2400, \"output_tokens\": 350, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 1800 },\n  \"processed_at\": \"2026-03-25T14:03:00Z\"\n}\n```\n\n---\n\n## Checking status & retrieving deliverables\n\n**Status** — either watch the stream for `span.outcome_evaluation_end`, or poll the session and read `outcome_evaluations`:\n\n```python\nsession = client.beta.sessions.retrieve(session.id)\nfor ev in session.outcome_evaluations:\n    print(f\"{ev.outcome_id}: {ev.result}\")  # outc_01a...: satisfied\n```\n\n**Deliverables** — the agent writes to `/mnt/session/outputs/`. Once idle, fetch via the Files API with `scope_id=session.id`. This is the same session-outputs mechanism documented in `shared/managed-agents-environments.md` → Session outputs (including the dual-beta-header requirement on `files.list`).\n\n---\n\n## Interaction rules & pitfalls\n\n- **One outcome at a time.** Chain by sending the next `user.define_outcome` only after the previous one's terminal `span.outcome_evaluation_end` (`satisfied` / `max_iterations_reached` / `failed` / `interrupted`). The session retains history across chained outcomes.\n- **Steering is allowed but optional.** You *may* send `user.message` events mid-outcome to nudge direction, but the agent already knows to keep working until terminal — don't send \"keep going\" prompts.\n- **`user.interrupt` pauses the current outcome** — it marks `result: \"interrupted\"` and leaves the session `idle`, ready for a new outcome or conversational turn.\n- **After terminal, the session is reusable** — continue conversationally or define a new outcome.\n- **Outcome ≠ session-create field.** Don't put `outcome`, `rubric`, or `description` on `sessions.create()` — outcomes are always sent as a `user.define_outcome` event.\n- **Idle-break gate is unchanged.** In your drain loop, keep using `event.type === 'session.status_idle' && event.stop_reason?.type !== 'requires_action'` — do **not** gate on `span.outcome_evaluation_end` alone (on `needs_revision` the session keeps running). See `shared/managed-agents-client-patterns.md` Pattern 5.\n\nFor the raw HTTP shapes and per-language SDK bindings beyond Python, WebFetch `https://platform.claude.com/docs/en/managed-agents/define-outcomes.md` (see `shared/live-sources.md`).\n</doc>\n\n<doc path=\"shared/managed-agents-overview.md\">\n# Managed Agents — Overview\n\nManaged Agents provisions a container per session as the agent's workspace. The agent loop runs on Anthropic's orchestration layer; the container is where the agent's *tools* execute — bash commands, file operations, code. You create a persisted **Agent** config (model, system prompt, tools, MCP servers, skills), then start **Sessions** that reference it. The session streams events back to you; you send user messages and tool results in.\n\n## ⚠️ THE MANDATORY FLOW: Agent (once) → Session (every run)\n\n**Why agents are separate objects: versioning.** An agent is a persisted, versioned config — every update creates a new immutable version, and sessions pin to a version at creation time. This lets you iterate on the agent (tweak the prompt, add a tool) without breaking sessions already running, roll back if a change regresses, and A/B test versions side-by-side. None of that works if you `agents.create()` fresh on every run.\n\nEvery session references a pre-created `/v1/agents` object. Create the agent once, store the ID, and reuse it across runs.\n\n| Step | Call | Frequency |\n|---|---|---|\n| 1 | `POST /v1/agents` — `model`, `system`, `tools`, `mcp_servers`, `skills` live here | **ONCE.** Store `agent.id` **and** `agent.version`. |\n| 2 | `POST /v1/sessions` — `agent: \"agent_abc123\"` or `{type: \"agent\", id, version}` | **Every run.** String shorthand uses latest version. |\n\nIf you're about to write `sessions.create()` with `model`, `system`, or `tools` on the session body — **stop**. Those fields live on `agents.create()`. The session takes a *pointer* only.\n\n**When generating code, separate setup from runtime.** `agents.create()` belongs in a setup script (or a guarded `if agent_id is None:` block), not at the top of the hot path. If the user's code calls `agents.create()` on every invocation, they're accumulating orphaned agents and paying the create latency for nothing. The correct shape is: create once → persist the ID (config file, env var, secrets manager) → every run loads the ID and calls `sessions.create()`.\n\n**To change the agent's behavior, use `POST /v1/agents/{id}` — don't create a new one.** Each update bumps the version; running sessions keep their pinned version, new sessions get the latest (or pin explicitly via `{type: \"agent\", id, version}`). See `shared/managed-agents-core.md` → Agents → Versioning. To change `tools`/`mcp_servers`/`vault_ids` on **one running session** without touching the agent object, use `sessions.update()` — see `shared/managed-agents-core.md` → Updating the agent configuration mid-session.\n\n## Beta Headers\n\nManaged Agents is in beta. The SDK sets required beta headers automatically:\n\n| Beta Header                    | What it enables                                      |\n| ------------------------------ | ---------------------------------------------------- |\n| `managed-agents-2026-04-01`    | Agents, Environments, Sessions, Events, Session Resources, Session Threads, Outcomes, Multiagent, Vaults, Credentials, Memory Stores |\n| `skills-2025-10-02`            | Skills API (for managing custom skill definitions)   |\n| `files-api-2025-04-14`         | Files API for file uploads                           |\n\n**Which beta header goes where:** The SDK sets `managed-agents-2026-04-01` automatically on `client.beta.{agents,environments,sessions,vaults,memory_stores}.*` calls, and `files-api-2025-04-14` / `skills-2025-10-02` automatically on `client.beta.files.*` / `client.beta.skills.*` calls. You do NOT need to add the Skills or Files beta header when calling Managed Agents endpoints. **Exception — session-scoped file listing:** `client.beta.files.list({scope_id: session.id})` is a Files endpoint that takes a Managed Agents parameter, so it needs **both** headers. Pass `betas: [\"managed-agents-2026-04-01\"]` explicitly on that call (the SDK adds the Files header; you add the Managed Agents one). See `shared/managed-agents-environments.md` → Session outputs.\n\n\n## Reading Guide\n\n| User wants to...                       | Read these files                                        |\n| -------------------------------------- | ------------------------------------------------------- |\n| **Get started from scratch / \"help me set up an agent\"** | `shared/managed-agents-onboarding.md` — guided interview (WHERE→WHO→WHAT→WATCH), then emit code |\n| Understand how the API works           | `shared/managed-agents-core.md`                         |\n| See the full endpoint reference        | `shared/managed-agents-api-reference.md`                |\n| **Create an agent** (required first step) | `shared/managed-agents-core.md` (Agents section) + language file |\n| Update/version an agent                | `shared/managed-agents-core.md` (Agents → Versioning) — update, don't re-create |\n| Create a session                       | `shared/managed-agents-core.md` + `{lang}/managed-agents/README.md` |\n| Configure tools and permissions        | `shared/managed-agents-tools.md`                        |\n| Set up MCP servers                     | `shared/managed-agents-tools.md` (MCP Servers section)  |\n| Stream events / handle tool_use        | `shared/managed-agents-events.md` + language file       |\n| Get notified of session state changes via webhook (no polling) | `shared/managed-agents-webhooks.md` — Console-registered endpoint, HMAC verify, thin payload + fetch |\n| Define an outcome / rubric-graded iterate loop | `shared/managed-agents-outcomes.md` — `user.define_outcome` event, grader, `span.outcome_evaluation_*` events |\n| Coordinate multiple agents / subagents / threads | `shared/managed-agents-multiagent.md` — `multiagent: {type: \"coordinator\", agents: [...]}` on the agent, session threads, cross-posted tool confirmations |\n| Set up environments                    | `shared/managed-agents-environments.md` + language file |\n| Run tool execution in your own infra / VPC (self-hosted sandbox) | `shared/managed-agents-self-hosted-sandboxes.md` — `config:{type:\"self_hosted\"}`, `ANTHROPIC_ENVIRONMENT_KEY`, `EnvironmentWorker.run()` / `ant beta:worker poll` |\n| Upload files / attach repos            | `shared/managed-agents-environments.md` (Resources)     |\n| Give agents persistent memory across sessions | `shared/managed-agents-memory.md` — memory stores, `memory_store` session resource, preconditions, versions/redact |\n| Define agents/environments as version-controlled YAML; drive the API from the shell | `shared/anthropic-cli.md` — `ant beta:agents create < agent.yaml`, `--transform`, `@file` inlining |\n| Store MCP credentials                  | `shared/managed-agents-tools.md` (Vaults section)       |\n| Call a non-MCP API / CLI that needs a secret | `shared/managed-agents-client-patterns.md` Pattern 9 — no container env vars; vaults are MCP-only; keep the secret host-side via a custom tool |\n\n## Common Pitfalls\n\n- **Agent FIRST, then session — NO EXCEPTIONS** — the session's `agent` field accepts **only** a string ID or `{type: \"agent\", id, version}`. `model`, `system`, `tools`, `mcp_servers`, `skills` are **top-level fields on `POST /v1/agents`**, never on `sessions.create()`. If the user hasn't created an agent, that is step zero of every example.\n- **Agent ONCE, not every run** — `agents.create()` is a setup step. Store the returned `agent_id` and reuse it; don't call `agents.create()` at the top of your hot path. If the agent's config needs to change, `POST /v1/agents/{id}` — each update creates a new version, and sessions can pin to a specific version for reproducibility.\n- **MCP auth goes through vaults** — the agent's `mcp_servers` array declares `{type, name, url}` only (no auth). Credentials live in vaults (`client.beta.vaults.credentials.create`) and attach to sessions via `vault_ids`. Anthropic auto-refreshes OAuth tokens using the stored refresh token.\n- **Reconcile resources before the first run** — a session with a clear ask but a missing tool, credential, data mount, or context will discover the gap mid-run, then flail and give up. Before creating the session, check that every action in the task maps to a configured tool/MCP server, every MCP server has a vault credential, and every referenced file/host is mounted/reachable. When helping a user set one up, run the reconciliation in `shared/managed-agents-onboarding.md` → §3 Pre-flight viability check.\n- **Stream to get events** — `GET /v1/sessions/{id}/events/stream` is the primary way to receive agent output in real-time.\n- **SSE stream has no replay — reconnect with consolidation** — if the stream drops while a `agent.tool_use`, `agent.mcp_tool_use`, or `agent.custom_tool_use` is pending resolution (`user.tool_confirmation` for the first two, `user.custom_tool_result` for the last one), the session deadlocks (client disconnects → session idles → reconnect happens → no client resolution happens). On every (re)connect: open stream with `GET /v1/sessions/{id}/events/stream` , fetch `GET /v1/sessions/{id}/events`, dedupe by event ID, then proceed. See `shared/managed-agents-events.md` → Reconnecting after a dropped stream.\n- **Don't trust HTTP-library timeouts as wall-clock caps** — `requests` `timeout=(c, r)` and `httpx.Timeout(n)` are *per-chunk* read timeouts; they reset every byte, so a trickling connection can block indefinitely. For a hard deadline on raw-HTTP polling, track `time.monotonic()` at the loop level and bail explicitly. Prefer the SDK's `sessions.events.stream()` / `session.events.list()` over hand-rolled HTTP. See `shared/managed-agents-events.md` → Receiving Events.\n- **Messages queue** — you can send events while the session is `running` or `idle`; they're processed in order. No need to wait for a response before sending the next message.\n- **Environment `config.type` is `\"cloud\"` or `\"self_hosted\"`** — `cloud` runs the container on Anthropic's infrastructure; `self_hosted` moves tool execution to your own (see `shared/managed-agents-self-hosted-sandboxes.md`).\n- **Archive is permanent on every resource** — archiving an agent, environment, session, vault, credential, or memory store makes it read-only with no unarchive. For agents, environments, and memory stores specifically, archived resources cannot be referenced by new sessions (existing sessions continue). Do not call `.archive()` on a production agent, environment, or memory store as cleanup — **always confirm with the user before archiving**.\n</doc>\n\n<doc path=\"shared/managed-agents-self-hosted-sandboxes.md\">\n# Managed Agents — Self-Hosted Sandboxes\n\nWith `config.type: \"self_hosted\"`, the **agent loop stays on Anthropic's orchestration layer** but **tool execution moves to infrastructure you control** — bash, file ops, and code run inside your container, so filesystem contents and network egress never leave your environment. Contrast with `config.type: \"cloud\"`, where Anthropic runs the container. Connectivity is **outbound-only**: your worker long-polls Anthropic's work queue; Anthropic never dials into your network.\n\n## Flow\n\n```\n1. Create environment:      config: {type: \"self_hosted\"}        → env_...\n2. Generate environment key (Console, on the environment page)   → sk-ant-oat01-...  as ANTHROPIC_ENVIRONMENT_KEY\n3. Run a worker:            EnvironmentWorker.run()  or  ant beta:worker poll\n4. Sessions reference       environment_id=env_... exactly as for cloud\n```\n\n## Create the environment\n\n```python\nclient = anthropic.Anthropic()\n\nenvironment = client.beta.environments.create(\n    name=\"self-hosted\", config={\"type\": \"self_hosted\"}\n)\n```\n\n`{\"type\": \"self_hosted\"}` is the entire config — there are no pool, capacity, or networking sub-fields; you control those on your side.\n\n## Run a worker — SDK (primary path)\n\n`EnvironmentWorker` wraps the poll → dispatch → tool-execute loop. `.run()` is the always-on loop; `.run_one()` / `.runOne()` handles one work item (for webhook-driven wake).\n\n**Python — always-on:**\n\n```python\nimport asyncio\nimport os\nfrom anthropic import AsyncAnthropic\nfrom anthropic.lib.environments import EnvironmentWorker\n\n\nasync def main() -> None:\n    environment_key = os.environ[\"ANTHROPIC_ENVIRONMENT_KEY\"]\n    environment_id = os.environ[\"ANTHROPIC_ENVIRONMENT_ID\"]\n    async with AsyncAnthropic(auth_token=environment_key) as client:\n        await EnvironmentWorker(\n            client,\n            environment_id=environment_id,\n            environment_key=environment_key,\n            workdir=\"/workspace\",\n        ).run()\n\n\nasyncio.run(main())\n```\n\n**TypeScript — always-on:**\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\nimport { EnvironmentWorker } from \"@anthropic-ai/sdk/helpers/beta/environments\";\n\nconst environmentKey = process.env.ANTHROPIC_ENVIRONMENT_KEY!;\nconst environmentId = process.env.ANTHROPIC_ENVIRONMENT_ID!;\nconst client = new Anthropic({ authToken: environmentKey });\nconst ctrl = new AbortController();\nprocess.once(\"SIGTERM\", () => ctrl.abort());\n\nawait new EnvironmentWorker({\n  client,\n  environmentId,\n  environmentKey,\n  workdir: \"/workspace\",\n  signal: ctrl.signal\n}).run();\n```\n\n**Customizing tools.** `EnvironmentWorker` runs the built-in toolset by default. To add or replace tools, use `AgentToolContext(workdir=, client=, session_id=)` with `beta_agent_toolset(env)` / `betaAgentToolset(env)` and pass the resulting tools to the lower-level `tool_runner()`. Skills attached to the agent are downloaded into `{workdir}/skills/<name>/` before tool calls begin (`AgentToolContext` handles this when given `client` and `session_id`). Downloaded skill files are marked executable automatically by the CLI and SDK; if you implement skills download yourself, you set permissions.\n\n> **Runtime deps:** the SDK helpers require `/bin/bash` at that exact path. The TypeScript SDK additionally requires `unzip`, `tar`, and Node.js 22+. These are resolved at fixed paths and do **not** respect `PATH` overrides.\n\n## Run a worker — `ant` CLI (fixed tools)\n\nThe `ant` CLI ships a worker with the fixed built-in toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`). Install per `shared/anthropic-cli.md`, then:\n\n```sh\nexport ANTHROPIC_ENVIRONMENT_KEY=sk-ant-oat01-...\nant beta:worker poll --environment-id env_... --workdir /workspace\n```\n\n- `--workdir` is the directory tools operate in (default `.`); tool calls are sandboxed to it.\n- `--environment-key` overrides the env var.\n- `--on-work <script>` runs your script per work item (e.g. to spin a fresh container per session — see Container orchestration below).\n- `--unrestricted-paths`, `--max-idle` (default `60s`), `--log-format` — see `ant beta:worker poll --help`.\n- Flags fall back to env vars (`ANTHROPIC_ENVIRONMENT_ID`, `ANTHROPIC_ENVIRONMENT_KEY`).\n- Exits cleanly on SIGTERM/SIGINT after draining in-flight work.\n- **Fixed toolset** — for custom tools, use the SDK worker above.\n\nInside an `--on-work` container, run `ant beta:worker run --workdir <dir>` as the entrypoint.\n\n## Webhook-driven wake (instead of always-on)\n\nRegister a webhook for `session.status_run_started` (see `shared/managed-agents-webhooks.md`), verify the delivery, then drain one work item with `.run_one()`:\n\n```python\nimport os\nimport anthropic\nfrom anthropic.lib.environments import EnvironmentWorker\n\nenvironment_key = os.environ[\"ANTHROPIC_ENVIRONMENT_KEY\"]\nenvironment_id = os.environ[\"ANTHROPIC_ENVIRONMENT_ID\"]\nclient = anthropic.AsyncAnthropic(\n    auth_token=environment_key,\n)  # reads ANTHROPIC_WEBHOOK_SIGNING_KEY from env for webhooks.unwrap()\n\n\nasync def handle(raw: bytes, headers: dict[str, str]) -> dict:\n    event = client.beta.webhooks.unwrap(raw.decode(), headers=headers)\n    if event.data.type != \"session.status_run_started\":\n        return {\"status\": \"ignored\"}\n    await EnvironmentWorker(\n        client,\n        environment_id=environment_id,\n        environment_key=environment_key,\n        workdir=\"/workspace\",\n    ).run_one()\n    return {\"status\": \"ok\"}\n```\n\nTypeScript: same shape with `client.beta.webhooks.unwrap(body, {headers})` and `new EnvironmentWorker({...}).runOne()`.\n\n## Container orchestration (mid-level)\n\n`EnvironmentWorker.run()` polls and executes tools in the same process. To run each session in its **own** container, use the mid-level poller in a thin orchestrator — Python `client.beta.environments.work.poller(environment_id=, environment_key=, drain=, block_ms=, reclaim_older_than_ms=, auto_stop=)`; TypeScript `new WorkPoller({client, environmentId, environmentKey, autoStop})` from `@anthropic-ai/sdk/helpers/beta/environments` — and, for each yielded `work` item, start a fresh container with these env vars injected, whose entrypoint runs `ant beta:worker run` or an `EnvironmentWorker(...).run_one()`. `block_ms` is 1–999 (or `None` for non-blocking); `reclaim_older_than_ms` re-claims items leased to a dead worker; `drain` stops once the queue is empty; `auto_stop` posts a stop signal after the iterator exits (set `False` when the launched container owns the stop call). **Go's poller has no `auto_stop` opt-out** — it calls `work.Stop` when the handler returns, so block in the handler until the session completes rather than detaching.\n\n| Env var | Value |\n|---|---|\n| `ANTHROPIC_SESSION_ID` | `work.data.id` |\n| `ANTHROPIC_WORK_ID` | `work.id` |\n| `ANTHROPIC_ENVIRONMENT_ID` | `work.environment_id` |\n| `ANTHROPIC_ENVIRONMENT_KEY` | pass through |\n| `ANTHROPIC_BASE_URL` | pass through |\n\nSkip items where `work.data.type != \"session\"`.\n\n## Monitoring & control\n\nThese are **control-plane** calls — authenticate with `x-api-key` (not the environment key); `managed-agents-2026-04-01` beta header. **Call them from outside the worker host** — setting `ANTHROPIC_API_KEY` on the worker host exposes an organization-scoped credential to agent tool calls.\n\n| SDK (`client.beta.environments.work.*`) | REST | CLI | Returns |\n|---|---|---|---|\n| `stats(environment_id)` | `GET /v1/environments/{id}/work/stats` | `ant beta:environments:work stats` | `{type:\"work_queue_stats\", depth, pending, oldest_queued_at, workers_polling}` |\n| `stop(work_id, environment_id=)` | `POST /v1/environments/{id}/work/{work_id}/stop` | `ant beta:environments:work stop` | `work.state` |\n\n## What changes vs `cloud`\n\n| Concern | `cloud` | `self_hosted` |\n|---|---|---|\n| Container lifecycle, hardening, networking | Anthropic | **You** — run non-root, read-only rootfs, drop caps; egress is whatever your VPC/firewall allows |\n| `file` / `github_repository` resource mounting | Anthropic mounts into the container | **You** — pass pointers via `sessions.create(metadata={...})` and have your orchestrator fetch/clone before dispatch |\n| `memory_store` resources | Supported | **Not yet supported** |\n| Built-in tools | Via `agent_toolset_20260401` | Supplied by your worker (`EnvironmentWorker` default / `beta_agent_toolset(env)` / `ant` CLI fixed set) |\n| Skills download | Automatic | `EnvironmentWorker` / `AgentToolContext` fetch into `{workdir}/skills/` (needs `client` + `session_id`) |\n| Claude Platform on AWS | Supported | **Not available** |\n| SDK worker helpers | All SDKs | **Python, TypeScript, Go only** (`EnvironmentWorker` / poller not in Java, Ruby, PHP, or C#) — use one of those three or the `ant` CLI |\n\n## Credentials\n\n| Credential | Format | Scope |\n|---|---|---|\n| `ANTHROPIC_ENVIRONMENT_KEY` | `sk-ant-oat01-...` | One environment's work queue. Generate in Console (\"Generate environment key\"). Pass as `auth_token=` / `authToken` on the client **and** as `environment_key=` / `environmentKey` on `EnvironmentWorker`. Store in a secrets manager; rotate on exposure. |\n| `ANTHROPIC_WEBHOOK_SIGNING_KEY` | `whsec_...` | Webhook signature verification (if using webhook-driven wake). The SDK reads this env var automatically for `client.beta.webhooks.unwrap()`. |\n\n## Security — what you own\n\nContainer hardening; egress restriction (there is no default); `ANTHROPIC_ENVIRONMENT_KEY` custody and rotation; one workspace + environment per trust boundary when running untrusted code; least-privilege for the tool process; log retention and redaction. **Anthropic cannot**: fast-revoke a leaked environment key, verify your image or supply chain, sandbox tool execution inside your container, or enforce retention after tool output reaches your infrastructure. See the Self-Hosted Sandboxes Security page in `shared/live-sources.md` for the full checklist.\n</doc>\n\n<doc path=\"shared/managed-agents-tools.md\">\n# Managed Agents — Tools & Skills\n\n## Tools\n\n### Server tools vs client tools\n\n| Type | Who runs it | How it works |\n|---|---|---|\n| **Prebuilt Claude Agent tools** (`agent_toolset_20260401`) | Anthropic, on the session's container (for `cloud` envs; for `self_hosted`, **your** worker supplies and runs them — see `shared/managed-agents-self-hosted-sandboxes.md`) | File ops, bash, web search, etc. Enable all at once or configure individually with `enabled: true/false`. |\n| **MCP tools** (`mcp_toolset`) | Anthropic's orchestration layer | Capabilities exposed by connected MCP servers. Grant access per-server via the toolset. |\n| **Custom tools** | **You** — your application handles the call and returns results | Agent emits a `agent.custom_tool_use` event, session goes `idle`, you send back a `user.custom_tool_result` event. |\n\n**Recommendation:** Enable all prebuilt tools via `agent_toolset_20260401`, then disable individually as needed.\n\n**Versioning:** The toolset is a versioned, static resource. When underlying tools change, a new toolset version is created (hence `_20260401`) so you always know exactly what you're getting.\n\n### Agent Toolset\n\nThe `agent_toolset_20260401` provides these built-in tools:\n\n| Tool                   | Description                              |\n| ---------------------- | ---------------------------------------- |\n| `bash` | Execute bash commands in a shell session |\n| `read` | Read a file from the local filesystem, including text, images, PDFs, and Jupyter notebooks |\n| `write` | Write a file to the local filesystem |\n| `edit` | Perform string replacement in a file |\n| `glob` | Fast file pattern matching using glob patterns |\n| `grep` | Text search using regex patterns |\n| `web_fetch` | Fetch content from a URL |\n| `web_search` | Search the web for information |\n\nEnable the full toolset:\n\n```json\n{\n  \"tools\": [\n    { \"type\": \"agent_toolset_20260401\" }\n  ]\n}\n```\n\n### Per-Tool Configuration\n\nOverride defaults for individual tools. This example enables everything except bash:\n\n```json\n{\n  \"tools\": [\n    {\n      \"type\": \"agent_toolset_20260401\",\n      \"default_config\": { \"enabled\": true },\n      \"configs\": [\n        { \"name\": \"bash\", \"enabled\": false }\n      ]\n    }\n  ]\n}\n```\n\n| Field | Required | Description |\n|---|---|---|\n| `type` | ✅ | `\"agent_toolset_20260401\"` |\n| `default_config` | ❌ | Applied to all tools. `{ \"enabled\": bool, \"permission_policy\": {...} }` |\n| `configs` | ❌ | Per-tool overrides: `[{ \"name\": \"...\", \"enabled\": bool, \"permission_policy\": {...} }]` |\n\n### Permission Policies\n\nControl when server-executed tools (agent toolset + MCP) run automatically vs wait for approval. Does not apply to custom tools.\n\n| Policy | Behavior |\n|---|---|\n| `always_allow` | Tool executes automatically (default) |\n| `always_ask` | Session emits `session.status_idle` and pauses until you send a `tool_confirmation` event |\n\n```json\n{\n  \"type\": \"agent_toolset_20260401\",\n  \"default_config\": {\n    \"enabled\": true,\n    \"permission_policy\": { \"type\": \"always_allow\" }\n  },\n  \"configs\": [\n    { \"name\": \"bash\", \"permission_policy\": { \"type\": \"always_ask\" } }\n  ]\n}\n```\n\n**Responding to `always_ask`:** Send a `user.tool_confirmation` event with `tool_use_id` from the triggering `agent_tool_use`/`mcp_tool_use` event:\n\n```json\n{ \"type\": \"tool_confirmation\", \"tool_use_id\": \"sevt_abc123\", \"result\": \"allow\" }\n{ \"type\": \"tool_confirmation\", \"tool_use_id\": \"sevt_def456\", \"result\": \"deny\", \"message\": \"Read .env.example instead\" }\n```\n\nThe optional `message` on a deny is delivered to the agent so it can adjust its approach.\n\nTo enable only specific tools, flip the default off and opt-in per tool:\n\n```json\n{\n  \"tools\": [\n    {\n      \"type\": \"agent_toolset_20260401\",\n      \"default_config\": { \"enabled\": false },\n      \"configs\": [\n        { \"name\": \"bash\", \"enabled\": true },\n        { \"name\": \"read\", \"enabled\": true }\n      ]\n    }\n  ]\n}\n```\n\n### Custom Tools (Client-Side)\n\nCustom tools are executed by **your application**, not Anthropic. The flow:\n\n1. Agent decides to use the tool → session emits a `agent.custom_tool_use` event with inputs\n2. Session goes `idle` waiting for you\n3. Your application executes the tool\n4. You send back a `user.custom_tool_result` event with the output\n5. Session resumes `running`\n\nNo permission policy needed — you're the one executing.\n\n```json\n{\n  \"tools\": [\n    {\n      \"type\": \"custom\",\n      \"name\": \"get_weather\",\n      \"description\": \"Fetch current weather for a city.\",\n      \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n          \"city\": { \"type\": \"string\", \"description\": \"City name\" }\n        },\n        \"required\": [\"city\"]\n      }\n    }\n  ]\n}\n```\n\n### MCP Servers\n\nMCP (Model Context Protocol) servers expose standardized third-party capabilities (e.g. Asana, GitHub, Linear). **Configuration is split across agent and vault:**\n\n1. **Agent creation** declares which servers to connect to (`type`, `name`, `url` — no auth). The agent's `mcp_servers` array has no auth field.\n2. **Vault** stores the OAuth credentials. Attach via `vault_ids` on session create.\n\nThis keeps secrets out of reusable agent definitions. Each vault credential is tied to one MCP server URL; Anthropic matches credentials to servers by URL.\n\n**Agent side — declare servers (no auth):**\n\n| Field | Required | Description |\n|---|---|---|\n| `type` | ✅ | `\"url\"` |\n| `name` | ✅ | Unique name — referenced by `mcp_toolset.mcp_server_name` |\n| `url` | ✅ | The MCP server's endpoint URL (Streamable HTTP transport) |\n\n```json\n{\n  \"mcp_servers\": [\n    { \"type\": \"url\", \"name\": \"linear\", \"url\": \"https://mcp.linear.app/mcp\" }\n  ],\n  \"tools\": [\n    { \"type\": \"mcp_toolset\", \"mcp_server_name\": \"linear\" }\n  ]\n}\n```\n\n**Session side — attach vault:**\n\n```json\n{\n  \"agent\": \"agent_abc123\",\n  \"environment_id\": \"env_abc123\",\n  \"vault_ids\": [\"vlt_abc123\"]\n}\n```\n\n> 💡 **Per-tool enablement (empirical):** `mcp_toolset` has been observed accepting `default_config: {enabled: false}` + `configs: [{name, enabled: true}]` for an allowlist pattern. The API ref shows only the minimal `{type, mcp_server_name}` form.\n\n> 💡 **Changing tools/MCP servers on a running session:** `sessions.update()` can replace `agent.tools`, `agent.mcp_servers`, and `vault_ids` while the session is `idle` — a session-local override that doesn't touch the agent object. See `shared/managed-agents-core.md` → Updating the agent configuration mid-session.\n\n**Large MCP tool outputs.** If an MCP tool returns more than **100K tokens**, the output is automatically offloaded to a file in the sandbox — the agent receives a truncated preview plus the file path and can `read` the full content. No configuration required.\n\n**Invalid vault credentials don't block session creation.** If a vault credential is invalid for a declared MCP server, the session still creates successfully; a `session.error` event describes the MCP auth failure, and auth retries on the next `session.status_idle` → `session.status_running` transition.\n\n> ⚠️ **MCP auth tokens ≠ REST API tokens.** Hosted MCP servers (`mcp.notion.com`, `mcp.linear.app`, etc.) typically require **OAuth bearer tokens**, not the service's native API keys. A Notion `ntn_` integration token authenticates against Notion's REST API but will **not** work as a vault credential for the Notion MCP server. These are different auth systems.\n\n### Vaults — the MCP credential store\n\n**Vaults** store OAuth credentials (access token + refresh token) that Anthropic auto-refreshes on your behalf via standard OAuth 2.0 `refresh_token` grant. This is the only way to authenticate MCP servers in the launch SDK.\n\n#### Credentials and the sandbox\n\nVaults store credentials; those credentials **never enter the sandbox**. This is a deliberate security boundary — code running in the sandbox (including anything the agent writes) cannot read or exfiltrate a vaulted credential, even under prompt injection. Instead, credentials are injected by Anthropic-side proxies **after** a request leaves the sandbox:\n\n- **MCP tool calls** are routed through an Anthropic-side proxy that fetches the credential from the vault and adds it to the outbound request.\n- **Git operations on attached GitHub repositories** (`git pull`, `git push`, GitHub REST calls) are routed through a git proxy that injects the `github_repository` resource's `authorization_token` the same way.\n\n**Not yet supported:** running other authenticated CLIs (e.g. `aws`, `gcloud`, `stripe`) directly inside the sandbox. There is currently no way to set container environment variables or expose vault credentials to arbitrary processes. If you need one of these today:\n\n- **Prefer an MCP server** for that service if one exists — it gets the same vault-backed injection.\n- **Otherwise, register a custom tool:** the agent emits `agent.custom_tool_use`, your orchestrator (which already holds the credential) executes the call and returns `user.custom_tool_result` over the same authenticated event stream. No public endpoint is exposed; the sandbox never sees the secret. See `shared/managed-agents-client-patterns.md` → Pattern 9.\n\n**Do not put API keys in the system prompt or user messages as a workaround** — they persist in the session's event history.\n\n> Formerly known internally as TATs (Tool/Tenant Access Tokens).\n\n**Flow:**\n\n1. Create a vault (`client.beta.vaults.create(...)`) — one per tenant/user, or one shared, depending on your model\n2. Add MCP credentials to it (`client.beta.vaults.credentials.create(...)`) — each credential is tied to one MCP server URL\n3. Reference the vault on session create via `vault_ids: [\"vlt_...\"]`\n4. Anthropic auto-refreshes tokens before they expire; the agent uses the current access token when calling MCP tools\n\n**Credential shape**:\n\n```json\n{\n  \"display_name\": \"Notion (workspace-foo)\",\n  \"auth\": {\n    \"type\": \"mcp_oauth\",\n    \"mcp_server_url\": \"https://mcp.notion.com/mcp\",\n    \"access_token\": \"<current access token>\",\n    \"expires_at\": \"2026-04-02T14:00:00Z\",\n    \"refresh\": {\n      \"refresh_token\": \"<refresh token>\",\n      \"client_id\": \"<your OAuth client_id>\",\n      \"token_endpoint\": \"https://api.notion.com/v1/oauth/token\",\n      \"token_endpoint_auth\": { \"type\": \"none\" }\n    }\n  }\n}\n```\n\nThe `refresh` block is what enables auto-refresh — `token_endpoint` is where Anthropic posts the `refresh_token` grant. `token_endpoint_auth` is a discriminated union:\n\n| `type` | Shape | Use when |\n|---|---|---|\n| `\"none\"` | `{type: \"none\"}` | Public OAuth client (no secret) |\n| `\"client_secret_basic\"` | `{type: \"client_secret_basic\", client_secret: \"...\"}` | Confidential client, secret via HTTP Basic auth |\n| `\"client_secret_post\"` | `{type: \"client_secret_post\", client_secret: \"...\"}` | Confidential client, secret in request body |\n\nOmit `refresh` entirely if you only have an access token with no refresh capability — it'll work until it expires, then the agent loses access.\n\n> 💡 **Getting an OAuth token.** How you obtain the initial access and refresh tokens depends on the MCP server — consult its documentation. Once you have them, store them in a vault credential using the shape above; Anthropic auto-refreshes via the `refresh.token_endpoint` from there.\n\n**Scoping:** Vaults are workspace-scoped. Anyone with developer+ role in the API workspace can create, read (metadata only — secrets are write-only), and attach vaults. `vault_ids` can be set at session **create** time but not via session update (the SDK docstring says \"Not yet supported; requests setting this field are rejected\").\n\n---\n\n## Skills\n\nSkills are reusable, filesystem-based resources that provide your agent with domain-specific expertise: workflows, context, and best practices that transform general-purpose agents into specialists. Unlike prompts (conversation-level instructions for one-off tasks), skills load on-demand and eliminate the need to repeatedly provide the same guidance across multiple conversations.\n\nTwo types — both work the same way; the agent automatically uses them when relevant to the task at hand:\n\n| Type | What it is |\n|---|---|\n| **Pre-built Anthropic skills** | Common document tasks (PowerPoint, Excel, Word, PDF). Reference by name (e.g. `xlsx`). |\n| **Custom skills** | Skills you've created in your organization via the Skills API. Reference by `skill_id` + optional `version`. |\n\n**Max 20 skills per agent.** Agent creation uses `managed-agents-2026-04-01`; the separate Skills API (for managing custom skill definitions) uses `skills-2025-10-02`.\n\n### Enabling skills on a session\n\nSkills are attached to the **agent** definition via `agents.create()`:\n\n```ts\nconst agent = await client.beta.agents.create(\n  {\n    name: \"Financial Agent\",\n    model: \"claude-opus-4-8\",\n    system: \"You are a financial analysis agent.\",\n    skills: [\n      { type: \"anthropic\", skill_id: \"xlsx\" },\n      { type: \"custom\", skill_id: \"skill_abc123\", version: \"latest\" },\n    ],\n  }\n);\n```\n\nPython:\n\n```python\nagent = client.beta.agents.create(\n    name=\"Financial Agent\",\n    model=\"claude-opus-4-8\",\n    system=\"You are a financial analysis agent.\",\n    skills=[\n        {\"type\": \"anthropic\", \"skill_id\": \"xlsx\"},\n        {\"type\": \"custom\", \"skill_id\": \"skill_abc123\", \"version\": \"latest\"},\n    ]\n)\n```\n\n**Skill reference fields:**\n\n| Field | Anthropic skill | Custom skill |\n|---|---|---|\n| `type` | `\"anthropic\"` | `\"custom\"` |\n| `skill_id` | Skill name (e.g. `\"xlsx\"`, `\"docx\"`, `\"pptx\"`, `\"pdf\"`) | Skill ID from Skills API (e.g. `\"skill_abc123\"`) |\n| `version` | — | `\"latest\"` or a specific version number |\n\n### Skills API\n\n| Operation             | Method   | Path                                            |\n| --------------------- | -------- | ----------------------------------------------- |\n| Create Skill          | `POST`   | `/v1/skills`                                    |\n| List Skills           | `GET`    | `/v1/skills`                                    |\n| Get Skill             | `GET`    | `/v1/skills/{id}`                               |\n| Delete Skill          | `DELETE` | `/v1/skills/{id}`                               |\n| Create Version        | `POST`   | `/v1/skills/{id}/versions`                      |\n| List Versions         | `GET`    | `/v1/skills/{id}/versions`                      |\n| Get Version           | `GET`    | `/v1/skills/{id}/versions/{version}`            |\n| Delete Version        | `DELETE` | `/v1/skills/{id}/versions/{version}`            |\n</doc>\n\n<doc path=\"shared/managed-agents-webhooks.md\">\n# Managed Agents — Webhooks\n\nAnthropic can POST to your HTTPS endpoint when a Managed Agents resource changes state — an alternative to holding an SSE stream or polling. Payloads are **thin** (event type + resource IDs only); on receipt, fetch the resource for current state. Every delivery is HMAC-signed.\n\n> **Direction matters.** This page covers *Anthropic → you* notifications about session/vault state. It does **not** cover *third-party → you* webhooks that *trigger* a session (e.g. a GitHub push handler that calls `sessions.create()`) — that's ordinary application code on your side with no Anthropic-specific wire format.\n\n---\n\n## Register an endpoint (Console only)\n\nConsole → **Manage → Webhooks**. There is no programmatic endpoint-management API yet. Secret rotation is supported from the same page.\n\n| Field | Constraint |\n|---|---|\n| URL | HTTPS on port 443, publicly resolvable hostname |\n| Event types | Subscribe per `data.type` — you only receive subscribed types (plus test events) |\n| Signing secret | `whsec_`-prefixed, 32 bytes, **shown once at creation** — store it |\n\n---\n\n## Verify the signature\n\nEvery delivery is HMAC-signed. **Use the SDK's `client.beta.webhooks.unwrap()`** — it verifies the signature, rejects payloads more than ~5 minutes old, and returns the parsed event. It reads the `whsec_` secret from `ANTHROPIC_WEBHOOK_SIGNING_KEY`.\n\n```python\nimport anthropic\nfrom flask import Flask, request\n\nclient = anthropic.Anthropic()  # reads ANTHROPIC_WEBHOOK_SIGNING_KEY from env\napp = Flask(__name__)\n\n\n@app.route(\"/webhook\", methods=[\"POST\"])\ndef webhook():\n    try:\n        event = client.beta.webhooks.unwrap(\n            request.get_data(as_text=True),\n            headers=dict(request.headers),\n        )\n    except Exception:\n        return \"invalid signature\", 400\n\n    if event.id in seen_event_ids:  # dedupe retries — id is per-event, not per-delivery\n        return \"\", 204\n    seen_event_ids.add(event.id)\n\n    match event.data.type:\n        case \"session.status_idled\":\n            session = client.beta.sessions.retrieve(event.data.id)\n            notify_user(session)\n        case \"vault_credential.refresh_failed\":\n            alert_oncall(event.data.id)\n\n    return \"\", 204\n```\n\nPass the **raw request body** to `unwrap()` — frameworks that re-serialize JSON (Express `.json()`, Flask `.get_json()`) change the bytes and break the MAC. For other languages, look up the `beta.webhooks.unwrap` binding in the SDK repo (`shared/live-sources.md`); don't hand-roll verification.\n\n---\n\n## Payload envelope\n\n```json\n{\n  \"type\": \"event\",\n  \"id\": \"event_01ABC...\",\n  \"created_at\": \"2026-03-18T14:05:22Z\",\n  \"data\": {\n    \"type\": \"session.status_idled\",\n    \"id\": \"session_01XYZ...\",\n    \"organization_id\": \"8a3d2f1e-...\",\n    \"workspace_id\": \"c7b0e4d9-...\"\n  }\n}\n```\n\nSwitch on `data.type`, fetch the resource by `data.id`, return any **2xx** to acknowledge. `created_at` is when the *state transition* happened, not when the webhook fired.\n\n---\n\n## Supported `data.type` values\n\n| `data.type` | Fires when |\n|---|---|\n| `session.status_scheduled` | Session created and ready to accept events |\n| `session.status_run_started` | Agent execution kicked off (every transition to `running`) |\n| `session.status_idled` | Agent awaiting input (tool approval, custom tool result, or next message) |\n| `session.status_terminated` | Session hit a terminal error |\n| `session.thread_created` | Multiagent: coordinator opened a new subagent thread |\n| `session.thread_idled` | Multiagent: a subagent thread is waiting for input |\n| `session.outcome_evaluation_ended` | Outcome grader finished one iteration |\n| `vault.archived` | Vault was archived |\n| `vault.created` | Vault was created |\n| `vault.deleted` | Vault was deleted |\n| `vault_credential.archived` | Vault credential was archived |\n| `vault_credential.created` | Vault credential was created |\n| `vault_credential.deleted` | Vault credential was deleted |\n| `vault_credential.refresh_failed` | MCP OAuth vault credential failed to refresh |\n\n> These are **webhook** `data.type` values — a separate namespace from SSE event types (`session.status_idle`, `span.outcome_evaluation_end`, etc. in `shared/managed-agents-events.md`). Don't reuse SSE constants in webhook handlers.\n\n---\n\n## Delivery behavior & pitfalls\n\n- **No ordering guarantee.** `session.status_idled` may arrive before `session.outcome_evaluation_ended` even if the evaluation finished first. Sort by envelope `created_at` if order matters.\n- **Retries carry the same `event.id`.** At least one retry on non-2xx. Dedupe on `event.id`.\n- **3xx is failure.** Redirects are not followed — update the URL in Console if your endpoint moves.\n- **Auto-disable** after ~20 consecutive failed deliveries, or immediately if the hostname resolves to a private IP or returns a redirect. Re-enable manually in Console.\n- **Thin payload is intentional.** Don't expect `stop_reason`, `outcome_evaluations`, credential secrets, etc. on the webhook body — fetch the resource.\n</doc>\n\n<doc path=\"shared/model-migration.md\">\n# Model Migration Guide\n\n> **If you arrived via `/claude-api migrate`:** this is the right file. Execute the steps below in order — do not summarize them back to the user. Start with Step 0 (confirm scope) before touching any file.\n\nHow to move existing code to newer Claude models. Covers breaking changes, deprecated parameters, and drop-in replacements for retired models.\n\nFor the latest, authoritative version (with code samples in every supported language), WebFetch the **Migration Guide** URL from `shared/live-sources.md`. Use this file for the consolidated, skill-resident reference; fall back to the live docs whenever a model launch or breaking change may have shifted the picture.\n\n**This file is large.** Use the section names below to jump (or `Grep` this file for the heading text). Read Step 0 and Step 1 first — they apply to every migration. Then read only the per-target section for the model you are migrating to.\n\n| Section | When you need it |\n|---|---|\n| Step 0: Confirm the migration scope | Always — before any edits |\n| Step 1: Classify each file | Always — decides whether to swap, add-alongside, or skip |\n| Per-SDK Syntax Reference | Translate the Python examples in this guide to TypeScript / Go / Ruby / Java / C# / PHP |\n| Destination Models / Retired Model Replacements | Picking a target model |\n| Breaking Changes by Source Model | Migrating to Opus 4.6 / Sonnet 4.6 |\n| Migrating to Opus 4.7 | Migrating to Opus 4.7 (breaking changes, silent defaults, behavioral shifts) |\n| Opus 4.7 Migration Checklist | The required vs optional items for 4.7, tagged `[BLOCKS]` / `[TUNE]` |\n| Migrating to Opus 4.8 | Migrating to Opus 4.8 (no new breaking changes; mid-session system prompts; behavioral re-tuning) |\n| Opus 4.8 Migration Checklist | The required vs optional items for 4.8, tagged `[BLOCKS]` / `[TUNE]` |\n| Verify the Migration | After edits — runtime spot-check |\n\n**TL;DR:** Change the model ID string. If you were using `budget_tokens`, switch to `thinking: {type: \"adaptive\"}`. If you were using assistant prefills, they 400 on both Opus 4.6 and Sonnet 4.6 — switch to one of the prefill replacements (most often `output_config.format`; see the table in Breaking Changes by Source Model). If you're moving from Sonnet 4.5 to Sonnet 4.6, set `effort` explicitly — 4.6 defaults to `high`. Remove the `effort-2025-11-24` and `fine-grained-tool-streaming-2025-05-14` beta headers (GA on 4.6); remove `interleaved-thinking-2025-05-14` once you're on adaptive thinking (keep it only while using the transitional `budget_tokens` escape hatch). Then drop back from `client.beta.messages.create` to `client.messages.create`. Dial back any aggressive \"CRITICAL: YOU MUST\" tool instructions; 4.6 follows the system prompt much more closely.\n\n---\n\n## Step 0: Confirm the migration scope\n\n**Before any Write, Edit, or MultiEdit call, confirm the scope.** If the user's request does not explicitly name a single file, a specific directory, or an explicit file list, **ask first — do not start editing**. This is non-negotiable: even imperative-sounding requests like \"migrate my codebase\", \"move my project to X\", \"upgrade to Sonnet 4.6\", or bare \"migrate to Opus 4.7\" leave the scope ambiguous and require a clarifying question. Phrases like \"my project\", \"my code\", \"my codebase\", \"the whole thing\", \"everywhere\", or \"across the repo\" are **ambiguous, not directive** — they tell you *what* to do but not *where*. Ask before doing.\n\nOffer the common scopes explicitly and wait for the answer before touching any file:\n\n1. The entire working directory\n2. A specific subdirectory (e.g. `src/`, `app/`, `services/billing/`)\n3. A specific file or a list of files\n\nSurface this as a single clarifying question so the user can answer in one turn. **Proceed without asking only when the scope is already unambiguous** — the user named an exact file (\"migrate `extract.py` to Sonnet 4.6\"), pointed at a specific directory (\"migrate everything under `services/billing/` to Opus 4.6\"), listed specific files (\"update `a.py` and `b.py`\"), or already answered the scope question in an earlier turn. If you can answer the question \"which files is this change going to touch?\" with a precise list from the prompt alone, proceed. If not, ask.\n\n**Worked example.** If the user says *\"Move my project to Opus 4.6. I want adaptive thinking everywhere it makes sense.\"* you do not know whether \"my project\" means the whole working directory, just `src/`, just the production code, or something else — the `everywhere` makes the intent clear (update every call site *within scope*) but the scope itself is still not defined. Do not start editing. Respond with:\n\n> Before I start editing, can you confirm the scope? I can migrate:\n> 1. Every `.py` file in the working directory\n> 2. Just the files under `src/` (production code)\n> 3. A specific subdirectory or list of files you name\n>\n> Which one?\n\nThen wait for the answer. The same applies to *\"Migrate to Opus 4.7\"* and bare *\"Help me upgrade to Sonnet 4.6\"* — ask before editing.\n\n**Sizing the scope question (large repos).** Before asking, get a per-directory count so the user can pick concretely:\n\n```sh\nrg -l \"<old-model-id>\" --type-not md | cut -d/ -f1 | sort | uniq -c | sort -rn\n```\n\nPresent the breakdown in your scope question (e.g. *\"Found 217 references across 3 directories: api/ (130), api-go/ (62), routing/ (25). Which to migrate?\"*). Also confirm `git status` is clean before surveying — unexpected modifications mean a concurrent process; stop and investigate before proceeding.\n\n---\n\n## Step 1: Classify each file\n\nNot every file that contains the old model ID is a **caller** of the API. Before editing, classify each file into one of these buckets — the right action differs:\n\n| # | Bucket | What it looks like | Action |\n|---|---|---|---|\n| 1 | **Calls the API/SDK** | `client.messages.create(model=…)`, `anthropic.Anthropic()`, request payloads | Swap the model ID **and** apply the breaking-change checklist for the target version (below). |\n| 2 | **Defines or serves the model** | Model registries, OpenAPI specs, routing/queue configs, model-policy enums, generated catalogs | The old entry **stays** (the model is still served). Ask whether to (a) add the new model alongside, (b) leave alone, or (c) retire the old model — never blind-replace. **If you can't ask, default to (a): add the new model alongside and flag it** — replacing would de-register a model that's still in production. |\n| 3 | **References the ID as an opaque string** | UI fallback constants, capability-gate substring checks, generic test fixtures, label parsers, env defaults | Usually swap the string and verify any parser/regex/substring match handles the new ID — but check the sub-cases below first. |\n| 4 | **Suffixed variant ID** | `claude-<model>-<suffix>` like `-fast`, `-1024k`, `-200k`, `[1m]`, dated snapshots | These are deployment/routing identifiers, not the public model ID. **Do not assume a new-model equivalent exists.** Verify in the registry first; if absent, leave the string alone and flag it. |\n\n**Bucket 3 sub-cases — before swapping a string reference, check:**\n\n- **Capability gate** (e.g. `if 'opus-4-6' in model_id:` enables a feature) → **add the new ID alongside**, don't replace. The old model is still served and still has the capability, so replacing would silently disable the feature for any old-model traffic that still flows through. If you know no old-model traffic will hit this gate (single-caller codebase fully migrating), replacing is fine; if unsure, add alongside.\n- **Registry-assert test** (e.g. `assert \"claude-X\" in supported_models`, `test_X_has_N_clusters`) → **add an assertion for the new model alongside; keep the old one.** The old model is still served, so its assertion stays valid — but the registry should also include the new model, so assert that too. Heuristic: if the test references multiple model versions in a list, it's a registry test; if one model in a struct compared only to itself, it's a generic fixture.\n- **Frozen / generated snapshot** → **regenerate**, don't hand-edit.\n- **Coupled to a definer** (e.g. an integration test that passes model authorization via a shared `conftest` seed list, or asserts on a billing-tier / rate-limit-group enum or a generated SKU/pricing catalog) → **verify the definer has a new-model entry first.** If not, add a seed entry (reusing the nearest existing tier as a placeholder); if you can't confidently do that, ask the user how to populate the definer. **Do not skip the test.** Swapping without populating the definer will make the test fail at runtime.\n\nWhen migrating tests specifically: breaking parameters (`temperature`, `top_p`, `budget_tokens`) are usually absent — test fixtures rarely set sampling params on placeholder models. The breaking-change scan is still required, but expect mostly clean results.\n\n**Find intentionally-flagged sync points first.** Many codebases tag spots that must change at every model launch with comment markers like `MODEL LAUNCH`, `KEEP IN SYNC`, `@model-update`, or similar. Grep for whatever convention the repo uses *before* the broad model-ID grep — those markers point at the load-bearing changes.\n\n---\n\n## Per-SDK Syntax Reference\n\nCode examples in this guide are Python. **The same fields exist in every official Anthropic SDK** — Stainless generates all 7 from the same OpenAPI spec, so JSON field names map 1:1 with only case-convention differences. Use the rows below to translate the Python examples to the SDK you are migrating.\n\n> **Verify type and method names against the SDK source before writing them into customer code.** WebFetch the relevant repository from the SDK source-code table in `shared/live-sources.md` (one row per SDK) and confirm the exact symbol — particularly for typed SDKs (Go, Java, C#) where union/builder names can differ from the JSON shape. Do not guess type names that aren't in the table below or in `<lang>/claude-api/README.md`.\n\n\n### `thinking` — `budget_tokens` → adaptive\n\n| SDK | Before | After |\n|---|---|---|\n| Python | `thinking={\"type\": \"enabled\", \"budget_tokens\": N}` | `thinking={\"type\": \"adaptive\"}` |\n| TypeScript | `thinking: { type: 'enabled', budget_tokens: N }` | `thinking: { type: 'adaptive' }` |\n| Go | `Thinking: anthropic.ThinkingConfigParamOfEnabled(N)` | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{}}` |\n| Ruby | `thinking: { type: \"enabled\", budget_tokens: N }` | `thinking: { type: \"adaptive\" }` |\n| Java | `.thinking(ThinkingConfigEnabled.builder().budgetTokens(N).build())` | `.thinking(ThinkingConfigAdaptive.builder().build())` |\n| C# | `Thinking = new ThinkingConfigEnabled { BudgetTokens = N }` | `Thinking = new ThinkingConfigAdaptive()` |\n| PHP | `thinking: ['type' => 'enabled', 'budget_tokens' => N]` | `thinking: ['type' => 'adaptive']` |\n\n### Sampling parameters — `temperature` / `top_p` / `top_k`\n\n(Remove the field entirely on Opus 4.7; on Claude 4.x keep at most one of `temperature` or `top_p`.)\n\n| SDK | Field(s) to remove |\n|---|---|\n| Python | `temperature=…`, `top_p=…`, `top_k=…` |\n| TypeScript | `temperature: …`, `top_p: …`, `top_k: …` |\n| Go | `Temperature: anthropic.Float(…)`, `TopP: anthropic.Float(…)`, `TopK: anthropic.Int(…)` |\n| Ruby | `temperature: …`, `top_p: …`, `top_k: …` |\n| Java | `.temperature(…)`, `.topP(…)`, `.topK(…)` |\n| C# | `Temperature = …`, `TopP = …`, `TopK = …` |\n| PHP | `temperature: …`, `topP: …`, `topK: …` |\n\n### Prefill replacement — structured outputs via `output_config.format`\n\n| SDK | Remove (last assistant turn) | Add |\n|---|---|---|\n| Python | `{\"role\": \"assistant\", \"content\": \"…\"}` | `output_config={\"format\": {\"type\": \"json_schema\", \"schema\": SCHEMA}}` |\n| TypeScript | `{ role: 'assistant', content: '…' }` | `output_config: { format: { type: 'json_schema', schema: SCHEMA } }` |\n| Go | trailing `anthropic.MessageParam{Role: \"assistant\", …}` | `OutputConfig: anthropic.OutputConfigParam{Format: anthropic.JSONOutputFormatParam{…}}` |\n| Ruby | `{ role: \"assistant\", content: \"…\" }` | `output_config: { format: { type: \"json_schema\", schema: SCHEMA } }` |\n| Java | trailing `Message.builder().role(ASSISTANT)…` | `.outputConfig(OutputConfig.builder().format(JsonOutputFormat.builder()…build()).build())` |\n| C# | trailing `new Message { Role = \"assistant\", … }` | `OutputConfig = new OutputConfig { Format = new JsonOutputFormat { … } }` |\n| PHP | trailing `['role' => 'assistant', 'content' => '…']` | `outputConfig: ['format' => ['type' => 'json_schema', 'schema' => $SCHEMA]]` |\n\n### `thinking.display` — opt back into summarized reasoning (Opus 4.7)\n\n| SDK | Add |\n|---|---|\n| Python | `thinking={\"type\": \"adaptive\", \"display\": \"summarized\"}` |\n| TypeScript | `thinking: { type: 'adaptive', display: 'summarized' }` |\n| Go | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{Display: anthropic.ThinkingConfigAdaptiveDisplaySummarized}}` |\n| Ruby | `thinking: { type: \"adaptive\", display: \"summarized\" }` (or `display_:` when constructing the model class directly) |\n| Java | `.thinking(ThinkingConfigAdaptive.builder().display(ThinkingConfigAdaptive.Display.SUMMARIZED).build())` |\n| C# | `Thinking = new ThinkingConfigAdaptive { Display = Display.Summarized }` |\n| PHP | `thinking: ['type' => 'adaptive', 'display' => 'summarized']` |\n\nFor any field not in these tables, the JSON key in the Python example translates directly: `snake_case` for Python/TypeScript/Ruby, `camelCase` named args for PHP, `PascalCase` struct fields for Go/C#, `camelCase` builder methods for Java.\n\n---\n\n## Explain every change you make\n\nMigration edits often look arbitrary to a user who hasn't read the release notes — a removed `temperature`, a deleted prefill, a rewritten system-prompt sentence. **For each edit, tell the user what you changed and why**, tied to the specific API or behavioral change that motivates it. Do this in your summary as you work, not just at the end.\n\nBe especially explicit about **system-prompt edits**. Users are rightly protective of their prompts, and prompt-tuning changes are judgment calls (not hard API requirements). For any prompt edit:\n\n- Quote the before and after text.\n- State the behavioral shift that motivates it (e.g. *\"Opus 4.7 calibrates response length to task complexity, so I added an explicit length instruction\"*, or *\"4.6 follows instructions more literally, so 'CRITICAL: YOU MUST use the search tool' will now overtrigger — softened to 'Use the search tool when…'\"*).\n- Make clear which prompt edits are **optional tuning** (tone, length, subagent guidance) versus which code edits are **required to avoid a 400** (sampling params, `budget_tokens`, prefills). Never present an optional prompt change as mandatory.\n\nIf you're applying several prompt-tuning edits at once, offer them as a short list the user can accept or decline item-by-item rather than silently rewriting their system prompt.\n\n---\n\n## Before You Migrate\n\n1. **Confirm the target model ID.** Use only the exact strings from `shared/models.md` — do not append date suffixes to aliases (`claude-opus-4-6`, not `claude-opus-4-6-20251101`). Guessing an ID will 404.\n2. **Check which features your code uses** with this checklist:\n   - `thinking: {type: \"enabled\", budget_tokens: N}` → migrate to adaptive thinking on Opus 4.6 / Sonnet 4.6 (still functional but deprecated)\n   - Assistant-turn prefills (`messages` ending with `role: \"assistant\"`) → must change on Opus 4.6 / Sonnet 4.6 (returns 400)\n   - `output_format` parameter on `messages.create()` → must change on all models (deprecated API-wide)\n   - `max_tokens > ~16000` → must stream on any model (above ~16K risks SDK HTTP timeouts). When streaming, Sonnet 4.6 / Haiku 4.5 cap at 64K and Opus 4.6 caps at 128K\n   - Beta headers `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `interleaved-thinking-2025-05-14` → GA on 4.6, remove them and switch from `client.beta.messages.create` to `client.messages.create`\n   - Moving Sonnet 4.5 → Sonnet 4.6 with no `effort` set → 4.6 defaults to `high`, which may change your latency/cost profile\n   - System prompts with `CRITICAL`, `MUST`, `If in doubt, use X` language → likely to overtrigger on 4.6 (see Prompt-Behavior Changes)\n   - Coming from 3.x / 4.0 / 4.1: also check sampling params (`temperature` + `top_p`), tool versions (`text_editor_20250728`), `refusal` + `model_context_window_exceeded` stop reasons, trailing-newline tool-param handling\n3. **Test on a single request first.** Run one call against the new model, inspect the response, then roll out.\n\n---\n\n## Destination Models (recommended targets)\n\n| If you're on…                         | Migrate to         | Why                                               |\n| ------------------------------------- | ------------------ | ------------------------------------------------- |\n| Opus 4.7                              | `claude-opus-4-8`  | Most capable model; same API surface as 4.7 (no new breaking changes) — mostly prompt re-tuning; see Migrating to Opus 4.8 |\n| Opus 4.6                              | `claude-opus-4-8`  | Apply the Opus 4.7 breaking changes, then the 4.8 re-tuning |\n| Opus 4.0 / 4.1 / 4.5 / Opus 3         | `claude-opus-4-8`  | Apply 4.6 → 4.7 → 4.8 in order (adaptive thinking, drop sampling params, then re-tune) |\n| Sonnet 4.0 / 4.5 / 3.7 / 3.5          | `claude-sonnet-4-6`| Best speed / intelligence balance; adaptive thinking; 64K output |\n| Haiku 3 / 3.5                         | `claude-haiku-4-5` | Fastest and most cost-effective                   |\n\nDefault to the latest Opus for the caller's tier unless they explicitly chose otherwise. The Opus migrations layer: if you're on Opus 4.6 or older, apply each version's section in order up to your target (e.g. 4.5 → 4.8 means the 4.6, 4.7, and 4.8 sections in sequence). A 4.7 → 4.8 move has no new breaking changes — see Migrating to Opus 4.8 below.\n\n---\n\n## Retired Model Replacements\n\nThese models return 404 — update immediately:\n\n| Retired model                 | Retired       | Drop-in replacement  |\n| ----------------------------- | ------------- | -------------------- |\n| `claude-3-7-sonnet-20250219`  | Feb 19, 2026  | `claude-sonnet-4-6`  |\n| `claude-3-5-haiku-20241022`   | Feb 19, 2026  | `claude-haiku-4-5`   |\n| `claude-3-opus-20240229`      | Jan 5, 2026   | `claude-opus-4-8`    |\n| `claude-3-5-sonnet-20241022`  | Oct 28, 2025  | `claude-sonnet-4-6`  |\n| `claude-3-5-sonnet-20240620`  | Oct 28, 2025  | `claude-sonnet-4-6`  |\n| `claude-3-sonnet-20240229`    | Jul 21, 2025  | `claude-sonnet-4-6`  |\n| `claude-2.1`, `claude-2.0`    | Jul 21, 2025  | `claude-sonnet-4-6`  |\n\n## Deprecated Models (retiring soon)\n\n| Model                         | Retires       | Replacement          |\n| ----------------------------- | ------------- | -------------------- |\n| `claude-3-haiku-20240307`     | Apr 19, 2026  | `claude-haiku-4-5`   |\n| `claude-opus-4-20250514`      | June 15, 2026 | `claude-opus-4-8`    |\n| `claude-sonnet-4-20250514`    | June 15, 2026 | `claude-sonnet-4-6`  |\n\n---\n\n## Breaking Changes by Source Model\n\n### Migrating from Sonnet 4.5 to Sonnet 4.6 (effort default change)\n\nSonnet 4.5 had no `effort` parameter; Sonnet 4.6 defaults to `high`. If you just switch the model string and do nothing else, you may see noticeably higher latency and token usage. Set `effort` explicitly.\n\n**Recommended starting points:**\n\n| Workload                                          | Start at       | Notes                                                                                                    |\n| ------------------------------------------------- | -------------- | -------------------------------------------------------------------------------------------------------- |\n| Chat, classification, content generation          | `low`          | With `thinking: {\"type\": \"disabled\"}` you'll see similar or better performance vs. Sonnet 4.5 no-thinking |\n| Most applications (balanced)                      | `medium`       | The default sweet spot for quality vs. cost                                                              |\n| Agentic coding, tool-heavy workflows              | `medium`       | Pair with adaptive thinking and a generous `max_tokens` (up to 64K with streaming — Sonnet 4.6's ceiling) |\n| Autonomous multi-step agents, long-horizon loops  | `high`         | Scale down to `medium` if latency/tokens become a concern                                                 |\n| Computer-use agents                               | `high` + adaptive | Sonnet 4.6's best computer-use accuracy is on adaptive + high                                          |\n\nFor non-thinking chat workloads specifically:\n\n```python\nclient.messages.create(\n    model=\"claude-sonnet-4-6\",\n    max_tokens=8192,\n    thinking={\"type\": \"disabled\"},\n    output_config={\"effort\": \"low\"},\n    messages=[{\"role\": \"user\", \"content\": \"...\"}],\n)\n```\n\n**When to use Opus 4.6 instead:** hardest and longest-horizon problems — large code migrations, deep research, extended autonomous work. Sonnet 4.6 wins on fast turnaround and cost efficiency.\n\n### Migrating to Opus 4.6 / Sonnet 4.6 (from any older model)\n\n**1. Manual extended thinking is deprecated — use adaptive thinking.**\n\n`thinking: {type: \"enabled\", budget_tokens: N}` (manual extended thinking with a fixed token budget) is deprecated on Opus 4.6 and Sonnet 4.6. Replace it with `thinking: {type: \"adaptive\"}`, which lets Claude decide when and how much to think. Adaptive thinking also enables interleaved thinking automatically (no beta header needed).\n\n```python\n# Old (still works on older models, deprecated on 4.6)\nresponse = client.messages.create(\n    model=\"claude-sonnet-4-5\",\n    max_tokens=16000,\n    thinking={\"type\": \"enabled\", \"budget_tokens\": 8000},\n    messages=[...]\n)\n\n# New (Opus 4.6 / Sonnet 4.6)\nresponse = client.messages.create(\n    model=\"claude-opus-4-6\",  # or \"claude-sonnet-4-6\"\n    max_tokens=16000,\n    thinking={\"type\": \"adaptive\"},\n    output_config={\"effort\": \"high\"},  # optional: low | medium | high | max\n    messages=[...]\n)\n```\n\nAdaptive thinking is the long-term target, and on internal evaluations it outperforms manual extended thinking. Move when you can.\n\n**Transitional escape hatch:** manual extended thinking is still *functional* on Opus 4.6 and Sonnet 4.6 (deprecated, will be removed in a future release). If you need a hard ceiling while migrating — for example, to bound token spend on a runaway workload before you've tuned `effort` — you can keep `budget_tokens` around alongside an explicit `effort` value, then remove it in a follow-up. `budget_tokens` must be strictly less than `max_tokens`:\n\n```python\n# Transitional only — deprecated, plan to remove\nclient.messages.create(\n    model=\"claude-sonnet-4-6\",\n    max_tokens=16384,\n    thinking={\"type\": \"enabled\", \"budget_tokens\": 8192},  # must be < max_tokens\n    output_config={\"effort\": \"medium\"},\n    messages=[...],\n)\n```\n\nIf the user asks for a \"thinking budget\" on 4.6, the preferred answer is `effort` — use `low`, `medium`, `high`, or `max` (Opus-tier only — not Sonnet or Haiku) rather than a token count.\n\n**2. Effort parameter (Opus 4.5, Opus 4.6, Sonnet 4.6 only).**\n\nControls thinking depth and overall token spend. Goes inside `output_config`, not top-level. Default is `high`. `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Errors on Sonnet 4.5 and Haiku 4.5.\n\n```python\noutput_config={\"effort\": \"medium\"}  # often the best cost / quality balance\n```\n\n### Migrating to the 4.6 family (Opus 4.6 and Sonnet 4.6)\n\n**3. Assistant-turn prefills return 400 (Opus 4.6 and Sonnet 4.6).**\n\nPrefilled responses on the final assistant turn are no longer supported on either Opus 4.6 or Sonnet 4.6 — both return a 400. Adding assistant messages *elsewhere* in the conversation (e.g., for few-shot examples) still works. Pick the replacement that matches what the prefill was doing:\n\n| Prefill was used for                               | Replacement                                                                                                                               |\n| -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |\n| Forcing JSON / YAML / schema output                | `output_config.format` with a `json_schema` — see example below                                                                           |\n| Forcing a classification label                     | Tool with an enum field containing valid labels, or structured outputs                                                                    |\n| Skipping preambles (`Here is the summary:\\n`)      | System prompt instruction: *\"Respond directly without preamble. Do not start with phrases like 'Here is...' or 'Based on...'.\"*           |\n| Steering around bad refusals                       | Usually no longer needed — 4.6 refuses far more appropriately. Plain user-turn prompting is sufficient.                                   |\n| Continuing an interrupted response                 | Move continuation into the user turn: *\"Your previous response was interrupted and ended with `[last text]`. Continue from there.\"*     |\n| Injecting reminders / context hydration            | Inject into the user turn instead. For complex agent harnesses, expose context via a tool call or during compaction.                      |\n\n```python\n# Old (fails on Opus 4.6 / Sonnet 4.6) — prefill forcing JSON shape\nmessages=[\n    {\"role\": \"user\", \"content\": \"Extract the name.\"},\n    {\"role\": \"assistant\", \"content\": \"{\\\"name\\\": \\\"\"},\n]\n\n# New — structured outputs replace the prefill\nresponse = client.messages.create(\n    model=\"claude-opus-4-6\",\n    max_tokens=1024,\n    output_config={\"format\": {\"type\": \"json_schema\", \"schema\": {...}}},\n    messages=[{\"role\": \"user\", \"content\": \"Extract the name.\"}],\n)\n```\n\n**4. Stream for `max_tokens > ~16K` (all models); Opus 4.6 alone reaches 128K.**\n\nNon-streaming requests hit SDK HTTP timeouts at high `max_tokens`, regardless of model — stream for anything above ~16K output. The streamable ceiling differs by model: Sonnet 4.6 and Haiku 4.5 cap at 64K, and Opus 4.6 alone goes up to 128K.\n\n```python\nwith client.messages.stream(model=\"claude-opus-4-6\", max_tokens=64000, ...) as stream:\n    message = stream.get_final_message()\n```\n\n**5. Tool-call JSON escaping may differ (Opus 4.6 and Sonnet 4.6).**\n\nBoth 4.6 models can produce tool call `input` fields with Unicode or forward-slash escaping. Always parse with `json.loads()` / `JSON.parse()` — never raw-string-match the serialized input.\n\n### All models\n\n**6. `output_format` → `output_config.format` (API-wide).**\n\nThe old top-level `output_format` parameter on `messages.create()` is deprecated. Use `output_config.format` instead. This is not 4.6-specific — applies to every model.\n\n---\n\n## Beta Headers to Remove on 4.6\n\nSeveral beta headers that were required on 4.5 are now GA on 4.6 and should be removed. Leaving them in is harmless but misleading; removing them also lets you move from `client.beta.messages.create(...)` back to `client.messages.create(...)`.\n\n| Header                                    | Status on 4.6                                              | Action                                                  |\n| ----------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------- |\n| `effort-2025-11-24`                       | Effort parameter is GA                                     | Remove                                                  |\n| `fine-grained-tool-streaming-2025-05-14`  | GA                                                         | Remove                                                  |\n| `interleaved-thinking-2025-05-14`         | Adaptive thinking enables interleaved thinking automatically | Remove when using adaptive thinking; still functional on Sonnet 4.6 *with* manual extended thinking, but that path is deprecated |\n| `token-efficient-tools-2025-02-19`        | Built in to all Claude 4+ models                           | Remove (no effect)                                      |\n| `output-128k-2025-02-19`                  | Built in to Claude 4+ models                               | Remove (no effect)                                      |\n\nOnce you remove all of these and finish moving to adaptive thinking, you can switch the SDK call site from the beta namespace back to the regular one:\n\n```python\n# Before\nresponse = client.beta.messages.create(\n    model=\"claude-opus-4-5\",\n    betas=[\"interleaved-thinking-2025-05-14\", \"effort-2025-11-24\"],\n    ...\n)\n\n# After\nresponse = client.messages.create(\n    model=\"claude-opus-4-6\",\n    thinking={\"type\": \"adaptive\"},\n    output_config={\"effort\": \"high\"},\n    ...\n)\n```\n\n---\n\n## Additional Changes When Coming from 3.x / 4.0 / 4.1 → 4.6\n\nIf you're jumping from Opus 4.1, Sonnet 4, Sonnet 3.7, or an older Claude 3.x model directly to 4.6, apply everything above *plus* the items in this section. Users already on Opus 4.5 / Sonnet 4.5 can skip this.\n\n**1. Sampling parameters: `temperature` OR `top_p`, not both.**\n\nPassing both will error on every Claude 4+ model:\n\n```python\n# Old (3.x only — errors on 4+)\nclient.messages.create(temperature=0.7, top_p=0.9, ...)\n\n# New\nclient.messages.create(temperature=0.7, ...)  # or top_p, not both\n```\n\n**2. Update tool versions.**\n\nLegacy tool versions are not supported on 4+. **Both the `type` and the `name` field change** — `text_editor_20250728` and `str_replace_based_edit_tool` are a pair; updating one without the other 400s. Also remove the `undo_edit` command from your text-editor integration:\n\n| Old                                               | New                                                     |\n| ------------------------------------------------- | ------------------------------------------------------- |\n| `text_editor_20250124` + `str_replace_editor`     | `text_editor_20250728` + `str_replace_based_edit_tool`  |\n| `code_execution_*` (earlier versions)             | `code_execution_20250825`                               |\n| `undo_edit` command                               | *(no longer supported — delete call sites)*             |\n\n```python\n# Before\ntools = [{\"type\": \"text_editor_20250124\", \"name\": \"str_replace_editor\"}]\n\n# After — BOTH fields change\ntools = [{\"type\": \"text_editor_20250728\", \"name\": \"str_replace_based_edit_tool\"}]\n```\n\n**3. Handle the `refusal` stop reason.**\n\nClaude 4+ can return `stop_reason: \"refusal\"` on the response. If your code only handles `end_turn` / `tool_use` / `max_tokens`, add a branch:\n\n```python\nif response.stop_reason == \"refusal\":\n    # Surface the refusal to the user; do not retry with the same prompt\n    ...\n```\n\n**4. Handle the `model_context_window_exceeded` stop reason (4.5+).**\n\nDistinct from `max_tokens`: it means the model hit the *context window* limit, not the requested output cap. Handle both:\n\n```python\nif response.stop_reason == \"model_context_window_exceeded\":\n    # Context window exhausted — compact or split the conversation\n    ...\nelif response.stop_reason == \"max_tokens\":\n    # Requested output cap hit — retry with higher max_tokens or stream\n    ...\n```\n\n**5. Trailing newlines preserved in tool call string parameters (4.5+).**\n\n4.5 and 4.6 preserve trailing newlines that older models stripped. If your tool implementations do exact string matching against tool-call `input` values (e.g., `if name == \"foo\"`), verify they still match when the model sends `\"foo\\n\"`. Normalizing with `.rstrip()` on the receiving side is usually the simplest fix.\n\n**6. Haiku: rate limits reset between generations.**\n\nHaiku 4.5 has its own rate-limit pool separate from Haiku 3 / 3.5. If you're ramping traffic as you migrate, check your tier's Haiku 4.5 limits at [API rate limits](https://platform.claude.com/docs/en/api/rate-limits) — a quota that comfortably served Haiku 3.5 traffic may need a tier bump for the same volume on 4.5.\n\n---\n\n## Prompt-Behavior Changes (Opus 4.5 / 4.6, Sonnet 4.6)\n\nThese don't break your code, but prompts that worked on 4.5-and-earlier may over- or under-trigger on 4.6. Tune as needed.\n\n**1. Aggressive instructions cause overtriggering.** Opus 4.5 and 4.6 follow the system prompt much more closely than earlier models. Prompts written to *overcome* the old reluctance are now too aggressive:\n\n| Before (worked on 4.0 / 4.5)                | After (use on 4.6)                        |\n| ------------------------------------------- | ----------------------------------------- |\n| `CRITICAL: You MUST use this tool when...`  | `Use this tool when...`                   |\n| `Default to using [tool]`                   | `Use [tool] when it would improve X`      |\n| `If in doubt, use [tool]`                   | *(delete — no longer needed)*             |\n\nIf the model is now overtriggering a tool or skill, the fix is almost always to dial back the language, not to add more guardrails.\n\n**2. Overthinking and excessive exploration (Opus 4.6).** At higher `effort` settings, Opus 4.6 explores more before answering. If that burns too many thinking tokens, lower `effort` first (`medium` is often the sweet spot) before adding prose instructions to constrain reasoning.\n\n**3. Overeager subagent spawning (Opus 4.6).** Opus 4.6 has a strong preference for delegating to subagents. If you see it spawning a subagent for something a direct `grep` or `read` would solve, add guidance: *\"Use subagents only for parallel or independent workstreams. For single-file reads or sequential operations, work directly.\"*\n\n**4. Overengineering (Opus 4.5 / 4.6).** Both models may add extra files, abstractions, or defensive error handling beyond what was asked. If you want minimal changes, prompt for it explicitly: *\"Only make changes directly requested. Don't add helpers, abstractions, or error handling for scenarios that can't happen.\"*\n\n**5. LaTeX math output (Opus 4.6).** Opus 4.6 defaults to LaTeX (`\\frac{}{}`, `$...$`) for math and technical content. If you need plain text, instruct it explicitly: *\"Format all math as plain text — no LaTeX, no `$`, no `\\frac{}{}`. Use `/` for division and `^` for exponents.\"*\n\n**6. Skipped verbal summaries (4.6 family).** The 4.6 models are more concise and may skip the summary paragraph after a tool call, jumping straight to the next action. If you rely on those summaries for visibility, add: *\"After completing a task that involves tool use, provide a brief summary of what you did.\"*\n\n**7. \"Think\" as a trigger word (Opus 4.5 with thinking disabled).** When `thinking` is off, Opus 4.5 is particularly sensitive to the word *think* and may reason more than you want. Use `consider`, `evaluate`, or `reason through` instead.\n\n---\n\n## Model-ID Rename Quick Reference\n\n| Old string (migration source)  | New string         |\n| ------------------------------ | ------------------ |\n| `claude-opus-4-7`              | `claude-opus-4-8`  |\n| `claude-opus-4-6`              | `claude-opus-4-8`  |\n| `claude-opus-4-5`              | `claude-opus-4-8`  |\n| `claude-opus-4-1`              | `claude-opus-4-8`  |\n| `claude-opus-4-0`              | `claude-opus-4-8`  |\n| `claude-sonnet-4-5`            | `claude-sonnet-4-6`|\n| `claude-sonnet-4-0`            | `claude-sonnet-4-6`|\n\nOlder aliases (`claude-opus-4-7`, `claude-opus-4-6`, `claude-opus-4-5`, `claude-sonnet-4-5`, etc.) are still active and can be pinned if you need time before upgrading — see `shared/models.md` for the full legacy list.\n\n### Amazon Bedrock model IDs\n\nIf the code uses the `AnthropicBedrockMantle` client (Python `anthropic[bedrock]`, TypeScript `@anthropic-ai/bedrock-sdk`, Java `BedrockMantleBackend`, Go `bedrock.NewMantleClient`, etc.) or targets `https://bedrock-mantle.{region}.api.aws/anthropic`, it is running on **Claude in Amazon Bedrock**. All breaking changes in this guide apply unchanged there — it serves the same Messages API shape — but model IDs carry an `anthropic.` provider prefix:\n\n| First-party ID | Bedrock ID |\n|---|---|\n| `claude-opus-4-8` | `anthropic.claude-opus-4-8` |\n| `claude-opus-4-7` | `anthropic.claude-opus-4-7` |\n| `claude-haiku-4-5` | `anthropic.claude-haiku-4-5` |\n\nWhen migrating a Bedrock file, apply the same rename-table row as first-party, then keep/add the `anthropic.` prefix. Do **not** generate a first-party `claude-*` ID for a Bedrock client — it will 400.\n\n**Skip for Bedrock:** the `code_execution_*` tool-version checklist item and the **Task Budgets** section — both are first-party-only features (Bedrock does not support server-side Anthropic tools or the `task-budgets-2026-03-13` beta). Everything else in this guide — `effort`, adaptive/extended thinking, `output_config.format`, `thinking.display`, fine-grained tool streaming, token counting — is available on Bedrock.\n\n> **Out of scope:** the legacy Amazon Bedrock integration (`InvokeModel` / `Converse` APIs with ARN-versioned IDs like `anthropic.claude-3-5-sonnet-20241022-v2:0`) uses a different request shape and model-ID format. This guide does not cover it; WebFetch the Bedrock page in `shared/live-sources.md` if the user is migrating between the two Bedrock integrations.\n\n### Claude Platform on AWS\n\nIf the code uses `AnthropicAWS` / `AnthropicAws` / `anthropicaws.NewClient` / `AnthropicAwsClient` (or targets `https://aws-external-anthropic.{region}.api.aws`), it is running on **Claude Platform on AWS** — Anthropic-operated, same-day API parity. Model IDs are **bare first-party** strings; apply the rename table above **verbatim** and every breaking-change section in this guide unchanged. There is nothing to skip. Do **not** add an `anthropic.` prefix (that's Amazon Bedrock, a separate offering). See `shared/claude-platform-on-aws.md` for client/auth details.\n\n---\n\n## Migration Checklist\n\nEvery item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent timeout, or wrong tool selection if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments.\n\nFor each file that calls `messages.create()` / equivalent SDK method:\n\n- [ ] **[BLOCKS]** Update the `model=` string to the new alias\n- [ ] **[BLOCKS]** Replace `budget_tokens` with `thinking={\"type\": \"adaptive\"}` (deprecated on Opus 4.6 / Sonnet 4.6)\n- [ ] **[BLOCKS]** Move `format` from top-level `output_format` into `output_config.format`\n- [ ] **[BLOCKS]** Remove any assistant-turn prefills if targeting Opus 4.6 or Sonnet 4.6 (see the prefill replacement table)\n- [ ] **[BLOCKS]** Switch to streaming if `max_tokens > ~16000` (otherwise SDK HTTP timeout)\n- [ ] **[TUNE]** Verify tool-input handling parses JSON rather than raw-string-matching the serialized input (4.6 may escape Unicode / forward slashes differently; most SDKs already expose `block.input` as a parsed object)\n- [ ] **[TUNE]** Set `output_config={\"effort\": \"...\"}` explicitly — especially when moving Sonnet 4.5 → Sonnet 4.6 (4.6 defaults to `high`)\n- [ ] **[TUNE]** Remove GA beta headers: `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `token-efficient-tools-2025-02-19`, `output-128k-2025-02-19`; remove `interleaved-thinking-2025-05-14` once on adaptive thinking\n- [ ] **[TUNE]** Switch `client.beta.messages.create(...)` → `client.messages.create(...)` once all betas are removed\n- [ ] **[TUNE]** Review system prompt for aggressive tool language (`CRITICAL:`, `MUST`, `If in doubt`) and dial it back\n\n**Extra items when coming from 3.x / 4.0 / 4.1:**\n- [ ] **[BLOCKS]** Remove either `temperature` or `top_p` (passing both 400s on Claude 4+)\n- [ ] **[BLOCKS]** Update text-editor tool `type` to `text_editor_20250728`\n- [ ] **[BLOCKS]** Update text-editor tool `name` to `str_replace_based_edit_tool` — **changing only the `type` and keeping `name: \"str_replace_editor\"` returns a 400**\n- [ ] **[BLOCKS]** Update code-execution tool to `code_execution_20250825`\n- [ ] **[BLOCKS]** Delete any `undo_edit` command call sites\n- [ ] **[TUNE]** Add handling for `stop_reason == \"refusal\"`\n- [ ] **[TUNE]** Add handling for `stop_reason == \"model_context_window_exceeded\"` (4.5+)\n- [ ] **[TUNE]** Verify tool-param string matching tolerates trailing newlines (preserved on 4.5+)\n- [ ] **[TUNE]** If moving to Haiku 4.5: review rate-limit tier (separate pool from Haiku 3.x)\n\n**Verification:**\n- [ ] Run one test request and inspect `response.stop_reason`, `response.usage`, and whether tool-use / thinking behavior matches expectations\n\nFor cached prompts: the render order and hash inputs did not change, so existing `cache_control` breakpoints keep working. However, **changing the model string invalidates the existing cache** — the first request on the new model will write the cache fresh.\n\n---\n\n## Migrating to Opus 4.7\n\n> **Model ID `claude-opus-4-7` is authoritative as written here.** When the user asks to migrate to Opus 4.7, write `model=\"claude-opus-4-7\"` exactly. Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entry exists in `shared/models.md`.\n\nClaude Opus 4.7 was Anthropic's most capable model at its launch and is now the previous-generation Opus (Opus 4.8 is current — see Migrating to Opus 4.8 below). It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks. This section summarizes everything that was new at the 4.7 launch and remains the layered breaking-change path for callers coming from Opus 4.6 or older. It is layered on top of the 4.6 migration above — if the caller is jumping from Opus 4.5 or older, apply the 4.6 changes first, then this section, then the 4.8 section.\n\n**TL;DR for someone already on Opus 4.6:** update the model ID to `claude-opus-4-7`, strip any remaining `budget_tokens` and sampling parameters (both 400 on Opus 4.7), give `max_tokens` extra headroom and re-baseline with `count_tokens()` against the new model, opt back into `thinking.display: \"summarized\"` if reasoning is surfaced to users, and re-tune `effort` — it matters more on 4.7 than on any prior Opus.\n\n### Breaking changes (will 400 on Opus 4.7)\n\n**Extended thinking removed.**\n\n`thinking: {type: \"enabled\", budget_tokens: N}` is no longer supported on Claude Opus 4.7 or later models and returns a 400 error. Switch to adaptive thinking (`thinking: {type: \"adaptive\"}`) and use the effort parameter to control thinking depth. Adaptive thinking is **off by default** on Claude Opus 4.7: requests with no `thinking` field run without thinking, matching Opus 4.6 behavior. Set `thinking: {type: \"adaptive\"}` explicitly to enable it.\n\n```python\n# Before (Opus 4.6)\nclient.messages.create(\n    model=\"claude-opus-4-6\",\n    max_tokens=64000,\n    thinking={\"type\": \"enabled\", \"budget_tokens\": 32000},\n    messages=[{\"role\": \"user\", \"content\": \"...\"}],\n)\n\n# After (Opus 4.7)\nclient.messages.create(\n    model=\"claude-opus-4-7\",\n    max_tokens=64000,\n    thinking={\"type\": \"adaptive\"},\n    output_config={\"effort\": \"high\"},  # or \"max\", \"xhigh\", \"medium\", \"low\"\n    messages=[{\"role\": \"user\", \"content\": \"...\"}],\n)\n```\n\nIf the caller wasn't using extended thinking, no change is required — thinking is off by default, or can be set explicitly with `thinking={\"type\": \"disabled\"}`.\n\nDelete `budget_tokens` plumbing entirely. For the replacement `effort` value, see **Choosing an effort level on Opus 4.7** below — there is no exact 1:1 mapping from `budget_tokens`.\n\n**Sampling parameters removed.**\n\nThe `temperature`, `top_p`, and `top_k` parameters are no longer accepted on Claude Opus 4.7. Requests that include them return a 400 error. Remove these fields from your request payloads. Prompting is the recommended way to guide model behavior on Claude Opus 4.7. If you were using `temperature = 0` for determinism, note that it never guaranteed identical outputs on prior models.\n\n```python\n# Before — errors on Opus 4.7\nclient.messages.create(temperature=0.7, top_p=0.9, ...)\n\n# After\nclient.messages.create(...)  # no sampling params\n```\n\n- **If the intent was determinism** — use `effort: \"low\"` with a tighter prompt.\n- **If the intent was creative variance** — the prompt replacement depends on the use case; **ask the user** how they want variance elicited. If you can't ask, add a use-case-appropriate instruction along the lines of *\"choose something off-distribution and interesting\"* — e.g. for text generation, *\"Vary your phrasing and structure across responses\"*; for frontend/design, use the propose-4-directions approach under **Design and frontend coding** below.\n\n### Choosing an effort level on Opus 4.7\n\n`budget_tokens` controlled how much to *think*; `effort` controls how much to think *and* act, so there is no exact 1:1 mapping. **Use `xhigh` for best results in coding and agentic use cases, and a minimum of `high` for most intelligence-sensitive use cases.** Experiment with other levels to further tune token usage and intelligence:\n\n| Level | Use when | Notes |\n| --- | --- | --- |\n| `max` | Intelligence-demanding tasks worth testing at the ceiling | Can deliver gains in some use cases but may show diminishing returns from increased token usage; can be prone to overthinking |\n| `xhigh` | **Most coding and agentic use cases** | The best setting for these; used as the default in Claude Code |\n| `high` | Intelligence-sensitive use cases generally | Balances token usage and intelligence; recommended minimum for most intelligence-sensitive work |\n| `medium` | Cost-sensitive use cases that need to reduce token usage while trading off intelligence | |\n| `low` | Short, scoped tasks and latency-sensitive workloads that are not intelligence-sensitive | |\n\n### Silent default changes (no error, but behavior differs)\n\n**Thinking content omitted by default.**\n\nThinking blocks still appear in the response stream on Claude Opus 4.7, but their `thinking` field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was to return summarized thinking text. To restore summarized thinking content on Claude Opus 4.7, set `thinking.display` to `\"summarized\"`. **The block-field name is unchanged** — it is still `block.thinking` on a `thinking`-type block; do not rename it.\n\n**Detect this:** any code that reads `block.thinking` (or equivalent) from a `thinking`-type block and renders it in a UI, log, or trace. **The fix is the request parameter, not the response handling** — add `display: \"summarized\"` to the `thinking` parameter:\n\n```python\nthinking={\"type\": \"adaptive\", \"display\": \"summarized\"}  # \"display\" is new on Opus 4.7; values: \"omitted\" (default) | \"summarized\"\n```\n\nThe default is `\"omitted\"` on Claude Opus 4.7. If thinking content was never surfaced anywhere, no change needed. If your product streams reasoning to users, the new default appears as a long pause before output begins; set `display: \"summarized\"` to restore visible progress during thinking.\n\n**Updated token counting.**\n\nClaude Opus 4.7 and Claude Opus 4.6 count tokens differently. The same input text produces a higher token count on Claude Opus 4.7 than on Claude Opus 4.6, and `/v1/messages/count_tokens` will return a different number of tokens for Claude Opus 4.7 than it did for Claude Opus 4.6. The token efficiency of Claude Opus 4.7 can vary by workload shape. Prompting interventions, `task_budget`, and `effort` can help control costs and ensure appropriate token usage. Keep in mind that these controls may trade off model intelligence. **Update your `max_tokens` parameters to give additional headroom, including compaction triggers.** Claude Opus 4.7 provides a 1M context window at standard API pricing with no long-context premium.\n\nWhat else to check:\n\n- Client-side token estimators (tiktoken-style approximations) calibrated against 4.6\n- Cost calculators that multiply tokens by a fixed per-token rate\n- Rate-limit retry thresholds keyed to measured token counts\n\nRe-baseline by re-running `client.messages.count_tokens()` against `claude-opus-4-7` on a representative sample of the caller's prompts. Do not apply a blanket multiplier. For cost-sensitive workloads, consider reducing `effort` by one level (e.g. `high` → `medium`). For agentic loops, consider adopting Task Budgets (below).\n\n### New feature: Task Budgets (beta)\n\nOpus 4.7 introduces **task budgets** — tell Claude how many tokens it has for a full agentic loop (thinking + tool calls + final output). The model sees a running countdown and uses it to prioritize work and wrap up gracefully as the budget is consumed.\n\nThis is a **suggestion the model is aware of**, not a hard cap. It is distinct from `max_tokens`, which remains the enforced per-response limit and is *not* surfaced to the model. Use `task_budget` when you want the model to self-moderate; use `max_tokens` as a hard ceiling to cap usage.\n\nRequires beta header `task-budgets-2026-03-13`:\n\n```python\nclient.beta.messages.create(\n    betas=[\"task-budgets-2026-03-13\"],\n    model=\"claude-opus-4-7\",\n    max_tokens=64000,\n    thinking={\"type\": \"adaptive\"},\n    output_config={\n        \"effort\": \"high\",\n        \"task_budget\": {\"type\": \"tokens\", \"total\": 128000},\n    },\n    messages=[...],\n)\n```\n\nSet a generous budget for open-ended agentic tasks and tighten it for latency-sensitive ones. **Minimum `task_budget.total` is 20,000 tokens.** If the budget is too restrictive for the task, the model may complete it less thoroughly, referencing its budget as the constraint. **Do not add `task_budget` during a migration unless you are sure the budget value is right** — if you can run the workload and measure, do so; otherwise ask the user for the value rather than guessing. This is the primary lever for offsetting the token-counting shift on agentic workloads.\n\n### Capability improvements\n\n**High-resolution vision.** Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution is **2576 pixels on the long edge** (up from 1568px on Opus 4.6 and prior). This unlocks gains on vision-heavy workloads, especially computer use and screenshot/artifact/document understanding. Coordinates returned by the model now map 1:1 to actual image pixels, so no scale-factor math is needed.\n\nHigh-res support is **automatic on Opus 4.7** — no beta header, no client-side opt-in required. The model accepts larger inputs and returns pixel-accurate coordinates out of the box.\n\n**Token cost.** Full-resolution images on Opus 4.7 can use up to ~3× more image tokens than on prior models (up to ~4784 tokens per image, vs. the previous ~1,600-token cap). If the extra fidelity isn't needed, downsample client-side before sending to control cost — but **do not add downsampling by default during a migration**. If you're not sure whether the pipeline needs the fidelity, ask the user rather than guessing. Use `count_tokens()` on representative images on Opus 4.7 to re-baseline before reacting to any measured cost shift.\n\nBeyond resolution, Opus 4.7 also improves on low-level perception (pointing, measuring, counting) and natural-image bounding-box localization and detection.\n\n**Knowledge work.** Meaningful gains on tasks where the model visually verifies its own output — `.docx` redlining, `.pptx` editing, and programmatic chart/figure analysis (e.g. pixel-level data transcription via image-processing libraries). If prompts have scaffolding like *\"double-check the slide layout before returning\"*, try removing it and re-baselining.\n\n**Memory.** Opus 4.7 is better at writing and using file-system-based memory. If an agent maintains a scratchpad, notes file, or structured memory store across turns, that agent should improve at jotting down notes to itself and leveraging its notes in future tasks.\n\n**User-facing progress updates.** Opus 4.7 provides more regular, higher-quality interim updates during long agentic traces. If the system prompt has scaffolding like *\"After every 3 tool calls, summarize progress\"*, try removing it to avoid excessive user-facing text. If the length or contents of Opus 4.7's updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples.\n\n### Real-time cybersecurity safeguards\n\nRequests that involve prohibited or high-risk topics may lead to refusals.\n\n### Fast Mode: not available on Opus 4.7\n\nOpus 4.7 does not have a Fast Mode variant. **Opus 4.6 Fast remains supported**. Only surface this if the caller's code actually uses a Fast Mode model string (e.g. `claude-opus-4-6-fast`); if the word \"fast\" does not appear in the code, say nothing about Fast Mode.\n\nWhen you see `model=\"claude-opus-4-6-fast\"` (or similar), **the migration edit is**:\n\n```python\n# Opus 4.7 has no Fast Mode — keeping on 4.6 Fast (caller's choice to switch to standard Opus 4.7).\nmodel=\"claude-opus-4-6-fast\",\n```\n\nThat is: leave the model string **unchanged**, add the comment above it, and tell the user their two options — (a) stay on Opus 4.6 Fast, which remains supported, or (b) move latency-tolerant traffic to standard Opus 4.7 for the intelligence gain. Do **not** rewrite the model string to `claude-opus-4-7` yourself; that silently trades latency for intelligence, which is the caller's decision.\n\n### Behavioral shifts (prompt-tunable)\n\nThese don't break anything, but prompts tuned for Opus 4.6 may land differently. Opus 4.7 is more steerable than 4.6, so small prompt nudges usually close the gap.\n\n**More literal instruction following.** Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make. The upside of this literalism is precision and less thrash. It generally performs better for API use cases with carefully tuned prompts, structured extraction, and pipelines where you want predictable behavior. A prompt and harness review may be especially helpful for migration to Claude Opus 4.7.\n\n**Verbosity calibrates to task complexity.** Opus 4.7 scales response length to how complex it judges the task to be, rather than defaulting to a fixed verbosity — shorter answers on simple lookups, much longer on open-ended analysis. If the product depends on a particular length or style, tune the prompt explicitly. To reduce verbosity:\n\n> *\"Provide concise, focused responses. Skip non-essential context, and keep examples minimal.\"*\n\nIf you see specific kinds of over-verbosity (e.g. over-explaining), add instructions targeting those. Positive examples showing the desired level of concision tend to be more effective than negative examples or instructions telling the model what not to do. Do **not** assume existing \"be concise\" instructions should be removed — test first.\n\n**Tone and writing style.** Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than Opus 4.6's warmer style. As with any new model, prose style on long-form writing may shift. If the product relies on a specific voice, re-evaluate style prompts against the new baseline. If a warmer or more conversational voice is wanted, specify it:\n\n> *\"Use a warm, collaborative tone. Acknowledge the user's framing before answering.\"*\n\n**`effort` matters more than on any prior Opus.** Opus 4.7 respects `effort` levels more strictly, especially at the low end. At `low` and `medium` it scopes work to what was asked rather than going above and beyond — good for latency and cost, but on moderate tasks at `low` there is some risk of under-thinking.\n\n- If shallow reasoning shows up on complex problems, raise `effort` to `high` or `xhigh` rather than prompting around it.\n- If `effort` must stay `low` for latency, add targeted guidance: *\"This task involves multi-step reasoning. Think carefully through the problem before responding.\"*\n- **At `xhigh` or `max`, set a large `max_tokens`** so the model has room to think and act across tool calls and subagents. Start at 64K and tune from there. (`xhigh` is a new effort level on Opus 4.7, between `high` and `max`.)\n\nAdaptive-thinking triggering is also steerable. If the model thinks more often than wanted — which can happen with large or complex system prompts — add: *\"Thinking adds latency and should only be used when it will meaningfully improve answer quality — typically for problems that require multi-step reasoning. When in doubt, respond directly.\"*\n\n**Uses tools less often by default.** Opus 4.7 tends to use tools less often than 4.6 and to use reasoning more. This produces better results in most cases, but for products that rely on tools (search/retrieval, function-calling, computer-use steps), it can drop tool-use rate. Two levers:\n\n- **Raise `effort`** — `high` or `xhigh` show substantially more tool usage in agentic search and coding, and are especially useful for knowledge work.\n- **Prompt for it** — be explicit in tool descriptions or the system prompt about when and how to use the tool, and encourage the model to err on the side of using it more often:\n\n> *\"When the answer depends on information not present in the conversation, you MUST call the `search` tool before answering — do not answer from prior knowledge.\"*\n\n**Fewer subagents by default.** Opus 4.7 tends to spawn fewer subagents than 4.6. This is steerable — give explicit guidance on when delegation is desirable. For a coding agent, for example:\n\n> *\"Do NOT spawn a subagent for work you can complete directly in a single response (e.g. refactoring a function you can already see). Spawn multiple subagents in the same turn when fanning out across items or reading multiple files.\"*\n\n**Design and frontend coding.** Opus 4.7 has stronger design instincts than 4.6, with a consistent default house style: warm cream/off-white backgrounds (around `#F4F1EA`), serif display type (Georgia, Fraunces, Playfair), italic word-accents, and a terracotta/amber accent. This reads well for editorial, hospitality, and portfolio briefs, but will feel off for dashboards, dev tools, fintech, healthcare, or enterprise apps — and it appears in slide decks as well as web UIs.\n\nThe default is persistent. Generic instructions (\"don't use cream,\" \"make it clean and minimal\") tend to shift the model to a different fixed palette rather than producing variety. Two approaches work reliably:\n\n1. **Specify a concrete alternative.** The model follows explicit specs precisely — give exact hex values, typefaces, and layout constraints.\n2. **Have the model propose options before building.** This breaks the default and gives the user control:\n\n   > *\"Before building, propose 4 distinct visual directions tailored to this brief (each as: bg hex / accent hex / typeface — one-line rationale). Ask the user to pick one, then implement only that direction.\"*\n\nIf the caller previously relied on `temperature` for design variety, use approach (2) — it produces meaningfully different directions across runs.\n\nOpus 4.7 also requires less frontend-design prompting than previous models to avoid generic \"AI slop\" aesthetics. Where earlier models needed a lengthy anti-slop snippet, Opus 4.7 generates distinctive, creative frontends with a much shorter nudge. This snippet works well alongside the variety approaches above:\n\n> *\"NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white or dark backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character. Use unique fonts, cohesive colors and themes, and animations for effects and micro-interactions.\"*\n\n**Interactive coding products.** Opus 4.7's token usage and behavior can differ between autonomous, asynchronous coding agents with a single user turn and interactive, synchronous coding agents with multiple user turns. Specifically, it tends to use more tokens in interactive settings, primarily because it reasons more after user turns. This can improve long-horizon coherence, instruction following, and coding capabilities in long interactive coding sessions, but also comes with more token usage. To maximize both performance and token efficiency in coding products, use `effort: \"xhigh\"` or `\"high\"`, add autonomous features (like an auto mode), and reduce the number of human interactions required from users.\n\nWhen limiting required user interactions, specify the task, intent, and relevant constraints upfront in the first human turn. Well-specified, clear, and accurate task descriptions upfront help maximize autonomy and intelligence while minimizing extra token usage after user turns — because Opus 4.7 is more autonomous than prior models, this usage pattern helps to maximize performance. In contrast, ambiguous or underspecified prompts conveyed progressively over multiple user turns tend to reduce token efficiency and sometimes performance.\n\n**Code review.** Opus 4.7 is meaningfully better at finding bugs than prior models, with both higher recall and precision. However, if a code-review harness was tuned for an earlier model, it may initially show *lower* recall — this is likely a harness effect, not a capability regression. When a review prompt says \"only report high-severity issues,\" \"be conservative,\" or \"don't nitpick,\" Opus 4.7 follows that instruction more faithfully than earlier models did: it investigates just as thoroughly, identifies the bugs, and then declines to report findings it judges to be below the stated bar. Precision rises, but measured recall can fall even though underlying bug-finding has improved.\n\nRecommended prompt language:\n\n> *\"Report every issue you find, including ones you are uncertain about or consider low-severity. Do not filter for importance or confidence at this stage — a separate verification step will do that. Your goal here is coverage: it is better to surface a finding that later gets filtered out than to silently drop a bug. For each finding, include your confidence level and an estimated severity so a downstream filter can rank them.\"*\n\nThis can be used without an actual second step, but moving confidence filtering out of the finding step often helps. If the harness has a separate verification/dedup/ranking stage, tell the model explicitly that its job at the finding stage is coverage, not filtering. If single-pass self-filtering is wanted, be concrete about the bar rather than using qualitative terms like \"important\" — e.g. *\"report any bugs that could cause incorrect behavior, a test failure, or a misleading result; only omit nits like pure style or naming preferences.\"* Iterate on prompts against a subset of evals to validate recall or F1 gains.\n\n**Computer use.** Computer use works across resolutions up to the new 2576px / 3.75MP maximum. Sending images at **1080p** provides a good balance of performance and cost. For particularly cost-sensitive workloads, **720p** or **1366×768** are lower-cost options with strong performance. Test to find the ideal settings for the use case; experimenting with `effort` can also help tune behavior.\n\n---\n\n## Opus 4.7 Migration Checklist\n\nEvery item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent truncation, or empty output if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments — surface them to the user as recommendations.\n\n`[BLOCKS]` items prefixed with **\"If…\"** or **\"At…\"** are conditional. Before working through the list, **scan the file** for the conditions: does it surface thinking text to a UI/log? Does it set `output_config.effort` to `\"x-high\"` or `\"max\"`? Is it a security workload? Is it a multi-turn agentic loop? Apply only the items whose condition matches.\n\n- [ ] **[BLOCKS]** Replace `thinking: {type: \"enabled\", budget_tokens: N}` with `thinking: {type: \"adaptive\"}` + `output_config.effort`; delete `budget_tokens` plumbing entirely\n- [ ] **[BLOCKS]** Strip `temperature`, `top_p`, `top_k` from request construction\n- [ ] **[BLOCKS]** If thinking content is surfaced to users or stored in logs: add `thinking.display: \"summarized\"` (otherwise the rendered text is empty)\n- [ ] **[BLOCKS]** At `output_config.effort` of `xhigh` or `max`: set `max_tokens` ≥ 64000 (otherwise output truncates mid-thought)\n- [ ] **[TUNE]** Give `max_tokens` and compaction triggers extra headroom; re-run `count_tokens()` against `claude-opus-4-7` on representative prompts to re-baseline (no blanket multiplier)\n- [ ] **[TUNE]** Re-baseline cost and rate-limit dashboards *before* reacting to measured shifts\n- [ ] **[TUNE]** Re-evaluate `effort` per route — use `xhigh` for coding/agentic and a minimum of `high` for most intelligence-sensitive work; it matters more on 4.7 than any prior Opus\n- [ ] **[TUNE]** Multi-turn agentic loops: adopt the API-native Task Budgets (`output_config.task_budget`, beta `task-budgets-2026-03-13`, minimum 20k tokens) — this is for capping *cumulative* spend across a loop; per-turn depth is `effort`\n- [ ] **[TUNE]** Check for ambiguous or underspecified instructions that relied on 4.6 generalizing intent, and update them to be clearer or more precise — 4.7 follows them literally\n- [ ] **[TUNE]** Tool-use workloads: add explicit when/how-to-use guidance to tool descriptions (4.7 reaches for tools less often)\n- [ ] **[TUNE]** Verbosity: test existing length instructions before changing them — 4.7 calibrates length to task complexity, so tune for the desired output rather than assuming a direction\n- [ ] **[TUNE]** Remove forced-progress-update scaffolding (*\"after every N tool calls…\"*)\n- [ ] **[TUNE]** Remove knowledge-work verification scaffolding (*\"double-check the slide layout…\"*) and re-baseline\n- [ ] **[TUNE]** Add tone instruction if a warmer / more conversational voice is needed; re-evaluate style prompts on writing-heavy routes\n- [ ] **[TUNE]** Subagent tool present: add explicit spawn / don't-spawn guidance\n- [ ] **[TUNE]** Frontend/design output: specify a concrete palette/typeface, or have the model propose 4 visual directions before building (the default cream/serif house style is persistent)\n- [ ] **[TUNE]** Interactive coding products: use `effort: \"xhigh\"` or `\"high\"`, add autonomous features (e.g. an auto mode) to reduce human interactions, and specify task/intent/constraints upfront in the first turn\n- [ ] **[TUNE]** Code-review harnesses: remove or loosen \"only report high-severity\" / \"be conservative\" filters and have the model report every finding with confidence + severity; move filtering to a downstream step (4.7 follows severity filters more literally, which can depress measured recall)\n- [ ] **[TUNE]** Vision-heavy pipelines (screenshots, charts, document understanding): leave images at native resolution up to 2576px long edge for the accuracy gain; remove any scale-factor math from coordinate handling (coords are now 1:1 with pixels). No beta header / opt-in needed — high-res is automatic on Opus 4.7.\n- [ ] **[TUNE]** Computer-use pipelines: send screenshots at 1080p for a good performance/cost balance (720p or 1366×768 for cost-sensitive workloads); experiment with `effort` to tune behavior\n- [ ] **[TUNE]** Cost-sensitive image pipelines: full-res images on 4.7 use up to ~4784 tokens vs ~1,600 on prior models (~3×). Downsampling client-side before upload avoids the increase, but **do not downsample by default** — if you're unsure whether fidelity is needed, ask the user. Re-baseline with `count_tokens()` on representative images before reacting to cost shifts.\n\n---\n\n## Migrating to Opus 4.8\n\n> **Model ID `claude-opus-4-8` is authoritative as written here.** When the user asks to migrate to Opus 4.8, write `model=\"claude-opus-4-8\"` exactly. Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entry exists in `shared/models.md`.\n\nClaude Opus 4.8 is our most capable generally available model to date — highly autonomous, with state-of-the-art long-horizon agentic execution, knowledge work, and memory. It is layered on top of the Opus 4.7 migration above. If the caller is jumping from Opus 4.6 or older, apply the 4.6 and 4.7 sections first, then this one.\n\n**No new breaking changes.** Opus 4.8 keeps the same request surface as Opus 4.7. The same calls that already work on 4.7 work unchanged on 4.8 — adaptive thinking only (`thinking: {type: \"enabled\", budget_tokens: N}` still 400s; use `{type: \"adaptive\"}`), sampling parameters (`temperature`, `top_p`, `top_k`) still rejected, last-assistant-turn prefills still 400, `thinking.display` still defaults to `\"omitted\"`, and the `low`/`medium`/`high`/`xhigh`/`max` effort levels, Task Budgets (beta), and high-resolution vision all behave as on 4.7. A 4.7 → 4.8 migration is therefore **the model-ID swap plus prompt re-tuning** — there is no required code edit beyond the model string.\n\n**TL;DR for someone already on Opus 4.7:** swap the model ID to `claude-opus-4-8`. Nothing else is required to avoid an error. Then re-tune prompts for the behavioral shifts: 4.8 narrates *more* than 4.7 (add a silence-default if you want 4.7-like terseness), writes in a warmer, less hedged voice, is more deliberate and asks more often (add autonomy guidance to claw back ask-rate), and is more conservative about reaching for search, subagents, file-based memory, and custom tools (add explicit \"when to use this\" triggering). For long-horizon agentic work, give the full task specification up front in one well-specified turn and run at high effort.\n\n### No new API breaking changes (inherited from 4.7)\n\nThese all carry over from Opus 4.7 unchanged — apply them only if the caller is coming from Opus 4.6 or earlier (see the **Migrating to Opus 4.7** section above for the before/after and the SDK-specific syntax):\n\n- `thinking: {type: \"enabled\", budget_tokens: N}` → 400. Use `thinking: {type: \"adaptive\"}` + `output_config.effort`.\n- `temperature`, `top_p`, `top_k` → 400. Remove them; steer with prompting.\n- Last-assistant-turn prefills → 400. Use `output_config.format` (structured outputs) or a system-prompt instruction.\n- `thinking.display` defaults to `\"omitted\"`; set `\"summarized\"` if you surface reasoning to users.\n\nIf the caller is already on Opus 4.7 and these are clean, there is nothing to change here.\n\n### New API feature: mid-session system prompts\n\nYou can deliver trusted instructions partway through a session by placing `{\"role\": \"system\", ...}` entries directly in the `messages` array — without editing the top-level system prompt and invalidating your prompt cache. Use it for things the application learns mid-session: the user delivered async context, a mode toggled (auto-approve enabled), files changed on disk, the remaining token budget dropped.\n\n```python\nmessages=[\n    {\"role\": \"user\", \"content\": [{\"type\": \"tool_result\", \"tool_use_id\": \"...\", \"content\": \"...\"}]},\n    {\"role\": \"system\", \"content\": \"This project's codebase is Go. Write code in Go.\"},\n]\n```\n\nPhrase these as **context, not commands**. State the fact and let Claude act on it; avoid override-style language (\"ignore what the user said\", \"regardless of the user's request\", \"disregard the previous instruction\"). Claude is trained to protect users from instructions that appear to work against them, and that protection applies to the system role too. This is a beta (`anthropic-beta: mid-conversation-system-2026-04-07`) and is available from Opus 4.7 onward, not 4.8-exclusive. For cache-placement details and the older-model `<system-reminder>` fallback, see `shared/prompt-caching.md` and `shared/agent-design.md`.\n\n### Capability improvements\n\n**Long-horizon agentic execution.** Opus 4.8 is state-of-the-art at long, autonomous agentic work — complex refactors and overnight coding runs that complete without human correction. To get the most out of it, **give the full task specification up front in a single well-specified initial turn and run at high effort** (`effort: \"high\"` or `\"xhigh\"`). Its long-horizon coherence comes partly from reasoning more at each step; combined with a clear up-front goal, that more-intelligent planning often produces more efficient *and* more accurate output than prior frontier models. The \"clear goal up front\" principle maps to two product surfaces: in Claude Code, `/goal` sets direction for the run; with **Managed Agents (CMA)**, state what \"done\" looks like via an **Outcome** (`user.define_outcome` with a gradeable rubric — the harness runs an iterate → grade → revise loop), see `shared/managed-agents-outcomes.md`.\n\n**Effort is a dimension to test, not a fixed setting.** On prior models many reached for `xhigh` reflexively to maximize intelligence. Opus 4.8 has a higher intelligence ceiling, so **start at `high` as the default and iterate** rather than defaulting to `xhigh`. Sweep `medium`, `high`, and `xhigh` on your own eval set and weigh the intelligence ↔ latency ↔ cost tradeoff per route — the relationship isn't monotonic: higher effort up front often *reduces* turn count and total cost on agentic work, while for some tasks `medium` delivers equally good results in less time. Reserve `max` for extremely hard, latency-insensitive cases. The per-level effort table in the **Migrating to Opus 4.7** section above applies unchanged on 4.8.\n\n**Writing voice and clarity.** Testers consistently describe 4.8's prose as clearer, warmer, and less hedged than prior models, with fewer measurable AI vocal tics — especially at higher effort, where it approaches expert-level prose and structure. This is roughly the **opposite** direction from the 4.7 shift (4.7 was more clipped, direct, and less validation-forward). If you added style prompts to counter 4.7's terseness or to inject warmth, re-evaluate them against the new baseline before keeping them — they may now overcorrect. 4.8 is also a stronger thought partner: more thoughtful, more willing to push back, and more likely to infer the right answer from context.\n\n**Code review and debugging.** Stronger real-bug finding and clearer explanations than 4.7 — one-shot fixes where 4.7 needed more, and correctly identifying intermittent flakes rather than declaring \"fixed\" after one clean run. The 4.7 caveat still applies: if a review harness says \"only report high-severity issues\" or \"be conservative\", 4.8 follows it literally and measured recall can drop even though underlying bug-finding improved. Tell the model to report everything and filter downstream (or review a second time) — see the **Code review** guidance in the 4.7 section for the recommended prompt.\n\n### Behavioral shifts (prompt-tunable)\n\nNone of these break code, but prompts tuned for Opus 4.7 may land differently. 4.8 follows instructions well, so small, explicit nudges close the gap.\n\n**Tool triggering is surface-dependent (search & knowledge).** 4.8's tool-triggering is more surface-dependent than in prior models: with a system prompt present it is high-precision / low-recall — web search triggers slightly more often but runs fewer rounds per trigger, while knowledge-retrieval tools (Drive, project knowledge, connected files) trigger *less* often. It searches when it's confident search is needed and otherwise answers from context, which can lower research depth on tasks that need it. Recover should-search rate with an explicit search-first instruction:\n\n> ```\n> <search_first>\n> For questions where current information would change the answer (recent events, current roles or prices, version-specific behavior, or anything the user flags as time-sensitive) search before answering rather than answering from memory. For open-ended research requests, begin searching immediately; do not ask a scoping question first unless the request is genuinely ambiguous about what to research.\n> </search_first>\n> ```\n\n**Under-utilization of subagents, memory, and custom tools.** Separately from search, 4.8 is conservative about reaching for capabilities that need an explicit \"decide to use this\" step — file-based memory, subagent delegation, custom tools. It won't reach for complex or expensive capabilities unless reasonably sure they're needed. This is steerable since 4.8 follows instructions well — say *when* each capability applies, not just that it exists:\n\n> *\"Before any task longer than a few turns, check your memory file for relevant prior context and write new findings to it as you go. When a task fans out across independent items (many files to read, many tests to run, many candidates to check), delegate to subagents rather than iterating serially.\"*\n\nThe same lever works at the **tool-description** level, not just the system prompt: prescriptive descriptions that state *when* to call a tool (e.g. \"Call this when the user asks about current prices or recent events\") give meaningful lift on 4.8 over descriptions that only state what the tool does. Make the trigger condition part of each capability's own `description`.\n\n**More user-facing narration.** 4.8 narrates more than 4.7 — more text between tool calls in long tool-calling sessions, and longer, more detailed end-of-task wrap-ups by default. If you previously added scaffolding to force interim status (\"after every 3 tool calls, summarize progress\"), **remove it** — 4.8 does this on its own. If the narration is too verbose for a coding agent, an explicit silence-default makes it behave like 4.7 with no loss of quality:\n\n> *\"Default to silence between tool calls. Only write text when you find something, change direction, or hit a blocker — one sentence each. Do not narrate routine actions ('Now I'll...', 'Let me check...', 'Looking at...'). When done: one or two sentences on the outcome. Do not recap every file or test — the user has been following along.\"*\n\nFor knowledge-work deliverables (reports, analysis readouts), verbosity responds very well to instructions in user preferences or the user turn — expose a verbosity preference rather than hard-coding a length.\n\n**More deliberate — asks more often.** 4.8 is more deliberate than prior Opus models. On minor decisions it would previously just make (a variable name, a default value, which of two equivalent approaches), it tends to pause and ask, and it often closes a completed task with \"Want me to also…?\" rather than doing the obvious next step or stopping cleanly. This is preferred for high-stakes or unfamiliar codebases, but bugs users when uncalibrated. Grant autonomy on the small stuff while keeping caution where it matters (in Claude Code testing this cut ask-rate by ~12 percentage points with no increase in over-reach):\n\n> *\"For minor choices (naming, formatting, default values, which approach among equivalents), pick a reasonable option and note it rather than asking. For scope changes or destructive actions, still ask first.\"*\n\n**Verbose reasoning when thinking is disabled.** With `thinking: {type: \"disabled\"}`, 4.8 occasionally writes longer explanations of its reasoning into the visible response, which reads as verbose when the user wants a fast, quick answer. The simplest fix is to leave adaptive thinking on — set `thinking: {type: \"adaptive\"}` (the recommended setting; it adjusts how much to think per task). Note adaptive is **not** on when the field is omitted — like Opus 4.7, a request with no `thinking` field runs without thinking, so set it explicitly. If you need thinking off for latency or cost, scope it in the system prompt:\n\n> *\"Respond only with your final answer. Do not include exploratory reasoning, intermediate drafts, diffs you considered but rejected, or meta-commentary about your process.\"*\n\n### Opus 4.8 Migration Checklist\n\nEvery item is tagged: **`[BLOCKS]`** items cause a 400 error if missed; **`[TUNE]`** items are quality/cost adjustments — surface them to the user as recommendations.\n\nFor a caller **already on Opus 4.7**, only the first item is required; everything else is `[TUNE]`. The conditional `[BLOCKS]` item applies only when coming from Opus 4.6 or earlier.\n\n- [ ] **[BLOCKS]** Update the `model=` string to `claude-opus-4-8`\n- [ ] **[BLOCKS]** *(only if coming from Opus 4.6 or earlier)* Apply the **Migrating to Opus 4.7** breaking changes first — `budget_tokens` → adaptive thinking, strip `temperature`/`top_p`/`top_k`, remove last-assistant-turn prefills. These already 400 on 4.7 and continue to 400 on 4.8.\n- [ ] **[TUNE]** Long-horizon / agentic work: put the full task spec in one well-specified first turn and run at `high` or `xhigh` effort (Claude Code: `/goal`; Managed Agents: an Outcome with a gradeable rubric)\n- [ ] **[TUNE]** Effort: sweep `medium` / `high` / `xhigh` on your eval set and pick per route by the intelligence ↔ latency ↔ cost tradeoff (default `high`, `xhigh` for coding/agentic)\n- [ ] **[TUNE]** Research depth & tool use: add a search-first instruction; add explicit triggering guidance for subagents, file-based memory, and custom tools (4.8 under-reaches for these by default) — in the system prompt *and* in each tool's own `description` (prescriptive \"call this when…\" descriptions give measurable lift)\n- [ ] **[TUNE]** Narration: remove forced-progress scaffolding (*\"after every N tool calls…\"*); add a silence-default if a coding agent is too chatty\n- [ ] **[TUNE]** Autonomy: add small-decisions-don't-ask guidance to cut ask-rate, while keeping caution on scope changes / destructive actions\n- [ ] **[TUNE]** Writing voice: re-evaluate style prompts added to counter 4.7's directness — 4.8 is warmer and less hedged by default; re-baseline before keeping them\n- [ ] **[TUNE]** Code-review harnesses: keep the report-everything-filter-downstream pattern (4.8 follows \"only high-severity\" / \"be conservative\" filters literally, which can depress measured recall)\n- [ ] **[TUNE]** Thinking-disabled paths: add a final-answer-only instruction if reasoning leaks into the visible response\n- [ ] **[TUNE]** Consider mid-session system messages (`role:\"system\"` in `messages`, beta `mid-conversation-system-2026-04-07`) for context the app learns mid-session, instead of rebuilding the top-level system prompt and invalidating the cache\n\n---\n\n## Verify the Migration\n\nAfter updating, spot-check that the new model is actually being used. Replace `YOUR_TARGET_MODEL` with the model string you migrated to (e.g. `claude-opus-4-8`, `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`) and keep the assertion prefix in sync:\n\n```python\nYOUR_TARGET_MODEL = \"claude-opus-4-8\"  # or \"claude-opus-4-7\", \"claude-sonnet-4-6\", \"claude-haiku-4-5\"\nresponse = client.messages.create(model=YOUR_TARGET_MODEL, max_tokens=64, messages=[...])\nassert response.model.startswith(YOUR_TARGET_MODEL), response.model\n```\n\nFor rate-limit headroom changes, pricing, or capability deltas (vision, structured outputs, effort support), query the Models API:\n\n```python\nm = client.models.retrieve(YOUR_TARGET_MODEL)\nm.max_input_tokens, m.max_tokens\nm.capabilities[\"effort\"][\"max\"][\"supported\"]\n```\n\nSee `shared/models.md` for the full capability lookup pattern.\n</doc>\n\n<doc path=\"shared/models.md\">\n# Claude Model Catalog\n\n**Only use exact model IDs listed in this file.** Never guess or construct model IDs — incorrect IDs will cause API errors. Use aliases wherever available. For the latest information, WebFetch the Models Overview URL in `shared/live-sources.md`, or query the Models API directly (see Programmatic Model Discovery below).\n\n## Programmatic Model Discovery\n\nFor **live** capability data — context window, max output tokens, feature support (thinking, vision, effort, structured outputs, etc.) — query the Models API instead of relying on the cached tables below. Use this when the user asks \"what's the context window for X\", \"does model X support vision/thinking/effort\", \"which models support feature Y\", or wants to select a model by capability at runtime.\n\n```python\nm = client.models.retrieve(\"claude-opus-4-8\")\nm.id                 # \"claude-opus-4-8\"\nm.display_name       # \"Claude Opus 4.8\"\nm.max_input_tokens   # context window (int)\nm.max_tokens         # max output tokens (int)\n\n# capabilities is an untyped nested dict — bracket access, check [\"supported\"] at the leaf\ncaps = m.capabilities\ncaps[\"image_input\"][\"supported\"]                       # vision\ncaps[\"thinking\"][\"types\"][\"adaptive\"][\"supported\"]     # adaptive thinking\ncaps[\"effort\"][\"max\"][\"supported\"]                     # effort: max (also low/medium/high)\ncaps[\"structured_outputs\"][\"supported\"]\ncaps[\"context_management\"][\"compact_20260112\"][\"supported\"]\n\n# filter across all models — iterate the page object directly (auto-paginates); do NOT use .data\n[m for m in client.models.list()\n if m.capabilities[\"thinking\"][\"types\"][\"adaptive\"][\"supported\"]\n and m.max_input_tokens >= 200_000]\n```\n\nTop-level fields (`id`, `display_name`, `max_input_tokens`, `max_tokens`) are typed attributes. `capabilities` is a dict — use bracket access, not attribute access. The API returns the full capability tree for every model with `supported: true/false` at each leaf, so bracket chains are safe without `.get()` guards. TypeScript SDK: same method names, also auto-paginates on iteration.\n\n### Raw HTTP\n\n```bash\ncurl https://api.anthropic.com/v1/models/claude-opus-4-8 \\\n  -H \"x-api-key: $ANTHROPIC_API_KEY\" \\\n  -H \"anthropic-version: 2023-06-01\"\n```\n\n```json\n{\n  \"id\": \"claude-opus-4-8\",\n  \"display_name\": \"Claude Opus 4.8\",\n  \"max_input_tokens\": 1000000,\n  \"max_tokens\": 128000,\n  \"capabilities\": {\n    \"image_input\": {\"supported\": true},\n    \"structured_outputs\": {\"supported\": true},\n    \"thinking\": {\"supported\": true, \"types\": {\"enabled\": {\"supported\": false}, \"adaptive\": {\"supported\": true}}},\n    \"effort\": {\"supported\": true, \"low\": {\"supported\": true}, …, \"max\": {\"supported\": true}},\n    …\n  }\n}\n```\n\n## Current Models (recommended)\n\n| Friendly Name     | Alias (use this)    | Full ID                       | Context        | Max Output | Status |\n|-------------------|---------------------|-------------------------------|----------------|------------|--------|\n| Claude Opus 4.8   | `claude-opus-4-8`   | —                             | 1M             | 128K       | Active |\n| Claude Opus 4.7   | `claude-opus-4-7`   | —                             | 1M             | 128K       | Active |\n| Claude Opus 4.6   | `claude-opus-4-6`   | —                             | 1M             | 128K       | Active |\n| Claude Sonnet 4.6 | `claude-sonnet-4-6` | -                             | 1M             | 64K        | Active |\n| Claude Haiku 4.5  | `claude-haiku-4-5`  | `claude-haiku-4-5-20251001`   | 200K           | 64K        | Active |\n\n### Model Descriptions\n- **Claude Opus 4.8** — The most capable Claude model to date — highly autonomous, state-of-the-art on long-horizon agentic work, knowledge work, and memory; clearer, warmer writing. Same API surface as Opus 4.7 (adaptive thinking only; sampling parameters and `budget_tokens` removed). 1M context window at standard API pricing (no long-context premium). See `shared/model-migration.md` → Migrating to Opus 4.8 — a 4.7 → 4.8 move is a model-ID swap plus prompt re-tuning, no new breaking changes.\n- **Claude Opus 4.7** — Previous-generation Opus. Highly autonomous; strong on long-horizon agentic work, knowledge work, vision, and memory. Adaptive thinking only; sampling parameters and `budget_tokens` removed. 1M context window. See `shared/model-migration.md` → Migrating to Opus 4.7.\n- **Claude Opus 4.6** — Older Opus. Supports adaptive thinking (recommended), 128K max output tokens (requires streaming for large outputs). 1M context window.\n- **Claude Sonnet 4.6** — Our best combination of speed and intelligence. Supports adaptive thinking (recommended). 1M context window. 64K max output tokens.\n- **Claude Haiku 4.5** — Fastest and most cost-effective model for simple tasks.\n\n## Legacy Models (still active)\n\n| Friendly Name     | Alias (use this)    | Full ID                       | Status |\n|-------------------|---------------------|-------------------------------|--------|\n| Claude Opus 4.5   | `claude-opus-4-5`   | `claude-opus-4-5-20251101`    | Active |\n| Claude Opus 4.1   | `claude-opus-4-1`   | `claude-opus-4-1-20250805`    | Active |\n| Claude Sonnet 4.5 | `claude-sonnet-4-5` | `claude-sonnet-4-5-20250929`  | Active |\n\n## Deprecated Models (retiring soon)\n\n| Friendly Name     | Alias (use this)    | Full ID                       | Status     | Retires      |\n|-------------------|---------------------|-------------------------------|------------|--------------|\n| Claude Sonnet 4   | `claude-sonnet-4-0` | `claude-sonnet-4-20250514`    | Deprecated | TBD          |\n| Claude Opus 4     | `claude-opus-4-0`   | `claude-opus-4-20250514`      | Deprecated | TBD          |\n| Claude Haiku 3    | —                   | `claude-3-haiku-20240307`     | Deprecated | Apr 19, 2026 |\n\n## Retired Models (no longer available)\n\n| Friendly Name     | Full ID                       | Retired     |\n|-------------------|-------------------------------|-------------|\n| Claude Sonnet 3.7 | `claude-3-7-sonnet-20250219`  | Feb 19, 2026 |\n| Claude Haiku 3.5  | `claude-3-5-haiku-20241022`   | Feb 19, 2026 |\n| Claude Opus 3     | `claude-3-opus-20240229`      | Jan 5, 2026 |\n| Claude Sonnet 3.5 | `claude-3-5-sonnet-20241022`  | Oct 28, 2025 |\n| Claude Sonnet 3.5 | `claude-3-5-sonnet-20240620`  | Oct 28, 2025 |\n| Claude Sonnet 3   | `claude-3-sonnet-20240229`    | Jul 21, 2025 |\n| Claude 2.1        | `claude-2.1`                  | Jul 21, 2025 |\n| Claude 2.0        | `claude-2.0`                  | Jul 21, 2025 |\n\n## Resolving User Requests\n\nWhen a user asks for a model by name, use this table to find the correct model ID:\n\n| User says...                              | Use this model ID              |\n|-------------------------------------------|--------------------------------|\n| \"opus\", \"most powerful\"                   | `claude-opus-4-8`              |\n| \"opus 4.8\"                                | `claude-opus-4-8`              |\n| \"opus 4.7\"                                | `claude-opus-4-7`              |\n| \"opus 4.6\"                                | `claude-opus-4-6`              |\n| \"opus 4.5\"                                | `claude-opus-4-5`              |\n| \"opus 4.1\"                                | `claude-opus-4-1`              |\n| \"opus 4\", \"opus 4.0\"                      | `claude-opus-4-0` (deprecated — suggest `claude-opus-4-8`) |\n| \"sonnet\", \"balanced\"                      | `claude-sonnet-4-6`            |\n| \"sonnet 4.6\"                              | `claude-sonnet-4-6`            |\n| \"sonnet 4.5\"                              | `claude-sonnet-4-5`            |\n| \"sonnet 4\", \"sonnet 4.0\"                  | `claude-sonnet-4-0` (deprecated — suggest `claude-sonnet-4-6`) |\n| \"sonnet 3.7\"                              | Retired — suggest `claude-sonnet-4-6` |\n| \"sonnet 3.5\"                              | Retired — suggest `claude-sonnet-4-6` |\n| \"haiku\", \"fast\", \"cheap\"                  | `claude-haiku-4-5`             |\n| \"haiku 4.5\"                               | `claude-haiku-4-5`             |\n| \"haiku 3.5\"                               | Retired — suggest `claude-haiku-4-5` |\n| \"haiku 3\"                                 | Deprecated — suggest `claude-haiku-4-5` |\n</doc>\n\n<doc path=\"shared/prompt-caching.md\">\n# Prompt Caching — Design & Optimization\n\nThis file covers how to design prompt-building code for effective caching. For language-specific syntax, see the `## Prompt Caching` section in each language's README or single-file doc.\n\n## The one invariant everything follows from\n\n**Prompt caching is a prefix match. Any change anywhere in the prefix invalidates everything after it.**\n\nThe cache key is derived from the exact bytes of the rendered prompt up to each `cache_control` breakpoint. A single byte difference at position N — a timestamp, a reordered JSON key, a different tool in the list — invalidates the cache for all breakpoints at positions ≥ N.\n\nRender order is: `tools` → `system` → `messages`. A breakpoint on the last system block caches both tools and system together.\n\nDesign the prompt-building path around this constraint. Get the ordering right and most caching works for free. Get it wrong and no amount of `cache_control` markers will help.\n\n---\n\n## Workflow for optimizing existing code\n\nWhen asked to add or optimize caching:\n\n1. **Trace the prompt assembly path.** Find where `system`, `tools`, and `messages` are constructed. Identify every input that flows into them.\n2. **Classify each input by stability:**\n   - Never changes → belongs early in the prompt, before any breakpoint\n   - Changes per-session → belongs after the global prefix, cache per-session\n   - Changes per-turn → belongs at the end, after the last breakpoint\n   - Changes per-request (timestamps, UUIDs, random IDs) → **eliminate or move to the very end**\n3. **Check rendered order matches stability order.** Stable content must physically precede volatile content. If a timestamp is interpolated into the system prompt header, everything after it is uncacheable regardless of markers.\n4. **Place breakpoints at stability boundaries.** See placement patterns below.\n5. **Audit for silent invalidators.** See anti-patterns table.\n\n---\n\n## Placement patterns\n\n### Large system prompt shared across many requests\n\nPut a breakpoint on the last system text block. If there are tools, they render before system — the marker on the last system block caches tools + system together.\n\n```json\n\"system\": [\n  {\"type\": \"text\", \"text\": \"<large shared prompt>\", \"cache_control\": {\"type\": \"ephemeral\"}}\n]\n```\n\n### Multi-turn conversations\n\nPut a breakpoint on the last content block of the most-recently-appended turn. Each subsequent request reuses the entire prior conversation prefix. Earlier breakpoints remain valid read points, so hits accrue incrementally as the conversation grows.\n\n```json\n// Last content block of the last user turn\nmessages[-1].content[-1].cache_control = {\"type\": \"ephemeral\"}\n```\n\n### Shared prefix, varying suffix\n\nMany requests share a large fixed preamble (few-shot examples, retrieved docs, instructions) but differ in the final question. Put the breakpoint at the end of the **shared** portion, not at the end of the whole prompt — otherwise every request writes a distinct cache entry and nothing is ever read.\n\n```json\n\"messages\": [{\"role\": \"user\", \"content\": [\n  {\"type\": \"text\", \"text\": \"<shared context>\", \"cache_control\": {\"type\": \"ephemeral\"}},\n  {\"type\": \"text\", \"text\": \"<varying question>\"}  // no marker — differs every time\n]}]\n```\n\n### Mid-conversation system messages\n\n**Beta, model-gated.** When an operator instruction arrives mid-conversation — a mode switch, updated context, dynamically injected state — send it as `{\"role\": \"system\", \"content\": \"...\"}` appended to `messages[]`, rather than editing top-level `system`. Editing top-level `system` changes the prefix ahead of the entire conversation history, so every cached turn is re-processed uncached; a `role: \"system\"` message sits after the history and leaves the cached prefix intact.\n\n```json\n// Top-level system stays byte-identical; new instruction goes after the cached history\n\"system\": [{\"type\": \"text\", \"text\": \"<stable core>\", \"cache_control\": {\"type\": \"ephemeral\"}}],\n\"messages\": [\n  ...history,\n  {\"role\": \"user\", \"content\": \"...\"},\n  {\"role\": \"system\", \"content\": \"Terse mode enabled — keep responses under 40 words.\"}\n]\n```\n\nThis is also the prompt-injection-safe replacement for embedding operator instructions as text inside a user turn (the `<system-reminder>` pattern): both have the same caching profile, but `role: \"system\"` is the non-spoofable operator channel, whereas text inside user/tool content can be forged by anything that writes to user-visible input.\n\nRequires `anthropic-beta: mid-conversation-system-2026-04-07`. Must follow a `role: \"user\"` message (or an assistant message ending in a server tool result); cannot be `messages[0]` — use top-level `system` for the initial prompt. Content is text-only. Model-gated — unsupported models return a 400 (`BadRequestError`: `role 'system' is not supported on this model`); catch that error and fall back to putting the instruction in a user-turn `<system-reminder>` block.\n\n### Prompts that change from the beginning every time\n\nDon't cache. If the first 1K tokens differ per request, there is no reusable prefix. Adding `cache_control` only pays the cache-write premium with zero reads. Leave it off.\n\n---\n\n## Architectural guidance\n\nThese are the decisions that matter more than marker placement. Fix these first.\n\n**Keep the system prompt frozen.** Don't interpolate \"current date: X\", \"mode: Y\", \"user name: Z\" into the system prompt — those sit at the front of the prefix and invalidate everything downstream. Inject dynamic context later in `messages` instead — as a `{\"role\": \"system\", ...}` message where supported (see § Mid-conversation system messages above), or as text in a user message otherwise. A message at turn 5 invalidates nothing before turn 5.\n\n**Don't change tools or model mid-conversation.** Tools render at position 0; adding, removing, or reordering a tool invalidates the entire cache. Same for switching models (caches are model-scoped). If you need \"modes\", don't swap the tool set — give Claude a tool that records the mode transition, or pass the mode as message content. Serialize tools deterministically (sort by name).\n\n**Fork operations must reuse the parent's exact prefix.** Side computations (summarization, compaction, sub-agents) often spin up a separate API call. If the fork rebuilds `system` / `tools` / `model` with any difference, it misses the parent's cache entirely. Copy the parent's `system`, `tools`, and `model` verbatim, then append fork-specific content at the end.\n\n---\n\n## Silent invalidators\n\nWhen reviewing code, grep for these inside anything that feeds the prompt prefix:\n\n| Pattern | Why it breaks caching |\n|---|---|\n| `datetime.now()` / `Date.now()` / `time.time()` in system prompt | Prefix changes every request |\n| `uuid4()` / `crypto.randomUUID()` / request IDs early in content | Same — every request is unique |\n| `json.dumps(d)` without `sort_keys=True` / iterating a `set` | Non-deterministic serialization → prefix bytes differ |\n| f-string interpolating session/user ID into system prompt | Per-user prefix; no cross-user sharing |\n| Conditional system sections (`if flag: system += ...`) | Every flag combination is a distinct prefix |\n| `tools=build_tools(user)` where set varies per user | Tools render at position 0; nothing caches across users |\n\nFix by moving the dynamic piece after the last breakpoint, making it deterministic, or deleting it if it's not load-bearing.\n\n---\n\n## API reference\n\n```json\n\"cache_control\": {\"type\": \"ephemeral\"}              // 5-minute TTL (default)\n\"cache_control\": {\"type\": \"ephemeral\", \"ttl\": \"1h\"} // 1-hour TTL\n```\n\n- Max **4** `cache_control` breakpoints per request.\n- Goes on any content block: system text blocks, tool definitions, message content blocks (`text`, `image`, `tool_use`, `tool_result`, `document`).\n- Top-level `cache_control` on `messages.create()` auto-places on the last cacheable block — simplest option when you don't need fine-grained placement.\n- Minimum cacheable prefix is model-dependent. Shorter prefixes silently won't cache even with a marker — no error, just `cache_creation_input_tokens: 0`:\n\n| Model | Minimum |\n|---|---:|\n| Opus 4.8, Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5 | 4096 tokens |\n| Sonnet 4.6, Haiku 3.5, Haiku 3 | 2048 tokens |\n| Sonnet 4.5, Sonnet 4.1, Sonnet 4, Sonnet 3.7 | 1024 tokens |\n\nA 3K-token prompt caches on Sonnet 4.5 but silently won't on Opus 4.8.\n\n**Economics:** Cache reads cost ~0.1× base input price. Cache writes cost **1.25× for 5-minute TTL, 2× for 1-hour TTL**. Break-even depends on TTL: with 5-minute TTL, two requests break even (1.25× + 0.1× = 1.35× vs 2× uncached); with 1-hour TTL, you need at least three requests (2× + 0.2× = 2.2× vs 3× uncached). The 1-hour TTL keeps entries alive across gaps in bursty traffic, but the doubled write cost means it needs more reads to pay off.\n\n---\n\n## Verifying cache hits\n\nThe response `usage` object reports cache activity:\n\n| Field | Meaning |\n|---|---|\n| `cache_creation_input_tokens` | Tokens written to cache this request (you paid the ~1.25× write premium) |\n| `cache_read_input_tokens` | Tokens served from cache this request (you paid ~0.1×) |\n| `input_tokens` | Tokens processed at full price (not cached) |\n\nIf `cache_read_input_tokens` is zero across repeated requests with identical prefixes, a silent invalidator is at work — diff the rendered prompt bytes between two requests to find it.\n\n**`input_tokens` is the uncached remainder only.** Total prompt size = `input_tokens + cache_creation_input_tokens + cache_read_input_tokens`. If your agent ran for hours but `input_tokens` shows 4K, the rest was served from cache — check the sum, not the single field.\n\nLanguage-specific access: `response.usage.cache_read_input_tokens` (Python/TS/Ruby), `$message->usage->cacheReadInputTokens` (PHP), `resp.Usage.CacheReadInputTokens` (Go/C#), `.usage().cacheReadInputTokens()` (Java).\n\n---\n\n## Invalidation hierarchy\n\nNot every parameter change invalidates everything. The API has three cache tiers, and changes only invalidate their own tier and below:\n\n| Change | Tools cache | System cache | Messages cache |\n|---|:---:|:---:|:---:|\n| Tool definitions (add/remove/reorder) | ❌ | ❌ | ❌ |\n| Model switch | ❌ | ❌ | ❌ |\n| `speed`, web-search, citations toggle | ✅ | ❌ | ❌ |\n| System prompt content | ✅ | ❌ | ❌ |\n| `tool_choice`, images, `thinking` enable/disable | ✅ | ✅ | ❌ |\n| Message content | ✅ | ✅ | ❌ |\n\nImplication: you can change `tool_choice` per-request or toggle `thinking` without losing the tools+system cache. Don't over-worry about these — only tool-definition and model changes force a full rebuild.\n\n---\n\n## 20-block lookback window\n\nEach breakpoint walks backward **at most 20 content blocks** to find a prior cache entry. If a single turn adds more than 20 blocks (common in agentic loops with many tool_use/tool_result pairs), the next request's breakpoint won't find the previous cache and silently misses.\n\nFix: place an intermediate breakpoint every ~15 blocks in long turns, or put the marker on a block that's within 20 of the previous turn's last cached block.\n\n---\n\n## Concurrent-request timing\n\nA cache entry becomes readable only after the first response **begins streaming**. N parallel requests with identical prefixes all pay full price — none can read what the others are still writing.\n\nFor fan-out patterns: send 1 request, await the first streamed token (not the full response), then fire the remaining N−1. They'll read the cache the first one just wrote.\n\n## Pre-warming the cache\n\nTo eliminate the cache-miss latency on the *first* real request, send a **`max_tokens: 0`** request at startup (or on an interval). The API runs prefill — writing the cache at your `cache_control` breakpoint — and returns immediately with `content: []`, `stop_reason: \"max_tokens\"`, and a populated `usage` block (zero output tokens billed; normal cache-write charge on `cache_creation_input_tokens`).\n\n**When to pre-warm** — pre-warming trades a cache-write charge *now* for lower TTFT on the *next* real request. It's worth it when all three hold: (a) first-request latency is user-visible (chat/voice/interactive — not background jobs), (b) the shared prefix is large enough that a cold write is noticeably slow, and (c) there's a moment *before* traffic to fire it — app startup, worker boot, post-deploy, start of a scheduled window.\n\n| Skip pre-warming when… | Because |\n|---|---|\n| Traffic is continuous (requests ≤ TTL apart) | The first real request warms the cache and every subsequent one hits it; a separate warm call is a pure extra write |\n| The prefix is small or below the cacheable minimum | The cold-write penalty is negligible |\n| The prefix varies per request/user | Nothing shared to pre-warm |\n| You'd pre-warm many distinct prefixes speculatively | Each is a ~1.25× write; cost can exceed the latency you save |\n\n**Scheduled re-warms:** only needed when traffic has gaps longer than the TTL. If real requests arrive more often than every 5 minutes, they keep the cache warm on their own — don't add an interval re-warm. For bursty traffic with long idle gaps, either re-warm just under the TTL or switch to `ttl: \"1h\"` and re-warm less often.\n\n```python\nclient.messages.create(\n    model=\"claude-opus-4-8\",\n    max_tokens=0,\n    system=[{\n        \"type\": \"text\",\n        \"text\": SYSTEM_PROMPT,\n        \"cache_control\": {\"type\": \"ephemeral\"},\n    }],\n    messages=[{\"role\": \"user\", \"content\": \"warmup\"}],\n)\n```\n\n**Breakpoint placement:** put `cache_control` on the **last block shared with the real request** (the system prompt or tool definitions) — **not** on the placeholder user message, and **not** via top-level automatic caching (which would key the cache to the placeholder). The placeholder can be any non-whitespace string; it's read during prefill but never answered.\n\n**Rejected combinations:** `max_tokens: 0` is an `invalid_request_error` with `stream: true`, `thinking.type: \"enabled\"`, `output_config.format`, `tool_choice` of `{\"type\":\"tool\"}` or `{\"type\":\"any\"}`, or inside a Message Batches request.\n\n**TTL still applies** — re-warm at least every 5 minutes for the default cache, or use the 1-hour TTL. This replaces the older `max_tokens: 1` workaround (no single-token reply to discard, no output tokens billed, intent is unambiguous).\n</doc>\n\n<doc path=\"shared/token-counting.md\">\n# Token Counting\n\nUse the `count_tokens` endpoint (`POST /v1/messages/count_tokens`) for accurate\ntoken counts against Claude models. Token counts are **model-specific** — pass\nthe same model ID you'll use for inference.\n\n**Do not use `tiktoken`.** It's OpenAI's tokenizer. It undercounts Claude\ntokens by ~15–20% on typical text, and by much more on code or non-English\ninput. Any estimate from `tiktoken`, `gpt-tokenizer`, or similar is wrong for\nClaude.\n\n## Count a file or string\n\n```python\nfrom anthropic import Anthropic\n\nclient = Anthropic()\nresp = client.messages.count_tokens(\n    model=\"claude-opus-4-8\",\n    messages=[{\"role\": \"user\", \"content\": open(\"CLAUDE.md\").read()}],\n)\nprint(resp.input_tokens)\n```\n\nTypeScript: `await client.messages.countTokens({model, messages})` →\n`.input_tokens`. See `{lang}/claude-api/README.md` for other SDKs.\n\n## CLI\n\n```sh\nant messages count-tokens --model claude-opus-4-8 \\\n  --message '{role: user, content: \"@./CLAUDE.md\"}' \\\n  --transform input_tokens -r\n```\n\n## Diffing a file across two versions\n\nThe endpoint is stateless — count each version separately and subtract:\n\n```python\nfrom anthropic import Anthropic\nimport subprocess\n\nclient = Anthropic()\ndef count(text: str) -> int:\n    return client.messages.count_tokens(\n        model=\"claude-opus-4-8\",\n        messages=[{\"role\": \"user\", \"content\": text}],\n    ).input_tokens\n\nbefore = subprocess.check_output([\"git\", \"show\", \"HEAD:CLAUDE.md\"], text=True)\nafter = open(\"CLAUDE.md\").read()\nprint(count(after) - count(before))\n```\n\nFull docs: see the Token Counting entry in `shared/live-sources.md`.\n</doc>\n\n<doc path=\"shared/tool-use-concepts.md\">\n# Tool Use Concepts\n\nThis file covers the conceptual foundations of tool use with the Claude API. For language-specific code examples, see the `python/`, `typescript/`, or other language folders. For decision heuristics on which tools to expose, how to manage context in long-running agents, and caching strategy, see `agent-design.md`.\n\n## User-Defined Tools\n\n### Tool Definition Structure\n\n> **Note:** When using the Tool Runner (beta), tool schemas are generated automatically from your function signatures (Python), Zod schemas (TypeScript), annotated classes (Java), `jsonschema` struct tags (Go), or `BaseTool` subclasses (Ruby). The raw JSON schema format below is for the manual approach — including PHP's `BetaRunnableTool`, which wraps a run closure around a hand-written schema — or SDKs without tool runner support.\n\nEach tool requires a name, description, and JSON Schema for its inputs:\n\n```json\n{\n  \"name\": \"get_weather\",\n  \"description\": \"Get current weather for a location\",\n  \"input_schema\": {\n    \"type\": \"object\",\n    \"properties\": {\n      \"location\": {\n        \"type\": \"string\",\n        \"description\": \"City and state, e.g., San Francisco, CA\"\n      },\n      \"unit\": {\n        \"type\": \"string\",\n        \"enum\": [\"celsius\", \"fahrenheit\"],\n        \"description\": \"Temperature unit\"\n      }\n    },\n    \"required\": [\"location\"]\n  }\n}\n```\n\n**Best practices for tool definitions:**\n\n- Use clear, descriptive names (e.g., `get_weather`, `search_database`, `send_email`)\n- Write detailed descriptions — Claude uses these to decide when to use the tool. Be **prescriptive about *when* to call it**, not just what it does (e.g. \"Call this when the user asks about current prices or recent events\"). On recent Opus models, which reach for tools more conservatively, trigger conditions in the description give measurable lift in should-call rate.\n- Include descriptions for each property\n- Use `enum` for parameters with a fixed set of values\n- Mark truly required parameters in `required`; make others optional with defaults\n\n---\n\n### Tool Choice Options\n\nControl when Claude uses tools:\n\n| Value                             | Behavior                                      |\n| --------------------------------- | --------------------------------------------- |\n| `{\"type\": \"auto\"}`                | Claude decides whether to use tools (default) |\n| `{\"type\": \"any\"}`                 | Claude must use at least one tool             |\n| `{\"type\": \"tool\", \"name\": \"...\"}` | Claude must use the specified tool            |\n| `{\"type\": \"none\"}`                | Claude cannot use tools                       |\n\nAny `tool_choice` value can also include `\"disable_parallel_tool_use\": true` to force Claude to use at most one tool per response. By default, Claude may request multiple tool calls in a single response.\n\n---\n\n### Tool Runner vs Manual Loop\n\n**Tool Runner (Recommended):** The SDK's tool runner handles the agentic loop automatically — it calls the API, detects tool use requests, executes your tool functions, feeds results back to Claude, and repeats until Claude stops calling tools. Available in Python, TypeScript, Java, Go, Ruby, and PHP SDKs (beta). The Python SDK also provides MCP conversion helpers (`anthropic.lib.tools.mcp`) to convert MCP tools, prompts, and resources for use with the tool runner — see `python/claude-api/tool-use.md` for details.\n\n**Manual Agentic Loop:** Use when you need fine-grained control over the loop (e.g., custom logging, conditional tool execution, human-in-the-loop approval). Loop until `stop_reason == \"end_turn\"`, always append the full `response.content` to preserve tool_use blocks, and ensure each `tool_result` includes the matching `tool_use_id`.\n\n**Stop reasons for server-side tools:** When using server-side tools (code execution, web search, etc.), the API runs a server-side sampling loop. If this loop reaches its default limit of 10 iterations, the response will have `stop_reason: \"pause_turn\"`. To continue, re-send the user message and assistant response and make another API request — the server will resume where it left off. Do NOT add an extra user message like \"Continue.\" — the API detects the trailing `server_tool_use` block and knows to resume automatically.\n\n```python\n# Handle pause_turn in your agentic loop\nif response.stop_reason == \"pause_turn\":\n    messages = [\n        {\"role\": \"user\", \"content\": user_query},\n        {\"role\": \"assistant\", \"content\": response.content},\n    ]\n    # Make another API request — server resumes automatically\n    response = client.messages.create(\n        model=\"claude-opus-4-8\", messages=messages, tools=tools\n    )\n```\n\nSet a `max_continuations` limit (e.g., 5) to prevent infinite loops. For the full guide, see: `https://platform.claude.com/docs/en/build-with-claude/handling-stop-reasons`\n\n> **Security:** The tool runner executes your tool functions automatically whenever Claude requests them. For tools with side effects (sending emails, modifying databases, financial transactions), validate inputs within your tool functions and consider requiring confirmation for destructive operations. Use the manual agentic loop if you need human-in-the-loop approval before each tool execution.\n\n---\n\n### Handling Tool Results\n\nWhen Claude uses a tool, the response contains a `tool_use` block. You must:\n\n1. Execute the tool with the provided input\n2. Send the result back in a `tool_result` message\n3. Continue the conversation\n\n**Error handling in tool results:** When a tool execution fails, set `\"is_error\": true` and provide an informative error message. Claude will typically acknowledge the error and either try a different approach or ask for clarification.\n\n**Multiple tool calls:** Claude can request multiple tools in a single response. Handle them all before continuing — send all results back in a single `user` message.\n\n---\n\n## Server-Side Tools: Code Execution\n\nThe code execution tool lets Claude run code in a secure, sandboxed container. Unlike user-defined tools, server-side tools run on Anthropic's infrastructure — you don't execute anything client-side. Just include the tool definition and Claude handles the rest.\n\n### Key Facts\n\n- Runs in an isolated container (1 CPU, 5 GiB RAM, 5 GiB disk)\n- No internet access (fully sandboxed)\n- Python 3.11 with data science libraries pre-installed\n- Containers persist for 30 days and can be reused across requests\n- Free when used with web search/web fetch tools; otherwise $0.05/hour after 1,550 free hours/month per organization\n\n### Tool Definition\n\nThe tool requires no schema — just declare it in the `tools` array:\n\n```json\n{\n  \"type\": \"code_execution_20260120\",\n  \"name\": \"code_execution\"\n}\n```\n\nClaude automatically gains access to `bash_code_execution` (run shell commands) and `text_editor_code_execution` (create/view/edit files).\n\n### Pre-installed Python Libraries\n\n- **Data science**: pandas, numpy, scipy, scikit-learn, statsmodels\n- **Visualization**: matplotlib, seaborn\n- **File processing**: openpyxl, xlsxwriter, pillow, pypdf, pdfplumber, python-docx, python-pptx\n- **Math**: sympy, mpmath\n- **Utilities**: tqdm, python-dateutil, pytz, sqlite3\n\nAdditional packages can be installed at runtime via `pip install`.\n\n### Supported File Types for Upload\n\n| Type   | Extensions                         |\n| ------ | ---------------------------------- |\n| Data   | CSV, Excel (.xlsx/.xls), JSON, XML |\n| Images | JPEG, PNG, GIF, WebP               |\n| Text   | .txt, .md, .py, .js, etc.          |\n\n### Container Reuse\n\nReuse containers across requests to maintain state (files, installed packages, variables). Extract the `container_id` from the first response and pass it to subsequent requests.\n\n### Response Structure\n\nThe response contains interleaved text and tool result blocks:\n\n- `text` — Claude's explanation\n- `server_tool_use` — What Claude is doing\n- `bash_code_execution_tool_result` — Code execution output (check `return_code` for success/failure)\n- `text_editor_code_execution_tool_result` — File operation results\n\n> **Security:** Always sanitize filenames with `os.path.basename()` / `path.basename()` before writing downloaded files to disk to prevent path traversal attacks. Write files to a dedicated output directory.\n\n---\n\n## Server-Side Tools: Web Search and Web Fetch\n\nWeb search and web fetch let Claude search the web and retrieve page content. They run server-side — just include the tool definitions and Claude handles queries, fetching, and result processing automatically.\n\n### Tool Definitions\n\n```json\n[\n  { \"type\": \"web_search_20260209\", \"name\": \"web_search\" },\n  { \"type\": \"web_fetch_20260209\", \"name\": \"web_fetch\" }\n]\n```\n\n### Dynamic Filtering (Opus 4.8 / Opus 4.7 / Opus 4.6 / Sonnet 4.6)\n\nThe `web_search_20260209` and `web_fetch_20260209` versions support **dynamic filtering** — Claude writes and executes code to filter search results before they reach the context window, improving accuracy and token efficiency. Dynamic filtering is built into these tool versions and activates automatically; you do not need to separately declare the `code_execution` tool or pass any beta header.\n\n```json\n{\n  \"tools\": [\n    { \"type\": \"web_search_20260209\", \"name\": \"web_search\" },\n    { \"type\": \"web_fetch_20260209\", \"name\": \"web_fetch\" }\n  ]\n}\n```\n\nWithout dynamic filtering, the previous `web_search_20250305` version is also available.\n\n> **Note:** Only include the standalone `code_execution` tool when your application needs code execution for its own purposes (data analysis, file processing, visualization) independent of web search. Including it alongside `_20260209` web tools creates a second execution environment that can confuse the model.\n\n---\n\n## Server-Side Tools: Programmatic Tool Calling\n\nWith standard tool use, each tool call is a round trip: Claude calls, the result enters Claude's context, Claude reasons, then calls the next tool. Chained calls accumulate latency and tokens — most of that intermediate data is never needed again.\n\nProgrammatic tool calling lets Claude compose those calls into a script. The script runs in the code execution container; when it invokes a tool, the container pauses, the call executes, and the result returns to the running code (not to Claude's context). The script processes it with normal control flow. Only the final output returns to Claude. Use it when chaining many tool calls or when intermediate results are large and should be filtered before reaching the context window.\n\nFor full documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling`\n\n---\n\n## Server-Side Tools: Tool Search\n\nThe tool search tool lets Claude dynamically discover tools from large libraries without loading all definitions into the context window. Use it when you have many tools but only a few are relevant to any given request. Discovered tool schemas are appended to the request, not swapped in — this preserves the prompt cache (see `agent-design.md` §Caching for Agents).\n\nFor full documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool`\n\n---\n\n## Skills\n\nSkills package task-specific instructions that Claude loads only when relevant. Each skill is a folder containing a `SKILL.md` file. The skill's short description sits in context by default; Claude reads the full file when the current task calls for it. Use skills to keep specialized instructions out of the base system prompt without losing discoverability.\n\nFor full documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/agents-and-tools/skills`\n\n---\n\n## Tool Use Examples\n\nYou can provide sample tool calls directly in your tool definitions to demonstrate usage patterns and reduce parameter errors. This helps Claude understand how to correctly format tool inputs, especially for tools with complex schemas.\n\nFor full documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/agents-and-tools/tool-use/implement-tool-use`\n\n---\n\n## Server-Side Tools: Computer Use\n\nComputer use lets Claude interact with a desktop environment (screenshots, mouse, keyboard). It can be Anthropic-hosted (server-side, like code execution) or self-hosted (you provide the environment and execute actions client-side).\n\nFor full documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/agents-and-tools/computer-use/overview`\n\n---\n\n## Context Editing\n\nContext editing clears stale tool results and thinking blocks from the transcript as a long-running agent accumulates turns. Unlike compaction (which summarizes), context editing prunes — the cleared content is removed, not replaced. Use it when old tool outputs are no longer relevant and you want to keep the transcript lean without losing the conversation structure. Thresholds for what to clear are configurable.\n\nFor full documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/build-with-claude/context-editing`\n\n---\n\n## Server-Side Tools: Advisor (Beta)\n\nThe advisor tool lets Claude consult a secondary model during a conversation. The advisor runs its own API call with a model you specify and returns its analysis to the primary model. Use it when you want a second opinion, specialized expertise, or cross-model verification without managing the orchestration yourself.\n\n### Tool Definition\n\n```json\n{\n  \"type\": \"advisor_20260301\",\n  \"name\": \"advisor\",\n  \"model\": \"claude-sonnet-4-6\"\n}\n```\n\nThe `model` parameter is required — it specifies which model the advisor uses for its own inference. Optional fields: `caching`, `max_uses`, `allowed_callers`, `defer_loading`, `strict`.\n\n**Beta header required:** `advisor-tool-2026-03-01`. The SDK sets this automatically when using `client.beta.messages.create()` with advisor tools.\n\n---\n\n## Client-Side Tools: Memory\n\nThe memory tool enables Claude to store and retrieve information across conversations through a memory file directory. Claude can create, read, update, and delete files that persist between sessions.\n\n### Key Facts\n\n- Client-side tool — you control storage via your implementation\n- Supports commands: `view`, `create`, `str_replace`, `insert`, `delete`, `rename`\n- Operates on files in a `/memories` directory\n- The Python, TypeScript, and Java SDKs provide helper classes/functions for implementing the memory backend\n\n> **Security:** Never store API keys, passwords, tokens, or other secrets in memory files. Be cautious with personally identifiable information (PII) — check data privacy regulations (GDPR, CCPA) before persisting user data. The reference implementations have no built-in access control; in multi-user systems, implement per-user memory directories and authentication in your tool handlers.\n\nFor full implementation examples, use WebFetch:\n\n- Docs: `https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool.md`\n\n---\n\n## Structured Outputs\n\nStructured outputs constrain Claude's responses to follow a specific JSON schema, guaranteeing valid, parseable output. This is not a separate tool — it enhances the Messages API response format and/or tool parameter validation.\n\nTwo features are available:\n\n- **JSON outputs** (`output_config.format`): Control Claude's response format\n- **Strict tool use** (`strict: true`): Guarantee valid tool parameter schemas\n\n**Supported models:** Claude Opus 4.8, Claude Sonnet 4.6, and Claude Haiku 4.5. Legacy models (Claude Opus 4.5, Claude Opus 4.1) also support structured outputs.\n\n> **Recommended:** Use `client.messages.parse()` which automatically validates responses against your schema. When using `messages.create()` directly, use `output_config: {format: {...}}`. The `output_format` convenience parameter is also accepted by some SDK methods (e.g., `.parse()`), but `output_config.format` is the canonical API-level parameter.\n\n### JSON Schema Limitations\n\n**Supported:**\n\n- Basic types: object, array, string, integer, number, boolean, null\n- `enum`, `const`, `anyOf`, `allOf`, `$ref`/`$def`\n- String formats: `date-time`, `time`, `date`, `duration`, `email`, `hostname`, `uri`, `ipv4`, `ipv6`, `uuid`\n- `additionalProperties: false` (required for all objects)\n\n**Not supported:**\n\n- Recursive schemas\n- Numerical constraints (`minimum`, `maximum`, `multipleOf`)\n- String constraints (`minLength`, `maxLength`)\n- Complex array constraints\n- `additionalProperties` set to anything other than `false`\n\nThe Python and TypeScript SDKs automatically handle unsupported constraints by removing them from the schema sent to the API and validating them client-side.\n\n### Important Notes\n\n- **First request latency**: New schemas incur a one-time compilation cost. Subsequent requests with the same schema use a 24-hour cache.\n- **Refusals**: If Claude refuses for safety reasons (`stop_reason: \"refusal\"`), the output may not match your schema.\n- **Token limits**: If `stop_reason: \"max_tokens\"`, output may be incomplete. Increase `max_tokens`.\n- **Incompatible with**: Citations (returns 400 error), message prefilling.\n- **Works with**: Batches API, streaming, token counting, extended thinking.\n\n---\n\n## Tips for Effective Tool Use\n\n1. **Provide detailed descriptions**: Claude relies heavily on descriptions to understand when and how to use tools\n2. **Use specific tool names**: `get_current_weather` is better than `weather`\n3. **Validate inputs**: Always validate tool inputs before execution\n4. **Handle errors gracefully**: Return informative error messages so Claude can adapt\n5. **Limit tool count**: Too many tools can confuse the model — keep the set focused\n6. **Test tool interactions**: Verify Claude uses tools correctly in various scenarios\n\nFor detailed tool use documentation, use WebFetch:\n\n- URL: `https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview`\n</doc>\n\n<doc path=\"typescript/claude-api/README.md\">\n# Claude API — TypeScript\n\n## Installation\n\n```bash\nnpm install @anthropic-ai/sdk\n```\n\n## Client Initialization\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\n// Default — resolves credentials from the environment:\n// ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.\n// Prefer this for local dev; don't hardcode a key.\nconst client = new Anthropic();\n\n// Explicit API key (only when you must inject a specific key)\nconst client = new Anthropic({ apiKey: \"your-api-key\" });\n```\n\n---\n\n## Basic Message Request\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [{ role: \"user\", content: \"What is the capital of France?\" }],\n});\n// response.content is ContentBlock[] — a discriminated union. Narrow by .type\n// before accessing .text (TypeScript will error on content[0].text without this).\nfor (const block of response.content) {\n  if (block.type === \"text\") {\n    console.log(block.text);\n  }\n}\n```\n\n---\n\n## System Prompts\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  system:\n    \"You are a helpful coding assistant. Always provide examples in Python.\",\n  messages: [{ role: \"user\", content: \"How do I read a JSON file?\" }],\n});\n```\n\n### Mid-conversation system messages (beta, model-gated)\n\nFor operator instructions that arrive mid-conversation (mode switches, injected state), append `{role: \"system\", ...}` to `messages` instead of editing top-level `system` — this preserves the cached prefix and carries operator authority. Must follow a user message; cannot be `messages[0]`. Unsupported models return a 400 (`role 'system' is not supported on this model`). See `shared/prompt-caching.md` for when to use this vs. top-level `system`.\n\n```typescript\n// SDK types for role:\"system\" in messages are pending — pass the beta header\n// directly until the SDK updates, then switch to client.beta.messages.create\n// with betas: [\"mid-conversation-system-2026-04-07\"].\nconst response = await client.messages.create(\n  {\n    model: MODEL_ID, // must support mid-conversation system messages\n    max_tokens: 16000,\n    system: [\n      { type: \"text\", text: STABLE_SYSTEM, cache_control: { type: \"ephemeral\" } },\n    ],\n    messages: [\n      ...history,\n      { role: \"user\", content: userMessage },\n      // @ts-expect-error — role:\"system\" pending SDK types\n      { role: \"system\", content: \"Terse mode enabled — keep responses under 40 words.\" },\n    ],\n  },\n  { headers: { \"anthropic-beta\": \"mid-conversation-system-2026-04-07\" } },\n);\n```\n\n---\n\n## Vision (Images)\n\n### URL\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: [\n        {\n          type: \"image\",\n          source: { type: \"url\", url: \"https://example.com/image.png\" },\n        },\n        { type: \"text\", text: \"Describe this image\" },\n      ],\n    },\n  ],\n});\n```\n\n### Base64\n\n```typescript\nimport fs from \"fs\";\n\nconst imageData = fs.readFileSync(\"image.png\").toString(\"base64\");\n\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: [\n        {\n          type: \"image\",\n          source: { type: \"base64\", media_type: \"image/png\", data: imageData },\n        },\n        { type: \"text\", text: \"What's in this image?\" },\n      ],\n    },\n  ],\n});\n```\n\n---\n\n## Prompt Caching\n\n**Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.\n\n### Automatic Caching (Recommended)\n\nUse top-level `cache_control` to automatically cache the last cacheable block in the request:\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  cache_control: { type: \"ephemeral\" }, // auto-caches the last cacheable block\n  system: \"You are an expert on this large document...\",\n  messages: [{ role: \"user\", content: \"Summarize the key points\" }],\n});\n```\n\n### Manual Cache Control\n\nFor fine-grained control, add `cache_control` to specific content blocks:\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  system: [\n    {\n      type: \"text\",\n      text: \"You are an expert on this large document...\",\n      cache_control: { type: \"ephemeral\" }, // default TTL is 5 minutes\n    },\n  ],\n  messages: [{ role: \"user\", content: \"Summarize the key points\" }],\n});\n\n// With explicit TTL (time-to-live)\nconst response2 = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  system: [\n    {\n      type: \"text\",\n      text: \"You are an expert on this large document...\",\n      cache_control: { type: \"ephemeral\", ttl: \"1h\" }, // 1 hour TTL\n    },\n  ],\n  messages: [{ role: \"user\", content: \"Summarize the key points\" }],\n});\n```\n\n### Verifying Cache Hits\n\n```typescript\nconsole.log(response.usage.cache_creation_input_tokens); // tokens written to cache (~1.25x cost)\nconsole.log(response.usage.cache_read_input_tokens);     // tokens served from cache (~0.1x cost)\nconsole.log(response.usage.input_tokens);                // uncached tokens (full cost)\n```\n\nIf `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `Date.now()` or a UUID in the system prompt, non-deterministic key ordering, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.\n\n---\n\n## Extended Thinking\n\n> **Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.8 and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.\n> **Older models:** Use `thinking: {type: \"enabled\", budget_tokens: N}` (must be < `max_tokens`, min 1024).\n\n```typescript\n// Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  thinking: { type: \"adaptive\" },\n  output_config: { effort: \"high\" }, // low | medium | high | max\n  messages: [\n    { role: \"user\", content: \"Solve this math problem step by step...\" },\n  ],\n});\n\nfor (const block of response.content) {\n  if (block.type === \"thinking\") {\n    console.log(\"Thinking:\", block.thinking);\n  } else if (block.type === \"text\") {\n    console.log(\"Response:\", block.text);\n  }\n}\n```\n\n---\n\n## Error Handling\n\nUse the SDK's typed exception classes — never check error messages with string matching:\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\ntry {\n  const response = await client.messages.create({...});\n} catch (error) {\n  if (error instanceof Anthropic.BadRequestError) {\n    console.error(\"Bad request:\", error.message);\n  } else if (error instanceof Anthropic.AuthenticationError) {\n    console.error(\"Invalid API key\");\n  } else if (error instanceof Anthropic.RateLimitError) {\n    console.error(\"Rate limited - retry later\");\n  } else if (error instanceof Anthropic.APIError) {\n    console.error(`API error ${error.status}:`, error.message);\n  }\n}\n```\n\nAll classes extend `Anthropic.APIError` with a typed `status` field. Check from most specific to least specific. See [shared/error-codes.md](../../shared/error-codes.md) for the full error code reference.\n\n---\n\n## Multi-Turn Conversations\n\nThe API is stateless — send the full conversation history each time. Use `Anthropic.MessageParam[]` to type the messages array:\n\n```typescript\nconst messages: Anthropic.MessageParam[] = [\n  { role: \"user\", content: \"My name is Alice.\" },\n  { role: \"assistant\", content: \"Hello Alice! Nice to meet you.\" },\n  { role: \"user\", content: \"What's my name?\" },\n];\n\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: messages,\n});\n```\n\n**Rules:**\n\n- Consecutive same-role messages are allowed — the API combines them into a single turn\n- First message must be `user`\n- Use SDK types (`Anthropic.MessageParam`, `Anthropic.Message`, `Anthropic.Tool`, etc.) for all API data structures — don't redefine equivalent interfaces\n\n---\n\n### Compaction (long conversations)\n\n> **Beta, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\nconst client = new Anthropic();\nconst messages: Anthropic.Beta.BetaMessageParam[] = [];\n\nasync function chat(userMessage: string): Promise<string> {\n  messages.push({ role: \"user\", content: userMessage });\n\n  const response = await client.beta.messages.create({\n    betas: [\"compact-2026-01-12\"],\n    model: \"claude-opus-4-8\",\n    max_tokens: 16000,\n    messages,\n    context_management: {\n      edits: [{ type: \"compact_20260112\" }],\n    },\n  });\n\n  // Append full content — compaction blocks must be preserved\n  messages.push({ role: \"assistant\", content: response.content });\n\n  const textBlock = response.content.find(\n    (b): b is Anthropic.Beta.BetaTextBlock => b.type === \"text\",\n  );\n  return textBlock?.text ?? \"\";\n}\n\n// Compaction triggers automatically when context grows large\nconsole.log(await chat(\"Help me build a Python web scraper\"));\nconsole.log(await chat(\"Add support for JavaScript-rendered pages\"));\nconsole.log(await chat(\"Now add rate limiting and error handling\"));\n```\n\n---\n\n## Stop Reasons\n\nThe `stop_reason` field in the response indicates why the model stopped generating:\n\n| Value           | Meaning                                                         |\n| --------------- | --------------------------------------------------------------- |\n| `end_turn`      | Claude finished its response naturally                          |\n| `max_tokens`    | Hit the `max_tokens` limit — increase it or use streaming       |\n| `stop_sequence` | Hit a custom stop sequence                                      |\n| `tool_use`      | Claude wants to call a tool — execute it and continue           |\n| `pause_turn`    | Model paused and can be resumed (agentic flows)                 |\n| `refusal`       | Claude refused for safety reasons — check `stop_details`        |\n\n### Structured Stop Details\n\nWhen `stop_reason` is `\"refusal\"`, the response includes a `stop_details` object with structured information about the refusal:\n\n```typescript\nif (response.stop_reason === \"refusal\" && response.stop_details) {\n  console.log(`Category: ${response.stop_details.category}`); // \"cyber\" | \"bio\" | null\n  console.log(`Explanation: ${response.stop_details.explanation}`);\n}\n```\n\n---\n\n## Cost Optimization Strategies\n\n### 1. Use Prompt Caching for Repeated Context\n\n```typescript\n// Automatic caching (simplest — caches the last cacheable block)\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  cache_control: { type: \"ephemeral\" },\n  system: largeDocumentText, // e.g., 50KB of context\n  messages: [{ role: \"user\", content: \"Summarize the key points\" }],\n});\n\n// First request: full cost\n// Subsequent requests: ~90% cheaper for cached portion\n```\n\n### 2. Use Token Counting Before Requests\n\n```typescript\nconst countResponse = await client.messages.countTokens({\n  model: \"claude-opus-4-8\",\n  messages: messages,\n  system: system,\n});\n\nconst estimatedInputCost = countResponse.input_tokens * 0.000005; // $5/1M tokens\nconsole.log(`Estimated input cost: $${estimatedInputCost.toFixed(4)}`);\n```\n</doc>\n\n<doc path=\"typescript/claude-api/batches.md\">\n# Message Batches API — TypeScript\n\nThe Batches API (`POST /v1/messages/batches`) processes Messages API requests asynchronously at 50% of standard prices.\n\n## Key Facts\n\n- Up to 100,000 requests or 256 MB per batch\n- Most batches complete within 1 hour; maximum 24 hours\n- Results available for 29 days after creation\n- 50% cost reduction on all token usage\n- All Messages API features supported (vision, tools, caching, etc.)\n\n---\n\n## Create a Batch\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\nconst client = new Anthropic();\n\nconst messageBatch = await client.messages.batches.create({\n  requests: [\n    {\n      custom_id: \"request-1\",\n      params: {\n        model: \"claude-opus-4-8\",\n        max_tokens: 16000,\n        messages: [\n          { role: \"user\", content: \"Summarize climate change impacts\" },\n        ],\n      },\n    },\n    {\n      custom_id: \"request-2\",\n      params: {\n        model: \"claude-opus-4-8\",\n        max_tokens: 16000,\n        messages: [\n          { role: \"user\", content: \"Explain quantum computing basics\" },\n        ],\n      },\n    },\n  ],\n});\n\nconsole.log(`Batch ID: ${messageBatch.id}`);\nconsole.log(`Status: ${messageBatch.processing_status}`);\n```\n\n---\n\n## Poll for Completion\n\n```typescript\nlet batch;\nwhile (true) {\n  batch = await client.messages.batches.retrieve(messageBatch.id);\n  if (batch.processing_status === \"ended\") break;\n  console.log(\n    `Status: ${batch.processing_status}, processing: ${batch.request_counts.processing}`,\n  );\n  await new Promise((resolve) => setTimeout(resolve, 60_000));\n}\n\nconsole.log(\"Batch complete!\");\nconsole.log(`Succeeded: ${batch.request_counts.succeeded}`);\nconsole.log(`Errored: ${batch.request_counts.errored}`);\n```\n\n---\n\n## Retrieve Results\n\n```typescript\nfor await (const result of await client.messages.batches.results(\n  messageBatch.id,\n)) {\n  switch (result.result.type) {\n    case \"succeeded\":\n      console.log(\n        `[${result.custom_id}] ${result.result.message.content[0].text.slice(0, 100)}`,\n      );\n      break;\n    case \"errored\":\n      if (result.result.error.type === \"invalid_request\") {\n        console.log(`[${result.custom_id}] Validation error - fix and retry`);\n      } else {\n        console.log(`[${result.custom_id}] Server error - safe to retry`);\n      }\n      break;\n    case \"expired\":\n      console.log(`[${result.custom_id}] Expired - resubmit`);\n      break;\n  }\n}\n```\n\n---\n\n## Cancel a Batch\n\n```typescript\nconst cancelled = await client.messages.batches.cancel(messageBatch.id);\nconsole.log(`Status: ${cancelled.processing_status}`); // \"canceling\"\n```\n</doc>\n\n<doc path=\"typescript/claude-api/files-api.md\">\n# Files API — TypeScript\n\nThe Files API uploads files for use in Messages API requests. Reference files via `file_id` in content blocks, avoiding re-uploads across multiple API calls.\n\n**Beta:** Pass `betas: [\"files-api-2025-04-14\"]` in your API calls (the SDK sets the required header automatically).\n\n## Key Facts\n\n- Maximum file size: 500 MB\n- Total storage: 100 GB per organization\n- Files persist until deleted\n- File operations (upload, list, delete) are free; content used in messages is billed as input tokens\n- Not available on Amazon Bedrock or Google Vertex AI\n\n---\n\n## Upload a File\n\n```typescript\nimport Anthropic, { toFile } from \"@anthropic-ai/sdk\";\nimport fs from \"fs\";\n\nconst client = new Anthropic();\n\nconst uploaded = await client.beta.files.upload({\n  file: await toFile(fs.createReadStream(\"report.pdf\"), undefined, {\n    type: \"application/pdf\",\n  }),\n  betas: [\"files-api-2025-04-14\"],\n});\n\nconsole.log(`File ID: ${uploaded.id}`);\nconsole.log(`Size: ${uploaded.size_bytes} bytes`);\n```\n\n---\n\n## Use a File in Messages\n\n### PDF / Text Document\n\n```typescript\nconst response = await client.beta.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: [\n        { type: \"text\", text: \"Summarize the key findings in this report.\" },\n        {\n          type: \"document\",\n          source: { type: \"file\", file_id: uploaded.id },\n          title: \"Q4 Report\",\n          citations: { enabled: true },\n        },\n      ],\n    },\n  ],\n  betas: [\"files-api-2025-04-14\"],\n});\n\nconsole.log(response.content[0].text);\n```\n\n---\n\n## Manage Files\n\n### List Files\n\n```typescript\nconst files = await client.beta.files.list({\n  betas: [\"files-api-2025-04-14\"],\n});\nfor (const f of files.data) {\n  console.log(`${f.id}: ${f.filename} (${f.size_bytes} bytes)`);\n}\n```\n\n### Delete a File\n\n```typescript\nawait client.beta.files.delete(\"file_011CNha8iCJcU1wXNR6q4V8w\", {\n  betas: [\"files-api-2025-04-14\"],\n});\n```\n\n### Download a File\n\n```typescript\nconst response = await client.beta.files.download(\n  \"file_011CNha8iCJcU1wXNR6q4V8w\",\n  { betas: [\"files-api-2025-04-14\"] },\n);\nconst content = Buffer.from(await response.arrayBuffer());\nawait fs.promises.writeFile(\"output.txt\", content);\n```\n</doc>\n\n<doc path=\"typescript/claude-api/streaming.md\">\n# Streaming — TypeScript\n\n## Quick Start\n\n```typescript\nconst stream = client.messages.stream({\n  model: \"claude-opus-4-8\",\n  max_tokens: 64000,\n  messages: [{ role: \"user\", content: \"Write a story\" }],\n});\n\nfor await (const event of stream) {\n  if (\n    event.type === \"content_block_delta\" &&\n    event.delta.type === \"text_delta\"\n  ) {\n    process.stdout.write(event.delta.text);\n  }\n}\n```\n\n---\n\n## Handling Different Content Types\n\n> **Opus 4.8 / Opus 4.7 / Opus 4.6:** Use `thinking: {type: \"adaptive\"}`. On older models, use `thinking: {type: \"enabled\", budget_tokens: N}` instead.\n\n```typescript\nconst stream = client.messages.stream({\n  model: \"claude-opus-4-8\",\n  max_tokens: 64000,\n  thinking: { type: \"adaptive\" },\n  messages: [{ role: \"user\", content: \"Analyze this problem\" }],\n});\n\nfor await (const event of stream) {\n  switch (event.type) {\n    case \"content_block_start\":\n      switch (event.content_block.type) {\n        case \"thinking\":\n          console.log(\"\\n[Thinking...]\");\n          break;\n        case \"text\":\n          console.log(\"\\n[Response:]\");\n          break;\n      }\n      break;\n    case \"content_block_delta\":\n      switch (event.delta.type) {\n        case \"thinking_delta\":\n          process.stdout.write(event.delta.thinking);\n          break;\n        case \"text_delta\":\n          process.stdout.write(event.delta.text);\n          break;\n      }\n      break;\n  }\n}\n```\n\n---\n\n## Streaming with Tool Use (Tool Runner)\n\nUse the tool runner with `stream: true`. The outer loop iterates over tool runner iterations (messages), the inner loop processes stream events:\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\nimport { betaZodTool } from \"@anthropic-ai/sdk/helpers/beta/zod\";\nimport { z } from \"zod\";\n\nconst client = new Anthropic();\n\nconst getWeather = betaZodTool({\n  name: \"get_weather\",\n  description: \"Get current weather for a location\",\n  inputSchema: z.object({\n    location: z.string().describe(\"City and state, e.g., San Francisco, CA\"),\n  }),\n  run: async ({ location }) => `72°F and sunny in ${location}`,\n});\n\nconst runner = client.beta.messages.toolRunner({\n  model: \"claude-opus-4-8\",\n  max_tokens: 64000,\n  tools: [getWeather],\n  messages: [\n    { role: \"user\", content: \"What's the weather in Paris and London?\" },\n  ],\n  stream: true,\n});\n\n// Outer loop: each tool runner iteration\nfor await (const messageStream of runner) {\n  // Inner loop: stream events for this iteration\n  for await (const event of messageStream) {\n    switch (event.type) {\n      case \"content_block_delta\":\n        switch (event.delta.type) {\n          case \"text_delta\":\n            process.stdout.write(event.delta.text);\n            break;\n          case \"input_json_delta\":\n            // Tool input being streamed\n            break;\n        }\n        break;\n    }\n  }\n}\n```\n\n---\n\n## Getting the Final Message\n\n```typescript\nconst stream = client.messages.stream({\n  model: \"claude-opus-4-8\",\n  max_tokens: 64000,\n  messages: [{ role: \"user\", content: \"Hello\" }],\n});\n\nfor await (const event of stream) {\n  // Process events...\n}\n\nconst finalMessage = await stream.finalMessage();\nconsole.log(`Tokens used: ${finalMessage.usage.output_tokens}`);\n```\n\n---\n\n## Stream Event Types\n\n| Event Type            | Description                 | When it fires                     |\n| --------------------- | --------------------------- | --------------------------------- |\n| `message_start`       | Contains message metadata   | Once at the beginning             |\n| `content_block_start` | New content block beginning | When a text/tool_use block starts |\n| `content_block_delta` | Incremental content update  | For each token/chunk              |\n| `content_block_stop`  | Content block complete      | When a block finishes             |\n| `message_delta`       | Message-level updates       | Contains `stop_reason`, usage     |\n| `message_stop`        | Message complete            | Once at the end                   |\n\n## Best Practices\n\n1. **Always flush output** — Use `process.stdout.write()` for immediate display\n2. **Handle partial responses** — If the stream is interrupted, you may have incomplete content\n3. **Track token usage** — The `message_delta` event contains usage information\n4. **Use `finalMessage()`** — Get the complete `Anthropic.Message` object even when streaming. Don't wrap `.on()` events in `new Promise()` — `finalMessage()` handles all completion/error/abort states internally\n5. **Buffer for web UIs** — Consider buffering a few tokens before rendering to avoid excessive DOM updates\n6. **Use `stream.on(\"text\", ...)` for deltas** — The `text` event provides just the delta string, simpler than manually filtering `content_block_delta` events\n7. **For agentic loops with streaming** — See the [Streaming Manual Loop](./tool-use.md#streaming-manual-loop) section in tool-use.md for combining `stream()` + `finalMessage()` with a tool-use loop\n\n## Raw SSE Format\n\nIf using raw HTTP (not SDKs), the stream returns Server-Sent Events:\n\n```\nevent: message_start\ndata: {\"type\":\"message_start\",\"message\":{\"id\":\"msg_...\",\"type\":\"message\",...}}\n\nevent: content_block_start\ndata: {\"type\":\"content_block_start\",\"index\":0,\"content_block\":{\"type\":\"text\",\"text\":\"\"}}\n\nevent: content_block_delta\ndata: {\"type\":\"content_block_delta\",\"index\":0,\"delta\":{\"type\":\"text_delta\",\"text\":\"Hello\"}}\n\nevent: content_block_stop\ndata: {\"type\":\"content_block_stop\",\"index\":0}\n\nevent: message_delta\ndata: {\"type\":\"message_delta\",\"delta\":{\"stop_reason\":\"end_turn\"},\"usage\":{\"output_tokens\":12}}\n\nevent: message_stop\ndata: {\"type\":\"message_stop\"}\n```\n</doc>\n\n<doc path=\"typescript/claude-api/tool-use.md\">\n# Tool Use — TypeScript\n\nFor conceptual overview (tool definitions, tool choice, tips), see [shared/tool-use-concepts.md](../../shared/tool-use-concepts.md).\n\n## Tool Runner (Recommended)\n\n**Beta:** The tool runner is in beta in the TypeScript SDK.\n\nUse `betaZodTool` with Zod schemas to define tools with a `run` function, then pass them to `client.beta.messages.toolRunner()`:\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\nimport { betaZodTool } from \"@anthropic-ai/sdk/helpers/beta/zod\";\nimport { z } from \"zod\";\n\nconst client = new Anthropic();\n\nconst getWeather = betaZodTool({\n  name: \"get_weather\",\n  description: \"Get current weather for a location\",\n  inputSchema: z.object({\n    location: z.string().describe(\"City and state, e.g., San Francisco, CA\"),\n    unit: z.enum([\"celsius\", \"fahrenheit\"]).optional(),\n  }),\n  run: async (input) => {\n    // Your implementation here\n    return `72°F and sunny in ${input.location}`;\n  },\n});\n\n// The tool runner handles the agentic loop and returns the final message\nconst finalMessage = await client.beta.messages.toolRunner({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  tools: [getWeather],\n  messages: [{ role: \"user\", content: \"What's the weather in Paris?\" }],\n});\n\nconsole.log(finalMessage.content);\n```\n\n**Key benefits of the tool runner:**\n\n- No manual loop — the SDK handles calling tools and feeding results back\n- Type-safe tool inputs via Zod schemas\n- Tool schemas are generated automatically from Zod definitions\n- Iteration stops automatically when Claude has no more tool calls\n\n---\n\n## Manual Agentic Loop\n\nUse this when you need fine-grained control (custom logging, conditional tool execution, streaming individual iterations, human-in-the-loop approval):\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\nconst client = new Anthropic();\nconst tools: Anthropic.Tool[] = [...]; // Your tool definitions\nlet messages: Anthropic.MessageParam[] = [{ role: \"user\", content: userInput }];\n\nwhile (true) {\n  const response = await client.messages.create({\n    model: \"claude-opus-4-8\",\n    max_tokens: 16000,\n    tools: tools,\n    messages: messages,\n  });\n\n  if (response.stop_reason === \"end_turn\") break;\n\n  // Server-side tool hit iteration limit; append assistant turn and re-send to continue\n  if (response.stop_reason === \"pause_turn\") {\n    messages.push({ role: \"assistant\", content: response.content });\n    continue;\n  }\n\n  const toolUseBlocks = response.content.filter(\n    (b): b is Anthropic.ToolUseBlock => b.type === \"tool_use\",\n  );\n\n  messages.push({ role: \"assistant\", content: response.content });\n\n  const toolResults: Anthropic.ToolResultBlockParam[] = [];\n  for (const tool of toolUseBlocks) {\n    const result = await executeTool(tool.name, tool.input);\n    toolResults.push({\n      type: \"tool_result\",\n      tool_use_id: tool.id,\n      content: result,\n    });\n  }\n\n  messages.push({ role: \"user\", content: toolResults });\n}\n```\n\n### Streaming Manual Loop\n\nUse `client.messages.stream()` + `finalMessage()` instead of `.create()` when you need streaming within a manual loop. Text deltas are streamed on each iteration; `finalMessage()` collects the complete `Message` so you can inspect `stop_reason` and extract tool-use blocks:\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\nconst client = new Anthropic();\nconst tools: Anthropic.Tool[] = [...];\nlet messages: Anthropic.MessageParam[] = [{ role: \"user\", content: userInput }];\n\nwhile (true) {\n  const stream = client.messages.stream({\n    model: \"claude-opus-4-8\",\n    max_tokens: 64000,\n    tools,\n    messages,\n  });\n\n  // Stream text deltas on each iteration\n  stream.on(\"text\", (delta) => {\n    process.stdout.write(delta);\n  });\n\n  // finalMessage() resolves with the complete Message — no need to\n  // manually wire up .on(\"message\") / .on(\"error\") / .on(\"abort\")\n  const message = await stream.finalMessage();\n\n  if (message.stop_reason === \"end_turn\") break;\n\n  // Server-side tool hit iteration limit; append assistant turn and re-send to continue\n  if (message.stop_reason === \"pause_turn\") {\n    messages.push({ role: \"assistant\", content: message.content });\n    continue;\n  }\n\n  const toolUseBlocks = message.content.filter(\n    (b): b is Anthropic.ToolUseBlock => b.type === \"tool_use\",\n  );\n\n  messages.push({ role: \"assistant\", content: message.content });\n\n  const toolResults: Anthropic.ToolResultBlockParam[] = [];\n  for (const tool of toolUseBlocks) {\n    const result = await executeTool(tool.name, tool.input);\n    toolResults.push({\n      type: \"tool_result\",\n      tool_use_id: tool.id,\n      content: result,\n    });\n  }\n\n  messages.push({ role: \"user\", content: toolResults });\n}\n```\n\n> **Important:** Don't wrap `.on()` events in `new Promise()` to collect the final message — use `stream.finalMessage()` instead. The SDK handles all error/abort/completion states internally.\n\n> **Error handling in the loop:** Use the SDK's typed exceptions (e.g., `Anthropic.RateLimitError`, `Anthropic.APIError`) — see [Error Handling](./README.md#error-handling) for examples. Don't check error messages with string matching.\n\n> **SDK types:** Use `Anthropic.MessageParam`, `Anthropic.Tool`, `Anthropic.ToolUseBlock`, `Anthropic.ToolResultBlockParam`, `Anthropic.Message`, etc. for all API-related data structures. Don't redefine equivalent interfaces.\n\n---\n\n## Handling Tool Results\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  tools: tools,\n  messages: [{ role: \"user\", content: \"What's the weather in Paris?\" }],\n});\n\nfor (const block of response.content) {\n  if (block.type === \"tool_use\") {\n    const result = await executeTool(block.name, block.input);\n\n    const followup = await client.messages.create({\n      model: \"claude-opus-4-8\",\n      max_tokens: 16000,\n      tools: tools,\n      messages: [\n        { role: \"user\", content: \"What's the weather in Paris?\" },\n        { role: \"assistant\", content: response.content },\n        {\n          role: \"user\",\n          content: [\n            { type: \"tool_result\", tool_use_id: block.id, content: result },\n          ],\n        },\n      ],\n    });\n  }\n}\n```\n\n---\n\n## Tool Choice\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  tools: tools,\n  tool_choice: { type: \"tool\", name: \"get_weather\" },\n  messages: [{ role: \"user\", content: \"What's the weather in Paris?\" }],\n});\n```\n\n---\n\n## Server-Side Tools\n\nVersion-suffixed `type` literals; `name` is fixed per interface. Pass plain object literals — the `ToolUnion` type is satisfied structurally. **The `name`/`type` pair must match the interface**: mixing `str_replace_based_edit_tool` (20250728 name) with `text_editor_20250124` (which expects `str_replace_editor`) is a TS2322.\n\n**Don't type-annotate as `Tool[]`** — `Tool` is just the custom-tool variant. Let structural typing infer from the `tools` param, or annotate as `Anthropic.Messages.ToolUnion[]` if you must:\n\n```typescript\n// ✓ let inference work — no annotation\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  tools: [\n    { type: \"text_editor_20250728\", name: \"str_replace_based_edit_tool\" },\n    { type: \"bash_20250124\", name: \"bash\" },\n    { type: \"web_search_20260209\", name: \"web_search\" },\n    { type: \"code_execution_20260120\", name: \"code_execution\" },\n  ],\n  messages: [{ role: \"user\", content: \"...\" }],\n});\n\n// ✗ this is a TS2352 — Tool is the CUSTOM tool variant only\n// const tools: Anthropic.Tool[] = [{ type: \"text_editor_20250728\", ... }]\n```\n\n| Interface | `name` | `type` |\n|---|---|---|\n| `ToolTextEditor20250124` | `str_replace_editor` | `text_editor_20250124` |\n| `ToolTextEditor20250429` | `str_replace_based_edit_tool` | `text_editor_20250429` |\n| `ToolTextEditor20250728` | `str_replace_based_edit_tool` | `text_editor_20250728` |\n| `ToolBash20250124` | `bash` | `bash_20250124` |\n| `WebSearchTool20260209` | `web_search` | `web_search_20260209` |\n| `WebFetchTool20260209` | `web_fetch` | `web_fetch_20260209` |\n| `CodeExecutionTool20260120` | `code_execution` | `code_execution_20260120` |\n\n**Don't mix beta and non-beta types**: if you call `client.beta.messages.create()`, the response `content` is `BetaContentBlock[]` — you cannot pass that to a non-beta `ContentBlockParam[]` without narrowing each element.\n\n---\n\n\n## Code Execution\n\n### Basic Usage\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\nconst client = new Anthropic();\n\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content:\n        \"Calculate the mean and standard deviation of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\",\n    },\n  ],\n  tools: [{ type: \"code_execution_20260120\", name: \"code_execution\" }],\n});\n```\n\n### Reading Local Files (ESM note)\n\n`__dirname` doesn't exist in ES modules. For script-relative paths use `import.meta.url`:\n\n```typescript\nimport { readFileSync } from \"fs\";\nimport { fileURLToPath } from \"url\";\nimport { dirname, join } from \"path\";\n\nconst __dirname = dirname(fileURLToPath(import.meta.url));\nconst pdfBytes = readFileSync(join(__dirname, \"sample.pdf\"));\n```\n\nOr use a CWD-relative path if the script runs from a known directory: `readFileSync(\"./sample.pdf\")`.\n\n### Upload Files for Analysis\n\n```typescript\nimport Anthropic, { toFile } from \"@anthropic-ai/sdk\";\nimport { createReadStream } from \"fs\";\n\nconst client = new Anthropic();\n\n// 1. Upload a file\nconst uploaded = await client.beta.files.upload({\n  file: await toFile(createReadStream(\"sales_data.csv\"), undefined, {\n    type: \"text/csv\",\n  }),\n  betas: [\"files-api-2025-04-14\"],\n});\n\n// 2. Pass to code execution\n// Code execution is GA; Files API is still beta (pass via RequestOptions)\nconst response = await client.messages.create(\n  {\n    model: \"claude-opus-4-8\",\n    max_tokens: 16000,\n    messages: [\n      {\n        role: \"user\",\n        content: [\n          {\n            type: \"text\",\n            text: \"Analyze this sales data. Show trends and create a visualization.\",\n          },\n          { type: \"container_upload\", file_id: uploaded.id },\n        ],\n      },\n    ],\n    tools: [{ type: \"code_execution_20260120\", name: \"code_execution\" }],\n  },\n  { headers: { \"anthropic-beta\": \"files-api-2025-04-14\" } },\n);\n```\n\n### Retrieve Generated Files\n\n```typescript\nimport path from \"path\";\nimport fs from \"fs\";\n\nconst OUTPUT_DIR = \"./claude_outputs\";\nawait fs.promises.mkdir(OUTPUT_DIR, { recursive: true });\n\nfor (const block of response.content) {\n  if (block.type === \"bash_code_execution_tool_result\") {\n    const result = block.content;\n    if (result.type === \"bash_code_execution_result\" && result.content) {\n      for (const fileRef of result.content) {\n        if (fileRef.type === \"bash_code_execution_output\") {\n          const metadata = await client.beta.files.retrieveMetadata(\n            fileRef.file_id,\n          );\n          const downloadResponse = await client.beta.files.download(fileRef.file_id);\n          const fileBytes = Buffer.from(await downloadResponse.arrayBuffer());\n          const safeName = path.basename(metadata.filename);\n          if (!safeName || safeName === \".\" || safeName === \"..\") {\n            console.warn(`Skipping invalid filename: ${metadata.filename}`);\n            continue;\n          }\n          const outputPath = path.join(OUTPUT_DIR, safeName);\n          await fs.promises.writeFile(outputPath, fileBytes);\n          console.log(`Saved: ${outputPath}`);\n        }\n      }\n    }\n  }\n}\n```\n\n### Container Reuse\n\n```typescript\n// First request: set up environment\nconst response1 = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: \"Install tabulate and create data.json with sample user data\",\n    },\n  ],\n  tools: [{ type: \"code_execution_20260120\", name: \"code_execution\" }],\n});\n\n// Reuse container\n// container is nullable — set only when using server-side code execution\nconst containerId = response1.container!.id;\n\nconst response2 = await client.messages.create({\n  container: containerId,\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: \"Read data.json and display as a formatted table\",\n    },\n  ],\n  tools: [{ type: \"code_execution_20260120\", name: \"code_execution\" }],\n});\n```\n\n---\n\n## Memory Tool\n\n### Basic Usage\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: \"Remember that my preferred language is TypeScript.\",\n    },\n  ],\n  tools: [{ type: \"memory_20250818\", name: \"memory\" }],\n});\n```\n\n### SDK Memory Helper\n\nUse `betaMemoryTool` with a `MemoryToolHandlers` implementation:\n\n```typescript\nimport {\n  betaMemoryTool,\n  type MemoryToolHandlers,\n} from \"@anthropic-ai/sdk/helpers/beta/memory\";\n\nconst handlers: MemoryToolHandlers = {\n  async view(command) { ... },\n  async create(command) { ... },\n  async str_replace(command) { ... },\n  async insert(command) { ... },\n  async delete(command) { ... },\n  async rename(command) { ... },\n};\n\nconst memory = betaMemoryTool(handlers);\n\nconst runner = client.beta.messages.toolRunner({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  tools: [memory],\n  messages: [{ role: \"user\", content: \"Remember my preferences\" }],\n});\n\nfor await (const message of runner) {\n  console.log(message);\n}\n```\n\nFor full implementation examples, use WebFetch:\n\n- `https://github.com/anthropics/anthropic-sdk-typescript/blob/main/examples/tools-helpers-memory.ts`\n\n---\n\n## Structured Outputs\n\n### JSON Outputs (Zod — Recommended)\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\nimport { z } from \"zod\";\nimport { zodOutputFormat } from \"@anthropic-ai/sdk/helpers/zod\";\n\nconst ContactInfoSchema = z.object({\n  name: z.string(),\n  email: z.string(),\n  plan: z.string(),\n  interests: z.array(z.string()),\n  demo_requested: z.boolean(),\n});\n\nconst client = new Anthropic();\n\nconst response = await client.messages.parse({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content:\n        \"Extract: Jane Doe (jane@co.com) wants Enterprise, interested in API and SDKs, wants a demo.\",\n    },\n  ],\n  output_config: {\n    format: zodOutputFormat(ContactInfoSchema),\n  },\n});\n\n// parsed_output is null if parsing failed — assert or guard\nconsole.log(response.parsed_output!.name); // \"Jane Doe\"\n```\n\n### Strict Tool Use\n\n```typescript\nconst response = await client.messages.create({\n  model: \"claude-opus-4-8\",\n  max_tokens: 16000,\n  messages: [\n    {\n      role: \"user\",\n      content: \"Book a flight to Tokyo for 2 passengers on March 15\",\n    },\n  ],\n  tools: [\n    {\n      name: \"book_flight\",\n      description: \"Book a flight to a destination\",\n      strict: true,\n      input_schema: {\n        type: \"object\",\n        properties: {\n          destination: { type: \"string\" },\n          date: { type: \"string\", format: \"date\" },\n          passengers: {\n            type: \"integer\",\n            enum: [1, 2, 3, 4, 5, 6, 7, 8],\n          },\n        },\n        required: [\"destination\", \"date\", \"passengers\"],\n        additionalProperties: false,\n      },\n    },\n  ],\n});\n```\n</doc>\n\n<doc path=\"typescript/managed-agents/README.md\">\n# Managed Agents — TypeScript\n\n> **Bindings not shown here:** This README covers the most common managed-agents flows for TypeScript. If you need a class, method, namespace, field, or behavior that isn't shown, WebFetch the TypeScript SDK repo **or the relevant docs page** from `shared/live-sources.md` rather than guess. Do not extrapolate from cURL shapes or another language's SDK.\n\n> **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML — its URL is in `shared/live-sources.md`. The examples below show in-code creation for completeness; in production the create call belongs in setup, not in the request path.\n\n## Installation\n\n```bash\nnpm install @anthropic-ai/sdk\n```\n\n## Client Initialization\n\n```typescript\nimport Anthropic from \"@anthropic-ai/sdk\";\n\n// Default — resolves credentials from the environment:\n// ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.\n// Prefer this for local dev; don't hardcode a key.\nconst client = new Anthropic();\n\n// Explicit API key (only when you must inject a specific key)\nconst client = new Anthropic({ apiKey: \"your-api-key\" });\n```\n\n---\n\n## Create an Environment\n\n```typescript\nconst environment = await client.beta.environments.create(\n  {\n    name: \"my-dev-env\",\n    config: {\n      type: \"cloud\",\n      networking: { type: \"unrestricted\" },\n    },\n  },\n);\nconsole.log(environment.id); // env_...\n```\n\n---\n\n## Create an Agent (required first step)\n\n> ⚠️ **There is no inline agent config.** `model`/`system`/`tools` live on the agent object, not the session. Always start with `agents.create()` — the session only takes `agent: { type: \"agent\", id: agent.id }`.\n\n### Minimal\n\n```typescript\n// 1. Create the agent (reusable, versioned)\nconst agent = await client.beta.agents.create(\n  {\n    name: \"Coding Assistant\",\n    model: \"claude-opus-4-8\",\n    tools: [{ type: \"agent_toolset_20260401\", default_config: { enabled: true } }],\n  },\n);\n\n// 2. Start a session\nconst session = await client.beta.sessions.create(\n  {\n    agent: { type: \"agent\", id: agent.id, version: agent.version },\n    environment_id: environment.id,\n  },\n);\nconsole.log(session.id, session.status);\n```\n\n### With system prompt and custom tools\n\n```typescript\nconst agent = await client.beta.agents.create(\n  {\n    name: \"Code Reviewer\",\n    model: \"claude-opus-4-8\",\n    system: \"You are a senior code reviewer.\",\n    tools: [\n      { type: \"agent_toolset_20260401\", default_config: { enabled: true } },\n      {\n        type: \"custom\",\n        name: \"run_tests\",\n        description: \"Run the test suite\",\n        input_schema: {\n          type: \"object\",\n          properties: {\n            test_path: { type: \"string\", description: \"Path to test file\" },\n          },\n          required: [\"test_path\"],\n        },\n      },\n    ],\n  },\n);\n\nconst session = await client.beta.sessions.create(\n  {\n    agent: { type: \"agent\", id: agent.id, version: agent.version },\n    environment_id: environment.id,\n    title: \"Code review session\",\n    resources: [\n      {\n        type: \"github_repository\",\n        url: \"https://github.com/owner/repo\",\n        mount_path: \"/workspace/repo\",\n        authorization_token: process.env.GITHUB_TOKEN,\n        branch: \"main\",\n      },\n    ],\n  },\n);\n```\n\n---\n\n## Send a User Message\n\n```typescript\nawait client.beta.sessions.events.send(\n  session.id,\n  {\n    events: [\n      {\n        type: \"user.message\",\n        content: [{ type: \"text\", text: \"Review the auth module\" }],\n      },\n    ],\n  },\n);\n```\n\n> 💡 **Stream-first:** Open the stream *before* (or concurrently with) sending the message. The stream only delivers events that occur after it opens — stream-after-send means early events arrive buffered in one batch. See [Steering Patterns](../../shared/managed-agents-events.md#steering-patterns).\n\n---\n\n## Stream Events (SSE)\n\n```typescript\n// Stream-first: open stream and send concurrently\nconst [events] = await Promise.all([\n  collectStream(session.id),\n  client.beta.sessions.events.send(\n    session.id,\n    { events: [{ type: \"user.message\", content: [{ type: \"text\", text: \"...\" }] }] },\n  ),\n]);\n\n// Standalone stream iteration:\nconst stream = await client.beta.sessions.events.stream(\n  session.id,\n);\n\nfor await (const event of stream) {\n  switch (event.type) {\n    case \"agent.message\":\n      for (const block of event.content) {\n        if (block.type === \"text\") {\n          process.stdout.write(block.text);\n        }\n      }\n      break;\n    case \"agent.custom_tool_use\":\n      // Custom tool invocation — session is now idle\n      console.log(`\\nCustom tool call: ${event.name}`);\n      console.log(`Input: ${JSON.stringify(event.input)}`);\n      break;\n    case \"session.status_idle\":\n      console.log(\"\\n--- Agent idle ---\");\n      break;\n    case \"session.status_terminated\":\n      console.log(\"\\n--- Session terminated ---\");\n      break;\n  }\n}\n```\n\n---\n\n## Provide Custom Tool Result\n\n```typescript\nawait client.beta.sessions.events.send(\n  session.id,\n  {\n    events: [\n      {\n        type: \"user.custom_tool_result\",\n        custom_tool_use_id: \"sevt_abc123\",\n        content: [{ type: \"text\", text: \"All 42 tests passed.\" }],\n      },\n    ],\n  },\n);\n```\n\n---\n\n## Poll Events\n\n```typescript\nconst events = await client.beta.sessions.events.list(\n  session.id,\n);\nfor (const event of events.data) {\n  console.log(`${event.type}: ${event.id}`);\n}\n```\n\n---\n\n## Full Streaming Loop with Custom Tools\n\n```typescript\nfunction runCustomTool(toolName: string, toolInput: unknown): string {\n  if (toolName === \"run_tests\") {\n    // Your tool implementation here\n    return \"All tests passed.\";\n  }\n  return `Unknown tool: ${toolName}`;\n}\n\nasync function runSession(client: Anthropic, sessionId: string) {\n  while (true) {\n    const stream = await client.beta.sessions.events.stream(\n      sessionId,\n    );\n\n    const toolCalls: Anthropic.Beta.Sessions.BetaManagedAgentsAgentCustomToolUseEvent[] = [];\n\n    for await (const event of stream) {\n      if (event.type === \"agent.message\") {\n        for (const block of event.content) {\n          if (block.type === \"text\") {\n            process.stdout.write(block.text);\n          }\n        }\n      } else if (event.type === \"agent.custom_tool_use\") {\n        toolCalls.push(event);\n      } else if (event.type === \"session.status_idle\") {\n        break;\n      } else if (event.type === \"session.status_terminated\") {\n        return;\n      }\n    }\n\n    if (toolCalls.length === 0) break;\n\n    // Process custom tool calls\n    const results = toolCalls.map((call) => ({\n      type: \"user.custom_tool_result\" as const,\n      custom_tool_use_id: call.id,\n      content: [{ type: \"text\" as const, text: runCustomTool(call.name, call.input) }],\n    }));\n\n    await client.beta.sessions.events.send(\n      sessionId,\n      { events: results },\n    );\n  }\n}\n```\n\n---\n\n## Upload a File\n\n```typescript\nimport fs from \"fs\";\n\nconst file = await client.beta.files.upload({\n  file: fs.createReadStream(\"data.csv\"),\n  purpose: \"agent\",\n});\n\n// Use in a session\nconst session = await client.beta.sessions.create(\n  {\n    agent: { type: \"agent\", id: agent.id, version: agent.version },\n    environment_id: environment.id,\n    resources: [{ type: \"file\", file_id: file.id, mount_path: \"/workspace/data.csv\" }],\n  },\n);\n```\n\n---\n\n## List and Download Session Files\n\nList files the agent wrote to `/mnt/session/outputs/` during a session, then download them.\n\n```typescript\nimport fs from \"fs\";\n\n// List files associated with a session\nconst files = await client.beta.files.list({\n  scope_id: session.id,\n  betas: [\"managed-agents-2026-04-01\"],\n});\nfor (const f of files.data) {\n  console.log(f.filename, f.size_bytes);\n\n  // Download and save to disk\n  const resp = await client.beta.files.download(f.id);\n  const buffer = Buffer.from(await resp.arrayBuffer());\n  fs.writeFileSync(f.filename, buffer);\n}\n```\n\n> 💡 There's a brief indexing lag (~1–3s) between `session.status_idle` and output files appearing in `files.list`. Retry once or twice if the list is empty.\n\n---\n\n## Session Management\n\n```typescript\n// Get session details\nconst session = await client.beta.sessions.retrieve(\"sesn_011CZxAbc123Def456\");\nconsole.log(session.status, session.usage);\n\n// List sessions\nconst sessions = await client.beta.sessions.list();\n\n// Delete a session\nawait client.beta.sessions.delete(\"sesn_011CZxAbc123Def456\");\n\n// Archive a session\nawait client.beta.sessions.archive(\"sesn_011CZxAbc123Def456\");\n```\n\n---\n\n## MCP Server Integration\n\n```typescript\n// Agent declares MCP server (no auth here — auth goes in a vault)\nconst agent = await client.beta.agents.create({\n  name: \"MCP Agent\",\n  model: \"claude-opus-4-8\",\n  mcp_servers: [\n    { type: \"url\", name: \"my-tools\", url: \"https://my-mcp-server.example.com/sse\" },\n  ],\n  tools: [\n    { type: \"agent_toolset_20260401\", default_config: { enabled: true } },\n    { type: \"mcp_toolset\", mcp_server_name: \"my-tools\" },\n  ],\n});\n\n// Session attaches vault(s) containing credentials for those MCP server URLs\nconst session = await client.beta.sessions.create({\n  agent: agent.id,\n  environment_id: environment.id,\n  vault_ids: [vault.id],\n});\n```\n\nSee `shared/managed-agents-tools.md` §Vaults for creating vaults and adding credentials.\n</doc>\n\n## When to Use WebFetch\n\nUse WebFetch to get the latest documentation when:\n\n- User asks for \"latest\" or \"current\" information\n- Cached data seems incorrect\n- User asks about features not covered here\n\nLive documentation URLs are in `shared/live-sources.md`.\n\n## Common Pitfalls\n\n- Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.\n- **Opus 4.8 / 4.7 thinking:** Adaptive only. `thinking: {type: \"enabled\", budget_tokens: N}` returns 400 — `budget_tokens` is fully removed (along with `temperature`, `top_p`, `top_k`). Use `thinking: {type: \"adaptive\"}`. Opus 4.8 inherits this surface from 4.7 with no new breaking changes.\n- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: \"adaptive\"}` — do NOT use `budget_tokens` for new 4.6 code (deprecated on both Opus 4.6 and Sonnet 4.6; for gradual migration of existing code, see the transitional escape hatch in `shared/model-migration.md` — note this carve-out does not apply to Opus 4.7 or 4.8). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.\n- **4.6/4.7/4.8 family prefill removed:** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead.\n- **Confirm migration scope before editing:** When a user asks to migrate code to a newer Claude model without naming a specific file, directory, or file list, **ask which scope to apply first** — the entire working directory, a specific subdirectory, or a specific set of files. Do not start editing until the user confirms. Imperative phrasings like \"migrate my codebase\", \"move my project to X\", \"upgrade to Sonnet 4.6\", or bare \"migrate to Opus 4.8\" are **still ambiguous** — they tell you what to do but not where, so ask. Proceed without asking only when the prompt names an exact file, a specific directory, or an explicit file list (\"migrate `app.py`\", \"migrate everything under `services/`\", \"update `a.py` and `b.py`\"). See `shared/model-migration.md` Step 0.\n- **`max_tokens` defaults:** Don't lowball `max_tokens` — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to `~16000` (keeps responses under SDK HTTP timeouts). For streaming requests, default to `~64000` (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (`~256`), cost caps, deliberately short outputs, or **`max_tokens: 0`** for cache pre-warming (see `shared/prompt-caching.md` → Pre-warming).\n- **128K output tokens:** Opus 4.6, Opus 4.7, and Opus 4.8 support up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.\n- **Tool call JSON parsing (4.6/4.7/4.8 family):** Opus 4.6, Opus 4.7, Opus 4.8, and Sonnet 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.\n- **Structured outputs (all models):** Use `output_config: {format: {...}}` instead of the deprecated `output_format` parameter on `messages.create()`. This is a general API change, not 4.6-specific.\n- **Don't reimplement SDK functionality:** The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use `stream.finalMessage()` instead of wrapping `.on()` events in `new Promise()`; use typed exception classes (`Anthropic.RateLimitError`, etc.) instead of string-matching error messages; use SDK types (`Anthropic.MessageParam`, `Anthropic.Tool`, `Anthropic.Message`, etc.) instead of redefining equivalent interfaces.\n- **Don't define custom types for SDK data structures:** The SDK exports types for all API objects. Use `Anthropic.MessageParam` for messages, `Anthropic.Tool` for tool definitions, `Anthropic.ToolUseBlock` / `Anthropic.ToolResultBlockParam` for tool results, `Anthropic.Message` for responses. Defining your own `interface ChatMessage { role: string; content: unknown }` duplicates what the SDK already provides and loses type safety.\n- **Report and document output:** For tasks that produce reports, documents, or visualizations, the code execution sandbox has `python-docx`, `python-pptx`, `matplotlib`, `pillow`, and `pypdf` pre-installed. Claude can generate formatted files (DOCX, PDF, charts) and return them via the Files API — consider this for \"report\" or \"document\" type requests instead of plain stdout text."}]},"isMeta":true,"uuid":"ecdd2775-4c0a-4d32-b17b-b042bc26f38e","timestamp":"2026-06-07T15:59:10.330Z","sourceToolUseID":"toolu_01FMwc5QfjwywcmUuuxduKqr","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"ecdd2775-4c0a-4d32-b17b-b042bc26f38e","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01TRbmwtkGLsZdmau4TKYTXB","type":"message","role":"assistant","content":[{"type":"text","text":"I'll verify the current AI SDK v6 / AI Gateway API surface against live Vercel docs before writing the design. Let me search the key areas in parallel."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2027,"cache_creation_input_tokens":200455,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":200455,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":{"cache_miss_reason":{"type":"tools_changed","cache_missed_input_tokens":19149}}},"requestId":"req_011CbpBCCxE5dsBKajhgvMLh","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"aa274de1-8c8d-4b3f-84de-34be79d51c66","timestamp":"2026-06-07T15:59:15.362Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"aa274de1-8c8d-4b3f-84de-34be79d51c66","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01TRbmwtkGLsZdmau4TKYTXB","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01CeczXNDuiQWcnzQnWZVEwG","name":"mcp__claude_ai_Vercel__search_vercel_documentation","input":{"topic":"AI Gateway model string anthropic provider AI SDK v6 authentication OIDC AI_GATEWAY_API_KEY","tokens":3000},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2027,"cache_creation_input_tokens":200455,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":200455,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":{"cache_miss_reason":{"type":"tools_changed","cache_missed_input_tokens":19149}}},"requestId":"req_011CbpBCCxE5dsBKajhgvMLh","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"7f7b6d26-0bdb-4034-a39a-53a6df644b05","timestamp":"2026-06-07T15:59:16.776Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"7f7b6d26-0bdb-4034-a39a-53a6df644b05","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01TRbmwtkGLsZdmau4TKYTXB","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_018wfrmQAwPwaP9FSwTirPHM","name":"mcp__claude_ai_Vercel__search_vercel_documentation","input":{"topic":"AI SDK streamText convertToModelMessages toUIMessageStreamResponse tools useChat DefaultChatTransport","tokens":3000},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2027,"cache_creation_input_tokens":200455,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":200455,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":{"cache_miss_reason":{"type":"tools_changed","cache_missed_input_tokens":19149}}},"requestId":"req_011CbpBCCxE5dsBKajhgvMLh","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"286afd33-da12-4980-b3b9-c588110d976c","timestamp":"2026-06-07T15:59:17.737Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"7f7b6d26-0bdb-4034-a39a-53a6df644b05","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01CeczXNDuiQWcnzQnWZVEwG","type":"tool_result","content":[{"type":"text","text":"### Make an Anthropic API request with Python\n\nSource: https://vercel.com/docs/ai-gateway/getting-started/text\n\nUse the Anthropic SDK in Python to send a message to an AI model via the AI Gateway. Ensure `AI_GATEWAY_API_KEY` is set in your environment.\n\n```python\nimport os\nimport anthropic\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\nclient = anthropic.Anthropic(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh',\n)\n\nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    messages=[\n        {\n            'role': 'user',\n            'content': 'Invent a new holiday and describe its traditions.',\n        },\n    ],\n)\n\nprint(message.content[0].text)\n```\n\n--------------------------------\n\n### Integrate Anthropic SDK with AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-compat\n\nThese examples demonstrate how to configure the Anthropic SDK to use the AI Gateway by setting the `baseURL` and providing an API key from environment variables.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n \nconst anthropic = new Anthropic({\n  apiKey: process.env.AI_GATEWAY_API_KEY,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n \nconst message = await anthropic.messages.create({\n  model: 'anthropic/claude-opus-4.7',\n  max_tokens: 1024,\n  messages: [{ role: 'user', content: 'Hello, world!' }],\n});\n```\n\n```python\nimport os\nimport anthropic\n \nclient = anthropic.Anthropic(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh'\n)\n \nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    messages=[\n        {'role': 'user', 'content': 'Hello, world!'}\n    ]\n)\n```\n\n--------------------------------\n\n### Make an Anthropic API request with TypeScript\n\nSource: https://vercel.com/docs/ai-gateway/getting-started/text\n\nUse the Anthropic SDK in TypeScript to send a message to an AI model via the AI Gateway. Ensure `AI_GATEWAY_API_KEY` is set in your environment.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\nimport 'dotenv/config';\n\nconst client = new Anthropic({\n  apiKey: process.env.AI_GATEWAY_API_KEY,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n\nasync function main() {\n  const message = await client.messages.create({\n    model: 'anthropic/claude-opus-4.7',\n    max_tokens: 1024,\n    messages: [\n      {\n        role: 'user',\n        content: 'Invent a new holiday and describe its traditions.',\n      },\n    ],\n  });\n\n  console.log(message.content[0].text);\n}\n\nmain().catch(console.error);\n```\n\n--------------------------------\n\n### Integrate Anthropic SDK with AI Gateway (Python)\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-messages-api\n\nInitialize the Anthropic Python SDK client, configuring its `base_url` to the Vercel AI Gateway and using an AI Gateway API key for authentication.\n\n```python\nimport os\nimport anthropic\n\nclient = anthropic.Anthropic(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh'\n)\n\nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    messages=[\n        {'role': 'user', 'content': 'Hello, world!'}\n    ]\n)\n```\n\n--------------------------------\n\n### Integrate Anthropic SDK with AI Gateway (TypeScript)\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-messages-api\n\nInitialize the Anthropic TypeScript SDK client, pointing its `baseURL` to the Vercel AI Gateway and using an AI Gateway API key for authentication.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n\nconst anthropic = new Anthropic({\n  apiKey: process.env.AI_GATEWAY_API_KEY,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n\nconst message = await anthropic.messages.create({\n  model: 'anthropic/claude-opus-4.7',\n  max_tokens: 1024,\n  messages: [{ role: 'user', content: 'Hello, world!' }],\n});\n```\n\n--------------------------------\n\n### Quick Start: Anthropic Messages API with AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/python\n\nSends a user message to the AI Gateway via the Anthropic SDK, specifying a model and max tokens, then prints the first text block of the model's response.\n\n```python\nimport os\nimport anthropic\n\nclient = anthropic.Anthropic(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh'\n)\n\nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    messages=[\n        {'role': 'user', 'content': 'Explain quantum computing in one paragraph.'}\n    ]\n)\n\nprint(message.content[0].text)\n```\n\n--------------------------------\n\n### Set Global Default AI SDK Gateway Provider with Attribution Headers (TypeScript)\n\nSource: https://vercel.com/docs/ai-gateway/ecosystem/app-attribution\n\nAssign a custom AI SDK Gateway provider, configured with attribution headers, as the global default to enable plain string model IDs to use this provider.\n\n```typescript\nimport { streamText } from 'ai';\nimport { createGateway } from '@ai-sdk/gateway';\n\nconst gateway = createGateway({\n  headers: {\n    'http-referer': 'https://myapp.vercel.app',\n    'x-title': 'MyApp',\n  },\n});\n\n// Set your provider as the default to allow plain-string model id creation with this instance\nglobalThis.AI_SDK_DEFAULT_PROVIDER = gateway;\n\n// Now you can use plain string model IDs and they'll use your custom provider\nconst result = streamText({\n  model: 'anthropic/claude-opus-4.7', // Uses the gateway provider with headers\n  prompt: 'Hello, world!',\n});\n\nfor await (const part of result.textStream) {\n  process.stdout.write(part);\n}\n```\n\n--------------------------------\n\n### Configure BYOK for Anthropic in Python\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions/advanced\n\nUse this snippet to provide request-scoped Anthropic API keys via the AI Gateway, overriding any cached BYOK credentials. Ensure `AI_GATEWAY_API_KEY` or `VERCEL_OIDC_TOKEN` and `ANTHROPIC_API_KEY` environment variables are set.\n\n```python\nimport os\nfrom openai import OpenAI\n\napi_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')\n\nclient = OpenAI(\n    api_key=api_key,\n    base_url='https://ai-gateway.vercel.sh/v1'\n)\n\ncompletion = client.chat.completions.create(\n    model='anthropic/claude-opus-4.7',\n    messages=[\n        {\n            'role': 'user',\n            'content': 'Hello, world!'\n        }\n    ],\n    extra_body={\n        'providerOptions': {\n            'gateway': {\n                'byok': {\n                    'anthropic': [{'apiKey': os.getenv('ANTHROPIC_API_KEY')}]\n                }\n            }\n        }\n    }\n)\n\nprint(completion.choices[0].message.content)\n```\n\n--------------------------------\n\n### Specify Model and Provider with AI SDK String\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers\n\nUse a plain string to specify a model and provider directly in an AI SDK function call, allowing per-request model switching without affecting other parts of the application.\n\n```typescript\nimport { generateText } from 'ai';\nimport { NextRequest } from 'next/server';\n\nexport async function GET() {\n  const result = await generateText({\n    model: 'xai/grok-4.3',\n    prompt: 'Tell me the history of the San Francisco Mission-style burrito.',\n  });\n  return Response.json(result);\n}\n```\n\n--------------------------------\n\n### Send AI Gateway Request with Custom Reporting Headers (Python SDK)\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/custom-reporting\n\nUse the Anthropic SDK with AI Gateway to include custom reporting tags and user IDs in the request headers for analytics.\n\n```python\nimport os\nfrom anthropic import Anthropic\n\nclient = Anthropic(\n    api_key=os.environ['AI_GATEWAY_API_KEY'],\n    base_url='https://ai-gateway.vercel.sh',\n    default_headers={\n        'ai-reporting-tags': 'team:billing,feature:chat,env:prod',\n        'ai-reporting-user': 'user-12345',\n    },\n)\n\nmessage = client.messages.create(\n    model='anthropic/claude-sonnet-4.6',\n    max_tokens=1024,\n    messages=[{'role': 'user', 'content': 'Tell me about San Francisco.'}],\n)\n```\n\n--------------------------------\n\n### Specify Model with Default AI Gateway Provider Instance\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers\n\nUse the default `gateway` provider instance from `@ai-sdk/gateway` to specify a model in an AI SDK function call, enabling dynamic model changes.\n\n```typescript\nimport { generateText } from 'ai';\nimport { gateway } from '@ai-sdk/gateway';\nimport { NextRequest } from 'next/server';\n\nexport async function GET() {\n  const result = await generateText({\n    model: gateway('anthropic/claude-opus-4.7'),\n    prompt: 'Tell me the history of the San Francisco Mission-style burrito.',\n  });\n  return Response.json(result);\n}\n```\n\n--------------------------------\n\n### Create Anthropic Messages with Anthropic SDK\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis\n\nConfigure the Anthropic SDK to use the AI Gateway base URL and send a message completion request.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n\nconst client = new Anthropic({\n  apiKey: process.env.AI_GATEWAY_API_KEY,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n\nconst message = await client.messages.create({\n  model: 'anthropic/claude-opus-4.7',\n  max_tokens: 1024,\n  messages: [{ role: 'user', content: 'Hello!' }],\n});\n```\n\n--------------------------------\n\n### Query Model Uptime via AI Gateway API\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/uptime\n\nUse this curl command to retrieve uptime metrics for a specific model and its providers. Replace the model ID and ensure AI_GATEWAY_API_KEY is set for authentication.\n\n```bash\ncurl -i https://ai-gateway.vercel.sh/v1/models/anthropic/claude-opus-4.7/endpoints \\\n  -H \"Authorization: Bearer $AI_GATEWAY_API_KEY\"\n```\n\n```json\n{\n  \"data\": {\n    \"id\": \"anthropic/claude-opus-4.7\",\n    \"endpoints\": [\n      {\n        \"provider_name\": \"anthropic\",\n        \"uptime_last_15m\": 100,\n        \"uptime_last_1h\": 99.8,\n        \"uptime_last_1d\": 99.6\n      },\n      {\n        \"provider_name\": \"bedrock\",\n        \"uptime_last_15m\": 100,\n        \"uptime_last_1h\": 100,\n        \"uptime_last_1d\": 99.9\n      },\n      {\n        \"provider_name\": \"vertex\",\n        \"uptime_last_15m\": 100,\n        \"uptime_last_1h\": 100,\n        \"uptime_last_1d\": 99.1\n      }\n    ]\n  }\n}\n```\n\n--------------------------------\n\n### Use OIDC Authentication with AI Gateway (TypeScript)\n\nSource: https://vercel.com/docs/ai-gateway/authentication-and-byok/oidc\n\nThis example demonstrates using the AI Gateway with OIDC authentication, eliminating the need for explicit API key management. The `generateText` function interacts with a specified model and prompt.\n\n```typescript\nimport { generateText } from 'ai';\n\nexport async function GET() {\n  const result = await generateText({\n    model: 'xai/grok-4.3',\n    prompt: 'Why is the sky blue?',\n  });\n  return Response.json(result);\n}\n```"}]}]},"uuid":"bfdf764b-a991-413a-93e4-b9de5f4feae9","timestamp":"2026-06-07T15:59:17.913Z","sourceToolAssistantUUID":"7f7b6d26-0bdb-4034-a39a-53a6df644b05","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"bfdf764b-a991-413a-93e4-b9de5f4feae9","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01TRbmwtkGLsZdmau4TKYTXB","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01GZJ4XcoLyoQfDXxzNBZoVf","name":"mcp__claude_ai_Vercel__search_vercel_documentation","input":{"topic":"Anthropic prompt caching cacheControl providerOptions ephemeral AI SDK","tokens":2500},"caller":{"type":"direct"}}],"stop_reason":"tool_use","stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2027,"cache_creation_input_tokens":200455,"cache_read_input_tokens":0,"output_tokens":439,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":200455},"inference_geo":"not_available","iterations":[{"input_tokens":2027,"output_tokens":439,"cache_read_input_tokens":0,"cache_creation_input_tokens":200455,"cache_creation":{"ephemeral_5m_input_tokens":200455,"ephemeral_1h_input_tokens":0},"type":"message"}],"speed":"standard"},"diagnostics":{"cache_miss_reason":{"type":"tools_changed","cache_missed_input_tokens":19149}}},"requestId":"req_011CbpBCCxE5dsBKajhgvMLh","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","type":"assistant","uuid":"8779c983-e7b4-4905-a4af-9b4c89106a2d","timestamp":"2026-06-07T15:59:18.217Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"286afd33-da12-4980-b3b9-c588110d976c","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_018wfrmQAwPwaP9FSwTirPHM","type":"tool_result","content":[{"type":"text","text":"### Set App Attribution Headers in AI SDK streamText (TypeScript)\n\nSource: https://vercel.com/docs/ai-gateway/ecosystem/app-attribution\n\nSet `http-referer` and `x-title` headers directly within the `streamText` function call when using the AI SDK to attribute requests.\n\n```typescript\nimport { streamText } from 'ai';\n\nconst result = streamText({\n  headers: {\n    'http-referer': 'https://myapp.vercel.app',\n    'x-title': 'MyApp',\n  },\n  model: 'anthropic/claude-opus-4.7',\n  prompt: 'Hello, world!',\n});\n\nfor await (const part of result.textStream) {\n  process.stdout.write(part);\n}\n```\n\n--------------------------------\n\n### Stream Text with AI SDK and ZDR in TypeScript\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/zdr\n\nUse \"streamText\" from the AI SDK with \"zeroDataRetention: true\" in \"providerOptions.gateway\" to process sensitive data without retention.\n\n```typescript\nimport type { GatewayProviderOptions } from '@ai-sdk/gateway';\nimport { streamText } from 'ai';\n\nexport async function POST(request: Request) {\n  const result = streamText({\n    model: 'moonshotai/kimi-k2.5',\n    prompt: 'Analyze this sensitive business data and provide insights.',\n    providerOptions: {\n      gateway: {\n        zeroDataRetention: true,\n      } satisfies GatewayProviderOptions,\n    },\n  });\n\n  return result.toDataStreamResponse();\n}\n```\n\n--------------------------------\n\n### Stream text responses with AI SDK\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/ai-sdk\n\nStream responses token-by-token using 'streamText' for real-time output, iterating over the 'textStream'.\n\n```typescript\nimport { streamText } from 'ai';\n\nconst result = streamText({\n  model: 'openai/gpt-5.5',\n  prompt: 'Write a short story about a robot discovering music.',\n});\n\nfor await (const textPart of result.textStream) {\n  process.stdout.write(textPart);\n}\n```\n\n--------------------------------\n\n### AI SDK streamText with Disallow Prompt Training\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/disallow-prompt-training\n\nUse `streamText` with `disallowPromptTraining: true` in `providerOptions.gateway` to prevent model training on prompts.\n\n```typescript\nimport type { GatewayProviderOptions } from '@ai-sdk/gateway';\nimport { streamText } from 'ai';\n\nexport async function POST(request: Request) {\n  const result = streamText({\n    model: 'zai/glm-4.7',\n    prompt: 'Analyze this proprietary business strategy.',\n    providerOptions: {\n      gateway: {\n        disallowPromptTraining: true,\n      } satisfies GatewayProviderOptions,\n    },\n  });\n\n  return result.toDataStreamResponse();\n}\n```\n\n--------------------------------\n\n### Create a Streaming Text Function with AI SDK\n\nSource: https://vercel.com/docs/functions/streaming-functions\n\nDefine a GET endpoint that uses the Vercel AI SDK's 'streamText' function to generate and stream a text response from an OpenAI model. Ensure your OpenAI API key is configured in '.env.local'.\n\n```ts\nimport { streamText } from 'ai';\n\n// This method must be named GET\nexport async function GET() {\n  // Make a request to OpenAI's API based on\n  // a placeholder prompt\n  const response = streamText({\n    model: 'openai/gpt-4o-mini',\n    messages: [{ role: 'user', content: 'What is the capital of Australia?' }],\n  });\n  // Respond with the stream\n  return response.toTextStreamResponse({\n    headers: {\n      'Content-Type': 'text/event-stream',\n    },\n  });\n}\n```\n\n```js\nimport { streamText } from 'ai';\n\n// This method must be named GET\nexport async function GET() {\n  // Make a request to OpenAI's API based on\n  // a placeholder prompt\n  const response = streamText({\n    model: 'openai/gpt-4o-mini',\n    messages: [{ role: 'user', content: 'What is the capital of Australia?' }],\n  });\n  // Respond with the stream\n  return response.toTextStreamResponse({\n    headers: {\n      'Content-Type': 'text/event-stream',\n    },\n  });\n}\n```\n\n```ts\n// Streaming Functions must be defined in an\n// app directory, even if the rest of your app\n// is in the pages directory.\nimport { streamText } from 'ai';\n\n// This method must be named GET\nexport async function GET() {\n  // Make a request to OpenAI's API based on\n  // a placeholder prompt\n  const response = streamText({\n    model: 'openai/gpt-4o-mini',\n    messages: [{ role: 'user', content: 'What is the capital of Australia?' }],\n  });\n  // Respond with the stream\n  return response.toTextStreamResponse({\n    headers: {\n      'Content-Type': 'text/event-stream',\n    },\n  });\n}\n```\n\n```js\n// Streaming Functions must be defined in an\n// app directory, even if the rest of your app\n// is in the pages directory.\nimport { streamText } from 'ai';\n\n// This method must be named GET\nexport async function GET() {\n  // Make a request to OpenAI's API based on\n  // a placeholder prompt\n  const response = streamText({\n    model: 'openai/gpt-4o-mini',\n    messages: [{ role: 'user', content: 'What is the capital of Australia?' }],\n  });\n  // Respond with the stream\n  return response.toTextStreamResponse({\n    headers: {\n      'Content-Type': 'text/event-stream',\n    },\n  });\n}\n```\n\n```ts\nimport { streamText } from 'ai';\n\n// This method must be named GET\nexport async function GET() {\n  // Make a request to OpenAI's API based on\n  // a placeholder prompt\n  const response = streamText({\n    model: 'openai/gpt-4o-mini',\n    messages: [{ role: 'user', content: 'What is the capital of Australia?' }],\n  });\n  // Respond with the stream\n  return response.toTextStreamResponse({\n    headers: {\n      'Content-Type': 'text/event-stream',\n    },\n  });\n}\n```\n\n```js\nimport { streamText } from 'ai';\n\n// This method must be named GET\nexport async function GET() {\n  // Make a request to OpenAI's API based on\n  // a placeholder prompt\n  const response = streamText({\n    model: 'openai/gpt-4o-mini',\n    messages: [{ role: 'user', content: 'What is the capital of Australia?' }],\n  });\n  // Respond with the stream\n  return response.toTextStreamResponse({\n    headers: {\n      'Content-Type': 'text/event-stream',\n    },\n  });\n}\n```\n\n--------------------------------\n\n### AI SDK Stream Text with Automatic Caching\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/automatic-caching\n\nUse `caching: 'auto'` within `providerOptions.gateway` in the AI SDK's `streamText` function to enable automatic caching for providers like Anthropic.\n\n```typescript\nimport { streamText } from 'ai';\n\nexport async function POST(request: Request) {\n  const { prompt } = await request.json();\n\n  const result = streamText({\n    model: 'anthropic/claude-sonnet-4.6',\n    system: 'You are a helpful assistant with access to a large knowledge base...',\n    prompt,\n    providerOptions: {\n      gateway: {\n        caching: 'auto',\n      },\n    },\n  });\n\n  return result.toUIMessageStreamResponse();\n}\n```\n\n--------------------------------\n\n### Implement Tool Calling with AI SDK\n\nSource: https://vercel.com/docs/ai-sdk\n\nExample of integrating external tools with the AI SDK to perform discrete tasks, such as fetching weather information based on user prompts.\n\n```ts\nimport { generateText, tool } from 'ai';\n\nconst { text } = await generateText({\n  model: 'openai/gpt-5.2',\n  prompt: 'What is the weather like today in San Francisco?',\n  tools: {\n    getWeather: tool({\n      description: 'Get the weather in a location',\n      inputSchema: z.object({\n        location: z.string().describe('The location to get the weather for'),\n      }),\n      execute: async ({ location }) => ({\n        location,\n        temperature: 72 + Math.floor(Math.random() * 21) - 10,\n      }),\n    }),\n  },\n});\n```\n\n--------------------------------\n\n### Stream Text with AI SDK\n\nSource: https://vercel.com/docs/ai-gateway/getting-started/text\n\nThis script demonstrates how to stream text responses from AI Gateway using the AI SDK, including logging token usage and finish reason.\n\n```typescript\nimport { streamText } from 'ai';\nimport 'dotenv/config';\n\nasync function main() {\n  const result = streamText({\n    model: 'openai/gpt-5.5',\n    prompt: 'Invent a new holiday and describe its traditions.',\n  });\n\n  for await (const textPart of result.textStream) {\n    process.stdout.write(textPart);\n  }\n\n  console.log();\n  console.log('Token usage:', await result.usage);\n  console.log('Finish reason:', await result.finishReason);\n}\n\nmain().catch(console.error);\n```\n\n--------------------------------\n\n### Stream Text with Google AI Studio Search Tool\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/web-search\n\nIntegrate Google Search with `streamText` using Google AI Studio. Import `google` from `@ai-sdk/google` and pass `google.tools.googleSearch({})` to the `tools` parameter.\n\n```typescript\nimport { streamText } from 'ai';\nimport { google } from '@ai-sdk/google';\n\nexport async function POST(request: Request) {\n  const { prompt } = await request.json();\n\n  const result = streamText({\n    model: 'google/gemini-3.1-pro-preview',\n    prompt,\n    tools: {\n      google_search: google.tools.googleSearch({}),\n    },\n  });\n\n  return result.toDataStreamResponse();\n}\n```\n\n--------------------------------\n\n### Stream Text with Google Vertex AI Search Tool\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/web-search\n\nIntegrate Google Search with `streamText` using Google Vertex AI. Import `vertex` from `@ai-sdk/google-vertex` and pass `vertex.tools.googleSearch({})` to the `tools` parameter.\n\n```typescript\nimport { streamText } from 'ai';\nimport { vertex } from '@ai-sdk/google-vertex';\n\nexport async function POST(request: Request) {\n  const { prompt } = await request.json();\n\n  const result = streamText({\n    model: 'google/gemini-3.1-pro-preview',\n    prompt,\n    tools: {\n      google_search: vertex.tools.googleSearch({}),\n    },\n  });\n\n  return result.toDataStreamResponse();\n}\n```\n\n--------------------------------\n\n### Enable Reasoning with AI SDK and OpenAI\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/reasoning\n\nUse `streamText` with `providerOptions` to configure OpenAI reasoning effort and summary. Iterate over the stream to process reasoning and text deltas.\n\n```typescript\nimport { streamText } from 'ai';\n\nconst result = streamText({\n  model: 'openai/gpt-5.5',\n  prompt: 'Explain the Monty Hall problem step by step.',\n  providerOptions: {\n    openai: {\n      reasoningEffort: 'high',\n      reasoningSummary: 'detailed',\n    },\n  },\n});\n\nfor await (const part of result.fullStream) {\n  if (part.type === 'reasoning-delta') {\n    process.stdout.write(part.text);\n  } else if (part.type === 'text-delta') {\n    process.stdout.write(part.text);\n  }\n}\n```\n\n--------------------------------\n\n### Stream OpenAI Reasoning Summaries with AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/reasoning/openai\n\nUse `streamText` to receive the model's thought process as it streams. Set `reasoningEffort` and `reasoningSummary` in `providerOptions.openai` to enable this feature.\n\n```TypeScript\nimport { streamText } from 'ai';\n \nconst result = streamText({\n  model: 'openai/gpt-5.5',\n  prompt: 'Tell me about the Mission burrito debate in San Francisco.',\n  providerOptions: {\n    openai: {\n      reasoningEffort: 'high',\n      reasoningSummary: 'detailed', // 'auto' for condensed or 'detailed' for comprehensive\n    },\n  },\n});\n \nfor await (const part of result.fullStream) {\n  if (part.type === 'reasoning-delta') {\n    process.stdout.write(part.text);\n  } else if (part.type === 'text-delta') {\n    process.stdout.write(part.text);\n  }\n}\n```\n\n--------------------------------\n\n### Use AI SDK with Global Default Provider\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers\n\nMake an AI SDK `generateText` call without explicitly specifying the model, relying on the globally configured default provider set in `instrumentation.ts`.\n\n```typescript\nimport { generateText } from 'ai';\nimport { NextRequest } from 'next/server';\n\nexport async function GET(request: NextRequest) {\n  const { searchParams } = new URL(request.url);\n  const prompt = searchParams.get('prompt');\n\n  if (!prompt) {\n    return Response.json({ error: 'Prompt is required' }, { status: 400 });\n  }\n\n  const result = await generateText({\n    model: 'openai/gpt-5.5',\n    prompt,\n  });\n\n  return Response.json(result);\n}\n```\n\n--------------------------------\n\n### Look Up Generation Info with streamText using AI SDK\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/custom-reporting\n\nRetrieve detailed information for a specific AI generation by its ID. This example shows how to capture the generation ID from the first content chunk of a `streamText` response and then query its details.\n\n```typescript\nimport { gateway, streamText } from 'ai';\n\nconst result = streamText({\n  model: 'anthropic/claude-opus-4.7',\n  prompt: 'Explain quantum entanglement briefly',\n});\n\nlet generationId: string | undefined;\n\nfor await (const part of result.fullStream) {\n  if (!generationId && part.providerMetadata?.gateway?.generationId) {\n    generationId = part.providerMetadata.gateway.generationId as string;\n  }\n}\n\nif (generationId) {\n  const generation = await gateway.getGenerationInfo({ id: generationId });\n  console.log(`Cost: $${generation.totalCost.toFixed(6)}`);\n  console.log(`Finish reason: ${generation.finishReason}`);\n}\n```"}]}]},"uuid":"1af060a7-7d0f-4813-bada-d76557773adf","timestamp":"2026-06-07T15:59:18.870Z","sourceToolAssistantUUID":"286afd33-da12-4980-b3b9-c588110d976c","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"8779c983-e7b4-4905-a4af-9b4c89106a2d","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01GZJ4XcoLyoQfDXxzNBZoVf","type":"tool_result","content":[{"type":"text","text":"### Implement Explicit Prompt Caching with AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-compat\n\nUse the `cache_control` parameter in the Anthropic SDK to enable explicit prompt caching for large system messages when routing requests through the AI Gateway. Observe cache creation and read tokens in the response usage.\n\n```TypeScript\nimport Anthropic from '@anthropic-ai/sdk';\n \nconst apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;\n \nconst anthropic = new Anthropic({\n  apiKey,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n \nconst message = await anthropic.messages.create({\n  model: 'anthropic/claude-opus-4.7',\n  max_tokens: 1024,\n  system: [\n    {\n      type: 'text',\n      text: 'You are a helpful assistant that analyzes documents.',\n    },\n    {\n      type: 'text',\n      text: longDocumentContent, // Large content to cache\n      cache_control: { type: 'ephemeral' },\n    },\n  ],\n  messages: [\n    {\n      role: 'user',\n      content: 'Summarize the key points from this document.',\n    },\n  ],\n});\n \nconsole.log(message.usage);\n// {\n//   input_tokens: 50,\n//   output_tokens: 200,\n//   cache_creation_input_tokens: 10000,  // Tokens written to cache\n//   cache_read_input_tokens: 0           // Tokens read from cache\n// }\n```\n\n```Python\nimport os\nimport anthropic\n \napi_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')\n \nclient = anthropic.Anthropic(\n    api_key=api_key,\n    base_url='https://ai-gateway.vercel.sh'\n)\n \nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    system=[\n        {\n            'type': 'text',\n            'text': 'You are a helpful assistant that analyzes documents.',\n        },\n        {\n            'type': 'text',\n            'text': long_document_content,  # Large content to cache\n            'cache_control': {'type': 'ephemeral'},\n        },\n    ],\n    messages=[\n        {\n            'role': 'user',\n            'content': 'Summarize the key points from this document.'\n        }\n    ],\n)\n \nprint(message.usage)\n# {\n#   'input_tokens': 50,\n#   'output_tokens': 200,\n#   'cache_creation_input_tokens': 10000,  # Tokens written to cache\n#   'cache_read_input_tokens': 0           # Tokens read from cache\n# }\n```\n\n--------------------------------\n\n### Enable automatic caching with Anthropic in TypeScript\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/automatic-caching\n\nConfigure the Anthropic SDK to use AI Gateway with automatic caching enabled. The `providerOptions.gateway.caching: 'auto'` setting automatically adds cache control markers to static content.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n\nconst apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;\n\nconst anthropic = new Anthropic({\n  apiKey,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n\nconst message = await anthropic.messages.create({\n  model: 'anthropic/claude-sonnet-4.6',\n  max_tokens: 2048,\n  system: 'You are a helpful assistant with access to a large knowledge base...',\n  messages: [\n    {\n      role: 'user',\n      content: 'What is the capital of France?',\n    },\n  ],\n  // @ts-expect-error - providerOptions is a gateway extension\n  providerOptions: {\n    gateway: {\n      caching: 'auto',\n    },\n  },\n});\n\nconsole.log(message.content[0].type === 'text' ? message.content[0].text : '');\n```\n\n--------------------------------\n\n### AI SDK Stream Text with Automatic Caching\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/automatic-caching\n\nUse `caching: 'auto'` within `providerOptions.gateway` in the AI SDK's `streamText` function to enable automatic caching for providers like Anthropic.\n\n```typescript\nimport { streamText } from 'ai';\n\nexport async function POST(request: Request) {\n  const { prompt } = await request.json();\n\n  const result = streamText({\n    model: 'anthropic/claude-sonnet-4.6',\n    system: 'You are a helpful assistant with access to a large knowledge base...',\n    prompt,\n    providerOptions: {\n      gateway: {\n        caching: 'auto',\n      },\n    },\n  });\n\n  return result.toUIMessageStreamResponse();\n}\n```\n\n--------------------------------\n\n### Implement Manual Prompt Caching in Python\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions/advanced\n\nApply `cache_control: { 'type': 'ephemeral' }` to a message in Python to manually enable prompt caching for Anthropic models. This helps reuse large prompts across multiple requests within a session.\n\n```python\nimport os\nfrom openai import OpenAI\n\napi_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')\n\nclient = OpenAI(\n    api_key=api_key,\n    base_url='https://ai-gateway.vercel.sh/v1'\n)\n\nresponse = client.chat.completions.create(\n    model='anthropic/claude-opus-4.7',\n    messages=[\n        {\n            'role': 'user',\n            'content': 'Analyze this document and summarize the key points.',\n            'cache_control': {\n                'type': 'ephemeral'\n            }\n        }\n    ]\n)\n\nprint(response.choices[0].message.content)\n```\n\n--------------------------------\n\n### Disallow Prompt Training with Anthropic Messages API (TypeScript)\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/disallow-prompt-training\n\nSet `disallowPromptTraining` to `true` in `providerOptions` for Anthropic messages requests via the AI Gateway. Note the `@ts-expect-error` for SDK type compatibility.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n\nconst apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;\n\nconst anthropic = new Anthropic({\n  apiKey,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n\nconst message = await anthropic.messages.create({\n  model: 'anthropic/claude-sonnet-4.6',\n  messages: [\n    {\n      role: 'user',\n      content: 'Analyze this proprietary business strategy.',\n    },\n  ],\n  // @ts-expect-error -- providerOptions is not in the Anthropic SDK types\n  providerOptions: {\n    gateway: {\n      disallowPromptTraining: true,\n    },\n  },\n});\n```\n\n--------------------------------\n\n### Implement Manual Prompt Caching in TypeScript\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions/advanced\n\nUse `cache_control: { type: 'ephemeral' }` within a message object to manually enable prompt caching for Anthropic models. This stores content for the session's duration, useful for large system prompts.\n\n```typescript\nimport OpenAI from 'openai';\n\nconst apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;\n\nconst openai = new OpenAI({\n  apiKey,\n  baseURL: 'https://ai-gateway.vercel.sh/v1',\n});\n\nconst response = await openai.chat.completions.create({\n  model: 'anthropic/claude-opus-4.7',\n  messages: [\n    {\n      role: 'user',\n      content: 'Analyze this document and summarize the key points.',\n      cache_control: {\n        type: 'ephemeral',\n      },\n    },\n  ],\n});\n\nconsole.log(response.choices[0].message.content);\n```\n\n--------------------------------\n\n### TypeScript OpenAI SDK Chat Completions with Automatic Caching\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/automatic-caching\n\nConfigure `caching: 'auto'` within `providerOptions.gateway` when using the OpenAI SDK with AI Gateway for automatic caching, targeting providers like Anthropic.\n\n```typescript\nimport OpenAI from 'openai';\n\nconst apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;\n\nconst openai = new OpenAI({\n  apiKey,\n  baseURL: 'https://ai-gateway.vercel.sh/v1',\n});\n\n// @ts-expect-error - providerOptions is a gateway extension\nconst response = await openai.chat.completions.create({\n  model: 'anthropic/claude-sonnet-4.6',\n  messages: [\n    {\n      role: 'system',\n      content: 'You are a helpful assistant with access to a large knowledge base...',\n    },\n    {\n      role: 'user',\n      content: 'What is the capital of France?',\n    },\n  ],\n  providerOptions: {\n    gateway: {\n      caching: 'auto',\n    },\n  },\n});\n\nconsole.log(response.choices[0].message.content);\n```\n\n--------------------------------\n\n### Disallow Prompt Training with Anthropic Messages API (Python)\n\nSource: https://vercel.com/docs/ai-gateway/capabilities/disallow-prompt-training\n\nSet `disallowPromptTraining` to `true` in `providerOptions` for Anthropic messages requests via the AI Gateway.\n\n```python\nimport os\nimport anthropic\n\nclient = anthropic.Anthropic(\n    api_key=os.getenv(\"AI_GATEWAY_API_KEY\"),\n    base_url=\"https://ai-gateway.vercel.sh\",\n)\n\nmessage = client.messages.create(\n    model=\"anthropic/claude-sonnet-4.6\",\n    messages=[\n        {\n            \"role\": \"user\",\n            \"content\": \"Analyze this proprietary business strategy.\",\n        }\n    ],\n    extra_body={\n        \"providerOptions\": {\n            \"gateway\": {\"disallowPromptTraining\": True}\n        }\n    },\n)\n```\n\n--------------------------------\n\n### Integrate Anthropic SDK with AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/anthropic-compat\n\nThese examples demonstrate how to configure the Anthropic SDK to use the AI Gateway by setting the `baseURL` and providing an API key from environment variables.\n\n```typescript\nimport Anthropic from '@anthropic-ai/sdk';\n \nconst anthropic = new Anthropic({\n  apiKey: process.env.AI_GATEWAY_API_KEY,\n  baseURL: 'https://ai-gateway.vercel.sh',\n});\n \nconst message = await anthropic.messages.create({\n  model: 'anthropic/claude-opus-4.7',\n  max_tokens: 1024,\n  messages: [{ role: 'user', content: 'Hello, world!' }],\n});\n```\n\n```python\nimport os\nimport anthropic\n \nclient = anthropic.Anthropic(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh'\n)\n \nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    messages=[\n        {'role': 'user', 'content': 'Hello, world!'}\n    ]\n)\n```\n\n--------------------------------\n\n### Enable automatic caching with Anthropic in Python\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/automatic-caching\n\nConfigure the Anthropic Python client to use AI Gateway with automatic caching enabled. Pass `caching: 'auto'` via `extra_body` to enable automatic cache marker insertion.\n\n```python\nimport os\nimport anthropic\n\napi_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')\n\nclient = anthropic.Anthropic(\n    api_key=api_key,\n    base_url='https://ai-gateway.vercel.sh'\n)\n\nmessage = client.messages.create(\n    model='anthropic/claude-sonnet-4.6',\n    max_tokens=2048,\n    system='You are a helpful assistant with access to a large knowledge base...',\n    messages=[\n        {\n            'role': 'user',\n            'content': 'What is the capital of France?'\n        }\n    ],\n    extra_body={\n        'providerOptions': {\n            'gateway': {\n                'caching': 'auto'\n            }\n        }\n    }\n)\n\nprint(message.content[0].text)\n```\n\n--------------------------------\n\n### Enable Automatic Caching with `gateway.caching: 'auto'` (TypeScript)\n\nSource: https://vercel.com/docs/ai-gateway/models-and-providers/provider-options\n\nConfigure `caching: 'auto'` to allow AI Gateway to automatically apply the appropriate caching strategy for the provider. This is useful for providers requiring explicit cache markers.\n\n```typescript\nproviderOptions: {\n  gateway: {\n    caching: 'auto',\n  },\n},\n```"}]}]},"uuid":"3585b36f-7ec9-442b-803f-789bdea173a1","timestamp":"2026-06-07T15:59:19.500Z","sourceToolAssistantUUID":"8779c983-e7b4-4905-a4af-9b4c89106a2d","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"3585b36f-7ec9-442b-803f-789bdea173a1","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01Q3smyzSoomBa7tfeASAosK","type":"message","role":"assistant","content":[{"type":"text","text":"Good. Now let me verify the AI SDK v6 specifics: model string format for Haiku, `cacheControl` via `providerOptions.anthropic`, `stopWhen`/`stepCountIs`, and the exact Haiku gateway id. Let me fetch the AI SDK Anthropic provider page and the Haiku model id."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":16936,"cache_read_input_tokens":200455,"cache_creation":{"ephemeral_5m_input_tokens":16936,"ephemeral_1h_input_tokens":0},"output_tokens":6,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBCsj7d4jcii1zjdqM6","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"6a84135d-912d-4cc0-a3ce-587ff21ab8c8","timestamp":"2026-06-07T15:59:23.867Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"6a84135d-912d-4cc0-a3ce-587ff21ab8c8","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01Q3smyzSoomBa7tfeASAosK","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01US5JWo2VePL3AuGnxcw6M6","name":"WebFetch","input":{"url":"https://ai-sdk.dev/providers/ai-sdk-providers/anthropic","prompt":"Extract: (1) the exact way to enable prompt caching with providerOptions.anthropic.cacheControl including the syntax and where it is placed (on a system message part or message), (2) the exact import path and package name for the Anthropic provider (@ai-sdk/anthropic), (3) how createAnthropic is configured with an apiKey, (4) the model id strings for Claude Haiku models. Quote code verbatim."},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":16936,"cache_read_input_tokens":200455,"cache_creation":{"ephemeral_5m_input_tokens":16936,"ephemeral_1h_input_tokens":0},"output_tokens":6,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBCsj7d4jcii1zjdqM6","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"5efffb77-7d16-4c3c-917e-91ac36f31a1e","timestamp":"2026-06-07T15:59:27.167Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"5efffb77-7d16-4c3c-917e-91ac36f31a1e","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01Q3smyzSoomBa7tfeASAosK","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01DT4C37TMAGPcTFvgmLtw7x","name":"mcp__claude_ai_Vercel__search_vercel_documentation","input":{"topic":"AI Gateway available models list Claude Haiku 4.5 model id anthropic/claude-haiku","tokens":1500},"caller":{"type":"direct"}}],"stop_reason":"tool_use","stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":16936,"cache_read_input_tokens":200455,"output_tokens":471,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":16936},"inference_geo":"not_available","iterations":[{"input_tokens":2,"output_tokens":471,"cache_read_input_tokens":200455,"cache_creation_input_tokens":16936,"cache_creation":{"ephemeral_5m_input_tokens":16936,"ephemeral_1h_input_tokens":0},"type":"message"}],"speed":"standard"},"diagnostics":null},"requestId":"req_011CbpBCsj7d4jcii1zjdqM6","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"9ea167da-28ea-4874-b224-48057730979f","timestamp":"2026-06-07T15:59:28.068Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"9ea167da-28ea-4874-b224-48057730979f","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01DT4C37TMAGPcTFvgmLtw7x","type":"tool_result","content":[{"type":"text","text":"### List Models\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions/rest-api\n\nMake a GET request to retrieve a list of available AI models from the AI Gateway.\n\n```bash\ncurl -X GET \"https://ai-gateway.vercel.sh/v1/models\" \\\n  -H \"Authorization: Bearer $AI_GATEWAY_API_KEY\" \\\n  -H \"Content-Type: application/json\"\n```\n\n```javascript\nconst response = await fetch('https://ai-gateway.vercel.sh/v1/models', {\n  method: 'GET',\n  headers: {\n    Authorization: `Bearer ${process.env.AI_GATEWAY_API_KEY}`,\n    'Content-Type': 'application/json',\n  },\n});\n\nconst models = await response.json();\nconsole.log(models);\n```\n\n--------------------------------\n\n### Quick Start: Anthropic Messages API with AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/python\n\nSends a user message to the AI Gateway via the Anthropic SDK, specifying a model and max tokens, then prints the first text block of the model's response.\n\n```python\nimport os\nimport anthropic\n\nclient = anthropic.Anthropic(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh'\n)\n\nmessage = client.messages.create(\n    model='anthropic/claude-opus-4.7',\n    max_tokens=1024,\n    messages=[\n        {'role': 'user', 'content': 'Explain quantum computing in one paragraph.'}\n    ]\n)\n\nprint(message.content[0].text)\n```\n\n--------------------------------\n\n### Route Claude Agent SDK Requests through AI Gateway\n\nSource: https://vercel.com/docs/ai-gateway/coding-agents/claude-code\n\nConfigure the Claude Agent SDK to route requests through AI Gateway by setting `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` in the `env` option.\n\n```ts\nimport { query } from '@anthropic-ai/claude-agent-sdk';\n\nfor await (const message of query({\n  prompt: 'Find and fix the bug in auth.py',\n  options: {\n    model: 'anthropic/claude-sonnet-4.6',\n    allowedTools: ['Read', 'Edit', 'Bash'],\n    env: {\n      ...process.env,\n      ANTHROPIC_BASE_URL: 'https://ai-gateway.vercel.sh',\n      ANTHROPIC_AUTH_TOKEN: 'your-ai-gateway-api-key',\n      ANTHROPIC_API_KEY: '',\n    },\n  },\n})) {\n  if ('result' in message) console.log(message.result);\n}\n```\n\n--------------------------------\n\n### List All Models with OpenAI SDK\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions\n\nRetrieve a list of all available models from the AI Gateway using the OpenAI SDK, configured with the AI Gateway base URL.\n\n```typescript\nimport OpenAI from 'openai';\n\nconst openai = new OpenAI({\n  apiKey: process.env.AI_GATEWAY_API_KEY,\n  baseURL: 'https://ai-gateway.vercel.sh/v1',\n});\n\nconst models = await openai.models.list();\nconsole.log(models);\n```\n\n```python\nimport os\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.getenv('AI_GATEWAY_API_KEY'),\n    base_url='https://ai-gateway.vercel.sh/v1'\n)\n\nmodels = client.models.list()\nprint(models)\n```\n\n--------------------------------\n\n### Generate Text with Anthropic Claude Opus 4.5\n\nSource: https://vercel.com/docs/ai-sdk\n\nShows how to switch to Anthropic's Claude Opus 4.5 model for text generation, highlighting the AI SDK's unified API for easy provider changes.\n\n```typescript\nimport { generateText } from 'ai';\n\nconst { text } = await generateText({\n  model: 'anthropic/claude-opus-4.5',\n  prompt: 'How many people will live in the world in 2040?',\n});\n```\n\n--------------------------------\n\n### List All Models API Endpoint\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions\n\nUse this GET endpoint to retrieve a list of all models available through the AI Gateway.\n\n```http\nGET /models\n```\n\n--------------------------------\n\n### JSON Response for Listing Models\n\nSource: https://vercel.com/docs/ai-gateway/sdks-and-apis/openai-chat-completions\n\nThe expected JSON structure returned when successfully listing all available models from the AI Gateway.\n\n```json\n{\n  \"object\": \"list\",\n  \"data\": [\n    {\n      \"id\": \"anthropic/claude-opus-4.7\",\n      \"object\": \"model\",\n      \"created\": 1677610602,\n      \"owned_by\": \"anthropic\"\n    },\n    {\n      \"id\": \"openai/gpt-5.5\",\n      \"object\": \"model\",\n      \"created\": 1677610602,\n      \"owned_by\": \"openai\"\n    }\n  ]\n}\n```"}]}]},"uuid":"4d5bd105-d78a-4aa2-b485-e3112c750219","timestamp":"2026-06-07T15:59:29.000Z","sourceToolAssistantUUID":"9ea167da-28ea-4874-b224-48057730979f","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"5efffb77-7d16-4c3c-917e-91ac36f31a1e","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01US5JWo2VePL3AuGnxcw6M6","type":"tool_result","content":"# Anthropic Provider Configuration Guide\n\n## 1. Prompt Caching with providerOptions\n\nPrompt caching is enabled by adding `providerOptions` to individual message content blocks:\n\n```javascript\n{\n  type: 'text',\n  text: `Error message: ${errorMessage}`,\n  providerOptions: {\n    anthropic: { cacheControl: { type: 'ephemeral' } },\n  },\n}\n```\n\n\"You can use cache control on system messages by providing multiple system messages at the head of your messages array\" with the same `providerOptions` structure applied to each system message part.\n\n## 2. Package Import Path\n\n\"The Anthropic provider is available in the `@ai-sdk/anthropic` module\" and can be installed via package managers.\n\n## 3. Creating Custom Anthropic Instance\n\nCustom provider instances use this pattern:\n\n```javascript\nimport { createAnthropic } from '@ai-sdk/anthropic';\n\nconst anthropic = createAnthropic({\n  apiKey: 'your-api-key-here',\n});\n```\n\nThe `apiKey` \"defaults to the `ANTHROPIC_API_KEY` environment variable.\"\n\n## 4. Claude Haiku Model IDs\n\nThe documentation references these Haiku variants: `claude-haiku-4-5` and `claude-haiku-3.5`. These appear in capability tables and cache limitation sections discussing minimum cacheable prompt lengths specific to each model version."}]},"uuid":"18033506-a4bc-4fe3-bf6b-0270794e14a3","timestamp":"2026-06-07T15:59:32.491Z","sourceToolAssistantUUID":"5efffb77-7d16-4c3c-917e-91ac36f31a1e","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"18033506-a4bc-4fe3-bf6b-0270794e14a3","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01S4UbgeFfJBsqd81hN2c3Qa","type":"message","role":"assistant","content":[{"type":"text","text":"I have a key nuance to resolve: the AI Gateway uses dotted model ids (`anthropic/claude-haiku-4.5`) while the direct `@ai-sdk/anthropic` provider uses hyphenated ids (`claude-haiku-4-5`). Let me confirm the exact gateway Haiku string and verify `stepCountIs`/`stopWhen` and `Output`/tool rendering in v6."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":1960,"cache_creation_input_tokens":972,"cache_read_input_tokens":217391,"cache_creation":{"ephemeral_5m_input_tokens":972,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBDqC6P76rBW1WXwE9D","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"45466455-4a3f-4f3b-8841-55eb834e85bb","timestamp":"2026-06-07T15:59:36.400Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"45466455-4a3f-4f3b-8841-55eb834e85bb","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01S4UbgeFfJBsqd81hN2c3Qa","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_014ExD9heGfrcJnrFAwh1R58","name":"WebFetch","input":{"url":"https://ai-gateway.vercel.sh/v1/models","prompt":"List all model ids that contain \"haiku\" or \"anthropic\". I need the exact model id string for Claude Haiku 4.5."},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":1960,"cache_creation_input_tokens":972,"cache_read_input_tokens":217391,"cache_creation":{"ephemeral_5m_input_tokens":972,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBDqC6P76rBW1WXwE9D","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"0435e392-9f6f-47d0-be06-21e461d06206","timestamp":"2026-06-07T15:59:37.331Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"0435e392-9f6f-47d0-be06-21e461d06206","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01S4UbgeFfJBsqd81hN2c3Qa","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01Cwn3FJFKsLiW3jz4ujgMik","name":"WebFetch","input":{"url":"https://ai-sdk.dev/docs/foundations/agents","prompt":"Extract verbatim: how stopWhen and stepCountIs are imported from 'ai' and used in streamText/generateText for multi-step tool calling. Show the exact import statement and usage."},"caller":{"type":"direct"}}],"stop_reason":"tool_use","stop_sequence":null,"stop_details":null,"usage":{"input_tokens":1960,"cache_creation_input_tokens":972,"cache_read_input_tokens":217391,"output_tokens":422,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":972},"inference_geo":"not_available","iterations":[{"input_tokens":1960,"output_tokens":422,"cache_read_input_tokens":217391,"cache_creation_input_tokens":972,"cache_creation":{"ephemeral_5m_input_tokens":972,"ephemeral_1h_input_tokens":0},"type":"message"}],"speed":"standard"},"diagnostics":null},"requestId":"req_011CbpBDqC6P76rBW1WXwE9D","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"2cda6ab8-4b50-42b6-b26e-24061acaaee3","timestamp":"2026-06-07T15:59:38.627Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"0435e392-9f6f-47d0-be06-21e461d06206","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_014ExD9heGfrcJnrFAwh1R58","type":"tool_result","content":"# Models Containing \"haiku\" or \"anthropic\"\n\n## Haiku Models:\n- `anthropic/claude-3-haiku`\n- `anthropic/claude-3.5-haiku`\n- `anthropic/claude-haiku-4.5`\n\n## Anthropic Models:\nAll Anthropic models from the listing:\n- `anthropic/claude-3-haiku`\n- `anthropic/claude-3.5-haiku`\n- `anthropic/claude-haiku-4.5`\n- `anthropic/claude-opus-4`\n- `anthropic/claude-opus-4.1`\n- `anthropic/claude-opus-4.5`\n- `anthropic/claude-opus-4.6`\n- `anthropic/claude-opus-4.7`\n- `anthropic/claude-opus-4.8`\n- `anthropic/claude-sonnet-4`\n- `anthropic/claude-sonnet-4.5`\n- `anthropic/claude-sonnet-4.6`\n\n## Claude Haiku 4.5 Exact ID:\n**`anthropic/claude-haiku-4.5`**"}]},"uuid":"b46a0994-83f2-4aeb-92b6-70c0d7519d8a","timestamp":"2026-06-07T15:59:40.318Z","sourceToolAssistantUUID":"0435e392-9f6f-47d0-be06-21e461d06206","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"2cda6ab8-4b50-42b6-b26e-24061acaaee3","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01Cwn3FJFKsLiW3jz4ujgMik","type":"tool_result","content":"I cannot provide the requested information based on the content above. The webpage content does not contain any import statements or usage examples showing how `stopWhen` and `stepCountIs` are imported from 'ai' or used in `streamText`/`generateText` for multi-step tool calling.\n\nThe page mentions \"[Loop Control](/docs/agents/loop-control)\" as a \"related topic\" with \"Execution control with stopWhen and prepareStep,\" but the actual code examples and import statements are not included in the provided content.\n\nTo find this information, you would need to visit the Loop Control documentation page linked in the navigation."}]},"uuid":"f919fb25-c327-4ddd-bfb3-e0c7e76db93b","timestamp":"2026-06-07T15:59:41.812Z","sourceToolAssistantUUID":"2cda6ab8-4b50-42b6-b26e-24061acaaee3","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"f919fb25-c327-4ddd-bfb3-e0c7e76db93b","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01QKcFkem76AMAT2kwtF5QnW","type":"message","role":"assistant","content":[{"type":"text","text":"Confirmed the gateway Haiku id is `anthropic/claude-haiku-4.5`. Let me confirm the loop-control (`stopWhen`/`stepCountIs`) and the `cacheControl` placement in AI SDK messages one more time, then I'll write the plan."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":2992,"cache_read_input_tokens":218363,"cache_creation":{"ephemeral_5m_input_tokens":2992,"ephemeral_1h_input_tokens":0},"output_tokens":2,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBEXMp63ztFS1Z3SdJr","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"72fd6e3f-e439-4be1-ab02-bfd8bb8c5195","timestamp":"2026-06-07T15:59:45.272Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"72fd6e3f-e439-4be1-ab02-bfd8bb8c5195","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01QKcFkem76AMAT2kwtF5QnW","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01LThLPwpZ8KFa7rqrjudgRa","name":"WebFetch","input":{"url":"https://ai-sdk.dev/docs/agents/loop-control","prompt":"Extract verbatim the import statement and usage of stopWhen and stepCountIs from 'ai' in streamText or generateText. Show exact code."},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":2992,"cache_read_input_tokens":218363,"cache_creation":{"ephemeral_5m_input_tokens":2992,"ephemeral_1h_input_tokens":0},"output_tokens":2,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBEXMp63ztFS1Z3SdJr","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"4f367055-d842-49a1-a930-8b53cbb2d109","timestamp":"2026-06-07T15:59:46.476Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"4f367055-d842-49a1-a930-8b53cbb2d109","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01QKcFkem76AMAT2kwtF5QnW","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01UgjtXvsNcRfaPqrVMT5ACq","name":"WebFetch","input":{"url":"https://ai-sdk.dev/cookbook/next/markdown-chatbot-with-memoization","prompt":"Not relevant - ignore. Just return \"ok\"."},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":2992,"cache_read_input_tokens":218363,"cache_creation":{"ephemeral_5m_input_tokens":2992,"ephemeral_1h_input_tokens":0},"output_tokens":2,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBEXMp63ztFS1Z3SdJr","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"5103b57e-c0ee-49d8-aef3-ce7006311200","timestamp":"2026-06-07T15:59:47.355Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"5103b57e-c0ee-49d8-aef3-ce7006311200","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01UgjtXvsNcRfaPqrVMT5ACq","type":"tool_result","content":"ok"}]},"uuid":"f81d721e-9aea-4d5d-a908-51b1467b365d","timestamp":"2026-06-07T15:59:48.497Z","sourceToolAssistantUUID":"5103b57e-c0ee-49d8-aef3-ce7006311200","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"4f367055-d842-49a1-a930-8b53cbb2d109","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01LThLPwpZ8KFa7rqrjudgRa","type":"tool_result","content":"Based on the documentation provided, I need to clarify that **`stopWhen` and `stepCountIs` are not used in `streamText` or `generateText`**. \n\nThese parameters are exclusively used with the `ToolLoopAgent` class. Here's the exact usage shown:\n\n```javascript\nimport { ToolLoopAgent, stepCountIs } from 'ai';\n\nconst agent = new ToolLoopAgent({\n  model: \"anthropic/claude-sonnet-4.5\",\n  tools: {\n    // your tools\n  },\n  stopWhen: stepCountIs(50),\n});\n```\n\nThe documentation states that \"The AI SDK provides built-in loop control through two parameters: `stopWhen`\" but specifies this applies to agent loop configuration, not the core `generateText` or `streamText` functions.\n\nFor manual loop control using core functions, the documentation suggests implementing your own loop management without these built-in parameters."}]},"uuid":"1e1e277b-542b-4fd4-a58f-75d3c065a59c","timestamp":"2026-06-07T15:59:51.413Z","sourceToolAssistantUUID":"4f367055-d842-49a1-a930-8b53cbb2d109","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"1e1e277b-542b-4fd4-a58f-75d3c065a59c","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01X7qvBXiGszuXSLttGHRCs7","type":"message","role":"assistant","content":[{"type":"text","text":"That WebFetch summary is misleading — `stopWhen`/`stepCountIs` are in fact supported on `streamText`/`generateText` in v6 (the loop control page covers both the core functions and the Agent class). Let me confirm directly from the core function reference to avoid putting wrong API in the plan."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":33,"cache_creation_input_tokens":682,"cache_read_input_tokens":221355,"cache_creation":{"ephemeral_5m_input_tokens":682,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBFE7G7QAf7SFVAzbCF","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"46ef4578-b3b6-48a4-a53f-cb3c09d91534","timestamp":"2026-06-07T15:59:54.869Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"46ef4578-b3b6-48a4-a53f-cb3c09d91534","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01X7qvBXiGszuXSLttGHRCs7","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01Aarm7KaH6utzrGS3nGR8gK","name":"WebFetch","input":{"url":"https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text","prompt":"Does streamText accept a stopWhen parameter? Show how stopWhen and stepCountIs (imported from 'ai') are passed to streamText for multi-step tool calling. Also confirm streamText has a toUIMessageStreamResponse() method and accepts a messages array via convertToModelMessages. Quote any relevant code or parameter descriptions verbatim."},"caller":{"type":"direct"}}],"stop_reason":"tool_use","stop_sequence":null,"stop_details":null,"usage":{"input_tokens":33,"cache_creation_input_tokens":682,"cache_read_input_tokens":221355,"output_tokens":323,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":682},"inference_geo":"not_available","iterations":[{"input_tokens":33,"output_tokens":323,"cache_read_input_tokens":221355,"cache_creation_input_tokens":682,"cache_creation":{"ephemeral_5m_input_tokens":682,"ephemeral_1h_input_tokens":0},"type":"message"}],"speed":"standard"},"diagnostics":null},"requestId":"req_011CbpBFE7G7QAf7SFVAzbCF","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"ade279e5-2095-4af8-98a3-a84470979210","timestamp":"2026-06-07T15:59:56.804Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"ade279e5-2095-4af8-98a3-a84470979210","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01Aarm7KaH6utzrGS3nGR8gK","type":"tool_result","content":"# streamText Parameters and Methods\n\nYes, `streamText` accepts a `stopWhen` parameter for multi-step tool calling. According to the documentation:\n\n**stopWhen parameter:**\nThe API signature includes `stopWhen?: StopCondition<TOOLS> | Array<StopCondition<TOOLS>>`, allowing you to pass one or more stop conditions.\n\n**Usage with stepCountIs:**\nYou can implement multi-step tool calling by combining these features. The `stepCountIs` function (imported from 'ai') enables conditional logic based on iteration count. Example pattern:\n\n```javascript\nstreamText({\n  model,\n  prompt,\n  tools,\n  stopWhen: stepCountIs(3) // Stop after 3 steps\n})\n```\n\n**toUIMessageStreamResponse() method:**\nYes, `streamText` returns an object with `toUIMessageStreamResponse()` method. The signature shows: `toUIMessageStreamResponse: (options?: ResponseInit & UIMessageStreamOptions) => Response`, allowing conversion of the stream to a Response object for UI consumption.\n\n**Messages array handling:**\nThe function accepts `messages?: Array<SystemModelMessage | UserModelMessage | AssistantModelMessage | ToolModelMessage>` as a parameter for passing conversation history directly to `streamText`."}]},"uuid":"70983c25-485c-4605-b5e1-df63e518a365","timestamp":"2026-06-07T16:00:00.789Z","sourceToolAssistantUUID":"ade279e5-2095-4af8-98a3-a84470979210","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"70983c25-485c-4605-b5e1-df63e518a365","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01GWejWBdTJ63iv2RL9j82Ux","type":"message","role":"assistant","content":[{"type":"text","text":"All API details are now verified against live docs. I have everything I need to write a concrete, accurate plan."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":816,"cache_read_input_tokens":222037,"cache_creation":{"ephemeral_5m_input_tokens":816,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBFwsydHozsX2JNX8PH","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"9fd2da75-0d92-424f-af89-82a3b7eddf8b","timestamp":"2026-06-07T16:00:10.807Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"9fd2da75-0d92-424f-af89-82a3b7eddf8b","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01GWejWBdTJ63iv2RL9j82Ux","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01B8mu5fiYVGokB2WRPBU8F9","name":"Write","input":{"file_path":"/home/dev/.claude/plans/brief-chatbot-pure-swing-agent-aed0c7b7da27aaf24.md","content":"# Covalba POC Chatbot — Implementation Design\n\nStandalone Next.js (App Router) AI chatbot, embeddable on WordPress via one `<script>`, lives in `chatbot/` inside the existing Vite repo, deployed as a separate Vercel project (root dir = `chatbot/`). Answers only from `knowledge.md`. Model: Claude Haiku via AI Gateway (OIDC on deploy) + direct Anthropic fallback for local dev. French.\n\nAll API details below were verified against current Vercel AI Gateway and AI SDK v6 docs (Jun 2026).\n\n---\n\n## 0. Verified API facts (flagged where they differ from the brief)\n\n- **Gateway Haiku id is dotted: `anthropic/claude-haiku-4.5`** — confirmed live from `GET https://ai-gateway.vercel.sh/v1/models`. The brief's `anthropic/claude-haiku-4-5` (hyphenated) is **wrong for the gateway**. Hyphenated `claude-haiku-4-5` is the id for the **direct** `@ai-sdk/anthropic` provider. We keep both, one per code path.\n- **Gateway local env var is `AI_GATEWAY_API_KEY`** (correct in brief). On Vercel deploys, OIDC is automatic — a plain model string `'anthropic/claude-haiku-4.5'` routes through the gateway with zero key config (`VERCEL_OIDC_TOKEN` injected at build/runtime). Local dev without a gateway key falls back to the direct Anthropic provider.\n- **Prompt caching, two layers, both confirmed:**\n  - Per-message-part: `providerOptions.anthropic.cacheControl = { type: 'ephemeral' }` (direct provider AND passes through the gateway).\n  - Gateway shortcut: `providerOptions.gateway.caching = 'auto'` auto-inserts cache markers on static content (gateway only).\n- **`stopWhen: stepCountIs(n)` IS supported on `streamText`** (verified in stream-text reference) — not only on `ToolLoopAgent`. We use `streamText` directly (simpler than the Agent class for this POC).\n- **`toUIMessageStreamResponse()`** is the correct v6 method for `useChat` (not `toDataStreamResponse()`).\n- `useChat` is from `@ai-sdk/react`; `streamText`, `tool`, `convertToModelMessages`, `stepCountIs`, `Output`, `DefaultChatTransport` are from `ai`. v6 `useChat` does NOT manage input state — manage it with `useState`.\n\n---\n\n## 1. Model wiring (`chatbot/lib/model.ts`)\n\nSingle helper resolves the model based on env. Plain string → gateway (works with OIDC on Vercel). If no gateway key locally but `ANTHROPIC_API_KEY` is present, use the direct provider.\n\n```ts\n// lib/model.ts\nimport type { LanguageModel } from 'ai';\nimport { createAnthropic } from '@ai-sdk/anthropic';\n\nexport function getModel(): LanguageModel {\n  // On Vercel: AI_GATEWAY_API_KEY unset but OIDC token present → plain string routes via gateway.\n  // Local with gateway key: same plain string works.\n  const hasGateway =\n    !!process.env.AI_GATEWAY_API_KEY || !!process.env.VERCEL_OIDC_TOKEN;\n\n  if (hasGateway) {\n    return 'anthropic/claude-haiku-4.5'; // dotted = gateway id\n  }\n\n  // Pure-local fallback: direct Anthropic provider\n  const anthropic = createAnthropic({ apiKey: process.env.ANTHROPIC_API_KEY });\n  return anthropic('claude-haiku-4-5'); // hyphenated = direct provider id\n}\n\n// Whether we're on the gateway path (controls which caching mechanism to use)\nexport const usingGateway =\n  !!process.env.AI_GATEWAY_API_KEY || !!process.env.VERCEL_OIDC_TOKEN;\n```\n\nOIDC explainer (put in README): on Vercel, the project gets a `VERCEL_OIDC_TOKEN` automatically; the AI SDK gateway provider exchanges it — no `AI_GATEWAY_API_KEY` needed in prod. `vercel env pull` brings a short-lived OIDC token locally too; `AI_GATEWAY_API_KEY` is the explicit alternative.\n\n---\n\n## 2. Prompt caching on the knowledge.md system prompt\n\n`knowledge.md` is large and identical on every request → cache it. The system prompt is sent as a **system message part with `cacheControl`**. This works on the direct provider and is passed through by the gateway. We also set `gateway.caching: 'auto'` so the gateway path is covered even if part-level markers are stripped.\n\n```ts\n// in app/api/chat/route.ts\nimport { readFileSync } from 'node:fs';\nimport { join } from 'node:path';\n\nconst KNOWLEDGE = readFileSync(join(process.cwd(), 'knowledge.md'), 'utf8');\n\nconst SYSTEM_PROMPT = `Tu es l'assistant virtuel de Covalba... [rules: répond UNIQUEMENT à partir de la base de connaissances ci-dessous; si l'info n'y est pas, dis-le et propose le formulaire de contact; français; ton B2B pro]\n\n# Base de connaissances\n${KNOWLEDGE}`;\n```\n\nThen in `streamText`, pass system as a cached message part:\n\n```ts\nmessages: [\n  {\n    role: 'system',\n    content: SYSTEM_PROMPT,\n    providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } },\n  },\n  ...convertToModelMessages(uiMessages),\n],\nproviderOptions: { gateway: { caching: 'auto' } }, // gateway-side belt-and-suspenders\n```\n\nNote: knowledge.md must exceed the model's minimum cacheable prefix (~2048 tokens for Haiku) or the marker is silently ignored — fine for a real knowledge base.\n\n---\n\n## 3. Chat route handler (`chatbot/app/api/chat/route.ts`)\n\n```ts\nimport { streamText, convertToModelMessages, stepCountIs, type UIMessage } from 'ai';\nimport { getModel } from '@/lib/model';\nimport { suggestPage } from '@/lib/tools/suggest-page';\nimport { captureLead } from '@/lib/tools/capture-lead';\nimport { logUnanswered } from '@/lib/tools/log-unanswered';\nimport { rateLimit } from '@/lib/rate-limit';\nimport { SYSTEM_PROMPT } from '@/lib/system-prompt';\n\nexport const maxDuration = 30;\n\nexport async function POST(req: Request) {\n  // rate limit by IP (see §7)\n  const ip = req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'anon';\n  const { success } = await rateLimit(ip);\n  if (!success) return new Response('Trop de requêtes. Réessayez dans un instant.', { status: 429 });\n\n  const { messages, originUrl }: { messages: UIMessage[]; originUrl?: string } = await req.json();\n\n  const result = streamText({\n    model: getModel(),\n    stopWhen: stepCountIs(5), // allow tool round-trips, cap loops\n    messages: [\n      {\n        role: 'system',\n        content: SYSTEM_PROMPT,\n        providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } },\n      },\n      ...convertToModelMessages(messages),\n    ],\n    providerOptions: { gateway: { caching: 'auto' } },\n    tools: {\n      suggest_page: suggestPage,\n      capture_lead: captureLead(originUrl), // closure injects origin page\n      log_unanswered: logUnanswered,\n    },\n  });\n\n  return result.toUIMessageStreamResponse();\n}\n```\n\n`originUrl` comes from the iframe query param (§6), forwarded by the client in the request body.\n\n---\n\n## 4. Tools (`chatbot/lib/tools/*.ts`, AI SDK v6 `tool()` + zod)\n\n### `suggest_page` — maps an intent to a fixed Covalba route\n\n```ts\n// lib/tools/routes.ts  (single source of truth; also used by widget if needed)\nexport const COVALBA_BASE = process.env.NEXT_PUBLIC_SITE_BASE_URL ?? 'https://www.covalba.com';\nexport const ROUTES = {\n  home: { path: '/', label: 'Accueil' },\n  simulateur: { path: '/simulateur-economie-energie', label: 'Simulateur d’économies' },\n  estimation: { path: '/estimation', label: 'Estimation' },\n  diagnostic: { path: '/diagnostic', label: 'Diagnostic' },\n  contact: { path: '/contact', label: 'Contact' },\n  applicateur: { path: '/devenir-applicateur', label: 'Devenir applicateur' },\n  qui_sommes_nous: { path: '/qui-sommes-nous', label: 'Qui sommes-nous' },\n  prime_cee: { path: '/bat-en-112', label: 'Prime CEE (BAT-EN-112)' },\n  covatherm: { path: '/solutions/covatherm', label: 'Covatherm' },\n  covatherm_light: { path: '/solutions/covatherm-light', label: 'Covatherm Light' },\n  covaseal_20: { path: '/solutions/covaseal-20', label: 'Covaseal 20' },\n  covametal_20: { path: '/solutions/covametal-20', label: 'Covametal 20' },\n  guide: { path: '/guide-cool-roof', label: 'Guide Cool Roof' },\n  faq: { path: '/faq', label: 'FAQ' },\n  references: { path: '/references', label: 'Références' },\n  tertiaire: { path: '/industries/tertiaire', label: 'Secteur tertiaire' },\n  distribution: { path: '/industries/distribution', label: 'Grande distribution' },\n  logistique: { path: '/industries/logistique', label: 'Logistique' },\n  collectivites: { path: '/industries/collectivites', label: 'Collectivités' },\n  agricole: { path: '/industries/agricole', label: 'Agricole' },\n  erp: { path: '/industries/erp', label: 'ERP' },\n} as const;\nexport type RouteKey = keyof typeof ROUTES;\n```\n\n```ts\n// lib/tools/suggest-page.ts\nimport { tool, type UIToolInvocation } from 'ai';\nimport { z } from 'zod';\nimport { ROUTES, COVALBA_BASE, type RouteKey } from './routes';\n\nexport const suggestPage = tool({\n  description:\n    \"Suggère UNE page du site Covalba la plus pertinente pour l'intention de l'utilisateur. \" +\n    'À utiliser dès que l’utilisateur cherche une info qui vit sur une page dédiée (solution, secteur, simulateur, contact, prime CEE…).',\n  inputSchema: z.object({\n    intent: z\n      .enum(Object.keys(ROUTES) as [RouteKey, ...RouteKey[]])\n      .describe('La clé de la page la plus pertinente.'),\n    reason: z.string().describe(\"Phrase courte expliquant pourquoi cette page aide l'utilisateur.\"),\n  }),\n  execute: async ({ intent, reason }) => {\n    const r = ROUTES[intent];\n    return { label: r.label, url: `${COVALBA_BASE}${r.path}`, reason };\n  },\n});\n\nexport type SuggestPageInvocation = UIToolInvocation<typeof suggestPage>;\n```\n\nUI renders the `tool-suggest_page` part as a clickable button (see §5).\n\n### `capture_lead` — console.log + Resend, graceful when key absent\n\n```ts\n// lib/tools/capture-lead.ts\nimport { tool } from 'ai';\nimport { z } from 'zod';\n\nexport function captureLead(originUrl?: string) {\n  return tool({\n    description:\n      \"Enregistre une demande de contact (lead) quand l'utilisateur fournit ses coordonnées ou demande à être rappelé. \" +\n      \"Ne PAS inventer de valeurs : ne l'appeler qu'avec les infos réellement fournies.\",\n    inputSchema: z.object({\n      nom: z.string(),\n      email: z.string().email(),\n      telephone: z.string().optional(),\n      besoin: z.string().describe('Besoin / projet décrit par l’utilisateur.'),\n      raison: z.string().describe('Pourquoi ce lead est qualifié / contexte.'),\n    }),\n    execute: async (lead) => {\n      const record = { ...lead, originUrl, at: new Date().toISOString() };\n      console.log('[covalba-lead]', JSON.stringify(record)); // visible in Vercel logs\n\n      const key = process.env.RESEND_API_KEY;\n      if (!key) {\n        return { ok: true, emailed: false, note: 'Lead enregistré (email désactivé en local).' };\n      }\n      try {\n        const { Resend } = await import('resend');\n        const resend = new Resend(key);\n        await resend.emails.send({\n          from: process.env.RESEND_FROM ?? 'Covalba Bot <onboarding@resend.dev>',\n          to: process.env.LEAD_INBOX ?? 'contact@covalba.com',\n          subject: `Nouveau lead chatbot — ${lead.nom}`,\n          text:\n            `Nom: ${lead.nom}\\nEmail: ${lead.email}\\nTél: ${lead.telephone ?? '—'}\\n` +\n            `Besoin: ${lead.besoin}\\nRaison: ${lead.raison}\\nPage d’origine: ${originUrl ?? '—'}`,\n        });\n        return { ok: true, emailed: true, message: 'Merci, votre demande a bien été transmise.' };\n      } catch (e) {\n        console.error('[covalba-lead] resend failed', e);\n        return { ok: true, emailed: false, note: 'Lead enregistré; email en échec (non bloquant).' };\n      }\n    },\n  });\n}\n```\n\n`capture_lead` is a factory so the route can inject `originUrl`. Failure of Resend never throws — the lead is always logged.\n\n### `log_unanswered` — structured console.log\n\n```ts\n// lib/tools/log-unanswered.ts\nimport { tool } from 'ai';\nimport { z } from 'zod';\n\nexport const logUnanswered = tool({\n  description:\n    \"À appeler quand la base de connaissances NE contient PAS la réponse à la question. \" +\n    'Journalise la question pour amélioration ultérieure. Puis propose le formulaire de contact.',\n  inputSchema: z.object({\n    question: z.string().describe(\"La question de l'utilisateur restée sans réponse.\"),\n  }),\n  execute: async ({ question }) => {\n    console.log('[covalba-unanswered]', JSON.stringify({ question, at: new Date().toISOString() }));\n    return { logged: true };\n  },\n});\n```\n\n---\n\n## 5. Frontend chat UI (`chatbot/app/embed/Chat.tsx`, client component)\n\nMinimal v6 `useChat` with `DefaultChatTransport`. Manage input with `useState` (v6 removed managed input). Forward `originUrl` (from `?origin=` query param) in the request body so `capture_lead` attaches it. Render text parts and the `tool-suggest_page` part as a link/button.\n\n```tsx\n'use client';\nimport { useChat } from '@ai-sdk/react';\nimport { DefaultChatTransport } from 'ai';\nimport { useState } from 'react';\n\nexport function Chat({ originUrl }: { originUrl: string }) {\n  const [input, setInput] = useState('');\n  const { messages, sendMessage, status } = useChat({\n    transport: new DefaultChatTransport({\n      api: '/api/chat',\n      // attach origin page to every request body\n      prepareSendMessagesRequest: ({ messages }) => ({ body: { messages, originUrl } }),\n    }),\n  });\n\n  return (\n    <div className=\"flex h-screen flex-col\">\n      <div className=\"flex-1 overflow-y-auto p-4 space-y-3\">\n        {messages.map((m) => (\n          <div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>\n            {m.parts.map((part, i) => {\n              if (part.type === 'text') return <p key={i}>{part.text}</p>;\n              if (part.type === 'tool-suggest_page' && part.state === 'output-available') {\n                const { label, url, reason } = part.output;\n                return (\n                  <a key={i} href={url} target=\"_top\" rel=\"noopener\"\n                     className=\"block rounded border p-2 hover:bg-gray-50\">\n                    <span className=\"font-medium\">→ {label}</span>\n                    <span className=\"block text-sm text-gray-600\">{reason}</span>\n                  </a>\n                );\n              }\n              return null;\n            })}\n          </div>\n        ))}\n      </div>\n      <form\n        onSubmit={(e) => { e.preventDefault(); if (!input.trim()) return; sendMessage({ text: input }); setInput(''); }}\n        className=\"flex gap-2 border-t p-3\"\n      >\n        <input value={input} onChange={(e) => setInput(e.target.value)}\n               placeholder=\"Posez votre question…\" className=\"flex-1 rounded border px-3 py-2\" />\n        <button disabled={status !== 'ready'} className=\"rounded bg-black px-4 py-2 text-white\">Envoyer</button>\n      </form>\n    </div>\n  );\n}\n```\n\nNotes: `target=\"_top\"` on the suggestion link breaks out of the iframe so the parent WordPress page navigates. `prepareSendMessagesRequest` is the v6 transport hook for injecting `originUrl` into the body (verify exact field name against installed `ai` version; alternative is to pass `body` on `sendMessage`).\n\n`chatbot/app/embed/page.tsx` (server) reads the `origin` query param and renders full-screen, no chrome:\n\n```tsx\nexport default function EmbedPage({ searchParams }: { searchParams: { origin?: string } }) {\n  return <Chat originUrl={searchParams.origin ?? ''} />;\n}\n```\n\nA close affordance inside the embed posts a message up to the parent (§6): `window.parent.postMessage({ type: 'covalba-chat-close' }, '*')`.\n\n---\n\n## 6. Embeddable widget (`chatbot/public/widget.js`, ~100 lines vanilla)\n\nSingle `<script src=\"https://<app>.vercel.app/widget.js\" async></script>` on WordPress. Injects a floating bubble bottom-right labeled \"Une question ?\". Click → opens an iframe to `/embed?origin=<encoded host URL>`. Iframe gives full CSS isolation from the WordPress theme. Responsive: desktop = fixed panel bottom-right; mobile = full-screen. Close via button inside (postMessage) or the bubble toggle.\n\nStructure:\n```js\n(function () {\n  var APP = 'https://covalba-chatbot.vercel.app'; // self-derive from script src in real impl\n  var origin = encodeURIComponent(location.href);\n  // 1. bubble button (fixed, bottom-right, z-index high)\n  // 2. iframe (hidden by default): src = APP + '/embed?origin=' + origin\n  //    style: desktop 380x600 panel; mobile (max-width:480px) → 100vw/100vh\n  // 3. toggle open/close on bubble click\n  // 4. window.addEventListener('message', e => { if (e.origin===APP && e.data?.type==='covalba-chat-close') close(); })\n  // 5. derive APP from document.currentScript.src so the host/port is never hard-coded\n})();\n```\n\nKey points:\n- **Derive `APP` from `document.currentScript.src`** — never hard-code the deploy URL (matches repo rule about not hard-coding host/port).\n- `origin=location.href` → lets `capture_lead` record which page the lead came from.\n- `postMessage` close pattern with origin check (`e.origin === APP`).\n- CSS injected inline via a `<style>` tag scoped to a unique id prefix to avoid clobbering theme styles; the chat itself is fully isolated in the iframe.\n\n**Embedding headers — `/embed` must be framable.** Next.js default has no `X-Frame-Options`, but be explicit. In `chatbot/next.config.ts`, set headers for `/embed` and `/widget.js`:\n\n```ts\nasync headers() {\n  return [\n    {\n      source: '/embed',\n      headers: [\n        // allow framing from the WordPress origin(s); use CSP frame-ancestors (X-Frame-Options can't list multiple hosts)\n        { key: 'Content-Security-Policy', value: \"frame-ancestors 'self' https://*.covalba.com https://covalba.com;\" },\n      ],\n    },\n    {\n      source: '/widget.js',\n      headers: [\n        { key: 'Access-Control-Allow-Origin', value: '*' }, // widget.js is a static script, CORS-open\n        { key: 'Cache-Control', value: 'public, max-age=300' },\n      ],\n    },\n  ];\n}\n```\n\n`/api/chat` is same-origin (called from inside the iframe which is served by the app) → **no CORS needed on the API**. That's the whole point of the iframe approach. Do NOT set `X-Frame-Options: DENY` anywhere. `frame-ancestors` is the correct mechanism (X-Frame-Options is single-origin only).\n\n---\n\n## 7. Rate limiting (`chatbot/lib/rate-limit.ts`) — pluggable\n\nIn-memory token bucket by IP for local POC; Upstash Redis when env vars present. Vercel KV is deprecated — use Upstash directly.\n\n```ts\n// lib/rate-limit.ts\nlet limiter: ((ip: string) => Promise<{ success: boolean }>) | null = null;\n\nasync function getLimiter() {\n  if (limiter) return limiter;\n  if (process.env.UPSTASH_REDIS_REST_URL && process.env.UPSTASH_REDIS_REST_TOKEN) {\n    const { Ratelimit } = await import('@upstash/ratelimit');\n    const { Redis } = await import('@upstash/redis');\n    const rl = new Ratelimit({\n      redis: Redis.fromEnv(),\n      limiter: Ratelimit.slidingWindow(20, '1 m'), // 20 msgs/min/IP\n    });\n    limiter = async (ip) => ({ success: (await rl.limit(ip)).success });\n  } else {\n    // in-memory token bucket fallback (per-instance; fine for POC)\n    const buckets = new Map<string, { tokens: number; ts: number }>();\n    const CAP = 20, REFILL = 20 / 60_000; // 20/min\n    limiter = async (ip) => {\n      const now = Date.now();\n      const b = buckets.get(ip) ?? { tokens: CAP, ts: now };\n      b.tokens = Math.min(CAP, b.tokens + (now - b.ts) * REFILL);\n      b.ts = now;\n      if (b.tokens < 1) { buckets.set(ip, b); return { success: false }; }\n      b.tokens -= 1; buckets.set(ip, b);\n      return { success: true };\n    };\n  }\n  return limiter;\n}\n\nexport async function rateLimit(ip: string) {\n  return (await getLimiter())(ip);\n}\n```\n\nHooks in at the top of the `POST` handler (§3) before any model call. In-memory buckets are per-serverless-instance (not global) — acceptable for a POC; Upstash gives true global limiting in prod.\n\n---\n\n## 8. File tree (`chatbot/`)\n\n```\nchatbot/\n├── package.json                 # own deps, independent project\n├── next.config.ts               # frame-ancestors headers for /embed, CORS for /widget.js\n├── tsconfig.json\n├── .env.example                 # documented env vars (§9)\n├── knowledge.md                 # the ONLY knowledge source, injected into system prompt\n├── README.md                    # gateway/OIDC vs local, deploy-as-separate-project notes\n├── app/\n│   ├── layout.tsx               # minimal root layout\n│   ├── api/\n│   │   └── chat/route.ts        # streamText handler + rate limit + tools  (§3)\n│   └── embed/\n│       ├── page.tsx             # server: reads ?origin, renders <Chat/>     (§5)\n│       └── Chat.tsx             # client: useChat UI, tool-part rendering    (§5)\n├── lib/\n│   ├── model.ts                 # gateway vs direct provider resolver        (§1)\n│   ├── system-prompt.ts         # builds SYSTEM_PROMPT from knowledge.md      (§2)\n│   ├── rate-limit.ts            # in-memory + Upstash limiter                 (§7)\n│   └── tools/\n│       ├── routes.ts            # Covalba route map (single source of truth)  (§4)\n│       ├── suggest-page.ts      # suggest_page tool                           (§4)\n│       ├── capture-lead.ts      # capture_lead factory (Resend + log)         (§4)\n│       └── log-unanswered.ts    # log_unanswered tool                         (§4)\n└── public/\n    └── widget.js                # embeddable floating bubble + iframe          (§6)\n```\n\n---\n\n## 9. Env vars (`chatbot/.env.example`)\n\n```bash\n# --- AI Gateway (prod: provided automatically via OIDC; local: set this OR rely on ANTHROPIC_API_KEY) ---\nAI_GATEWAY_API_KEY=            # Vercel AI Gateway key. On Vercel deploys, OIDC covers this automatically.\n\n# --- Direct Anthropic fallback (pure-local dev without gateway access) ---\nANTHROPIC_API_KEY=            # sk-ant-...  used only when no gateway key/OIDC token present\n\n# --- Resend (lead emails). If absent, capture_lead logs only and degrades gracefully. ---\nRESEND_API_KEY=              # re_...\nRESEND_FROM=                 # \"Covalba Bot <bot@covalba.com>\" (verified sender/domain)\nLEAD_INBOX=                  # destination inbox, e.g. contact@covalba.com\n\n# --- Rate limiting (optional; in-memory fallback if unset) ---\nUPSTASH_REDIS_REST_URL=\nUPSTASH_REDIS_REST_TOKEN=\n\n# --- Site base URL for suggest_page links (no trailing slash) ---\nNEXT_PUBLIC_SITE_BASE_URL=https://www.covalba.com\n```\n\n---\n\n## 10. Dependencies (`chatbot/package.json`)\n\nInstall (versions: pin to current major lines; `ai` v6, React 19 / Next 15+ which is App-Router default):\n\n```bash\nnpm i ai@^6 @ai-sdk/react@^2 @ai-sdk/anthropic@^2 zod@^3 resend@^4 \\\n      @upstash/ratelimit@^2 @upstash/redis@^1 next@^15 react@^19 react-dom@^19\nnpm i -D typescript @types/react @types/node\n```\n\n- `ai` — core (`streamText`, `tool`, `convertToModelMessages`, `stepCountIs`, `Output`, `DefaultChatTransport`).\n- `@ai-sdk/react` — `useChat`.\n- `@ai-sdk/anthropic` — direct provider fallback (`createAnthropic`). Gateway path needs no extra provider package (plain string).\n- `zod` — tool input schemas.\n- `resend` — lead emails (dynamically imported so absence of key/package never blocks).\n- `@upstash/ratelimit` + `@upstash/redis` — optional prod rate limiting.\n- `next` / `react` / `react-dom` — the app.\n\nConfirm exact published versions at install time (`npm view ai version`); the majors above match current v6 / @ai-sdk/react v2 / @ai-sdk/anthropic v2.\n\n---\n\n## 11. Build order (when implementation begins)\n\n1. Scaffold `chatbot/` (package.json, next.config.ts, tsconfig, minimal layout).\n2. `knowledge.md` placeholder + `lib/system-prompt.ts` + `lib/model.ts`.\n3. Tools (`routes.ts`, three tools).\n4. `app/api/chat/route.ts` + `lib/rate-limit.ts`.\n5. `app/embed/` (page + Chat client) — test in-browser at `/embed`.\n6. `public/widget.js` + `next.config.ts` headers — test embedding from a scratch HTML page.\n7. README (OIDC vs local, deploy as separate Vercel project root=`chatbot/`).\n8. Deploy: new Vercel project, root directory `chatbot/`, env vars, OIDC auto. Add WordPress origin to `frame-ancestors`.\n\n---\n\n## Open questions to confirm before coding\n- Real Covalba production base URL (`NEXT_PUBLIC_SITE_BASE_URL`) — assumed `https://www.covalba.com`.\n- Exact WordPress origin(s) to allow in `frame-ancestors` (prod domain + any staging).\n- Verified Resend sender domain.\n- Content of `knowledge.md` (who provides it).\n```"},"caller":{"type":"direct"}}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":2,"cache_creation_input_tokens":816,"cache_read_input_tokens":222037,"cache_creation":{"ephemeral_5m_input_tokens":816,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBFwsydHozsX2JNX8PH","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"7b13606d-05ce-47d3-8466-88823f0ac57b","timestamp":"2026-06-07T16:02:00.173Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"7b13606d-05ce-47d3-8466-88823f0ac57b","isSidechain":true,"promptId":"ec938d38-8bb0-4964-bb2c-502c6ee78f9d","agentId":"aed0c7b7da27aaf24","type":"user","message":{"role":"user","content":[{"tool_use_id":"toolu_01B8mu5fiYVGokB2WRPBU8F9","type":"tool_result","content":"File created successfully at: /home/dev/.claude/plans/brief-chatbot-pure-swing-agent-aed0c7b7da27aaf24.md (file state is current in your context — no need to Read it back)"}]},"uuid":"6fa2d31a-36a1-42a4-a61a-9117faea069b","timestamp":"2026-06-07T16:02:00.275Z","sourceToolAssistantUUID":"7b13606d-05ce-47d3-8466-88823f0ac57b","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}
{"parentUuid":"6fa2d31a-36a1-42a4-a61a-9117faea069b","isSidechain":true,"agentId":"aed0c7b7da27aaf24","message":{"model":"claude-opus-4-8","id":"msg_01EgesMXc8stv2fVt19xSscn","type":"message","role":"assistant","content":[{"type":"text","text":"I've written the full implementation design to the plan file. Here's the concrete design, with the API surface verified against current Vercel AI Gateway / AI SDK v6 docs.\n\n## Key correction flagged up front\n\n**The gateway Haiku id is dotted: `anthropic/claude-haiku-4.5`** — confirmed live from `GET https://ai-gateway.vercel.sh/v1/models`. The brief's `anthropic/claude-haiku-4-5` (hyphenated) is wrong **for the gateway**. The hyphenated `claude-haiku-4-5` is the id for the **direct** `@ai-sdk/anthropic` provider. The design keeps both, one per code path.\n\n## 1. Model wiring\nPlain string `'anthropic/claude-haiku-4.5'` routes through the gateway. On Vercel deploys, OIDC is automatic (`VERCEL_OIDC_TOKEN` injected) — no `AI_GATEWAY_API_KEY` needed in prod. Local: `AI_GATEWAY_API_KEY` (correct env var per brief), or fall back to `createAnthropic({ apiKey: ANTHROPIC_API_KEY })('claude-haiku-4-5')`. A `getModel()` helper picks based on whether a gateway key/OIDC token is present.\n\n## 2. Prompt caching\nVerified two mechanisms, both confirmed working through the gateway: per-message-part `providerOptions.anthropic.cacheControl = { type: 'ephemeral' }` on the system message, **plus** the gateway shortcut `providerOptions.gateway.caching = 'auto'`. Both included for belt-and-suspenders.\n\n## 3–4. Route handler + tools\n`streamText` + `convertToModelMessages` + `stopWhen: stepCountIs(5)` + `toUIMessageStreamResponse()` (all v6-correct; verified `stopWhen` IS supported on `streamText`, not just `ToolLoopAgent`). Three `tool()` definitions with zod: `suggest_page` (enum over a fixed `ROUTES` map of all your real Covalba routes), `capture_lead` (factory injecting `originUrl`; console.log + dynamically-imported Resend with graceful degradation when `RESEND_API_KEY` is absent), `log_unanswered` (structured console.log visible in Vercel logs).\n\n## 5. Frontend\nv6 `useChat` from `@ai-sdk/react` + `DefaultChatTransport` → `/api/chat`. Input managed via `useState` (v6 removed managed input). Renders text parts and `tool-suggest_page` parts (state `output-available`) as `target=\"_top\"` links that break the iframe.\n\n## 6. Embeddable widget (key deliverable)\n~100-line vanilla `public/widget.js`: floating \"Une question ?\" bubble → iframe to `/embed?origin=<encoded host URL>` for CSS isolation. `APP` derived from `document.currentScript.src` (never hard-coded). `postMessage` close pattern with origin check. Framing handled via **CSP `frame-ancestors`** in `next.config.ts` (not `X-Frame-Options`, which can't list multiple hosts). `/api/chat` is same-origin from inside the iframe → no CORS needed.\n\n## 7. Rate limiting\nPluggable: in-memory token bucket by IP for POC, Upstash `@upstash/ratelimit` when env vars present. Noted Vercel KV is deprecated. Hooks in at the top of `POST`.\n\n## 8–10\nFull file tree, `.env.example` (all vars with comments), and exact deps (`ai@^6`, `@ai-sdk/react@^2`, `@ai-sdk/anthropic@^2`, `zod`, `resend`, `@upstash/ratelimit`+`@upstash/redis`, `next`/`react`).\n\nPlan file: `/home/dev/.claude/plans/brief-chatbot-pure-swing-agent-aed0c7b7da27aaf24.md`\n\nA few open items worth confirming before coding (in the plan): the real production base URL for `suggest_page` links, the exact WordPress origin(s) for `frame-ancestors`, the verified Resend sender domain, and who supplies `knowledge.md`. One implementation detail to verify against the installed `ai` version: the exact transport hook name for injecting `originUrl` into the request body (`prepareSendMessagesRequest` vs passing `body` on `sendMessage`)."}],"stop_reason":null,"stop_sequence":null,"stop_details":null,"usage":{"input_tokens":131,"cache_creation_input_tokens":10378,"cache_read_input_tokens":222853,"cache_creation":{"ephemeral_5m_input_tokens":10378,"ephemeral_1h_input_tokens":0},"output_tokens":2,"service_tier":"standard","inference_geo":"not_available"},"diagnostics":null},"requestId":"req_011CbpBQnPUkHotDG6vbSmgr","attributionAgent":"vercel:ai-architect","attributionPlugin":"vercel","attributionMcpServer":"claude.ai Vercel","attributionMcpTool":"search_vercel_documentation","type":"assistant","uuid":"3bfcea66-acce-4a8f-8053-9f7d57a7997b","timestamp":"2026-06-07T16:02:29.200Z","userType":"external","entrypoint":"cli","cwd":"/opt/projects/.covalba-refonte-worktrees/tom-test-chatbot","sessionId":"df169d01-2f45-42ae-bcea-94edf9cd2c59","version":"2.1.168","gitBranch":"tom/test-chatbot","slug":"brief-chatbot-pure-swing"}