
Azure Bing Grounding: Web Search in AI Responses
Learn how to add real-time web search capabilities to your Azure AI agents with Bing Grounding, complete with citations and production-ready code.
Azure Bing Grounding: Web Search in AI Responses
Your AI agent can answer questions about your data, but what happens when users ask about current events, recent product releases, or breaking news? The model's knowledge cutoff becomes a hard wall. You could build a custom web scraping system, but that's complex, unreliable, and potentially violates terms of service.
Enter Azure Bing Grounding. It's Microsoft's answer to giving AI agents access to real-time web information, complete with automatic citations and source verification. No web scraping, no custom search indexing, no complex infrastructure. Just add a tool to your agent and let it search when needed.
In this guide, we'll explore how to provision Bing Grounding with Terraform, integrate it with Azure AI Foundry projects, and build production-ready agents that cite their sources.
What is Bing Grounding and Why It Matters
Bing Grounding is an Azure service that allows AI agents to search the web using Bing and incorporate results directly into their responses. Unlike traditional RAG (Retrieval Augmented Generation) systems that search your private data, Bing Grounding searches the public web in real-time.
Key Benefits
Real-time Information: Agents can access current information beyond their training data cutoff. Market prices, weather, news, documentation updates—anything publicly available on the web.
Automatic Citations: Every piece of information from Bing includes a citation with title and URL. This provides transparency and allows users to verify sources, which is critical for trust and compliance.
Agent-Controlled Search: The model decides when to search based on the user's question. You don't need to detect search intent manually—the agent handles it intelligently.
Cost Optimization: A single Bing Grounding resource can be shared across multiple environments (development, staging, production), reducing costs while maintaining isolation at the project level.
How It Works
When you enable Bing Grounding on an Azure AI agent:
- User asks a question
- Agent determines if web search would be helpful
- Agent calls Bing Grounding tool automatically
- Bing returns search results with URLs
- Agent incorporates results into response
- Citations are extracted from annotations
The agent decides when to search based on the context. Ask about last week's tech news, and it searches. Ask about your application's business logic, and it uses its base knowledge.
Provisioning Bing Grounding with Terraform
Bing Grounding requires the Microsoft.Bing/accounts resource type with kind Bing.Grounding. This is different from the older Bing Search API v7 that uses Cognitive Services.
Setting Up the Resource Group
First, create a dedicated resource group for Bing. This allows you to share the Bing resource across environments while maintaining isolation.
# Resource Group for Bing grounding (on Bing subscription)
# Shared across environments for cost optimization
# Dev creates it, Prod imports it via data source
resource "azurerm_resource_group" "bing_grounding" {
count = var.env == "dev" ? 1 : 0
provider = azurerm.bing
name = "shared-bing-rg"
location = "westeurope" # Must be a real region
tags = {
environment = "shared"
project = "ai-platform"
managedBy = "terraform"
purpose = "shared-grounding"
}
}
# Data source to reference existing Bing RG in prod
data "azurerm_resource_group" "bing_grounding_existing" {
count = var.env == "prod" ? 1 : 0
provider = azurerm.bing
name = "shared-bing-rg"
}
# Local to get the resource group ID regardless of environment
locals {
bing_rg_id = var.env == "dev" ?
azurerm_resource_group.bing_grounding[0].id :
data.azurerm_resource_group.bing_grounding_existing[0].id
}
This pattern creates the resource group in development and references it in production, avoiding duplicate costs.
Creating the Bing Grounding Resource
Use the azapi provider to create the Bing Grounding resource. The standard azurerm provider doesn't support this resource type yet.
# Bing Grounding resource
# Uses Microsoft.Bing/accounts API (not CognitiveServices Bing.Search.v7)
# Only created in dev, shared with prod via data source
resource "azapi_resource" "bing_grounding" {
count = var.env == "dev" ? 1 : 0
type = "Microsoft.Bing/accounts@2020-06-10"
name = "shared-bing-grounding"
parent_id = local.bing_rg_id
location = "global"
schema_validation_enabled = false
body = {
kind = "Bing.Grounding"
sku = {
name = "G1"
}
properties = {
statisticsEnabled = false
}
}
tags = {
environment = var.env
project = "ai-platform"
managedBy = "terraform"
purpose = "shared-grounding"
}
response_export_values = ["properties.endpoint"]
}
# Data source to reference existing Bing Grounding in prod
data "azapi_resource" "bing_grounding_existing" {
count = var.env == "prod" ? 1 : 0
type = "Microsoft.Bing/accounts@2020-06-10"
name = "shared-bing-grounding"
parent_id = local.bing_rg_id
response_export_values = ["properties.endpoint"]
}
# Locals to access Bing resources regardless of environment
locals {
bing_grounding_id = var.env == "dev" ?
azapi_resource.bing_grounding[0].id :
data.azapi_resource.bing_grounding_existing[0].id
bing_grounding_endpoint = var.env == "dev" ?
azapi_resource.bing_grounding[0].output.properties.endpoint :
data.azapi_resource.bing_grounding_existing[0].output.properties.endpoint
}
Important Notes:
- Location must be
"global"for Bing Grounding - SKU
G1is the standard tier for grounding - Only dev environment creates the resource; prod references it
Retrieving API Keys
Use the listKeys action to retrieve the API keys securely:
# Get Bing API keys using listKeys action
data "azapi_resource_action" "bing_keys" {
type = "Microsoft.Bing/accounts@2020-06-10"
resource_id = local.bing_grounding_id
action = "listKeys"
response_export_values = ["*"]
}
output "bing_grounding_key" {
description = "Bing grounding API key (sensitive)"
value = data.azapi_resource_action.bing_keys.output.key1
sensitive = true
}
Store the key in Azure Key Vault or as a secret in your deployment pipeline. Never commit it to source control.
Connecting Bing to Azure AI Foundry Projects
Once you have the Bing resource provisioned, connect it to your AI Foundry project. This makes Bing Grounding available as a tool for agents in that project.
Creating the Connection
Use the Microsoft.CognitiveServices/accounts/connections API to create the connection:
# Connection: Bing Grounding to AI Foundry Project
# This enables Bing grounding capabilities for AI agents in the project
# Compatible with GPT-4o, GPT-3.5-turbo, GPT-4, GPT-4-turbo
resource "azapi_resource" "bing_connection" {
type = "Microsoft.CognitiveServices/accounts/connections@2025-06-01"
name = "bing-grounding-connection"
parent_id = azurerm_cognitive_account.ai_foundry.id
schema_validation_enabled = false
body = {
properties = {
category = "ApiKey"
target = local.bing_grounding_endpoint
authType = "ApiKey"
isSharedToAll = true
metadata = {
Location = "westeurope"
ApiType = "Azure"
ResourceId = local.bing_grounding_id
}
credentials = {
key = data.azapi_resource_action.bing_keys.output.key1
}
}
}
depends_on = [
data.azapi_resource_action.bing_keys,
azapi_resource.ai_foundry_project
]
}
Key Properties:
category: Must be"ApiKey"for Bing GroundingisSharedToAll: Set totrueto make available to all agents in the projectcredentials.key: The API key retrieved from the Bing resource
Model Compatibility
Bing Grounding works with most Azure OpenAI models:
- ✅ gpt-4o
- ✅ gpt-4-turbo
- ✅ gpt-4
- ✅ gpt-3.5-turbo
- ❌ gpt-4o-mini-2024-07-18 (not compatible)
If you need to use gpt-4o-mini, verify compatibility with the latest Azure AI documentation.
Building an Agent Service with Bing Grounding
With infrastructure provisioned, let's build a TypeScript service that creates agents with Bing Grounding enabled.
Service Configuration
First, configure environment variables for authentication and project access:
// Environment variables needed:
// - PROJECT_ENDPOINT: Azure AI Foundry project endpoint
// - MODEL_DEPLOYMENT_NAME: Name of your GPT-4o deployment
// - BING_CONNECTION_ID: Connection ID from Terraform output
// - AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET (local dev)
// - Or use Managed Identity in production
import { AgentsClient, ToolUtility } from '@azure/ai-agents';
import { DefaultAzureCredential } from '@azure/identity';
export class AzureAgentService {
private client: AgentsClient;
private modelDeploymentName: string;
private bingConnectionId?: string;
constructor() {
const projectEndpoint = process.env.PROJECT_ENDPOINT;
this.modelDeploymentName = process.env.MODEL_DEPLOYMENT_NAME;
this.bingConnectionId = process.env.BING_CONNECTION_ID;
if (!projectEndpoint || !this.modelDeploymentName) {
throw new Error('PROJECT_ENDPOINT and MODEL_DEPLOYMENT_NAME environment variables are required');
}
// DefaultAzureCredential supports:
// 1. EnvironmentCredential (local dev)
// 2. ManagedIdentityCredential (production)
// 3. AzureCliCredential (fallback)
const credential = new DefaultAzureCredential();
this.client = new AgentsClient(projectEndpoint, credential);
}
}
Creating Agents with Bing Grounding
Enable Bing Grounding by adding it as a tool during agent creation:
interface AgentConfig {
name: string;
instructions?: string;
enableBingGrounding?: boolean; // Default: true
}
async createAgent(config: AgentConfig) {
const tools = [];
// Always add Bing grounding tool unless explicitly disabled
const shouldEnableBing = config.enableBingGrounding !== false;
if (shouldEnableBing && this.bingConnectionId) {
const bingTool = ToolUtility.createBingGroundingTool([
{
connectionId: this.bingConnectionId,
count: 20, // Number of search results (max: 50, default: 5)
},
]);
tools.push(bingTool.definition);
}
const agent = await this.client.createAgent(this.modelDeploymentName, {
name: config.name,
instructions: config.instructions,
tools: tools.length > 0 ? tools : undefined,
temperature: 0.2, // Low temperature for consistent, factual responses
});
return agent;
}
Important Parameters:
count: Number of search results to retrieve (1-50). Higher values provide more context but use more tokens. 20 is a good balance for quality.temperature: Lower values (0.1-0.3) work better for factual responses with citations.
Configuring Search Result Count
The count parameter controls how many Bing search results the agent can access. This directly impacts response quality and cost:
Search result count options:
- 5 results (default) - Simple queries, cost-sensitive (~500 tokens)
- 10 results - Standard queries, balanced (~1,000 tokens)
- 20 results - Complex queries, high quality (~2,000 tokens)
- 50 results (maximum) - Research tasks, comprehensive (~5,000 tokens)
Higher counts improve accuracy for complex queries but increase prompt tokens significantly. Start with 10-20 and adjust based on your use case.
Sending Messages and Extracting Citations
Once you have an agent with Bing Grounding, you can send messages and extract citations from responses.
Sending Messages
Create a thread, send a message, and poll for completion:
async sendMessage(
agentId: string,
userMessage: string
): Promise<AgentResponse> {
// Create a thread for this conversation
const thread = await this.client.threads.create();
try {
// Create user message
await this.client.messages.create(thread.id, 'user', userMessage);
// Create and poll run
const runStatus = await this.client.runs.createAndPoll(thread.id, agentId, {
pollingOptions: {
intervalInMs: 15000, // 15 seconds to avoid rate limits
},
});
// Check if run failed
if (runStatus.status === 'failed') {
const errorMessage = runStatus.lastError?.message || 'Unknown error';
throw new Error(`Agent run failed: ${errorMessage}`);
}
// Get the agent's response with citations
const response = await this.getLatestAgentMessage(thread.id);
return {
content: response.content,
citations: response.citations,
metadata: {
agentId,
threadId: thread.id,
runId: runStatus.id,
tokensUsed: runStatus.usage ? {
prompt: runStatus.usage.promptTokens,
completion: runStatus.usage.completionTokens,
total: runStatus.usage.totalTokens,
} : undefined,
},
};
} finally {
// Clean up thread
await this.client.threads.delete(thread.id);
}
}
Polling Considerations:
- Use a longer polling interval (15+ seconds) to avoid rate limit exhaustion
- Azure AI Foundry has rate limits on API calls (e.g., 400 capacity = 40 req/sec)
- Adjust based on your deployment's capacity and concurrent operations
Extracting Citations from Responses
Citations are embedded in message annotations. Extract them using type guards:
import type {
MessageTextContent,
ThreadMessage,
MessageTextUrlCitationAnnotation,
} from '@azure/ai-agents';
private extractMessageContent(message: ThreadMessage): {
content: string;
citations: Array<{ title: string; url: string }>;
} {
const contentParts: string[] = [];
const citations: Array<{ title: string; url: string }> = [];
for (const content of message.content) {
if (isOutputOfType<MessageTextContent>(content, 'text')) {
contentParts.push(content.text.value);
// Extract citations from annotations
if (content.text.annotations) {
for (const annotation of content.text.annotations) {
if (
isOutputOfType<MessageTextUrlCitationAnnotation>(
annotation,
'url_citation'
)
) {
citations.push({
title: annotation.urlCitation.title || annotation.urlCitation.url,
url: annotation.urlCitation.url,
});
}
}
}
}
}
return {
content: contentParts.join('\n'),
citations,
};
}
Citation Structure:
- Each citation includes a title and URL
- Title defaults to URL if not provided
- Citations maintain order of appearance in the response
- Duplicate URLs may appear if referenced multiple times
Example Usage
Here's how to use the service for a one-off query:
const agentService = new AzureAgentService();
// Create an agent with Bing Grounding (enabled by default)
const agent = await agentService.createAgent({
name: 'research-assistant',
instructions:
'You are a helpful research assistant. Use Bing grounding to provide accurate, up-to-date information with citations.',
});
try {
// Send a query
const response = await agentService.sendMessage(
agent.id,
'What are the latest developments in quantum computing from the past month?',
);
console.log('Response:', response.content);
console.log('Citations:');
response.citations?.forEach((citation, index) => {
console.log(`${index + 1}. ${citation.title}`);
console.log(` ${citation.url}`);
});
console.log('Tokens used:', response.metadata.tokensUsed?.total);
} finally {
// Clean up
await agentService.deleteAgent(agent.id);
}
Example Output:
Response: Recent developments in quantum computing include IBM's announcement
of a 1,121-qubit quantum processor, Google's breakthrough in quantum error
correction, and Microsoft's release of Azure Quantum Elements for chemistry
simulations.
Citations:
1. IBM Unveils 1,121-Qubit Quantum Processor
https://www.ibm.com/quantum/news/1121-qubit-processor
2. Google achieves quantum error correction milestone
https://blog.google/technology/quantum-ai/quantum-error-correction/
3. Microsoft Azure Quantum Elements for chemistry
https://azure.microsoft.com/en-us/products/quantum/elements/
Tokens used: 3245
Cost Considerations and Pricing Tiers
Bing Grounding pricing consists of two components: the Bing resource itself and the tokens consumed during agent runs.
Bing Resource Pricing
The G1 SKU for Bing Grounding is billed per 1,000 transactions (queries):
- G1 Tier: Approximately $7 per 1,000 queries
- No minimum commitment: Pay only for what you use
- Shared resource: One resource can serve multiple projects and environments
Token Consumption
Search results add to prompt tokens. Typical token usage:
Typical token usage by search result count:
- 5 results - ~500 tokens added
- 10 results - ~1,000 tokens added
- 20 results - ~2,000 tokens added
- 50 results - ~5,000 tokens added
With GPT-4o pricing at ~$5 per 1M prompt tokens, 20 search results add roughly $0.01 per query in token costs.
Total Cost Example
For 10,000 queries per month with 20 search results each:
- Bing Grounding: 10,000 queries × ($7 / 1,000) = $70
- Token overhead: 10,000 queries × 2,000 tokens × ($5 / 1M) = $100
- Total: ~$170/month for grounded search
Compare this to building a custom web search solution with scraping infrastructure, storage, and maintenance. Bing Grounding is significantly cheaper and more reliable.
Sharing Bing Resource Across Environments
A single Bing Grounding resource can be shared across multiple environments (dev, staging, prod) while maintaining project-level isolation. This reduces costs without sacrificing security.
How Resource Sharing Works
The shared Bing Grounding resource sits at the top of your infrastructure hierarchy. You create it once in the development environment, and then each environment's AI Foundry project - whether dev, staging, production, or disaster recovery - connects to that same resource. Each project maintains its own agents, threads, and conversation history, so there's no data leakage between environments. You get the cost savings of a single Bing resource while preserving complete isolation at the application layer.
Terraform Pattern for Sharing
Use conditional resource creation and data sources:
# Dev creates the resource
resource "azapi_resource" "bing_grounding" {
count = var.env == "dev" ? 1 : 0
# ... resource configuration
}
# Other environments reference it
data "azapi_resource" "bing_grounding_existing" {
count = var.env != "dev" ? 1 : 0
type = "Microsoft.Bing/accounts@2020-06-10"
name = "shared-bing-grounding"
# ... rest of configuration
}
# Each environment creates its own connection
resource "azapi_resource" "bing_connection" {
# All environments create their own connection
# but reference the same Bing resource
parent_id = azurerm_cognitive_account.ai_foundry.id
# ... connection configuration
}
Security Considerations
Project Isolation: Each AI Foundry project has its own threads, agents, and conversation history. Sharing the Bing resource does not compromise data isolation.
API Key Rotation: When rotating Bing API keys, update connections in all environments. Use a secret management system like Azure Key Vault for centralized key distribution.
Rate Limiting: All environments share the same rate limits on the Bing resource. Monitor usage to avoid throttling in production due to dev/staging activity.
Cost Savings
Sharing one Bing resource across 4 environments (dev, staging, prod, DR):
- Without Sharing: 4 × $7/1,000 queries = $28 per 1,000 queries (if each used separately)
- With Sharing: 1 × $7/1,000 queries = $7 per 1,000 queries
- Savings: 75% reduction in Bing resource costs
This is one of the most effective cost optimizations for multi-environment deployments.
Comparing Bing Grounding with Other Approaches
Bing Grounding is one of several ways to provide AI agents with additional context. Understanding the tradeoffs helps you choose the right tool.
Bing Grounding vs. RAG (Retrieval Augmented Generation)
RAG: Private Data Search
- Searches your own documents, databases, knowledge bases
- Full control over content and indexing
- No internet access required
- Ideal for internal company knowledge, proprietary data
Bing Grounding: Public Web Search
- Searches the public web via Bing
- Access to real-time information and current events
- Automatic citations from web sources
- Ideal for current events, market data, public documentation
When to Use Both: Use RAG for internal knowledge (product specs, company policies) and Bing Grounding for external context (market trends, competitor analysis, current events).
// Example: Combining RAG and Bing Grounding
const agent = await agentService.createAgent({
name: 'hybrid-assistant',
instructions: `You are a business intelligence assistant.
Use Azure AI Search (RAG) for:
- Internal sales data
- Product specifications
- Company policies
Use Bing Grounding for:
- Competitor news and announcements
- Market trends and analysis
- Industry regulations and updates`,
// Enable both tools
enableBingGrounding: true,
// RAG would be added here via Azure AI Search connection
});
Bing Grounding vs. Custom Web Tools
Custom Web Tools (Function Calling)
- You write the search/scraping logic
- Full control over sources and parsing
- Complex to maintain and scale
- Legal and ethical considerations for scraping
Bing Grounding
- Microsoft handles search infrastructure
- Built-in citation extraction
- No scraping or legal concerns
- Limited customization (no custom search filters)
When to Use Custom Tools: When you need to search specific sites (e.g., internal documentation hosted externally), parse non-standard formats, or have specific data extraction requirements that Bing doesn't support.
Bing Grounding vs. Web Browser Tools
Some AI frameworks provide browser automation tools (Selenium, Playwright) for agents. These let agents navigate websites interactively.
Web Browser Tools
- Full browser automation (click, scroll, fill forms)
- Can access content behind authentication
- Very slow (seconds to minutes per page)
- High token consumption (full page HTML)
- Complex error handling
Bing Grounding
- Fast search results (sub-second)
- Optimized for token efficiency
- No authentication support
- Limited to search results (can't interact with pages)
When to Use Browser Tools: For tasks requiring interaction (filling forms, clicking through multi-step processes) or accessing authenticated content. For simple information retrieval, Bing Grounding is faster and cheaper.
Comparison Summary
Bing Grounding:
- Data source: Public web
- Latency: Low (~1s)
- Token cost: Medium
- Citations: Automatic
- Real-time: Yes
- Setup complexity: Low
- Best for: Current events, public info
RAG (Retrieval Augmented Generation):
- Data source: Your data
- Latency: Very low (under 500ms)
- Token cost: Low
- Citations: Manual
- Real-time: No
- Setup complexity: Medium
- Best for: Internal knowledge
Custom Web Tools:
- Data source: Custom
- Latency: Medium (2-5s)
- Token cost: Medium
- Citations: Manual
- Real-time: Yes
- Setup complexity: High
- Best for: Specific sources
Browser Tools:
- Data source: Any website
- Latency: High (10-60s)
- Token cost: Very high
- Citations: Manual
- Real-time: Yes
- Setup complexity: Very high
- Best for: Interactive tasks
Key Takeaways
- Bing Grounding adds real-time web search to Azure AI agents with automatic citations
- Provision with Terraform using
Microsoft.Bing/accountsand kindBing.Grounding - Connect to AI Foundry projects via
Microsoft.CognitiveServices/accounts/connections - Configure search result count (1-50) based on quality needs and token budget
- Extract citations from message annotations for transparency and verification
- Share one Bing resource across environments to reduce costs by 75%
- Use Bing Grounding for public web data, RAG for private data, and combine both for comprehensive coverage
- Monitor token consumption: higher search counts improve quality but increase costs
Ready to add web search to your AI agents? Start with the official Azure AI Agents documentation and explore the full range of agent tools.
For more Azure AI content, check out our guide on building production-ready AI applications with Azure AI Foundry.