Data Scraping and Extraction

BrowserQL offers three main approaches to extracting data: grab the page HTML with the html mutation, map DOM elements to a structured JSON with mapSelector or querySelectorAll, or intercept raw API responses with the response mutation. Choose the approach that fits your downstream processing needs.

Basic Usage

The html mutation returns the full page HTML. Wait for the page to load before extracting to avoid empty results.

mutation ExtractHTML {
  goto(url: "https://www.browserless.io/", waitUntil: domContentLoaded) {
    status
  }

  html {
    html
  }
}

curl --request POST \
  --url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{"query":"mutation ExtractHTML {\n  goto(url: \"https://www.browserless.io/\", waitUntil: domContentLoaded) {\n    status\n  }\n\n  html {\n    html\n  }\n}","variables":{},"operationName":"ExtractHTML"}'

const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";

const options = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: `mutation ExtractHTML {
  goto(url: "https://www.browserless.io/", waitUntil: domContentLoaded) {
    status
  }

  html {
    html
  }
}`,
    variables: "",
    operationName: "ExtractHTML",
  })
};

const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();

console.log(data);

import requests

endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
    "token": "YOUR_API_TOKEN_HERE",
}
headers = {
    "Content-Type": "application/json",
}
payload = {
    "query": """mutation ExtractHTML {
  goto(url: "https://www.browserless.io/", waitUntil: domContentLoaded) {
    status
  }

  html {
    html
  }
}""",
    "variables": {},
    "operationName": "ExtractHTML",
}

response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())

String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);

HttpResponse<String> response = Unirest.post(endpoint)
    .header("Content-Type", "application/json")
    .body("{\"query\":\"mutation ExtractHTML {\\n  goto(url: \\\"https://www.browserless.io/\\\", waitUntil: domContentLoaded) {\\n    status\\n  }\\n\\n  html {\\n    html\\n  }\\n}\",\"variables\":\"\",\"operationName\":\"ExtractHTML\"}")
    .asString();

string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";

var payload = new
{
    query = @"mutation ExtractHTML {
  goto(url: ""https://www.browserless.io/"", waitUntil: domContentLoaded) {
    status
  }

  html {
    html
  }
}",
    variables = "",
    operationName = "ExtractHTML"
};

using (HttpClient httpClient = new HttpClient())
{
    var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
    var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");

    var response = await httpClient.PostAsync(endpoint, content);
    string responseBody = await response.Content.ReadAsStringAsync();
    Console.WriteLine(responseBody);
}

Targeting a Specific Element

Pass a selector to return HTML from a single element instead of the full page:

html(selector: ".navbar_container") {
  html
}

Cleaning the HTML

The clean argument strips non-text nodes (scripts, video, canvas), DOM attributes, and excess whitespace. It can reduce payload size by up to 1,000x:

html(clean: {
  removeAttributes: true
  removeNonTextNodes: true
}) {
  html
}

Creating a JSON with `mapSelector`

mapSelector is designed for pages with repetitive, hierarchical structure: product listings, comment threads, search results, or any repeating pattern. It iterates over a NodeList, similar to document.querySelectorAll, and returns a structured array of objects. Use it to extract attributes, text content, or nested elements.

The query below navigates to Hacker News and extracts the href of every post link:

mutation ScrapeHackerNews {
  goto(
    url: "https://news.ycombinator.com"
    waitUntil: firstContentfulPaint
  ) {
    status
  }

  posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
    link: attribute(name: "href") {
      value
    }
  }
}

curl --request POST \
  --url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{"query":"mutation ScrapeHackerNews {\n  goto(\n    url: \"https://news.ycombinator.com\"\n    waitUntil: firstContentfulPaint\n  ) {\n    status\n  }\n\n  posts: mapSelector(selector: \".submission .titleline > a\", wait: true) {\n    link: attribute(name: \"href\") {\n      value\n    }\n  }\n}","variables":{},"operationName":"ScrapeHackerNews"}'

const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";

const options = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: `mutation ScrapeHackerNews {
  goto(
    url: "https://news.ycombinator.com"
    waitUntil: firstContentfulPaint
  ) {
    status
  }

  posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
    link: attribute(name: "href") {
      value
    }
  }
}`,
    variables: "",
    operationName: "ScrapeHackerNews",
  })
};

const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();

console.log(data);

import requests

endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
    "token": "YOUR_API_TOKEN_HERE",
}
headers = {
    "Content-Type": "application/json",
}
payload = {
    "query": """mutation ScrapeHackerNews {
  goto(
    url: "https://news.ycombinator.com"
    waitUntil: firstContentfulPaint
  ) {
    status
  }

  posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
    link: attribute(name: "href") {
      value
    }
  }
}""",
    "variables": {},
    "operationName": "ScrapeHackerNews",
}

response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())

String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);

HttpResponse<String> response = Unirest.post(endpoint)
    .header("Content-Type", "application/json")
    .body("{\"query\":\"mutation ScrapeHackerNews {\\n  goto(\\n    url: \\\"https://news.ycombinator.com\\\"\\n    waitUntil: firstContentfulPaint\\n  ) {\\n    status\\n  }\\n\\n  posts: mapSelector(selector: \\\".submission .titleline > a\\\", wait: true) {\\n    link: attribute(name: \\\"href\\\") {\\n      value\\n    }\\n  }\\n}\",\"variables\":\"\",\"operationName\":\"ScrapeHackerNews\"}")
    .asString();

string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";

var payload = new
{
    query = @"mutation ScrapeHackerNews {
  goto(
    url: ""https://news.ycombinator.com""
    waitUntil: firstContentfulPaint
  ) {
    status
  }

  posts: mapSelector(selector: "".submission .titleline > a"", wait: true) {
    link: attribute(name: ""href"") {
      value
    }
  }
}",
    variables = "",
    operationName = "ScrapeHackerNews"
};

using (HttpClient httpClient = new HttpClient())
{
    var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
    var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");

    var response = await httpClient.PostAsync(endpoint, content);
    string responseBody = await response.Content.ReadAsStringAsync();
    Console.WriteLine(responseBody);
}

Response

{
  "data": {
    "posts": [
      { "link": { "value": "https://churchofturing.github.io/landscapeoflisp.html" } },
      { "link": { "value": "https://www.jjj.de/fxt/fxtbook.pdf" } },
      { "link": { "value": "https://ereader-swedish.fly.dev/" } }
    ]
  }
}

Nest mapSelector calls to traverse hierarchical DOM structures. The example below extracts post authors and scores from each submission:

mutation ScrapeHackerNewsMetadata {
  goto(url: "https://news.ycombinator.com") {
    status
  }

  posts: mapSelector(selector: ".subtext .subline") {
    author: mapSelector(selector: ".hnuser") {
      authorName: innerText
    }

    score: mapSelector(selector: ".score") {
      score: innerText
    }
  }
}

mapSelector always returns an array, whether one or many elements match. It returns null when no elements are found.

Scraping Network Responses

The response mutation records HTTP responses made by the browser, filtered by URL pattern, method, or resource type. BQL waits for the response automatically.

The example below captures the raw document response from a page load:

mutation DocumentResponses {
  goto(url: "https://example.com/", waitUntil: load) {
    status
  }

  response(type: document) {
    url
    body
    headers {
      name
      value
    }
  }
}

curl --request POST \
  --url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{"query":"mutation DocumentResponses {\n  goto(url: \"https://example.com/\", waitUntil: load) {\n    status\n  }\n  response(type: document) {\n    url\n    body\n    headers {\n      name\n      value\n    }\n  }\n}","variables":{},"operationName":"DocumentResponses"}'

const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";

const options = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: `mutation DocumentResponses {
  goto(url: "https://example.com/", waitUntil: load) {
    status
  }

  response(type: document) {
    url
    body
    headers {
      name
      value
    }
  }
}`,
    variables: "",
    operationName: "DocumentResponses",
  })
};

const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();

console.log(data);

import requests

endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
    "token": "YOUR_API_TOKEN_HERE",
}
headers = {
    "Content-Type": "application/json",
}
payload = {
    "query": """mutation DocumentResponses {
  goto(url: "https://example.com/", waitUntil: load) {
    status
  }

  response(type: document) {
    url
    body
    headers {
      name
      value
    }
  }
}""",
    "variables": {},
    "operationName": "DocumentResponses",
}

response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())

String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);

HttpResponse<String> response = Unirest.post(endpoint)
    .header("Content-Type", "application/json")
    .body("{\"query\":\"mutation DocumentResponses {\\n  goto(url: \\\"https://example.com/\\\", waitUntil: load) {\\n    status\\n  }\\n  response(type: document) {\\n    url\\n    body\\n    headers {\\n      name\\n      value\\n    }\\n  }\\n}\",\"variables\":\"\",\"operationName\":\"DocumentResponses\"}")
    .asString();

string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";

var payload = new
{
    query = @"mutation DocumentResponses {
  goto(url: ""https://example.com/"", waitUntil: load) {
    status
  }

  response(type: document) {
    url
    body
    headers {
      name
      value
    }
  }
}",
    variables = "",
    operationName = "DocumentResponses"
};

using (HttpClient httpClient = new HttpClient())
{
    var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
    var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");

    var response = await httpClient.PostAsync(endpoint, content);
    string responseBody = await response.Content.ReadAsStringAsync();
    Console.WriteLine(responseBody);
}

Filter by method and operator to narrow responses to a specific type. The example below captures only XHR GET responses:

mutation AJAXGetCalls {
  goto(url: "https://msn.com/", waitUntil: load) {
    status
  }

  response(type: xhr, method: GET, operator: and) {
    url
    type
    method
    body
    headers {
      name
      value
    }
  }
}

curl --request POST \
  --url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{"query":"mutation AJAXGetCalls {\n  goto(url: \"https://msn.com/\", waitUntil: load) {\n    status\n  }\n  response(type: xhr, method: GET, operator: and) {\n    url\n    type\n    method\n    body\n    headers {\n      name\n      value\n    }\n  }\n}","variables":{},"operationName":"AJAXGetCalls"}'

const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";

const options = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: `mutation AJAXGetCalls {
  goto(url: "https://msn.com/", waitUntil: load) {
    status
  }

  response(type: xhr, method: GET, operator: and) {
    url
    type
    method
    body
    headers {
      name
      value
    }
  }
}`,
    variables: "",
    operationName: "AJAXGetCalls",
  })
};

const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();

console.log(data);

import requests

endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
    "token": "YOUR_API_TOKEN_HERE",
}
headers = {
    "Content-Type": "application/json",
}
payload = {
    "query": """mutation AJAXGetCalls {
  goto(url: "https://msn.com/", waitUntil: load) {
    status
  }

  response(type: xhr, method: GET, operator: and) {
    url
    type
    method
    body
    headers {
      name
      value
    }
  }
}""",
    "variables": {},
    "operationName": "AJAXGetCalls",
}

response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())

String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);

HttpResponse<String> response = Unirest.post(endpoint)
    .header("Content-Type", "application/json")
    .body("{\"query\":\"mutation AJAXGetCalls {\\n  goto(url: \\\"https://msn.com/\\\", waitUntil: load) {\\n    status\\n  }\\n  response(type: xhr, method: GET, operator: and) {\\n    url\\n    type\\n    method\\n    body\\n    headers {\\n      name\\n      value\\n    }\\n  }\\n}\",\"variables\":\"\",\"operationName\":\"AJAXGetCalls\"}")
    .asString();

string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";

var payload = new
{
    query = @"mutation AJAXGetCalls {
  goto(url: ""https://msn.com/"", waitUntil: load) {
    status
  }

  response(type: xhr, method: GET, operator: and) {
    url
    type
    method
    body
    headers {
      name
      value
    }
  }
}",
    variables = "",
    operationName = "AJAXGetCalls"
};

using (HttpClient httpClient = new HttpClient())
{
    var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
    var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");

    var response = await httpClient.PostAsync(endpoint, content);
    string responseBody = await response.Content.ReadAsStringAsync();
    Console.WriteLine(responseBody);
}

Using `querySelectorAll`

The querySelectorAll mutation returns an array of matched elements with their HTML properties. Use it when you need fast element extraction without the nested mapping of mapSelector.

mutation FindLinks {
  goto(url: "https://browserless.io") {
    status
  }

  links: querySelectorAll(selector: "a") {
    innerText
    outerHTML
  }
}

curl --request POST \
  --url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{"query":"mutation FindLinks {\n  goto(url: \"https://browserless.io\") {\n    status\n  }\n\n  links: querySelectorAll(selector: \"a\") {\n    innerText\n    outerHTML\n  }\n}","variables":{},"operationName":"FindLinks"}'

const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";

const options = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: `mutation FindLinks {
  goto(url: "https://browserless.io") {
    status
  }

  links: querySelectorAll(selector: "a") {
    innerText
    outerHTML
  }
}`,
    variables: "",
    operationName: "FindLinks",
  })
};

const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();

console.log(data);

import requests

endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
    "token": "YOUR_API_TOKEN_HERE",
}
headers = {
    "Content-Type": "application/json",
}
payload = {
    "query": """mutation FindLinks {
  goto(url: "https://browserless.io") {
    status
  }

  links: querySelectorAll(selector: "a") {
    innerText
    outerHTML
  }
}""",
    "variables": {},
    "operationName": "FindLinks",
}

response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())

String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);

HttpResponse<String> response = Unirest.post(endpoint)
    .header("Content-Type", "application/json")
    .body("{\"query\":\"mutation FindLinks {\\n  goto(url: \\\"https://browserless.io\\\") {\\n    status\\n  }\\n\\n  links: querySelectorAll(selector: \\\"a\\\") {\\n    innerText\\n    outerHTML\\n  }\\n}\",\"variables\":\"\",\"operationName\":\"FindLinks\"}")
    .asString();

string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";

var payload = new
{
    query = @"mutation FindLinks {
  goto(url: ""https://browserless.io"") {
    status
  }

  links: querySelectorAll(selector: ""a"") {
    innerText
    outerHTML
  }
}",
    variables = "",
    operationName = "FindLinks"
};

using (HttpClient httpClient = new HttpClient())
{
    var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
    var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");

    var response = await httpClient.PostAsync(endpoint, content);
    string responseBody = await response.Content.ReadAsStringAsync();
    Console.WriteLine(responseBody);
}

Each result includes innerHTML, innerText, outerHTML, id, className, and childElementCount. Use innerText to get visible text, or outerHTML to get the full element markup.

Processing Data with JavaScript

The evaluate mutation runs JavaScript in the browser context and returns the result. Use it when you need calculations, filtering, or transformations that go beyond what mapSelector or querySelectorAll support.

mutation CountHeadings {
  goto(url: "https://browserless.io") {
    status
  }

  headingCount: evaluate(
    content: "document.querySelectorAll('h2').length"
  ) {
    value
  }
}

The content field accepts any JavaScript expression or block. Wrap multi-step logic in a function body and use return to pass values back. For examples using await, external scripts, and complex transformations, see Multi-line JavaScript.

Next Steps

Multi-line JavaScript Evaluation

Run custom JavaScript in the browser context for complex data processing and transformation.

Waiting for Things

Control when BrowserQL proceeds to avoid scraping incomplete or empty pages.

Best BQL Practices

Patterns for reliable, maintainable browser automation with BrowserQL.

Basic Usage​

Targeting a Specific Element​

Cleaning the HTML​

Creating a JSON with mapSelector​

Scraping Network Responses​

Using querySelectorAll​

Processing Data with JavaScript​

Next Steps​