Data Scraping and Extraction
BrowserQL offers three main approaches to extracting data: grab the page HTML with the html mutation, map DOM elements to a structured JSON with mapSelector or querySelectorAll, or intercept raw API responses with the response mutation. Choose the approach that fits your downstream processing needs.
Basic Usage
The html mutation returns the full page HTML. Wait for the page to load before extracting to avoid empty results.
- BQL Query
- cURL
- Javascript
- Python
- Java
- C#
mutation ExtractHTML {
goto(url: "https://www.browserless.io/", waitUntil: domContentLoaded) {
status
}
html {
html
}
}
curl --request POST \
--url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{"query":"mutation ExtractHTML {\n goto(url: \"https://www.browserless.io/\", waitUntil: domContentLoaded) {\n status\n }\n\n html {\n html\n }\n}","variables":{},"operationName":"ExtractHTML"}'
const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: `mutation ExtractHTML {
goto(url: "https://www.browserless.io/", waitUntil: domContentLoaded) {
status
}
html {
html
}
}`,
variables: "",
operationName: "ExtractHTML",
})
};
const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();
console.log(data);
import requests
endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
"token": "YOUR_API_TOKEN_HERE",
}
headers = {
"Content-Type": "application/json",
}
payload = {
"query": """mutation ExtractHTML {
goto(url: "https://www.browserless.io/", waitUntil: domContentLoaded) {
status
}
html {
html
}
}""",
"variables": {},
"operationName": "ExtractHTML",
}
response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())
String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);
HttpResponse<String> response = Unirest.post(endpoint)
.header("Content-Type", "application/json")
.body("{\"query\":\"mutation ExtractHTML {\\n goto(url: \\\"https://www.browserless.io/\\\", waitUntil: domContentLoaded) {\\n status\\n }\\n\\n html {\\n html\\n }\\n}\",\"variables\":\"\",\"operationName\":\"ExtractHTML\"}")
.asString();
string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new
{
query = @"mutation ExtractHTML {
goto(url: ""https://www.browserless.io/"", waitUntil: domContentLoaded) {
status
}
html {
html
}
}",
variables = "",
operationName = "ExtractHTML"
};
using (HttpClient httpClient = new HttpClient())
{
var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody);
}
Targeting a Specific Element
Pass a selector to return HTML from a single element instead of the full page:
html(selector: ".navbar_container") {
html
}
Cleaning the HTML
The clean argument strips non-text nodes (scripts, video, canvas), DOM attributes, and excess whitespace. It can reduce payload size by up to 1,000x:
html(clean: {
removeAttributes: true
removeNonTextNodes: true
}) {
html
}
Creating a JSON with mapSelector
mapSelector is designed for pages with repetitive, hierarchical structure: product listings, comment threads, search results, or any repeating pattern. It iterates over a NodeList, similar to document.querySelectorAll, and returns a structured array of objects. Use it to extract attributes, text content, or nested elements.
The query below navigates to Hacker News and extracts the href of every post link:
- BQL Query
- cURL
- Javascript
- Python
- Java
- C#
mutation ScrapeHackerNews {
goto(
url: "https://news.ycombinator.com"
waitUntil: firstContentfulPaint
) {
status
}
posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
link: attribute(name: "href") {
value
}
}
}
curl --request POST \
--url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{"query":"mutation ScrapeHackerNews {\n goto(\n url: \"https://news.ycombinator.com\"\n waitUntil: firstContentfulPaint\n ) {\n status\n }\n\n posts: mapSelector(selector: \".submission .titleline > a\", wait: true) {\n link: attribute(name: \"href\") {\n value\n }\n }\n}","variables":{},"operationName":"ScrapeHackerNews"}'
const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: `mutation ScrapeHackerNews {
goto(
url: "https://news.ycombinator.com"
waitUntil: firstContentfulPaint
) {
status
}
posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
link: attribute(name: "href") {
value
}
}
}`,
variables: "",
operationName: "ScrapeHackerNews",
})
};
const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();
console.log(data);
import requests
endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
"token": "YOUR_API_TOKEN_HERE",
}
headers = {
"Content-Type": "application/json",
}
payload = {
"query": """mutation ScrapeHackerNews {
goto(
url: "https://news.ycombinator.com"
waitUntil: firstContentfulPaint
) {
status
}
posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
link: attribute(name: "href") {
value
}
}
}""",
"variables": {},
"operationName": "ScrapeHackerNews",
}
response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())
String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);
HttpResponse<String> response = Unirest.post(endpoint)
.header("Content-Type", "application/json")
.body("{\"query\":\"mutation ScrapeHackerNews {\\n goto(\\n url: \\\"https://news.ycombinator.com\\\"\\n waitUntil: firstContentfulPaint\\n ) {\\n status\\n }\\n\\n posts: mapSelector(selector: \\\".submission .titleline > a\\\", wait: true) {\\n link: attribute(name: \\\"href\\\") {\\n value\\n }\\n }\\n}\",\"variables\":\"\",\"operationName\":\"ScrapeHackerNews\"}")
.asString();
string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new
{
query = @"mutation ScrapeHackerNews {
goto(
url: ""https://news.ycombinator.com""
waitUntil: firstContentfulPaint
) {
status
}
posts: mapSelector(selector: "".submission .titleline > a"", wait: true) {
link: attribute(name: ""href"") {
value
}
}
}",
variables = "",
operationName = "ScrapeHackerNews"
};
using (HttpClient httpClient = new HttpClient())
{
var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody);
}
Response
{
"data": {
"posts": [
{ "link": { "value": "https://churchofturing.github.io/landscapeoflisp.html" } },
{ "link": { "value": "https://www.jjj.de/fxt/fxtbook.pdf" } },
{ "link": { "value": "https://ereader-swedish.fly.dev/" } }
]
}
}
Nest mapSelector calls to traverse hierarchical DOM structures. The example below extracts post authors and scores from each submission:
mutation ScrapeHackerNewsMetadata {
goto(url: "https://news.ycombinator.com") {
status
}
posts: mapSelector(selector: ".subtext .subline") {
author: mapSelector(selector: ".hnuser") {
authorName: innerText
}
score: mapSelector(selector: ".score") {
score: innerText
}
}
}
mapSelector always returns an array, whether one or many elements match. It returns null when no elements are found.
Scraping Network Responses
The response mutation records HTTP responses made by the browser, filtered by URL pattern, method, or resource type. BQL waits for the response automatically.
The example below captures the raw document response from a page load:
- BQL Query
- cURL
- Javascript
- Python
- Java
- C#
mutation DocumentResponses {
goto(url: "https://example.com/", waitUntil: load) {
status
}
response(type: document) {
url
body
headers {
name
value
}
}
}
curl --request POST \
--url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{"query":"mutation DocumentResponses {\n goto(url: \"https://example.com/\", waitUntil: load) {\n status\n }\n response(type: document) {\n url\n body\n headers {\n name\n value\n }\n }\n}","variables":{},"operationName":"DocumentResponses"}'
const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: `mutation DocumentResponses {
goto(url: "https://example.com/", waitUntil: load) {
status
}
response(type: document) {
url
body
headers {
name
value
}
}
}`,
variables: "",
operationName: "DocumentResponses",
})
};
const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();
console.log(data);
import requests
endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
"token": "YOUR_API_TOKEN_HERE",
}
headers = {
"Content-Type": "application/json",
}
payload = {
"query": """mutation DocumentResponses {
goto(url: "https://example.com/", waitUntil: load) {
status
}
response(type: document) {
url
body
headers {
name
value
}
}
}""",
"variables": {},
"operationName": "DocumentResponses",
}
response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())
String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);
HttpResponse<String> response = Unirest.post(endpoint)
.header("Content-Type", "application/json")
.body("{\"query\":\"mutation DocumentResponses {\\n goto(url: \\\"https://example.com/\\\", waitUntil: load) {\\n status\\n }\\n response(type: document) {\\n url\\n body\\n headers {\\n name\\n value\\n }\\n }\\n}\",\"variables\":\"\",\"operationName\":\"DocumentResponses\"}")
.asString();
string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new
{
query = @"mutation DocumentResponses {
goto(url: ""https://example.com/"", waitUntil: load) {
status
}
response(type: document) {
url
body
headers {
name
value
}
}
}",
variables = "",
operationName = "DocumentResponses"
};
using (HttpClient httpClient = new HttpClient())
{
var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody);
}
Filter by method and operator to narrow responses to a specific type. The example below captures only XHR GET responses:
- BQL Query
- cURL
- Javascript
- Python
- Java
- C#
mutation AJAXGetCalls {
goto(url: "https://msn.com/", waitUntil: load) {
status
}
response(type: xhr, method: GET, operator: and) {
url
type
method
body
headers {
name
value
}
}
}
curl --request POST \
--url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{"query":"mutation AJAXGetCalls {\n goto(url: \"https://msn.com/\", waitUntil: load) {\n status\n }\n response(type: xhr, method: GET, operator: and) {\n url\n type\n method\n body\n headers {\n name\n value\n }\n }\n}","variables":{},"operationName":"AJAXGetCalls"}'
const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: `mutation AJAXGetCalls {
goto(url: "https://msn.com/", waitUntil: load) {
status
}
response(type: xhr, method: GET, operator: and) {
url
type
method
body
headers {
name
value
}
}
}`,
variables: "",
operationName: "AJAXGetCalls",
})
};
const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();
console.log(data);
import requests
endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
"token": "YOUR_API_TOKEN_HERE",
}
headers = {
"Content-Type": "application/json",
}
payload = {
"query": """mutation AJAXGetCalls {
goto(url: "https://msn.com/", waitUntil: load) {
status
}
response(type: xhr, method: GET, operator: and) {
url
type
method
body
headers {
name
value
}
}
}""",
"variables": {},
"operationName": "AJAXGetCalls",
}
response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())
String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);
HttpResponse<String> response = Unirest.post(endpoint)
.header("Content-Type", "application/json")
.body("{\"query\":\"mutation AJAXGetCalls {\\n goto(url: \\\"https://msn.com/\\\", waitUntil: load) {\\n status\\n }\\n response(type: xhr, method: GET, operator: and) {\\n url\\n type\\n method\\n body\\n headers {\\n name\\n value\\n }\\n }\\n}\",\"variables\":\"\",\"operationName\":\"AJAXGetCalls\"}")
.asString();
string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new
{
query = @"mutation AJAXGetCalls {
goto(url: ""https://msn.com/"", waitUntil: load) {
status
}
response(type: xhr, method: GET, operator: and) {
url
type
method
body
headers {
name
value
}
}
}",
variables = "",
operationName = "AJAXGetCalls"
};
using (HttpClient httpClient = new HttpClient())
{
var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody);
}
Using querySelectorAll
The querySelectorAll mutation returns an array of matched elements with their HTML properties. Use it when you need fast element extraction without the nested mapping of mapSelector.
- BQL Query
- cURL
- Javascript
- Python
- Java
- C#
mutation FindLinks {
goto(url: "https://browserless.io") {
status
}
links: querySelectorAll(selector: "a") {
innerText
outerHTML
}
}
curl --request POST \
--url 'https://production-sfo.browserless.io/chromium/bql?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{"query":"mutation FindLinks {\n goto(url: \"https://browserless.io\") {\n status\n }\n\n links: querySelectorAll(selector: \"a\") {\n innerText\n outerHTML\n }\n}","variables":{},"operationName":"FindLinks"}'
const endpoint = "https://production-sfo.browserless.io/chromium/bql";
const token = "YOUR_API_TOKEN_HERE";
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: `mutation FindLinks {
goto(url: "https://browserless.io") {
status
}
links: querySelectorAll(selector: "a") {
innerText
outerHTML
}
}`,
variables: "",
operationName: "FindLinks",
})
};
const url = `${endpoint}?token=${token}`;
const response = await fetch(url, options);
const data = await response.json();
console.log(data);
import requests
endpoint = "https://production-sfo.browserless.io/chromium/bql"
query_string = {
"token": "YOUR_API_TOKEN_HERE",
}
headers = {
"Content-Type": "application/json",
}
payload = {
"query": """mutation FindLinks {
goto(url: "https://browserless.io") {
status
}
links: querySelectorAll(selector: "a") {
innerText
outerHTML
}
}""",
"variables": {},
"operationName": "FindLinks",
}
response = requests.post(endpoint, params=query_string, headers=headers, json=payload)
print(response.json())
String url = "https://production-sfo.browserless.io/chromium/bql";
String token = "YOUR_API_TOKEN_HERE";
String endpoint = String.format("%s?token=%s", url, token);
HttpResponse<String> response = Unirest.post(endpoint)
.header("Content-Type", "application/json")
.body("{\"query\":\"mutation FindLinks {\\n goto(url: \\\"https://browserless.io\\\") {\\n status\\n }\\n\\n links: querySelectorAll(selector: \\\"a\\\") {\\n innerText\\n outerHTML\\n }\\n}\",\"variables\":\"\",\"operationName\":\"FindLinks\"}")
.asString();
string url = "https://production-sfo.browserless.io/chromium/bql";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new
{
query = @"mutation FindLinks {
goto(url: ""https://browserless.io"") {
status
}
links: querySelectorAll(selector: ""a"") {
innerText
outerHTML
}
}",
variables = "",
operationName = "FindLinks"
};
using (HttpClient httpClient = new HttpClient())
{
var jsonPayload = System.Text.Json.JsonSerializer.Serialize(payload);
var content = new StringContent(jsonPayload, Encoding.UTF8, "application/json");
var response = await httpClient.PostAsync(endpoint, content);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody);
}
Each result includes innerHTML, innerText, outerHTML, id, className, and childElementCount. Use innerText to get visible text, or outerHTML to get the full element markup.
Processing Data with JavaScript
The evaluate mutation runs JavaScript in the browser context and returns the result. Use it when you need calculations, filtering, or transformations that go beyond what mapSelector or querySelectorAll support.
mutation CountHeadings {
goto(url: "https://browserless.io") {
status
}
headingCount: evaluate(
content: "document.querySelectorAll('h2').length"
) {
value
}
}
The content field accepts any JavaScript expression or block. Wrap multi-step logic in a function body and use return to pass values back. For examples using await, external scripts, and complex transformations, see Multi-line JavaScript.