Extract Content from a URL
Fetch the fully rendered HTML of any page, including content generated by JavaScript, using the Browserless /content endpoint.
- A Browserless API token from your account dashboard
Steps
- REST API
- Frameworks
- BQL
Use the /content REST endpoint to retrieve fully rendered HTML. No WebSocket connection needed.
- cURL
- JavaScript
- Python
- Java
- C#
1. Build the request
The /content endpoint returns the full rendered HTML of a page after JavaScript execution:
https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE
2. Send the request
curl -X POST \
"https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE" \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{ "url": "https://example.com" }'
3. Check the output
The response body is raw HTML: the full DOM as it exists after JavaScript has run, not just the original server response.
1. Send the request
const response = await fetch(
'https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE',
{
method: 'POST',
headers: {
'Cache-Control': 'no-cache',
'Content-Type': 'application/json',
},
body: JSON.stringify({ url: 'https://example.com' }),
}
);
const html = await response.text();
console.log(html.slice(0, 500));
2. Check the output
Run with node content.mjs. The variable html contains the fully rendered page source.
1. Install dependencies
pip install requests
2. Send the request
import requests
response = requests.post(
'https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE',
headers={'Cache-Control': 'no-cache', 'Content-Type': 'application/json'},
json={'url': 'https://example.com'},
)
html = response.text
print(html[:500])
3. Check the output
Run with python content.py. The variable html contains the fully rendered page source. Pipe it through BeautifulSoup or similar to parse.
1. Send the request
import java.net.URI;
import java.net.http.*;
String token = "YOUR_API_TOKEN_HERE";
String endpoint = "https://production-sfo.browserless.io/content?token=" + token;
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(endpoint))
.header("Cache-Control", "no-cache")
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString("{\"url\": \"https://example.com\"}"))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
String html = response.body();
System.out.println(html.substring(0, Math.min(500, html.length())));
2. Check the output
Run the class. The response body is the fully rendered HTML.
1. Send the request
using System.Net.Http;
using System.Text;
using System.Text.Json;
string url = "https://production-sfo.browserless.io/content";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";
var payload = new { url = "https://example.com" };
using (HttpClient httpClient = new HttpClient())
{
var request = new HttpRequestMessage(HttpMethod.Post, endpoint);
request.Headers.Add("Cache-Control", "no-cache");
request.Content = new StringContent(
JsonSerializer.Serialize(payload),
Encoding.UTF8,
"application/json"
);
var response = await httpClient.SendAsync(request);
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseBody[..Math.Min(500, responseBody.Length)]);
}
2. Check the output
Run the program. The response body is the fully rendered HTML.
Use a browser connection to navigate to the page and retrieve rendered content directly.
- Puppeteer
- Playwright
1. Install dependencies
npm install puppeteer-core
2. Connect and extract content
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: 'wss://production-sfo.browserless.io?token=YOUR_API_TOKEN_HERE',
});
try {
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
const html = await page.content();
console.log(html.slice(0, 500));
} finally {
// Always close to release the session even on error.
await browser.close();
}
3. Check the output
Run with node content.mjs. page.content() returns the serialized DOM after all scripts have executed.
1. Install dependencies
npm install playwright-core
2. Connect and extract content
import { chromium } from 'playwright-core';
const browser = await chromium.connect(
'wss://production-sfo.browserless.io/chromium/playwright?token=YOUR_API_TOKEN_HERE'
);
try {
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle' });
const html = await page.content();
console.log(html.slice(0, 500));
} finally {
// Always close to release the session even on error.
await browser.close();
}
3. Check the output
Run with node content.mjs. page.content() returns the serialized DOM after all scripts have executed.
1. Write the mutation
Use html to get the rendered content, or innerText for just the visible text:
mutation ExtractContent {
goto(url: "https://example.com", waitUntil: domContentLoaded) {
status
}
html {
html
}
}
2. Run it
Paste into the BQL IDE and click Run.
3. Check the output
{
"data": {
"goto": { "status": 200 },
"html": { "html": "<!DOCTYPE html><html>...</html>" }
}
}
Next steps
- Scrape Structured Data — extract specific elements rather than the full HTML
- Automate Google Search — navigate and extract content from dynamic pages
- Take a Screenshot — capture a visual snapshot alongside the HTML