Extract content from a URL

Fetch the fully rendered HTML of any page, including content generated by JavaScript, using the Browserless /content endpoint.

Prerequisites

A Browserless API token from your account dashboard

Steps

AI Agent
REST API
Frameworks
BQL

Use the Browserless MCP server to extract content from a URL from any MCP-compatible AI agent (Claude Desktop, Cursor, Windsurf, ChatGPT, etc.).

1. Connect the MCP server

Send this prompt to your AI agent to install the Browserless MCP server:

Go to https://github.com/browserless/browserless-mcp/blob/main/install.md
and follow the instructions to install the Browserless MCP server
for my client.

2. Extract content

Use browserless_smartscraper. It extracts page content in one call with automatic bot-protection handling.

Use the browserless_smartscraper tool to extract the main content
of https://scraping-sandbox.netlify.app/javascript-enabled and return it as clean markdown

Use the /content REST endpoint to retrieve fully rendered HTML. No WebSocket connection needed.

cURL
JavaScript
Python
Java
C#

View Full Code on GitHub

1. Build the request

The /content endpoint returns the full rendered HTML of a page after JavaScript execution:

https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE

2. Send the request

curl -X POST \
  "https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE" \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{ "url": "https://scraping-sandbox.netlify.app/javascript-enabled" }'

3. Check the output

The response body is raw HTML: the full DOM as it exists after JavaScript has run, not just the original server response.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

View Full Code on GitHub

1. Send the request

const response = await fetch(
  'https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE',
  {
    method: 'POST',
    headers: {
      'Cache-Control': 'no-cache',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ url: 'https://scraping-sandbox.netlify.app/javascript-enabled' }),
  }
);

const html = await response.text();
console.log(html.slice(0, 500));

2. Check the output

Run with node content.mjs. The variable html contains the fully rendered page source.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

View Full Code on GitHub

1. Install dependencies

pip install requests

2. Send the request

import requests

response = requests.post(
    'https://production-sfo.browserless.io/content?token=YOUR_API_TOKEN_HERE',
    headers={'Cache-Control': 'no-cache', 'Content-Type': 'application/json'},
    json={'url': 'https://scraping-sandbox.netlify.app/javascript-enabled'},
)

html = response.text
print(html[:500])

3. Check the output

Run with python content.py. The variable html contains the fully rendered page source. Pipe it through BeautifulSoup or similar to parse.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

View Full Code on GitHub

1. Send the request

import java.net.URI;
import java.net.http.*;

String token = "YOUR_API_TOKEN_HERE";
String endpoint = "https://production-sfo.browserless.io/content?token=" + token;

HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create(endpoint))
    .header("Cache-Control", "no-cache")
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString("{\"url\": \"https://scraping-sandbox.netlify.app/javascript-enabled\"}"))
    .build();

HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
String html = response.body();
System.out.println(html.substring(0, Math.min(500, html.length())));

2. Check the output

Run the class. The response body is the fully rendered HTML.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

View Full Code on GitHub

1. Send the request

using System.Net.Http;
using System.Text;
using System.Text.Json;

string url = "https://production-sfo.browserless.io/content";
string token = "YOUR_API_TOKEN_HERE";
string endpoint = $"{url}?token={token}";

var payload = new { url = "https://scraping-sandbox.netlify.app/javascript-enabled" };

using (HttpClient httpClient = new HttpClient())
{
    var request = new HttpRequestMessage(HttpMethod.Post, endpoint);
    request.Headers.Add("Cache-Control", "no-cache");
    request.Content = new StringContent(
        JsonSerializer.Serialize(payload),
        Encoding.UTF8,
        "application/json"
    );
    var response = await httpClient.SendAsync(request);
    string responseBody = await response.Content.ReadAsStringAsync();
    Console.WriteLine(responseBody[..Math.Min(500, responseBody.Length)]);
}

2. Check the output

Run the program. The response body is the fully rendered HTML.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

Use a browser connection to navigate to the page and retrieve rendered content directly.

Puppeteer
Playwright

View Full Code on GitHub

1. Install dependencies

npm install puppeteer-core

2. Connect and extract content

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: 'wss://production-sfo.browserless.io?token=YOUR_API_TOKEN_HERE',
});

try {
  const page = await browser.newPage();
  await page.goto('https://scraping-sandbox.netlify.app/javascript-enabled', { waitUntil: 'networkidle2' });
  const html = await page.content();
  console.log(html.slice(0, 500));
} finally {
  // Always close to release the session even on error.
  await browser.close();
}

3. Check the output

Run with node content.mjs. page.content() returns the serialized DOM after all scripts have executed.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

View Full Code on GitHub

1. Install dependencies

npm install playwright-core

2. Connect and extract content

import { chromium } from 'playwright-core';

const browser = await chromium.connect(
  'wss://production-sfo.browserless.io/chromium/playwright?token=YOUR_API_TOKEN_HERE'
);

try {
  const page = await browser.newPage();
  await page.goto('https://scraping-sandbox.netlify.app/javascript-enabled', { waitUntil: 'networkidle' });
  const html = await page.content();
  console.log(html.slice(0, 500));
} finally {
  // Always close to release the session even on error.
  await browser.close();
}

3. Check the output

Run with node content.mjs. page.content() returns the serialized DOM after all scripts have executed.

<!DOCTYPE html><html><head><meta charSet="utf-8"/>
<title>Javascript Enabled</title>
...

View Full Code on GitHub

1. Write the mutation

Use html to get the rendered content, or innerText for just the visible text:

mutation ExtractContent {
  goto(url: "https://scraping-sandbox.netlify.app/javascript-enabled", waitUntil: domContentLoaded) {
    status
  }
  html {
    html
  }
}

2. Run it

Paste into the BQL IDE and click Run.

3. Check the output

{
  "data": {
    "goto": { "status": 200 },
    "html": { "html": "<!DOCTYPE html><html>...</html>" }
  }
}

Next steps

Scrape Structured Data

extract specific elements rather than the full HTML

Automate Google Search

navigate and extract content from dynamic pages

Take a Screenshot

capture a visual snapshot alongside the HTML

Steps​

Next steps​

Scrape Structured Data

Automate Google Search

Take a Screenshot

Steps

Next steps