Articles on: API

What kind of headers are returned by the API?

The API returns two types of header.

The one coming from our web app server, and the one coming from the scraped website.

In normal mode, we prefix headers coming from the scraped website by Spb- in order to differentiate them from the ones from our web app server.

On top of that, we also add 3 headers:
- Spb-cost: "Request cost in credits."
- Spb-initial-status-code: "The initial status code returned by the scraped page. Useful when the page redirects"
- Spb-resolved-url: "The resolved URL of the scraped page. Useful when the page redirects."

If you use json_response (documentation), in the headers key you will only see headers coming from the scraped server, without the "Spb-" prefix this time.

Here is some examples, if you scrape a any website, you might have those kind of headers in the API response. Note that those headers are not in the response body but are just headers of the response:
Content-Type: application/json # from our web server
Content-Length: 2128781 # from our web server
Spb-content-type: text/html; charset=utf-8 # from the scraped website
Spb-vary: Accept-Encoding # from the scraped website
Spb-cost: 5 # credit cost of your request
Spb-initial-status-code: 200 # initial status code
Spb-resolved-url: https://www.website.com/ # Resolved URL of the scraped page

And if you're using json_response=True to scrape this same website, the response body will look like this.

{
  # Headers sent by the server
  "headers": {
    "Content-type": "text/html; charset=utf-8"
    "Vary": "Accept-Encoding"
  },
  # Credit cost of your request
  "cost": 5,
  # Initial status code of the server
  "initial-status-code": 200,
  # Resolved URL (following redirection)
  "resolved-url": "https://www.website.com/",
  # Type of the response "html" or "json" or "b64_bytes" for file, image, pdf,...
  "type": "html",
  # Content of the answer. Content will be base 64 encoded if is a file, image, pdf,...
  "body": "<html>... </body>"
  # Cookies sent back by the server
  'cookies': [
    {
        "name": "cookie_name",
        "value": "cookie_value",
        "domain": "test.com",
        ...
    },
    ...
  ],
  # XHR / Ajax requests sent by the browser
  "xhr": [
    {
      # URL
      "url": "https://",
      # status code of the server
      "status_code": 200,
      # Method of the request
      "method": "POST",
      # Headers of the XHR / Ajax request
      "headers": {
        "pragma": "no-cache",
        ...
      },
      # Response of the XHR / Ajax request
      "body": "2d,x"
    },
    ...
  ]
}

Updated on: 13/10/2021

Was this article helpful?

Share your feedback

Cancel

Thank you!