What is headless web browsing?
It's using a web-browser like application to do automated
fetching and analysis of web pages, without a human user
present.This is different from simply fetching HTML content via
HTTP; headless web browsers typically also load images, process
Javascript code, CSS and layout the page content (albeit in an
invisible way).
The developer can then use scripting (usually Javascript) to
examine the page as it is laid out in memory, as if in a "real"
web browser, to look at the style of text, etc.
We could even use OCR to look for text within images shown in the
page. Why?
- More effectively analyse the content of pages. Lots of pages nowadays contain a huge amount of "boiler plate" uninteresting text, often in HTML elements without semantic meaning (e.g. DIV). Only by using CSS (and sometimes Javascript) are we able to have a computer see the page as a human would
- Generation …