I am scraping some JSON data from a website which works pretty well. I can login and download the necessary data. However, in one case I have to download a HTML page to extract the info from the HTML.
I've modified the request headers such that they match the ones that were visibile using Chrome developer options (F12).
Request request = new Request.Builder().url(url) .header("Host", "www.host.com") .header("Connection", "Keep-Alive") .header("Cache-Control", "max-age=0") .header("Upgrade-Insecure-Requests", "1") .header("User-Agent",this.user_agent_user_for_this_session) .header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8") .header("Accept-Encoding", "gzip, deflate, br") .header("Accept-Language", "en-US,en;q=0.9,fr;q=0.8,nl;q=0.7,de;q=0.6,af;q=0.5") .get().build(); Response response = client.newCall(request).execute(); String html = IOUtils.toString(new GZIPInputStream(response.body().byteStream()));
In addition, the HTML that is downloaded looks identical to the HTML file that is downloaded in the first network view of Chrome (i copy pasted the content and the file sizes are the same).
So should I allow for some additional analyses on the request?