This article represents take-away code sample that could be used to get or scrape content from a given URL. Those wanting to do a quick web scrape could use this piece of code. I shall be posting a series of blogs which would help one to create a web scraper using Java. The reason why I am hooked to Java web scraping is the need to get the data from web for data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Code Sample – Get Content from URL
Pay attention to some of the following aspects of fetching content from a given URL:
- Create a URL object using actual URL string
- Create a URLConnection object using that URL object created in above step
- Set the configuration parameters. Key is to note the connection and read timeout. At times when scraping the websites, it helps with slow websites.
- Create a BufferedReader for reading the data
- Read line by line
public String getContent(String urlstr) { URL url = null; StringBuilder contentb = new StringBuilder(); try { // get URL content url = new URL(urlstr); // Create a URL Connection Object URLConnection conn = (HttpURLConnection) url.openConnection(); // Set the configuration parameters // Note the readTimeOut set to 30 seconds. // This is quite important when you are planning to scrape URLs. conn.setConnectTimeout(100000); conn.setReadTimeout(30000); conn.connect(); // open the stream and put it into BufferedReader BufferedReader br = new BufferedReader(new InputStreamReader( conn.getInputStream())); String inputLine; while ((inputLine = br.readLine()) != null) { contentb.append(inputLine); contentb.append("\n"); } br.close(); } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return contentb.toString(); }
Latest posts by Ajitesh Kumar (see all)
- Agentic Reasoning Design Patterns in AI: Examples - October 18, 2024
- LLMs for Adaptive Learning & Personalized Education - October 8, 2024
- Sparse Mixture of Experts (MoE) Models: Examples - October 6, 2024
I found it very helpful. However the differences are not too understandable for me