Open Source Crawler in JAVA

In case you have the need to crawl the web to get information. I would suggest few:

  1. Smart and Simple Web Crawler - https://crawler.dev.java.net/
  2. Websphnix - http://www.cs.cmu.edu/~rcm/websphinx/
  3. Archive-Crawler - http://archive-crawler.sourceforge.net
Of these i would recommend https://crawler.dev.java.net/
Why? because I have used it.

This is a quite simple crawler which serves the purpose. It loads the page and parses the page.
Throws appropriate events while crawling. (i.e supports event model to serve the requirement).

It has HTML and HTTP parsing capabilities.

Comments

Popular posts from this blog

Hibernate: a different object with the same identifier value was already associated with the session

BeanDefinitionStoreException: Failed to parse configuration class: Could not find class [javax.jms.ConnectionFactory]