If you need to do some web scraping with Ruby it can be confusing knowing where to start. Which Ruby web scraping library should you use? Mechanize, nokogiri, phantomjs, poltergeist, selenium-webdriver, watir, upton, anemone, spidr, scrubyt, or wombat?
As is usually the case it depends (on your project and your requirements), but here are three simple rules you can follow to make it easier to decide.
1. Don't use anything that isn't maintained
This rules out the poltergeist gem because it depends on phantomjs, which is no longer maintained. Also older Ruby gems like anemone and scrubyt.
The mechanize gem is probably the easiest Ruby web scraping library to get started with, compared to browser based solutions like selenium-webdriver (both have non Ruby dependencies, which is where most installation problems come from). It’s been around for over a decade, therefore relatively battle tested. It can be used for crawling websites, filling out forms, and getting data out of web pages—so it’s a good 80/20 solution.
Learn how to scrape websites with Mechanize
Learn how to scrape websites with Watir