The Internet, with its profusion of information, has made us hungry for
ever more, ever better data. Out of necessity, many of us have become
pretty adept with search engine queries, but there are times when even
the most powerful search engines aren't enough. If you've ever wanted
your data in a different form than it's presented, or wanted to collect
data from several sites and see it side-by-side without the constraints
of a browser, then Spidering Hacks is for you.Spidering Hacks takes
you to the next level in Internet data retrieval--beyond search
engines--by showing you how to create spiders and bots to retrieve
information from your favorite sites and data sources. You'll no longer
feel constrained by the way host sites think you want to see their data
presented--you'll learn how to scrape and repurpose raw data so you can
view in a way that's meaningful to you.Written for developers,
researchers, technical assistants, librarians, and power users,
Spidering Hacks provides expert tips on spidering and scraping
methodologies. You'll begin with a crash course in spidering concepts,
tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know
when you've gone too far: what's acceptable and unacceptable). Next,
you'll collect media files and data from databases. Then you'll learn
how to interpret and understand the data, repurpose it for use in other
applications, and even build authorized interfaces to integrate the data
into your own content. By the time you finish Spidering Hacks, you'll
be able to:
- Aggregate and associate data from disparate locations, then store and
manipulate the data as you like
- Gain a competitive edge in business by knowing when competitors'
products are on sale, and comparing sales ranks and product placement
on e-commerce sites
- Integrate third-party data into your own applications or web sites
- Make your own site easier to scrape and more usable to others
- Keep up-to-date with your favorite comics strips, news stories, stock
tips, and more without visiting the site every day
Like the other books in O'Reilly's popular Hacks series, Spidering
Hacks brings you 100 industrial-strength tips and tools from the
experts to help you master this technology. If you're interested in data
retrieval of any type, this book provides a wealth of data for finding a
wealth of data.