Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2574 posts at DZone. You can read more from them at their website. View Full User Profile

Archive-It: Scaling Beyond a Billion Archival Webpages

  • submit to reddit

Archive-It: Scaling Beyond a Billion Archival Webpages, Aaron Binns, Internet Archive, Eurocon 2011 from Lucene Revolution on Vimeo.

Description of Archive-It, the Internet Archive's subscription, self-serve web archiving service, focused on the full-text search system. With nearly 200 partners and over 2000 collections the custom Lucene-based system handles 3+ million index updates per day across an index that totals over 1.3 billion documents. This session will give a detailed description of the architecture and implementation of the Archive-It search system; highlighting many of the challenges due to the scale as well as complex use cases.

Download session slides.