Quantcast
Channel: BlackDog Foundry
Viewing all articles
Browse latest Browse all 27

Scraping WebSites (and other data) Using Objective-C

$
0
0

Introducing ScrapeKit

I recently needed to scrape some data from a website (from a Mac OS X application), and one of the things I was very conscious of is that the layout of the website could change occasionally and I would be left having to re-distribute a new executable version of my application.

This was very unappealing, so I started thinking about how I could define a very simple set of text-based rules that could define how my application scrapes data from a page. In this way, the application could periodically poll a file on my server to make sure it had the latest set of rules and it could continue on its merry way.

So, without further ado, allow me to introduce ScrapeKit. An example of how to use it is shown below:

Imagine that your input looks like:

<ol>
  <li>abc</li>
  <li>def</li>
  <li>ghi</li>
</ol>

To extract out the list items, your general logic would be:

  • Create an array to hold the resulting items
  • Look for text between <li> and </li> tags
  • Repeat whileever there are more tags

A script to achieve this might look something like:

@main
  createvar NSMutableArray elements
  pushbetween <li> exclude </li> exclude
  iffailure end
  :loop
    popIntoVar elements
    pushbetween <li> exclude </li> exclude
    iffailure end
    goto loop
  :end

And to invoke ScrapeKit to use this script, you would use (assuming ARC):

#import <ScrapeKit/ScrapeKit.h>
 
NSString *script = ...;
NSString *input  = ...;
 
SKEngine *engine = [[SKEngine alloc] init];
[engine compile:script error:nil];
[engine parse:input];
 
NSMutableArray *elements = [engine variableFor:@"elements"];
for (NSString *element in elements)
  NSLog(@"List element = %@", element);

For more info, please see the ScrapeKit readme and associated documentation.


Viewing all articles
Browse latest Browse all 27

Trending Articles