Archive

Posts Tagged ‘set::extract’

Trim down Twitter API responses with CakePHP’s Set::extract

September 9th, 2009

I’ve been working a lot with the Twitter API lately; it’s been a frustrating experience, especially after being spoiled by the substantially more robust Facebook platform. One issue of particular concern was the volume of data being returned per request.

I am caching the response data, and I found that retrieving a list of 100 followers was storing over 150KB of cached data. One popular Twitter user could then easily be accounting for well over 1MB of cache files. I only needed about 4 fields from the XML response. I also knew the cakePHP Set class was the answer; I just didn’t know which method. So after a lot of trial and error, I figured out how to cut the fat with Set::extract. I’m not sure if this is the most efficient solution; also not sure if my regular expression is efficient… never did take the time to learn regular expressions. But my cache files are now ~18k vs. the ~150k they used to be.

I’m using the Twitter datasource which transforms Twitter’s XML response into an array. This is necessary as Set::extract expects an array, not an XML object. Assuming a $followers array:

$followers = Set::extract($followers, 'Users.User.{n}.{(id|name|profile_image_url|screen_name)}');

The {n} wildcard will allow every numeric key within ['Users']['User'] to survive the extract.
The {(id|name|profile_image_url|screen_name)} keeps only the data that maps to those keys (id, name, profile_image_url and screen_name); therefore my $followers array is reduced to:

Array
(
    [0] => Array
        (
            [id] => XXXXXXXX
            [name] => XXXXXXXX
            [screen_name] => XXXXXXXX
            [profile_image_url] => XXXXXXXX
        )

    [1] => Array
        (
            [id] => XXXXXXXX
            [name] => XXXXXXXX
            [screen_name] => XXXXXXXX
            [profile_image_url] => XXXXXXXX
        )
}

There’s some initial processing overhead, but I’m sure that will be more than offset by iterating through a much lighter array in the various actions.

kettle server-side , , ,