Trim down Twitter API responses with CakePHP’s Set::extract
I’ve been working a lot with the Twitter API lately; it’s been a frustrating experience, especially after being spoiled by the substantially more robust Facebook platform. One issue of particular concern was the volume of data being returned per request.
I am caching the response data, and I found that retrieving a list of 100 followers was storing over 150KB of cached data. One popular Twitter user could then easily be accounting for well over 1MB of cache files. I only needed about 4 fields from the XML response. I also knew the cakePHP Set class was the answer; I just didn’t know which method. So after a lot of trial and error, I figured out how to cut the fat with Set::extract. I’m not sure if this is the most efficient solution; also not sure if my regular expression is efficient… never did take the time to learn regular expressions. But my cache files are now ~18k vs. the ~150k they used to be.
I’m using the Twitter datasource which transforms Twitter’s XML response into an array. This is necessary as Set::extract expects an array, not an XML object. Assuming a $followers array:
$followers = Set::extract($followers, 'Users.User.{n}.{(id|name|profile_image_url|screen_name)}');
The {n} wildcard will allow every numeric key within ['Users']['User'] to survive the extract.
The {(id|name|profile_image_url|screen_name)} keeps only the data that maps to those keys (id, name, profile_image_url and screen_name); therefore my $followers array is reduced to:
Array
(
[0] => Array
(
[id] => XXXXXXXX
[name] => XXXXXXXX
[screen_name] => XXXXXXXX
[profile_image_url] => XXXXXXXX
)
[1] => Array
(
[id] => XXXXXXXX
[name] => XXXXXXXX
[screen_name] => XXXXXXXX
[profile_image_url] => XXXXXXXX
)
}
There’s some initial processing overhead, but I’m sure that will be more than offset by iterating through a much lighter array in the various actions.
