|
|
getting and sending cookies Thursday, May 3 2012
Today as I worked my way through a series of specifications for a "health check" system for a website, I realized that I would have to simulate a logged-in user as I performed the actions of a web spider. A web spider is a piece of software running on a computer that behaves like a human with a web browser but that operates robotically, downloading and parsing content and then maybe following links, perhaps to other pages where the process can repeat. I've written web spiders in the past, but I've never had the challenge of impersonating a user who has logged in. To do that, the spider must have set a cookie in its browser, but with a spider there is no browser. To simulate the existence of a cookie, the spider has to somehow pay attention to the website when it tries to set a cookie, store it, and then send that cookie with every subsequent hit to the site. I'd never done anything like this before, and at first I had no idea how to proceed. But it turns out that the CURL library (which is built in to recent versions of PHP) supports a provision both for extracting content from headers (where a cookie is initially sent) and also for sending it back along with the rest of a web request. The following function will send a PHP sessid along with every web request, allowing it to maintain a logged-in session.
function do_curl_request($url, $data, $sessid)
{
$fields_string="";
foreach($data as $key=>$value)
{
$fields_string .= $key.'='.$value.'&';
}
rtrim($fields_string,'&');
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_NOBODY,0);
curl_setopt($ch,CURLOPT_POST,count($data));
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_HEADER, 1);
if($sessid!="")
{
curl_setopt($ch,CURLOPT_COOKIE,"PHPSESSID=" . $sessid . ";");
}
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
To get the sessid in the first place, you can parse that $result string (which will contain a header) and look for content that begins with "PHPSESSID=" and ends with ";". It's rare at this point in my career that I impress myself with newly-gained powers, but being able to spider a website as if I am a logged-in user was such an occasion.
This evening after a long day of productive work, Gretchen and I drove down to New Paltz on the pretense of picking up Nancy at the bus station there. (Nancy had put in a workday down in Manhattan working as a graphic designer for a print publication.) The exciting thing about going to New Paltz was our plan to eat dinner at P&G, the sprawling murkily-lit expanse of worn wooden booths where the food is cheap and delicious and the atmosphere is cozy and vibrant. The other day Gretchen had been at P&G on a weekend night, when it had been overrun by the fake ID college date rapist demographic, but tonight it was lower key and peopled by a diverse mix of ages. The people at the bar all looked like regulars: aging alcoholics allowed to drink too much and talking just a bit too loud. That part was like a dive bar, but with all of the good things and none of the bad. We all ordered veggie burgers and fries and I had a Lagunitas IPA, the first sip of which tasted exactly like ruby red grapefruit juice.
For linking purposes this article's URL is: http://asecular.com/blog.php?120503 feedback previous | next |