Wednesday, August 09, 2006

Parsing AOL Search Data  [Digg.com This!]

Here's some quick-and-dirty parsing code for parsing the recently 'released' AOL search data. Instructions are specifically for SQL Server 2005 (as CSV import is a nightmare), but it should be ok for SQL Server 2000 and you can modify the Perl for MySQL I'd imagine.

1) Create a database called AOL

2) Create the table:

CREATE TABLE [dbo].[AllData](
[AnonID] [varchar](10)
COLLATE Latin1_General_CI_AS NULL,
[Query] [varchar](150)
COLLATE Latin1_General_CI_AS NULL,
[QueryTime] [varchar](50)
COLLATE Latin1_General_CI_AS NULL,
[ItemRank] [varchar](50)
COLLATE Latin1_General_CI_AS NULL,
[ClickURL] [varchar](250)
COLLATE Latin1_General_CI_AS NULL
) ON [PRIMARY]
3) Create a DSN for your database:

Control Panel -> Administrative Tools -> Data Sources (ODBC)

4) Install ActivePerl or your favourite perl interpreter if you haven't already

5) Install DBI and DBD::ODBC if you haven't already by running ppm3-bin from your Perl\bin directory (in the case of ActivePerl anyway) and type:

install dbi
install dbd-odbc

6) Copy the following Perl into a file and run it. Change the filename (user-....) for each of the ten files and amend as required.

use DBI;
use DBD::ODBC;

$dbh = DBI->connect('DBI:ODBC:aol', "", "");

if (!$dbh)
{
print "z $DBI::err\n$DBI::errstr\n$DBI::state";
}

my $sth = $dbh->prepare(<<SQL);
INSERT INTO AllData
(AnonID, Query, QueryTime, ItemRank, ClickURL)
VALUES (?, ?, ?, ?, ?)
SQL

$FILE = "user-ct-test-collection-01.txt";
open FILE;
$sText = <FILE>;

print "Entering loop\n";

while ($sText ne "")
{
my ($v1, $v2, $v3, $v4, $v5) = split /\t/, $sText;

#print "v1:$v1 v2:$v2 v3:$v3 v4:$v4 v5:$v5\n";

$sth->execute($v1, $v2, $v3, $v4, $v5);

$sText = <FILE>;
}

close FILE;

print "End\n";

1 Comments:

At 1:15 AM, Vaandoo said...

Here's a list of free PERL books for quick reference:
Top 4 Perl Books for free download.

 

Post a Comment

<< Home