DBD::SQLite::Cookbook - The DBD::SQLite Cookbook
This is the DBD::SQLite cookbook.
It is intended to provide a place to keep a variety of functions and formals for use in callback APIs in DBD::SQLite.
This is a simple aggregate function which returns a variance. It is adapted from an example implementation in pysqlite.
package variance; sub new { bless [], shift; } sub step { my ( $self, $value ) = @_; push @$self, $value; } sub finalize { my $self = $_[0]; my $n = @$self; # Variance is NULL unless there is more than one row return undef unless $n || $n == 1; my $mu = 0; foreach my $v ( @$self ) { $mu += $v; } $mu /= $n; my $sigma = 0; foreach my $v ( @$self ) { $sigma += ($x - $mu)**2; } $sigma = $sigma / ($n - 1); return $sigma; } # NOTE: If you use an older DBI (< 1.608), # use $dbh->func(..., "create_aggregate") instead. $dbh->sqlite_create_aggregate( "variance", 1, 'variance' );
The function can then be used as:
SELECT group_name, variance(score) FROM results GROUP BY group_name;
A more efficient variance function, optimized for memory usage at the expense of precision:
package variance2; my $sum = 0; my $count = 0; my %hash; sub new { bless [], shift; } sub step { my ( $self, $value ) = @_; # by truncating and hashing, we can comsume many more data points $value = int($value); # change depending on need for precision # use sprintf for arbitrary fp precision if (defined $hash{$value}) { $hash{$value}++; } else { $hash{$value} = 1; } $sum += $value; $count++; } sub finalize { my $self = $_[0]; # Variance is NULL unless there is more than one row return undef unless $count > 1; # calculate avg my $mu = $sum / $count; my $sigma = 0; foreach my $h (keys %hash) { $sigma += (($h - $mu)**2) * $hash{$h}; } $sigma = $sigma / ($count - 1); return $sigma; }
SELECT group_name, variance2(score) FROM results GROUP BY group_name;
A third variable implementation, designed for arbitrarily large data sets:
package variance; my $mu = 0; my $count = 0; my $S = 0 sub new { bless [], shift; } sub step { my ( $self, $value ) = @_; $count++; $delta = $value - $mu; $mu = $mu + $delta/$count $S = $S + $delta*($value - $mu); } sub finalize { my $self = $_[0]; return $S / ($count - 1); }
SELECT group_name, variance3(score) FROM results GROUP BY group_name;
As explained in http://www.sqlite.org/fts3.html#section_6, each FTS3 table t is stored internally within three regular tables t_content, t_segments and t_segdir. The last two tables contain the fulltext index. The first table t_content stores the complete documents being indexed ... but if copies of the same documents are already stored somewhere else, or can be computed from external resources (for example as HTML or MsWord files in the filesystem), then this is quite a waste of space. SQLite itself only needs the t_content table for implementing the offsets() and snippet() functions, which are not always usable anyway (in particular when using utf8 characters greater than 255).
t
t_content
t_segments
t_segdir
offsets()
snippet()
So an alternative strategy is to use SQLite only for the fulltext index and metadata, and to keep the full documents outside of SQLite : to do so, after each insert or update in the FTS3 table, do an update in the t_content table, setting the content column(s) to NULL. Of course your application will need an algorithm for finding the external resource corresponding to any docid stored within SQLite. Furthermore, SQLite offsets() and snippet() functions cannot be used, so if such functionality is needed, it has to be directly programmed within the Perl application. In short, this strategy is really a hack, because FTS3 was not originally programmed with that behaviour in mind; however it is workable and has a strong impact on the size of the database file.
Bugs should be reported via the CPAN bug tracker at
http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DBD-SQLite
* Add more and varied cookbook recipes, until we have enough to turn them into a separate CPAN distribution.
* Create a series of tests scripts that validate the cookbook recipies.
Adam Kennedy <adamk@cpan.org>
Laurent Dami <dami@cpan.org>
Copyright 2009 Adam Kennedy.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
To install DBD::SQLite, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DBD::SQLite
CPAN shell
perl -MCPAN -e shell install DBD::SQLite
For more information on module installation, please visit the detailed CPAN module installation guide.