TaskPipe::Manual::Installation - how to install and setup TaskPipe
Instructions are presented here for installing TaskPipe on CentOS 7, using MySQL (or MariaDB), PhantomJS and TOR.
Install MySQL
yum install mysql
(Actually strictly speaking this will install MariaDB on CentOS 7).
Install PhantomJS
Working in your home directory:
Install phantom prerequisites:
yum install fontconfig freetype freetype-devel fontconfig-devel libstdc++
Get the bz2 file
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.8-linux-x86_64.tar.bz2
Unpack it
tar -xjvf phantomjs-1.9.8-linux-x86_64.tar.bz2
Create a directory to keep phantomjs
mkdir -p /opt/phantomjs
Copy the files in
cp -r phantomjs-1.9.8-linux-x86_64/* /opt/phantomjs
Remove installation directory
rm -rf phantomjs-1.9.8-linux-x86_64
Create symlink to the phantomjs executable
ln -s /opt/phantomjs/bin/phantomjs /usr/bin/phantomjs
Test phantomjs
phantomjs /opt/phantomjs/examples/hello.js
(Should result in "Hello world!" being printed to the terminal.)
Install TOR
yum install tor
Edit the tor configuration file which should appear at /etc/tor/torrc
/etc/tor/torrc
nano /etc/tor/torrc
Almost all of this file is already commented out. The simplest way to use TOR with TaskPipe is to comment out everything that is not already commented out! ie you will let TaskPipe send config options to TOR when it is launching and stopping instances.
This may just be a section in /etc/tor/torrc which looks like this:
ControlSocket /run/tor/control ControlSocketsGroupWritable 1 CookieAuthentication 1 CookieAuthFile /run/tor/control.authcookie CookieAuthFileGroupReadable 1
Having commented these lines out, save the file.
NOTE There is no need to start TOR. TaskPipe will launch TOR instances as and when are needed.
TaskPipe
Install other dependencies
yum install expat-devel sqlite
You can install TaskPipe using any of the usual methods available for intalling Perl modules. For example:
You can get TaskPipe from CPAN using the cpan shell. If you don't have this already type
cpan
yum install cpan
Then to launch the cpan prompt:
And at the prompt type:
install TaskPipe
You can use cpanp or cpanm. E.g.
cpanp
cpanm
cpanm -i TaskPipe
You can download archive file directly and install via make.
make
tar -xzvf TaskPipe-0.01.tar.gz cd TaskPipe-0.01 perl Makefile.PL make make test make install
Once you have installed TaskPipe, test the command line tool works. At the command line, type
taskpipe help
The first thing you should do is run taskpipe setup. Before doing this, you need to choose a location to install TaskPipe global files. We will assume you are going to install it in the subdirectory taskpipe inside your home directory, but adjust the directory in the commands provided to suit your setup.
taskpipe setup
taskpipe
Also, before proceeding, make sure your home directory is writeable, because TaskPipe will create a file .taskpipe in your home directory.
.taskpipe
Then type
taskpipe setup --root_dir=/home/myusername/taskpipe --job_tracking=none
adjusting /home/myusername/taskpipe to suit your system. You should use an absolute path when executing this command.
/home/myusername/taskpipe
The --job_tracking=none switch is necessary because otherwise taskpipe will try to register the job in the global database, which doesn't exist yet.
--job_tracking=none
Have a look at the files that were created. You should find the following structure:
/home/myusername/taskpipe /global /conf /global.yml /system.yml /lib /logs /projects
TaskPipe complained about the missing global database, so let's set that up.
In a MySQL shell, type
create database taskpipe;
- assuming you will call the global taskpipe database taskpipe (but just change taskpipe in the above command if not).
Create a username that taskpipe can use to interface with the database
create user taskpipe_user@localhost identified by 'somedatabasepassword';
Give your user permissions to the database taskpipe:
grant all privileges on taskpipe.* to taskpipe_user@localhost;
Tell TaskPipe the details of your database. To do this, edit the global config file taskpipe setup created earlier:
vi /home/myusername/taskpipe/global/conf/global.yml
global.yml is important because it contains global TaskPipe settings. There are lots of important settings in this file - but right now you just need to ensure the settings related to the global database are correct.
global.yml
TaskPipe uses MooseX::ConfigCascade to load variables from config files (see the docs for that module for more information) - which means config variables are listed under the modules that they load to. You are looking for the module TaskPipe::SchemaManager::Settings_Global.
TaskPipe::SchemaManager::Settings_Global
Find this module and look at the settings underneath. You need to make sure these settings are correct for your database. Specifically you should replace the tilde ~ that appears next to username, password and database - but also check database, host and method are correct.
~
username
password
database
host
method
Deploy the global tables:
taskpipe deploy tables --scope=global
Also, TaskPipe uses the DBIx::Class ORM to talk to the database, so you need to generate the DBIx::Class schema files. You should just be able to type
taskpipe generate schema --scope=global
You'll get that warning again when issuing both of these commands - but this should be the last time, because the database is now set up.
The fastest way to get a TaskPipe project up and running is to deploy the built-in sample project. (At the time of writing TaskPipe has only one built-in sample project, but more may be included later). This can be accomplished by adding --sample=SP500 to each of the installation commands. (The sample project scrapes quotes for the companies on the S&P500 list, and is called SP500)
--sample=SP500
SP500
If you intend to create a bare project, then omit the --sample parameter from the commands that follow, and change the project name from SP500 to whatever you are going to call your new project.
--sample
taskpipe deploy files --project=SP500 --sample=SP500
You should find a new entry under your /projects directory with the following structure:
/projects
/projects /SP500 /conf /project.yml /lib # some Perl modules here /logs /plans /plan.yml /sources
Again, the next step is to tell your project about the database. TaskPipe uses a one database per project philosophy, with each project database being separate from the global database. So again, in MySQL you need to create a database:
create database SP500;
Grant privileges to the mysql user you created earlier
grant all privileges on SP500.* to taskpipe_user@localhost;
Edit the project configuration file and enter the details of your project database:
vi /home/myusername/taskpipe/projects/SP500/conf/project.yml
Complete the information in the TaskPipe::SchemaManager::Settings_Project section:
TaskPipe::SchemaManager::Settings_Project
TaskPipe::SchemaManager::Settings_Project: database: SP500 host: localhost method: dbi module: TaskPipe::Schema password: somecrazypassword table_prefix: tp_ type: mysql username: taskpipe_user
And now (back at the command line):
taskpipe deploy tables --project=SP500 --sample=SP500
Generate the DBIx::Class schema files associated with the tables you just created:
taskpipe generate schema --project=SP500
If all went without complaint, you can now go right ahead and run the plan:
taskpipe run plan --project=SP500
This project uses PhantomJS to render the the page. This is necessary for this particular scrape, because the pages which contain the quote information get their values via ajax.
PhantomJS
You may note a pause of 10 - 20 seconds or so near the beginning of the run, as PhantomJS initialises. Then the process should proceed to gather the S&P500 company information and quotes quite quickly.
In a MySQL shell you can type
use SP500; select * from company;
to see the data being gathered.
2 POD Errors
The following errors were encountered while parsing the POD:
'=item' outside of any '=over'
You forgot a '=back' before '=head2'
To install TaskPipe, copy and paste the appropriate command in to your terminal.
cpanm TaskPipe
CPAN shell
perl -MCPAN -e shell install TaskPipe
For more information on module installation, please visit the detailed CPAN module installation guide.