The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

LRpt - Perl extension for comparing and reporting results of database selects

SYNOPSIS

  ###################################################################
  #
  # Simple report
  #
  ###################################################################
  
  lks.pl --keys=keys.txt selects.txt > sel_subs.txt
  
  lcsvdmp.pl --conn_file=conn_file.txt --path=data sel_subs.txt 
  
  lrptxml.pl --rldf=rldf.xml --selects=sel_subs.txt --keys_file=keys.txt 
             data > report.xml
  
  
  ###################################################################
  #
  # Diff report
  #
  ###################################################################
  
  lks.pl --keys=keys.txt selects.txt > sel_subs.txt
  
  lcsvdmp.pl --conn_file=conn_file.txt --path=data_state1 sel_subs.txt
  
  ... some actions on the database...
  
  lcsvdmp.pl --conn_file=conn_file.txt --path=data_state2 sel_subs.txt
  
  lcsvdiff.pl --all --keys_file=keys.txt data_state1 data_state2 
             > diffs.txt
  
  lrptxml.pl --diffs --rldf=rldf.xml --selects=sel_subs.txt 
             --keys_file=keys.txt diffs.txt > report_aft.xml
  
  
  ###################################################################
  #
  # Expected-actual diff report
  #
  ###################################################################
  
  lks.pl --keys=keys.txt selects.txt > sel_subs.txt
  
  lcsvdmp.pl --conn_file=conn_file.txt --path=data sel_subs.txt
  
  lcsveadiff.pl --keys_file=keys.txt --expectations=exp.xml
            --cmp_rules=cmp_rules.xml data > eadiffs.xml
  
  

INSTALLATION

You can get the most recent code from lreport project on source forge http://lreport.sourceforge.net. You are given 2 options - you can either download a ppm active state package or a CPAN package.

Currently, the module is not available on CPAN (http://www.cpan.org).

DESCRIPTION

LRpt (LReport) is a set of tools for database row sets comparison and reporting. The core logic operates on csv files, however, LReport also provides tools for converting select results to csv files.

HOW IS THIS MANUAL ORGANIZED

Although the concepts behind this tool are not particularly sophisticated, it is not easy to put shortly and concisely what does the tool actually do. So firstly, I will give you an example, showing what does the tool do. This will help you to quickly decide if this tool is something you are looking for.

If you consider LReport usefull, and want to find out how to use it, go on and read the second example. It will guide you step by step on how to make the tool to do the stuff you want it to do.

Once you are done with examples, you can go on and read the REFERENCE MANUAL chapter, which is a more systematic treament of the topic.

WHAT DOES THE TOOL REALLY DO

I created LReport to help me in my tests. When I worked as a tester of business applications, I have noticed that I am wasting much of my time checking what has changed in the database and documenting my observations. Although you cannot fully replace human beings in testing, you can at least support them in the most repetitive, boring and error prone tasks.

EXAMPLE 1

You want to check how a given transaction affects a database.

You are not going to observe the whole database. Just results of 2 selects:

select1: select * from customer where customer id = 1234
select2: select * from service where customer id = 1234

Say, results of those selects before the transaction are as follows:

  select1 (customer) :
     ---------------------------------------------
     | customer_id | name  | last_name | address  |
     ---------------------------------------------
     |        1234 | Jan   | Nowak     | Warszawa |
     --------------------------------------------- 
  
  select2 (service) :
     ---------------------------------------------------
     | customer_id | service_type | price | status      |
     ---------------------------------------------------
     |        1234 |         MAIL | 1.30  | ACTIVE      |
     ---------------------------------------------------
     |        1234 |        VOICE | 0.34  | ACTIVE      |
     ---------------------------------------------------
     

After the transaction, the same selects return the following results:

  select1:
     ---------------------------------------------
     | customer_id | name  | last_name | address  |
     ---------------------------------------------
     |        1234 | Jan   | Nowak     | Warszawa |
     --------------------------------------------- 
  
  select2:
     ---------------------------------------------------
     | customer_id | service_type | price | status      |
     ---------------------------------------------------
     |        1234 |         GPRS | 2.05  | ACTIVE      |
     ---------------------------------------------------
     |        1234 |        VOICE | 0.34  | DEACTIVATED |
     ---------------------------------------------------
     

You would like to see what are differences between select results before and after the transaction. After processing the data through the tool chain, you get the following diff report

  lcsvdiff.pl before/service.txt after/service.txt 
  INS( 1234#GPRS )
  DEL( 1234#MAIL )
  UPD( 1234#VOICE ): status: ACTIVE =#=> DEACTIVATED
  

Those entries mean the following:

INS( 1234#GPRS )

New row has been inserted with a key 1234#GPRS

DEL( 1234#MAIL )

The row with a key 1234#MAIL has been removed

UPD( 1234#VOICE ): status

In a row with a key (which is customer_id and service_type ) 1234#VOICE, value in a column status has changed from ACTIVE to DEACTIVATED

As you see, there is nothing about select1. This is fine because nothing has changed in customer table.

You can also generate an xml report about the current results of selects. The report will look like this:

  <report>
      <customer>
          <statement><![CDATA[select * from customer where customer_id = 1234]]>
          </statement>
          <header>
              <customer_id/>
              <name/>
              <last_name/>
              <address/>
          </header>
          <data>
              <equal>
                  <customer_id>1234</customer_id>
                  <name>Jan</name>
                  <last_name>Nowak</last_name>
                  <address>Warszawa</address>
              </equal>
          </data>
      </customer>
      <service>
          <statement><![CDATA[select * from service where customer_id = 1234]]>
          </statement>
          <header>
              <customer_id/>
              <service_type/>
              <price/>
              <status/>
          </header>
          <data>
              <additional>
                  <customer_id>1234</customer_id>
                  <service_type>GPRS</service_type>
                  <price>2.05</price>
                  <status>ACTIVE</status>
              </additional>
              <missing>
                  <customer_id>1234</customer_id>
                  <service_type>MAIL</service_type>
                  <price>1.30</price>
                  <status>ACTIVE</status>
              </missing>
              <different>
                  <customer_id>1234</customer_id>
                  <service_type>VOICE</service_type>
                  <price>0.34</price>
                  <status>
                      <old_value>ACTIVE</old_value>
                      <new_value>DEACTIVATED</new_value>
                  </status>
              <different>
          </data>
      </service>
  </report>

You can transform it to some other formats using XSLT. You can transform it to RTF. You can find an RTF document for this example at http:/xxxx.

CONCLUSION

I hope, you now know what can you expect from LReport. There is more LReport can offer you, for example expected-actual comparison. But for starters this example should be enough. If this is what you are looking for, read on. In the next example, I am going to be more specific on how does the LReport tool chain work.

HOW DOES IT WORK

Now I am going to show you another example. It will work on the same data as in the first example. However, now I will be more specific on how do you use LReport tools to achieve results you want.

EXAMPLE 2

Let's see what steps do we need to perform in order to get results presented in previous example.

You can split the whole processing on the following steps:

1. Run selects and dump their results to csv files
2. Run the transaction, which is about to modify a database
3. Run selects again and dump their results to another set of csv files
4. Compare csv files from before- and after-transaction state
5. Generate XML report

Let's observe those steps one by one.

Step 1: Run selects and dump their results to csv files

It is important to understand that LReport operates on csv files. All comparisons and reporting is done on csv files as a source of data. I will elaborate in LIMITATIONS AND WARNINGS, what are potential benefits and flaws of such an approach. LReport makes the following assumptions about the format of csv files:

  • Result of each select is placed in a separate file

  • File extension is txt

  • Field separator is tab (tabulation)

  • First row in each file should be a row with columns' names

You can modify some of those defaults. You can read about it in section DEFAULTS AND ENVIRONMENT VARIABLES.

The goal of step 1 is to put the data into csv files so other tools from LReport tool chain can work with them. If your database server provides you with tools allowing to create such files, you don't have to use the tool provided by LReport. But let's assume we will use LReport to do it for us. The tool we have to use is lcsvdmp.pl which stands for "LReport csv dump". You call the lcsvdmp.pl in a following way:

  lcsvdmp.pl --conn_file=conn_file.txt --path=data_dir selects.txt

Meaning of parameters:

--path

This is a directory in which csv files will be placed. If this parameter is not given, files will be created in the current directory.

--conn_file

This may be difficult to understand on the beginning. This is the name of the database connection file, which contains instructions on how to connect to the database. lcsvdmp.pl uses DBI to connect to database. As you know each database driver (DBD) has a slightly different flavour. So the syntax of the connection string for Sybase is slightly different from that for Oracle, and is slightly different from Mysql and so on. In order to make lcsvdmp.pl flexible and be able to deal with any source for which DBD exists, I passed the responsibility of opening a connection on a user. Connection file should contain a perl code, which opens a connection and assigns a reference to datbase handle to a variable named $dbh. lcsvdmp.pl will load contents of connection file and eval it.

This an example of opening a connection to ODBC source. As you can see you can even include additional configuration code there:

  use Win32::ODBC;
  my $DBName     = "INVOICES";
  my $DBServer   = "MYSERVER";
  my $DBUser     = "pkaluski";
  my $DBPassword = "password";
  
  no strict;
  Win32::ODBC::ConfigDSN( ODBC_CONFIG_DSN, 
                          "Sybase ASE ODBC Driver",
                          "DSN=BolekSybase",
                          "Database=$DBName",
                          "InterfacesFileServerName=$DBServer");
  use strict;
  my $error = Win32::ODBC::Error();
  if( $error ){
      die $error;
  } 
  $dbh = DBI->connect( "DBI:ODBC:DBSybaseSource", $DBUser, $DBPassword,
                       {RaiseError => 1, AutoCommit => 1}
                     );
selects

Points to a file containing selects to be executed and dumped to csv files. For our example it has the following format:

  name: customer
  select: selects * from customer where customer_id = 1234
  %%
  name: service
  select: selects * from customer where customer_id = 1234
  %%

Each select should have a name. LReport for each of selects will create a file with a file name equal to select's name, plus extension. So results of select named 'customer' will go to a file customer.txt.

In our example we run the tool in the following way

  lcsvdmp.pl --conn_file=conn_file.txt --path=before selects.txt

Note that we have given a --path option with value before. It will make lcsvdmp.pl to put all csv files in before directory, which should exist before we run the tool.

Once we run it, in the before directory you should have 2 files - customer.txt and service.txt with results of according selects.

The generated files will look like this:

customer.txt
  customer_id   name    last_name       address
  1234  Jan     Nowak   Warszawa
  
service.txt
  customer_id   service_type    price   status
  1234  VOICE   0.34    ACTIVE
  1234  MAIL    1.30    ACTIVE

Fields are separated by tab characters.

It is time for the step 2.

Step 2: Run the transaction, which will modify the database

Just do it. Once you are done, you have to see what has changed.

Step 3: Run selects again and dump their results to csv files

We use lcsvdmp.pl again. We will run it as follows:

  lcsvdmp.pl --conn_file=conn_file.txt --path=after selects.txt

As you can see, the only difference is that we specify different destination directory for csv files. If we use before, new csvs will simply overwrite the previous ones and we will loose the information about the state before the transaction.

At this stage, we have information about database states before and after the transaction, stored in separate directories. It's time to compare them.

Step 4: Compare csv files from before- and after-transaction state

It's time for the next tool - lcsvdiff.pl.

The command line syntax is as follows

  lcsvdiff.pl --keys_file=keys.txt $before_dir $after_dir

Meaning of parameters:

--keys_file

The name of the file, which contains definitions of row keys for each csv file. Some explanation is needed here. In order to be able to compare 2 sets of rows, you have to define a kind of primary key, which will be a direct link between after and before rows to be compared. You can read more about rows keys in BEFORE-AFTER COMPARISON and ROW KEYS. The file for our example will look like this

  select_name: customer
  key_columns: customer_id:4
  %%
  select_name: service
  key_columns: customer_id:4, service_type
  %%
  

For customer select, the key is a field customer_id. Note the digit in a definition. It describes a length of the field and you should always specify it for numeric fields. Why? As you know lcsvdiff.pl operates on csv files. Those files, by their nature, carry no information about the field type, so they will treat all columns as text. It does not have severe consequences but there is one, which is good to be aware of. 2 is less then 1000 for numeric types. But is greater then 1000 for text comparison. When you specify a field lenght, lcsvdiff.pl will leftpad the value with zeros. So the comparison will be 0002 with 1000 which will give the correct result.

$before_dir

Directory containing csv files with data before the transaction

$after_dir

Directory containing csv files with data after the transaction

lcsvdiff.pl will work as follows:

1. It will load customer.txt files from before and after directories.
2. A row key for each of rows will be calculated

In our case there is one row only

3. It will find out that in both files there is a row with a key 1234 and values in all columns are the same. So there are no differences.
4. It will load service.txt files from both locations.
5. It will calculate a row key for each row.

In case of service, there is a composite key, consisting of columns customer_id and service_type.

6. It will find that a row with a key 1234#GPRS exists in after file but does not exist in before file.

So it concludes that this row was inserted by the transaction. So it will generate an INS entry:

  INS( 1234#GPRS )
7. Then it will find that a row with a key 1234#MAIL exists in before file but does exist in after file.

So it concludes that this row was removed by the transaction. So it generates the DEL entry:

  DEL( 1234#MAIL )
8. It will notice that in both files there is a row with a key (1234,VOICE) (presented as 1234#VOICE).

It will find out that the value of column status is different. So it will generate an UPD entry like below

  UPD( 1234#VOICE ): status: ACTIVE =#=> DEACTIVATED
9. Since there would be no more rows, the tool will finish.

The tool will generate all the output on the standard output.

If you were interested in finding the diffences, you are done at this point. If you would like to generate more readable, nicely formated report, you go to step 5:

Step 5: Generate XML report

We have to use lrptxml.pl tool. But I am not going to describe it here, to avoid bogging you down with too much details. You can read about reporting in manual page for lrptxml.pl.

REFERENCE MANUAL

In this and following chapters you will find a more systematic treatment of LReport. If you haven't read examples above, I recommend you at least to skim over them. They will help you to understand concepts explained below. When appropriate, you are given links to other documentation pages.

LREPORT SUITE

LReport consits of the following tools:

lks.pl

Tool for replacing where key place holders in select files. You can read more about it in man page for LRpt::KeySubst.

lcsvdmp.pl

Tool for dumping results of a group of selects to csv files. You can read more about it in man page for LRpt::CSVDumper.

lcsvdiff.pl

Tool for comparing 2 sets of csv files. You can read more about in in man page for LRpt::CSVDiff.

lcsveadiff.pl

Tool for comparing a group of csv files with expectations. You can read more about it on a man page for LRpt::CSVEADiff.

lrptxml.pl

Tool for creating and xml report from results generated by lcsvdmp.pl and lcsvdiff.pl. You can read more about it on a man page for LRpt::XMLReport.

COMPARISON TYPES

There are 2 types of possible comparisons:

Before-after comparison
Expected-actual comparison

BEFORE-AFTER COMPARISON

A set of selects is executed twice - before and after some transaction, which could potentialy change the data. For each select, rows returned before change are compared with rows returned by the same select after the change. When differences are found, they are reported.

The following differences can be reported:

missing row

The row, which existed before a transaction does not exist any more.

additional row

The row did not exist before a transaction and appeared after the change.

not equal

The row existed before a transaction, still exists after it but some fields of the row are modified.

In order to determine a direct relationship between before and after rows (i.e. to determine which after row should be compared with which before row) a row key is used. A row key is an equivalent of primary key in a database table. It has to be unique and may consist of many columns.

During comparison for particular select, LReport sorts after and before rows by a row key (after and before rows are sorted separately). Then it iterates through all keys found in both collections of rows (before and after rows). If for example before collections have 3 rows with keys (01,AB), (02,GT), (02,JT) and the after collection has rows with keys (01,AB), (03,KP), then LReport will iterate through a following list: (01,AB), (02,GT), (02,JT), (03,KP). So it's basically a sum of sets.

For each value of a row key, LReport finds an after and before row with this key value. If there is no such row in before collection, that means that a new row was created. So it is reported as additional row. If there is no such row in after collection, that means that a row was deleted. It is then reported as missing row. If both after and before row exist, they are compared field by field. If any difference is found, the row is reported as not equal. Otherwise rows are considered equal.

EXPECTED-ACTUAL COMPARISON

NOTE: This functionality is experimental.

This type of comparison may be used for test automation. Sometimes you can create a static set of expected data, which you can use for comparison with actual data. There are many cases, however, when you cannot create static files with expectations, because some column values depend on a date, environment and some circumstances, which are difficult to recreate. So instead, you can implement a script/program, which will predict what could be the output of a test and then generate expectations.

Those expecations may be then compared with results of selects done on actual data. Although it may seem that there is nothing special in expected-actual comparison and it is a slightly modified version of before-after comparison, this is not the case. There are some nuances, which forces an introduction of some new notions and concepts.

In general, expected-actual comparison uses the same core logic as before-after comparison. It parses the expectations and builds collections of expected rows and then compares them with collections of actual rows.

The first big difference is that not all columns are defined in expectations. It makes sense. We often deal with rows, which consist of 40 columns, but we are only interested in 10 of them. It would be infeasible to force a user to define expected values for all those 40 columns. So the user can decide not to specify them.

From the other hand, it would be good to have a protection mechanism, which would warn a user when a really important column has no expectation defined. This is where comparing rules come to the rescue.

Comparing rules are defined for a particular select. They define what to do if an expected value for a given column in not specified. You can make the application terminate, do nothing, warn, use default value. You can find details in COMPARING RULES chapter.

Another big difference is a direct consequence of the previous one. If you are allowed to not specify all columns values, it may happen that a value for column belonging to a row key is not defined. As a result a value of a row key for this row cannot be calculated. So the mechanism of before-after comparison cannot be used in this case. The solution is provided by unkeyed rows and column matching. It is explained below.

The following differences are reported:

missing row

The row was expected to exist but doesn't.

additional row

The row exists even though it is not expected.

not equal

The row exists as expected but some fields contain values different from expected.

unmatched

This type of difference is used for rows, for which row keys are not defined in expectations (due to not giving a value for one more columns belonging to a row key). Such rows are called unkeyed rows

If case of unkeyed row (missing values in columns which are parts of a row key), rows can not be compared on a row by row basis since row key is the only information allowing to determine a direct relationship between rows. Hence, the following solution was developed for such rows:

Each unkeyed row defined in expectations is tried to be matched with all its fields with any of actual rows. If there is no such match, the row is reported as unmatched.

The detailed rules are as follows:

1. Expectations rows with row keys values defined (keyed rows) , have priority.

All actual rows, which are matched with keyed expectation rows are excluded from further comparison.

2. If all keyed expectation rows are processed, unkeyed expectation rows are processed
3. From all unkeyed rows, rows having the biggest number of column values defined are taken first for processing
4. For each unkeyed row, LReport tries to find a remaining actual row (not matched earlier by any other exectation row), whose columns' values match with all column values in expectation row.

If such a row is found, the row is considered matched. If there is no such actual row, the expectation row is reported as unmatched. In addition, the result of comparison with the closest match is reported (see below).

Reporting of matching and unmatching has some additional specifics. Since row key is not defined, it is possible to define a set of identical expectation rows. LReport catches those identical rows and count them. When it reports unmatching rows, it groups identical rows and shows how many of them were not matched. If for example, expectations rows for a given select are

  1, ABC, ACT
  1, ABC, ACT
  1, ABC, ACT

and only 2 of them have been matched with actual rows, row (1, ABC, ACT) is reported to be unmatched once.

You can find details on configuring and using expected-actual comparison (with a step by step example) in manual for LRpt:CSVEADiff.

SELECTS

LReport does not operate on database tables but on select results. Single csv file contain results of one select statement. If the statement is select * from table then it is actually the whole table. But this is a specific example. In general, csv file contain results from one select.

SELECT NAMES

In order to manage select results, choose a proper row key, report format etc. each select should be given a name. In some simple cases it is not required to give select a name, but it is recommended to use a meaningful name whenever possible.

Select name is used for following purposes:

Creating csv file name

Results for each select are stored in a file select_name.ext, where ext is an csv file extension.

Choosing a row key

When a row key is defined, it contains information about names of selects to which it applies

Choosing a report layout

A report layout contains a name of a select, to which it applies

SELECT FILES

Selects are defined in files called select files. They use jar record format. Definition of each select ends with a line beginning with %% with white spaces after it.

Definition of each select consists of 2 fields: name and select. Their meaning is as follows:

name

Name of a select used for csv file creation, row key choosing and so on.

select

Full text of a select. The text can be divided on several lines, no continuation characters are needed on end of lines. The only restrictions you have to obey are:

* No line of a select text may start with string: or %%. Adding single space in front of them is enough to get rid of this limitation.

REPORTING AND FORMATING

LReport provides some facilities for generating nicely formatted reports of selects results and found differences. It does that by generating xml reports, which can be then transformed to some document format using XSLT.

LReport package in the source forge also contain XSLT stylesheet for converting xml report to an RTF document.

Reporting is a wide topic by itself so it is not described in details here. You can read more on reporting in LRpt::XMLReport.

ROW KEYS

The row key is more or less an equivalent of primary key in database theory. It defines a set of columns, which values uniquely identify each row.

Row key is crucial in before-after comparison, since it determines the direct link between before and after row.

In its code LReport treats row key value as a string. If a row key consists of one column, the key value for a row equals the value of the specified column from this row.

If a row key consists of several columns, then <LReport> creates key value by joining values from those columns, using # as a separator. So if values from columns are 3, Kaluski and <20050601>, then the key value will be 3#Kaluski#20050601. Using a column, which values may contain # as a row key part, may lead to unpredicted affects. It would be good to be able to define, what should be the row key value separator. But it is currently not implemented. The hash character # was chosen for the following reasons:

* It is low in characters order. So it not should disturb in string comparisons. The strings abc#1234 and abcd#1234 are compared correctly. abc#1234 is less then abcd#1234. If # would be high in characters order (having ascii code 123 for example), then the comparison above would consider abc#1234 to be greater then abcd#1234

* I found it very unlikely to use such a character in columns which are parts of unique keys.

Row keys values are treated as strings, so they are compared as strings. Therefore keys value 2 is greater then 1000. In order to achieve correct numerical comparison, column value has to be left padded with zeros. 0002 is lower then 1000.

There are several ways of defining a row key. We can define it in a file, give it at command line and use default key. Currently it is recommended to use only a file method. I am still working on the most convinient and expressive way of defining row keys in command line, so things may change. The default row key is a first column of a table/select results.

ROW KEY FILE

Row key file is in jar record format. Definition of each key ends with a line beginning with %% with white spaces after it.

Definition of each key consists of 2 fields: select_name and key_columns. Their meaning is as follows:

select_name

Name or names (comma separarated) of selects, which should use this key. This why giving names to all used selects is important.

key_columns

Comma separated list of columns, which constitute a row key. Column names should be given in the order, in which they appear in the key. In order to left pad with zeros a given column, containing numeric value, the field length has to be specified after the column's name. The field length has to be given after a colon after a column name (no spaces in between). See example below.

Example of a row key file:

  select_name: customer
  key_columns: customer_id:4
  %%
  select_name: service, service_history
  key_columns: customer_id:4, service_type
  %%

As you can see one row key may be defined for several selects. Note that values of customer_id column will be left padded with zeros to create 4 character string.

FORMAT OF CSV FILES

It's an ordinary csv file. LReport expects that a first line of each such a file contains columns' names. Default field separator is tab. You can read more on defaults and changing them in DEFAULTS AND ENVIRONMENT VARIABLES section.

COMPARING RULES

Comparing rules are used in expected-actual comparison. They are used to determine what to do when expectations for a given column are not specified. Currently, the following possibilities are available:

skip_and_warn

Column is important, but not so important to stop the processing. A warning is printed and the comparison goes on

die

Column is so important, that further comparison does not make any sense. LReport (actually lcsveadiff.pl) terminates.

use_default

If column value is not given a default value is used.

use_default_and_warn

If column value is not given a default value is used and a warning is logged

skip

[default] Column is not important. It is not a problem if expectations are not defined. The comparison goes on

You can read more about comparing rules in man page for LRpt::CSVEADiff.

SELECT TEMPLATES

A select template is an additional feature provided by LReport. It's usefull in cases when a user executes the same select or set of selects many times but for different values in a where clause.

EXAMPLE 3

We would like to run the selects from previous example. We want to run it not only for customer_id = 1234 but also for customer_id = 1334 and customer_id = 3476. And maybe some others in future.

We can use a select template here. We define the following file (let's call it sel_tmpl.txt):

  name: customer
  select: select * from customer where --customer_id--
  %%
  name: service
  select: select * from service where --customer_id--

Then we have create a where key file (wkey.txt):

  name: customer_id
  key: customer_id = 1234
  %%

When we now run lks.pl:

  lks.pl --keys=wkey.txt sel_tmpl.txt

Meaning of parameters is as follows:

--keys

A file containing a text, with which a where key placeholder should be replaced.

sel_tmpl.txt

A file containing selects templates

it will print on standard output following lines:

  name: customer
  select: select * from customer where customer_id = 1234
  %%
  name: service
  select: select * from service where customer_id = 1234
  %%

lks.pl will basically replace all entries --entry-- from sel_tmp.txt with a proper definition from wkey.txt. Should we run this to selects for customer_id = 1334, we have to modify wkey.txt only. Output generated by lks.pl can be fed as input to lcsvdmp.pl (in --select option).

You can read more on how lks.pl work in man page for LRpt::KeySubst.

Also have a look at SYNOPSIS.

WHERE KEY FILES

Where keys are defined in files called where key files. They use jar record format. Definition of each key ends with a line beginning with %% with white spaces after it.

Definition of each select consists of 2 fields: name and select. Their meaning is as follows:

name

Name of a key. Has to match with a placeholder in a select template. Where key named abc will replace a placeholder --abc-- in a select template

key

A text with, which a key placeholder in a select template should be replaced. The text can be divided on several lines, no continuation characters are needed on end of lines. The only restrictions you have to obey are:

* No line of a select text may start with string: or %%. Adding single space in front of them is enough to get rid of this limitation.

WORKING WITH DATABASES

Core logic of LReport operates on csv files. However, before we can operate on them, we have to create them first. A tool for creating csv files from select results is lcsvdmp.pl and its usage is explained in example 2.

DATABASE CONNECTION FILE

A database connection file contains instructions on how to connect to the database. It is used by lcsvdmp.pl. As you know each database driver (DBD) has a slightly different flavour. So the syntax of the connection string for Sybase is slightly different from that for Oracle, and is slightly different from Mysql and so on. In order to make lcsvdmp.pl flexible and be able to deal with any source for which DBD exists, I passed the responsibility of opening a connection on a user. Connection file should contain a perl code, which opens a connection and assigns a reference to datbase handle to a variable named $dbh. lcsvdmp.pl will load contents of connection file and eval it.

This an example of opening a connection to ODBC source. As you can see you can even include additional configuration code there:

  use Win32::ODBC;
  my $DBName     = "INVOICES";
  my $DBServer   = "MYSERVER";
  my $DBUser     = "pkaluski";
  my $DBPassword = "password";
  
  no strict;
  Win32::ODBC::ConfigDSN( ODBC_CONFIG_DSN, 
                          "Sybase ASE ODBC Driver",
                          "DSN=DBSybaseSource",
                          "Database=$DBName",
                          "InterfacesFileServerName=$DBServer");
  use strict;
  my $error = Win32::ODBC::Error();
  if( $error ){
      die $error;
  } 
  $dbh = DBI->connect( "DBI:ODBC:DBSybaseSource", $DBUser, $DBPassword,
                       {RaiseError => 1, AutoCommit => 1}
                     );

Do not forget about assigning a reference to opened connection to $dbh variable.

SUPPORTED DATABASES

Since LReport core logic operates on csv files, you can use LReport for any database for which you are able to create csv files for select results. If you use lcsvdmp.pl to do it, you need to have a DBI driver for a database you are working with. LReport uses the following DBI methods:

connect
prepare
execute
fetchall_arrayref

If the DBD for you database supports those methods, lcsvdmp.pl should be able to do its job.

LIMITATIONS AND WARNINGS

LReport core logic works on csv files. This gives the tool significant flexibility, since it can work with virtually any database. However, such an approach does have some consequences, which you should be aware of.

No information on columns data types.

csv files carry no information on data type of each column. LReport treats all columns as texts. This may give surprising results for numeric data. You would expect that 1 and 1.0 are equal, but for LReport they are different texts

LReport does not support database schema comparison

LReport will help you to detect that a field's value has changes from 1 to 2. It also detect that a particular columns is new or does not exist any more. But it will not detect that column datatype has changed from int to char.

No support for null values

LReport does not see null values, since it is problematic how to put null values in csv file. No value (tabtab) in csv file is treated as an empty string. So basically LReport will not be able to find a difference between field containing a null value and a field containing an empty string.

DEFAULTS AND ENVIRONMENT VARIABLES

ENVIRONMENT

LReport has some hardcoded defaults. Each of those defaults can be overriden by environment variable. In some cases, those defaults can be further overriden by command line switches. So the general rule is this:

Check if the command switch specifies a parameter's value. If is does not then =item Check if there is an environmental variable, which defines it. If there is no such variable, use hardcoded default.

There are following parameters, which are used by LReport tools. The list below is given in the format: meaning: hardcoded default -> Environment variable -> command line option

csv filename extension:
  txt -> LRPT_CSV_FILE_EXT -> --ext=ext
Field separator in csv files
  tab -> LRPT_CSV_FIELD_SEPARATOR -> --sep=separator
Location in which csv files should be created
  . (current directory) -> LRPT_CSV_FILE_PATH -> --path=path
Database connection file
  conn_file.txt -> LRPT_CONNECTION_FILE -> --conn_file=file.txt
Chunk size
  1000 -> LRPT_CHUNK_SIZE -> --conn_file=file.txt
Location of global keys file
  keys.txt -> LRPT_GLOBAL_KEYS_FILE --> No command line switch
Separator between different values in diff file
  --#> -> LRPT_DIFF_VALUE_SEPARATOR --> No command line switch

SUPPORTED PLATFORMS

The tool was developed and tested on Windows. I can't think of any reason why it will not work on Unix but it was not tested. Currently it assumes that a name separator in file paths is slash (/) so it probably won't work on VMS and other systems, which do not use slash as a name separator.

DEVELOPMENT RELATED INFORMATION

NOTE: This sections is under construction

If you are brave enough to try to modify the LReport code and the ugliness of the code did not scare you to death, read this chapter. It will provide you with some guidance on how all things are organized. I hope it will make your coding advanture less painfull.

OBJECT/PACKAGE MODEL

All LReport logic is encapsulated in packages. All *.pl tools are only wrappers calling one exported function. For example, lcsvdmp.pl looks as follows:

  use strict;
  use LRpt::CSVDumper;
  
  dump_selects( @ARGV );
  

Some packages are object oriented, some are not.

LRpt provides 4 main objects to perform expected job (see SYNOPSIS):

LRpt::CSVDumper

Dumps results of selects to a csv file. Wrapped by lcsvdmp.pl.

LRpt::CSVDiff

Compares two sets of csv files. Wrapped by lcsvdiff.pl

LRpt::CSVEADiff

Compares set of csv files with expectations. Wrapped by lcsveadiff.pl

LRpt::XMLReport

Converts csv and diff files to xml format. Wrapped by lrptxml.pl

LRpt::KeySubst

Substitutes where keys place holders in select templates.

Apart from those packages there is a bunch of other classes defined. Some of them may appear pretty usefull. Be careful though. I do not consider them public, so I may decide to change them in future. If you find some of them extremely useful, let me know so I will be more careful in my future desing decisions.

LRpt library consists of the following classes:

LRpt::CollDiff

Object for comparing two collections of rows

LRpt::CollEADiff

Object for comparing collection of rows with a set of expectations

LRpt::Collection

Container for data loaded from csv files.

LRpt::CollUnkeyed

Container for unkeyed rows.

LRpt::Config

Object for managing defaults and runtime parameters.

LRpt::JarReader

Object for reading jar records files.

LRpt::RKeysRdr

Object for reading row keys definition files.

Each of this modules have its POD, so if you would like to know more about them, you can go ahead and read it.

REGRESSION TESTS

LReport consists of a bunch of tools. Currently regression tests do not test particular modules separately. Instead there are some end-to-end test scenarios. For each testcase there is set of expectations and the results are compared with those expectations.

As of 10-09-2006 tests are divided into 2 groups: tests of simple reporting and tests of difference reporting. The 't' directory contains 2 directories, 1 for each group - DiffRpt and Report. Let's have a look at DiffRpt directory. It constains the following subdirectories/files:

  dbschema
  dbschema1
  dbschema2
  dbschema3
  t1
  t2
  t3
  t4
  t5
  driver.pl

driver.pl is test driver for diff reporting tests. It contains perl code running scenarious of four tests.

dbschema and dbschema1 are databases. They are based on csv files and are handled by DBD::CVS. They are used to simulate changes in the database. Instead of changing the contents of the database, selects for before state are done on dbschema database and selects for after state are done on dbschema1.

t1 to t4 are directories used by test cases. Let's have a look at t1. The test case code looks like this (it is in t\DiffRpt\driver.pl file):

CODE

On the beginning, t1 contains configuration and input files. It also contains expected directory. It contains a directory structure with files expected to be generated by tested tools. Once you run the test, it will create actual directory in t1. All output files generated by the test case will be placed in the actual directory and its subdirectories. Once the scenario is done, contents of actual and expected directories are compared. If they are the same, that means that the test case generated exactly what we expected. Otherwise, it behaves differently which means that we either introduced a new bug or modified the functionality. We should either remove the bug, or if changes were intentional - modify the test case.

BUGS

As of 15-01-2006 the tool have only one user, which is me. I used it in one environment. You can imagine that there are plenty of undetected bugs which will jump out, once you will start using it for your specific needs. I expect that regression tests may fail on Unix. This is due to end of line incompatibilities.

In case you find a bug, let me know.

TO DO

PERFORMANCE

Thanks to chunking, LReport can dump and compare huge files. Some performance tuning has been already done but there is still some space for improvement.

MANAGING COMPARING RULES

Currently comparing rules (for expected-actual comparison) have to be specified in command lines options. I expect that for intensive use of LReport, it would be convenient to have a common place where all this is defined. LReport tools should look there if nothing is given in the command line.

DOCUMENTATION

This is the first version of a documentation. Some parts of it are still under construction. Some links does not work. I am aware of that. My goal is to provide you with something you can play with. I will keep working on documentation refinements.

SEE ALSO

Project is maintained on Source Forge (http://lreport.sourceforge.net). You can find links to documentation there.

AUTHOR

Piotr Kaluski, <pkaluski@piotrkaluski.com>

COPYRIGHT AND LICENSE

Copyright 2004-2006 by Piotr Kaluski

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.