The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

typesafety.pm - compile-time object type usage static analysis

ABSTRACT

Perform heuristics on your program before it is run, with a goal of insuring that object oriented types are used consistently -- the correct class (or a subclass of it) is returned in the right places, provided in method call argument lists in the right places, only assigned to the right variables, and so on. This is a standard feature of non-dynamic languages such as Java, C++, and C#. Lack of this feature is one of the main reasons Perl is said not to be a "real" object oriented language.

SYNOPSIS

  package main;
  use typesafety; # 'summary', 'debug';

  my FooBar $foo;            # establish type-checked variables
  my FooBar $bar;            # FooBar is the base class of references $bar will hold
  my BazQux $baz;

  $foo = new FooBar;         # this is okay, because $foo holds FooBars
  $bar = $foo;               # this is okay, because $bar also holds FooBars
  # $foo = 10;               # this would throw an error - 10 is not a FooBar
  # $baz = $foo;             # not allowed - FooBar isn't a BazQux
  $foo = $baz;               # is allowed -  BazQux is a FooBar because of inheritance
  $bar = $foo->foo($baz, 1); # this is okay, as FooBar::foo() returns FooBars also

  typesafety::check();   # perform type check static analysis

  #

  package FooBar;
  use typesafety;

  # unneeded - new() defaults to prototype to return same type as package
  # proto 'new', returns => 'FooBar'; 

  sub new {
      bless [], $_[0]; 
      # or: bless whatever, __PACKAGE__;
      # or: bless whatever, 'FooBar';
      # or: my $type = shift; bless whatever, $type;
      # or: my $type = shift; $type = ref $type if ref $type; bless whatever, $type;
  }

  sub foo (FooBar; BazQux, undef) { my $me = shift; return $me->new(); } 

  # or: proto 'foo', returns => 'FooBar'; sub foo { my $me = shift; return $me->new(); } 

  #

  package BazQux;
  use typesafety;
  @ISA = 'FooBar';

DESCRIPTION

This module is similar to "strict.pm" or "taint.pm" in that it checks your program for classes of possible errors. It identifies possible data flow routes and performs heuristics on the data flow to rule out the possibility of the

Important

This software is BETA! Critical things seem to work, but it needs more testing (for bugs and usability) from the public before I can call it "1.0". The API is subject to change (and has already changed with each version so far). This is the first version where I'm happy with the basic functionality and consider it usable, so I'm calling it beta. While it correctly makes sense of a lot of code related to types in OO, there's still a lot of code out there in the wild that it mistakes for an object related construct and causes death and internal bleeding when it foolishly tries to swollow it.

IMPORTANT: This module depends on B::Generate, but the version up on CPAN doesn't build cleanly against current versions of Perl. I have a modified version of B::Generate up in my area that works, at least for me. As I write this, Perl 5.8.8 is current.

IMPORTANT: Like adapting a Perl 4 program to compile cleanly on Perl 5 with strict and warnings in effect, adapting a Perl 5 program to cleanly pass type checking is a major undertaking. And like adapting a Perl 4 program for strict, it takes some self-education and adjustment on the part of the programmer. Also like adapting a program for strict, it's an extremely rewarding habit to get into for a program that might grow to tens of thousands of lines. I suggest making it a corporate project (with large sums of money budged towards consulting fees for me) or else for the adventurous and broad-minded.

IMPORTANT-ish: There's a good tutorial on strong typing (type safety, type checking) in my _Perl 6 Now: The Core Ideas Illustrated with Perl 5_ along with loads of other great stuff (you should buy it just for the two chapters on coroutines). See http://perl6now.com for excerpts, more plugging, and links to buy.

Strong Typing

Failure to keep track what kind of data is in a given variable or returned from a given method is an epic source of confusion and frustration during debugging.

Given a ->get_pet() method, you might try to bathe the output. If it always a dog during testing, everything is fine, but sooner or later, you're going to get a cat, and that can be rather bloody.

Welcome to Type Safety. Type Safety means knowing what kind of data you have (atleast in general - it may be a subclass of the type you know you have). Because you always know what kind of data it is, you see in advance when you try to use something too generic (like a pet) where you want something more specific (like a dog, or atleast a pet that implements the "washable" interface).

Think of Type Safety as a new kind of variable scoping - instead of scoping where the variables can be seen from, you're scoping what kind of data they might contain.

"Before hand" means when the program is parsed, not while it is running. This prevents bugs from "hiding". I'm sure you're familiar with evil bugs, lurking in the dark, il-used corners of the program, like so many a grue. Like Perl's use strict and use warnings and use diagnostics, potential problems are brought to your attention before they are proven to be a problem by tripping on them while the program happens on that nasty part of code. You might get too much information, but you'll never have to test every aspect of the program to try to uncover these sorts of warnings. Now you understand the difference between "run time diagnostics" and "compile time warnings".

Asserts in the code, checking return values manually, are an example of run-time type checking:

  # we die unexpectedly, but atleast bad values don't creep around!
  # too bad our program is so ugly, full of checks and possible bad
  # cases to check for...

  my $foo = PetStore->get_pet();
  $foo->isa('Dog') or die; 

Run-time type checking misses errors unless a certain path of execution is taken, leaving little time bombs to go off, showing up later. More importantly, it clutters up the code with endless "debugging" checks, known as "asserts", from the C language macro of the same name.

Type Safety is a cornerstone of Object Oriented programming. It works with Polymorphism and Inheritance (including Interface Inheritance).

Use typesafety.pm while developing. Comment out the typesafety::check() statement when placing the code into production. This emulates what is done with compiled languages - types are checked only after the last time changes are made to the code. The type checking is costly in terms of CPU, and as long as the code stays the same, the results won't change. If everything was type safe the last time you tested, and you haven't changed anything, then it still is.

A few specific things are inspected in the program when typesafety::check() is called:

  $a = $b;

Variable assignment. Rules are only applied to variables that are "type safe" - a type safe variable was declared using one of the two constructs shown in the SYNOPSIS. If it isn't type safe, none of these rules apply. Otherwise, $b must be the same type as $a, or a subclass of $a's type. In other words, the types must "match".

  $a->meth();

Method call. If $a is type safe, then the method meth() must exist in whichever package $a was prototyped to hold a reference to. Note that type safety can't keep you from trying to use a null reference (uninitialized variable), only from trying to call methods that haven't been proven to be part of the module they're prototyped to hold a reference to. If the method hasn't been prototyped in that module, then a ->can() test is done at compile time. Inheritance is handled this way.

  $a = new Foo;

Package constructors are always assumed to return an object of the same type as their package. In this case, $a->isa('Foo') is expected to be true after this assignment. This may be overridden with a prototype for your abstract factory constructors (which really belong in another method anyway, but I'm feeling generous). The return type of Foo->new() must match the type of $a, as a seperate matter. To match, it must match exactly or else be a subclass of $a expects. This is just like the simple case of "variable assignment", above. If new() has arguments prototyped for it, the arguments types must also match. This is just like "Method call", above.

  $a = $foo->new();

Same as above. If $foo is type checked and $a is not, then arguments to the new() method are still checked against any prototype. If $a is type checked, then the return value of new() must match. If no prototype exists for new() in whatever package $foo belongs to, then, as above, the return value is assumed to be the same as the package $foo belongs to. In other words, in normal circumstances, you don't have to prototype methods.

  $b = $a->bar();

As above: the return type of bar() must be the same as, or a subclass of, $b's, if $b is type safe. If $a is type safe and there is a prototype on bar(), then argument types are inforced.

  $b = $a->bar($a->baz(), $z->qux());

The above rules apply recursively: if a method call is made to compute an argument, and the arguments of the bar() method are prototyped, then the return values of method calls made to compute the arguments must match the prototype. Any of the arguments in the prototype may be undef, in which case no particular type is enforced. Only object types are enforced - if you want to pass an array reference, then bless that array reference into a package and make it an object.

  bless something, $_[0];
  bless something, __PACKAGE__;
  bless something, 'FooBar';

This is considered to return an object of the type of the hard-coded value or of the present package. This value may "fall through" and be the default return value of the function.

  return $whatever;

Return values in methods must match the return type prototyped for that method.

  push @a, $type;
  unshift @a, $type;
  $type = pop @a;
  $type = shift @a;
  $type = $a[5];

When typed variables and typed expressions are used in conjunction with arrays, the array takes on the types of all of the input values. Arrays only take on types when assigned from another array, a value is pushed onto it, or a value is unshifted onto it. Whenever the array is used to generate a value with an index, via pop, or via unshift, the expected type is compared to each of the types the array holds values from. Should a value be assigned to the array that is incompatiable with the types expected of the array, the program dies with a diagnostic message. This feature is extremely experimental. In theory, this type of automatic type inference could be applied to method arguments, scalars, and so forth, such that types can be specified by the programmer when desired, but never need to be, and the program is still fully type checked. O'Caml reported does this, but with a concept of records like datastructures, where different elements of an array are typed seperately if the array isn't used as a queue. We only support one type for the whole array, as if it were a queue of sorts.

  sub foo (FooBar; BazQux, undef) { my $me = shift; return $me->new(); } 

Method prototypes are provided in the () after method name. You might recognize the () from perlsub. You might also remember perlsub explaining that these prototypes aren't prototypes in the normal meaning of the word. Well, with typesafety.pm, they are. The format is (ReturnType; FirstArgType, SecondArgType, ThirdArgType). Any of them may be undef, in which case nothing is done in the way of enforcement for that argument. The ReturnType is what the method returns - it is seperated from the arguments with a simicolon (;). The argument types are then listed, seperated by commas (,). Any calls made to that method (well, almost any) will be checked against this prototype.

  sub foo (FooBar; BazQux) {
    my $b = $_[0];
    my $a = shift;
    # ...
  }

Arguments read from prototyped methods using a simple shift or $_[n] take the correct type from the prototype. shift @_ should work, too - it is the same thing. In this example, $a and $b would be of type BazQux. Of course, you can, and probably should, explicitly specify the type: my BazQux $a = shift;.

  typesafety::check(); 

This must be done after setting things up to perform actual type checking, or it can be commented out for production. The module will still need to be used to provide the proto(), and add the attribute.pm interface handlers.

Giving the 'summary' argument to the use typesafety line generates a report of defined types when typesafety::check() is run:

  typesafety.pm status report:
  ----------------------------
  variable $baz, type BazQux, defined in package main, file test.7.pl, line 36
  variable $bar, type FooBar, defined in package main, file test.7.pl, line 34
  variable $foo, type FooBar, defined in package main, file test.7.pl, line 33

I don't know what this is good for except warm fuzzy feelings.

You can also specify a 'debug' flag, but I don't expect it will be very helpful to you.

DIAGNOSTICS

  unsafe assignment:  in package main, file test.7.pl, line 42 - variable $baz, 
  type BazQux, defined in package main, file test.7.pl, line 36 cannot hold method 
  foo, type FooBar, defined in package FooBar, file test.7.pl, line 6 at 
  typesafety.pm line 303.

There are actually a lot of different diagnostic messages, but they are all somewhat similar. Either something was being assigned to something it shouldn't have been, or else something is being passed in place of something it shouldn't be. The location of the relavent definitions as well the actual error are included, along with the line in typesafety.pm, which is only useful to me.

EXPORT

proto() is always exported. This is considered a bug.

BUGS

My favorite section!

Yes, every module I write mentions Damian Conway =)

Testing 13 is commented out because it was failing ( $foo{bar} = $foo{baz} where each slot held a different object type).

Constructs like $foo->bar->() were kicking its butt (functions that return closures) and probably still are. Not sure about closure handling. This is on my todo. Not having it is an embarasement.

my Foo $bar is used by fields as well.

Blesses are only recognized as returning a given type when not used with a variable, or when used with $_[0]. E.g., all of these are recognized: bless {}, 'FooBar', bless {}, $_[0], and bless {}, __PACKAGE__. (__PACKAGE__ is a litteral as far as perl is concerned). Doing bless {}, $type and other constructs will throw a diagnostic about an unrecognized construct - typesafety.pm loses track of the significance of $_[0] when it is assigned to another variable. To get this going, I'd have to track data as it is unshifted from arguments into other things, and I'd have to recognize the result of ref or the first argument to new as a special thing that produces a predictable type when feed to new as the second argument. Meaty. Update: a few more constructs are supported: my $type = shift; bless whatever, $type; the most significant. Still, you won't have much trouble stumping this thing.

undef isn't accepted in place of an object. Most OO langauges permit this - however, it is a well known duality that leads to checking each return value. This is a nasty case of explicit type case analysis syndrome. Rather than each return value be checked for nullness (or undefness, in the case of Perl) and the error handling logic be repeated each place where a return value is expected, use the introduce null object pattern: return values should always be the indicated type - however, a special subclass of that type can throw an error when any of its methods are accessed. Should a method call be performed to a method that promises it will always return a given type, and this return value isn't really needed, and failure is acceptable, the return can be compared to the special null object of that class. The normal situation, where a success return is expected, is handled correctly without having to introduce any ugly return checking code or diagnostics. The error reporting code is refactored into the special null object, rather than be littered around the program, in other words.

We're intimately tied to the bytecode tree, the structure of which could easily change in future versions of Perl. This works on my 5.9.0 pre-alpha. It might not work at all on what you have.

Only operations on lexical my variables are supported. Attempting to assign a global to a typed variable will be ignored - type errors won't be reported. Global variables themselves cannot yet be type checked. All doable, just ran out of steam.

Only operations on methods using the $ob->method(args) syntax is supported - function calls are not prototyped nor recognized. Stick to method calls for now. New - function prototypes might work, but I haven't tested this, nor written a test case.

Types should be considered to match if the last part matches - Foo::Bar->isa('Bar') would be true. This might take some doing. Workaround to :: not being allowed in attribute-prototypes. Presently, programs with nested classes, like Foo::Bar, cannot have these types assigned to variables. No longer true - the declare() syntax is a work-around to this.

Many valid, safe expressions will stump this thing. It doesn't yet understand all operations - only a small chunk of them. map { }, when the last thing in the block is type safe, grep { }, slice operations on arrays, and dozens of other things could be treated as safe. When typesafety.pm encounters something it doesn't understand, it barfs.

We use B::Generate just for the ->sv() method. Nothing else. I promise! We're not modifying the byte code tree, just reporting on it. I do have some ideas for using B::Generate, but don't go off thinking that this module does radical run time self modifying code stuff. XXX this should go anyway; B has equivilents but it needs a wrapper to switch on the object type to get the right one for the situation.

The root (code not in functions) of main:: is checked, but not the roots of other modules. I don't know how to get a handle on them. Sorry. Methods and functions in main:: and other namespaces that use typesafety; get checked, of course. Update: B::Utils will give me a handle on those, I think, but I'm too lazy to add support.

Having to call a "check" function is kind of a kludge. I think this could be done in a CHECK { } block, but right now, the typesafety::check() call may be commented out, and the code should start up very quickly, only having to compile the few thousand lines of code in typesafety.pm, and not having to actually recurse through the bytecode. Modules we use have a chance to run at the root level, which lets the proto() functions all run, if we are used after they are, but the main package has no such benefit. Running at CHECK time doesn't let anything run.

The B tree matching, navigation, and type solving logic should be presented as a reusable API, and a module specific to this task should use that module. After I learn what the pattern is and fascilities are really needed, I'll consider this.

Tests aren't run automatically - I really need to fix this. I keep running them by hand. It is one big file where each commented-out line gets uncommented one by one. This makes normal testing procedures awkward. I'll have to rig something up.

Some things just plain might not work as described. Let me know.

FUTURE DIRECTION

  sub foo (FooBar $a, BazQux $b) { 
  }

This should use B::Generate to insert instructions into the op tree to shift @_ into $a and <$b>. When foo() runs, $a and $b would contain the argument values. Also, support for named parameters - each key in the parameter list could be associated with a type. This is much more perlish than mere argument order (bleah). That might look something like:

  sub foo (returns => FooBar, name => NameObject, color => ColorObject, file => IO::Handle) {
  }

This would first require support for hashes, period. Then support for types on individual hash keys when hash keys are literal constants.

Support for hashes is also sorely needed for type safe access to instance variables:

  sub foo (FooBar; undef) {
    my $self = shift; 
    return $self->{0}; # XXX dies, even if we always store only FooBars in $self->{0}!
  }

Scalars without explicitly defined types and method parameters to unprototyped methods should be given the same treatment as arrays - the type usage history should be tracked, and if an inconsistency is found, it should be reported.

map {}, grep {}, and probably numerous other operations should be supported on arrays. Probably numerous other operations should be supported on scalars. If you stub your toe on something and just can't stand it, let me know. I'll look into making it work.

private, public, protected, friendly protection levels, as well as static. Non-static methods aren't callable in packages specified by constant names, only via padsvs and such ($a-meth()>, not Foo-meth()>. Eg, FooBar-bleah()>, bleah() must be prototyped static if prototyped. Non-static methods should get a $this that they can make method calls in.

See also the comments at the top of the typesafety.pm file.

Even though I have plenty of ideas of things I'd like to do with this module, I'm really pleased with this module as it is now. However, you're likely to try to use it for things that I haven't thought of and be sorely dissappointed. Should you find it lacking, or find a way in which it could really shine for your application, let me know. I'm not likely to do any more work on this beyond bug fixes unless I get the clear impression that doing so would make someone happy. If no one else cares, then neither do I.

HISTORY

This is the fifth snapshot. The first was ugly, ugly, ugly and contained horrific bugs and the implementation was lacking. The second continued to lack numerous critical features but the code was radically cleaned up. In the third version, I learned about the context bits in opcodes, and used that to deturmine whether an opcode pushed nothing onto the stack, or whether it pushed something that I didn't know what was, for opcodes that I didn't have explicit heuristics coded for. This was a huge leap forward. This fourth version added support for more bless() idioms and fixed return() to check the return value against the method prototype rather than the type expected by the return's parent opcode, and added support for shift() and indexing argument array and got generic types for arrays working. Version four also introduced the concept of literal values beyond just object types, needed to make the bless() idioms work. The interface is in flux and has changed between each version. The fourth one was pretty good but has essentially no users, so I kind of ignored the whole mess for a while. The fifth version makes the thing more tolerant of closures but it still doesn't do the right thing. Some constant expressions were stumping it (duh). There were some fixes that really didn't seem right to me... it didn't seem to be able to cope with untyped function calls at all, but I was certain that it just ignored all untyped stuff. Another thing that was confusing it was functions that were exported into the checked namespace from other modules -- it should have been ignoring those, and now it is. Some more places uses PVX rather than sv to get strings without meta-information tacked on after nulls. I forget what they were (it happened a few months ago) but there were some other fixes for 5.8.8. I didn't get to clean up (and spell check) the docuemntation or add new features in this release, but a new release was over due.

OTHER MODULES

Class::Contract by Damian Conway. Let me know if you notice any others. Class::Contract only examines type safety on arguments to and from method calls. It doesn't delve into the inner workings of a method to make sure that types are handled correctly in there. This module covers the same turf, but with less syntax and less bells and whistles. This module is more natural to use, in my opinion.

To the best of my knowledge, no other module attempts to do what this modules, er, attempts to do.

Object::PerlDesignPatterns by myself. Documentation. Deals with many concepts surrounding Object Oriented theory, good design, and hotrodding Perl. The current working version is always at http://perldesignpatterns.com.

SEE ALSO

See "Pragmatic Modules" in perlmodlib.

types, by Arthur Bergman - C style type checking on strings, integers, and floats.

http://perldesignpatterns.com/?TypeSafety - look for updated documentation on this module here - this included documentation is sparse - only usage information, bugs, and such are included. The TypeSafety page on http://perldesignpatterns.com, on the other hand, is an introduction and tutorial to the ideas.

http://www.c2.com/cgi/wiki?NoseJobRefactoring - an extreme case of the utility of strong types.

Class::Contract, by Damian Conway

Attribute::Types, by Damian Conway

Sub::Parameters, by Richard Clamp

Object::PerlDesignPatterns, by myself.

The realtest.pl file that comes with this distribution demonstrates exhaustively everything that is allowed and everything that is not.

The source code. At the top of the .pm file is a list of outstanding issues, things that I want to do in the future, and things that have been knocked down. At the bottom of the .pm file is a whole bunch of comments, documentation, and such.

http://perldesignpatterns.com/?PerlAssembly - typesafety.pm works by examining the bytecode tree of the compiled program. This bytecode is known as "B", for whatever reason. I'm learning it as I write this, and as I write this, I'm documenting it (talk about multitasking!) The PerlAssembly page has links to other resources I've found around the net, too.

http://perl.plover.com/yak/typing/ - Mark Jason Dominus did an excellent presentation and posted the slides and notes. His description on of Ocaml's type system was the inspiration for our handling of arrays.

AUTHOR

Scott Walters - scott@slowass.net

COPYRIGHT

Distribute under the same terms as Perl itself. Copyright 2003 Scott Walters. Some rights reserved.