Björn Höhrmann

NAME

CSS::SAC - SAC CSS parser

SYNOPSIS

  use CSS::SAC qw();
  use My::SACHandler ();
  use My::SACErrors ();

  my $doc_handler = My::SACHandler->new;
  my $err_handler = My::SACErrors->new;
  my $sac = CSS::SAC->new({
                           DocumentHandler => $doc_handler,
                           ErrorHandler    => $err_handler,
                         });

  # generate a stream of events
  $sac->parse({ filename => 'foo.css' });

DESCRIPTION

SAC (Simple API for CSS) is an event-based API much like SAX for XML. If you are familiar with the latter, you should have little trouble getting used to SAC. More information on SAC can be found online at http://www.w3.org/TR/SAC.

CSS having more constructs than XML, core SAC is still more complex than core SAX. However, if you need to parse a CSS style sheet, SAC probably remains the easiest way to get it done.

Most of the spec is presently implemented. The following interfaces are not yet there: Locator, CSSException, CSSParseException, ParserFactory. They may or may not be implemented at a later date (the most likely candidates are the exception classes, for which I still have to find an appropriate model).

Some places differ slightly from what is in the spec. I have tried to keep those to a justified minimum and to flag them correctly.

the CSS::SAC module itself

The Parser class doesn't exist separately, it's defined in CSS::SAC. It doesn't expose the locale interface because we don't localize errors (yet). It also doesn't have parse_style_sheet but rather parse, which is more consistent with other Perl parsing interfaces.

I have added the charset($charset) callback to the DocumentHandler interface. There are valid reasons why it wasn't there (it can be trusted only ever so often, and one should look at the actual encoding instead) but given that it's a token in the grammar, I believe that there should still be a way to access it.

METHODS

  • CSS::SAC->new(\%options) or $sac->new(\%options)

    Constructs a new parser object. The options can be:

     - ConditionFactory and SelectorFactory
        the factory classes used to build selector and condition objects.
        See CSS::SAC::{Condition,Selector}Factory for more details on the
        interfaces those classes must expose.
    
     - DocumentHandler and ErrorHandler
        the handler classes used as sinks for the event stream received
        from a SAC Driver. See CSS::SAC::{Document,Error}Factory for more
        details on the interfaces those classes must expose.

    Methods will be called on whatever it is you pass as values to those options. Thus, you may pass in objects as well as class names (I haven't tested this yet, there may be a problem).

    NOTE: an error handler should implement all callbacks, while a document handler may only implement those it is interested in. There is a default error handler (which dies and warns depending on the type of error) but not default document handler.

  • $sac->ParserVerion or $sac->getParserVerion

    Returns the supported CSS version.

    Requesting this parser's ParserVersion will return the string 'CSS3'. While that is (modulo potential bugs of course) believed to be generally true, several caveats apply:

    To begin with, CSS3 has been modularised, and various modules are at different stages of development. Evolving modules may require evolving this parser. I hesitated between making ParserVersion return CSS2, CSS3-pre, or simply CSS3. I chose the latter because I intend to update it as I become aware of the necessity of changes to accommodate new CSS3 stuff, and because it already supports a number of constructs alien to CSS2 (of which namespaces is imho important enough to justify a CSS3 tag). If you are aware of incompatibilities, please contact me.

    More importantly, it is now considered wrong for a parser to return CSSx as its version and instead it is expected to return an uri corresponding to the uri of the CSS version that it supports. However, there is no uri for CSS3, but instead one uri per module. While this issue hasn't been resolved by the WG, I will stick to returning CSS3. However, the behaviour of this attribute is certain to change in the future, so please avoid relying on it.

  • $cf = $sac->ConditionFactory

  • $sac->ConditionFactory($cf) or $sac->setConditionFactory($cf)

  • $cf = $sac->SelectorFactory

  • $sac->SelectorFactory($sf) or $sac->setSelectorFactory($sf)

  • $cf = $sac->DocumentHandler

  • $sac->DocumentHandler($dh) or $sac->setDocumentHandler($dh)

  • $cf = $sac->ErrorHandler

  • $sac->ErrorHandler($eh) or $sac->setErrorHandler($eh)

    get/set the ConditionFactory, SelectorFactory, DocumentHandler, ErrorHandler that we use

  • $sac->parse(\%options)

  • $sac->parseStyleSheet(\%options)

    parses a style sheet and sends events to the defined handlers. The options that you can use are:

    • string

    • ioref

    • filename

      passes either a string, an open filehandle, or a filename to read the stylesheet from

    • embedded

      tells whether the stylesheet is embedded or not. This is most of the time useless but it will influence the interpretation of @charset rules. The latter being forbidden in embedded style sheets they will generate an ignorable_style_sheet event instead of a charset event if embedded is set to a true value.

  • $sac->parse_rule($string_ref)

  • $sac->parseRule($string_ref)

    parses a rule (with { and }). You probably don't need this one. It returns nothing, but generates the events.

  • $sac->parse_style_declaration($string_ref)

  • $sac->parseStyleDeclaration($string_ref)

    same as parse_rule, but without the { and }. This is useful when you want to parse style declarations embedded using style attributes in HTML, SVG, etc... It returns nothing, but generates the events.

  • $sac->parse_property_value($string_ref)

  • $sac->parsePropertyValue($string_ref)

    parses a property value and returns an array ref of lexical units (see CSS::SAC::LexicalUnit)

  • $sac->parse_priority($string_ref)

  • $sac->parsePriority($string_ref)

    parses a priority and returns true if there is a priority value there.

  • $sac->parse_selector_list($string_ref)

  • $sac->parseSelectors($string_ref)

    parses a list of selectors and returns an array ref of selectors

OTHER METHODS

Methods in this section are of relevance mostly to the internal workings of the parser. I document them here but I don't really consider them part of the interface, and thus may change them if need be. If you are using them directly tell me about it and I will "officialize" them. These have no Java style equivalent.

  • $sac->parse_charset($string_ref)

    parses a charset. It returns nothing, but generates the events.

  • $sac->parse_imports($string_ref)

    parses import rules. It returns nothing, but generates the events.

  • $sac->parse_namespace_declarations($string_ref)

    parses ns declarations. It returns nothing, but generates the events.

  • $sac->parse_medialist($string_ref)

    parses a list of media values and returns that list as an arrayref

  • $sac->parse_comments($string_ref)

    parses as many comments as there are at the beginning of the string. It returns nothing, but generates the events.

  • $sac->parse_simple_selector($string_ref)

    parses a simple selector and returns the selector object

  • $sac->build_condition(\@tokens)

    helper to build conditions (you probably don't want to use this at all...)

CSS::SAC::DefaultErrorHandler

This is pretty much a non package, it is just there to provide the default error handler if you are too lazy to provide one yourself.

All it does is pretty simple. There are three error levels: warning, error, and fatal_error. What it does is warn on the two first and die on the last. Yes, it ain't fancy but then you can plug anything more intelligent into it at any moment.

CSS3 ISSUES

One problem is that I have modelled this parser after existing SAC implementations that do not take into account as much of CSS3 as it is possible to. Some parts of that are trivial, and I have provided support on my own in this module. Other parts though are more important and I believe that coordination between the SAC authors would be beneficial on these points (once the relevant CSS3 modules will have moved to REC).

  • new attribute conditions

    CSS3-selectors introduces a bunch of new things, including new attribute conditions ^= (starts with), $= (ends with) and *= (contains). There are no corresponding constants for conditions, so I suggested SAC_STARTS_WITH_ATTRIBUTE_CONDITION, SAC_ENDS_WITH_ATTRIBUTE_CONDITION, SAC_CONTAINS_ATTRIBUTE_CONDITION.

    Note that these constants have been added, together with the corresponding factory methods. However, they will remain undocumented and considered experimental until some consensus is reached on the matter.

  • :root condition

    The :root token confuses some people because they think it is equivalent to XPath's / root step. That is not so. XPath's root selects "above" the document element. CSS's :root tests whether an element is the document element, there is nothing above a document element. Thus :root on its own is equivalent to *:root. It's a condition, not a selector. E:root matches the E element that is also the document element (if there is one).

    Thus, SAC_ROOT_NODE_SELECTOR does not apply and we need a new SAC_IS_ROOT_CONDITION constant.

    Note that this constant has been added, together with the corresponding factory method. However, it will remain undocumented and considered experimental until some consensus is reached on the matter.

  • other new pseudo-classes

    :empty definitely needs a constant too I'd say.

    Note that this constant has been added, together with the corresponding factory method. However, it will remain undocumented and considered experimental until some consensus is reached on the matter.

  • an+b syntax in positional conditions

    There is new syntax that allows for very customisable positional selecting. PositionalCondition needs to be updated to deal with that.

BUGS

 - the problem with attaching pseudo-elements to elements as
 coselectors. I'm not sure which is the right representation. Don't
 forget to update CSS::SAC::Writer too so that it writes it out
 properly.

 - see Bjoern's list

ACKNOWLEDGEMENTS

 - Bjoern Hoehrmann for his immediate reaction and much valuable
 feedback and suggestions. It's certainly much harder to type with all
 those fingers that all those Mafia padres have cut off, but at least
 I get work done much faster than before. And also those nasty bugs he
 kindly uncovered.

 - Steffen Goeldner for spotting bugs and providing patches.

 - Ian Hickson for very very very kind testing support, and all sorts
 of niceties.

 - Manos Batsis for starting a very long discussion on this that
 eventually deviated into other very interesting topics, and for
 giving me some really weird style sheets to feed into this module.

 - Simon St.Laurent for posting this on xmlhack.com and thus pointing a
 lot of people to this module (as seen in my referer logs).

And of course all the other people that have sent encouragement notes and feature requests.

TODO

 - add a pointer to the SAC W3 page

 - create the Exception classes

 - update PositionalCondition to include logic that can normalize the
 an+n notation and add a method that given a position will return a
 boolean indicating whether it matches the condition.

 - add stringify overloading to all classes so that they may be
 printed directly

 - have parser version return an overloaded object that circumvents the
 current problems

 - add docs on how to write a {Document,Error}Handler, right now there
 is example code in Writer, but it isn't all clearly explained.

 - find a way to make the '-' prefix to properties optional

 - add a filter that switches events to spec names, and that can be used
 directly through an option

 - add DOM-like hasFeature support (in view of SAC 3)

 - prefix all constants with SAC_. Keep the old ones around for a few 
 versions, importable with :old-constants.

 - update docs

AUTHOR

Robin Berjon <robin@knowscape.com>

This module is licensed under the same terms as Perl itself.




Hosting generously
sponsored by Bytemark