NAME

Image::Leptonica::Func::psio1

VERSION

version 0.04

psio1.c

  psio1.c

    |=============================================================|
    |                         Important note                      |
    |=============================================================|
    | Some of these functions require libtiff, libjpeg and libz.  |
    | If you do not have these libraries, you must set            |
    |     #define  USE_PSIO     0                                 |
    | in environ.h.  This will link psio1stub.c                   |
    |=============================================================|

     This is a PostScript "device driver" for wrapping images
     in PostScript.  The images can be rendered by a PostScript
     interpreter for viewing, using evince or gv.  They can also be
     rasterized for printing, using gs or an embedded interpreter
     in a PostScript printer.  And they can be converted to a pdf
     using gs (ps2pdf).

     Convert specified files to PS
          l_int32          convertFilesToPS()
          l_int32          sarrayConvertFilesToPS()
          l_int32          convertFilesFittedToPS()
          l_int32          sarrayConvertFilesFittedToPS()
          l_int32          writeImageCompressedToPSFile()

     Convert mixed text/image files to PS
          l_int32          convertSegmentedPagesToPS()
          l_int32          pixWriteSegmentedPageToPS()
          l_int32          pixWriteMixedToPS()

     Convert any image file to PS for embedding
          l_int32          convertToPSEmbed()

     Write all images in a pixa out to PS
          l_int32          pixaWriteCompressedToPS()

  These PostScript converters are used in three different ways.

  (1) For embedding a PS file in a program like TeX.
      convertToPSEmbed() handles this for levels 1, 2 and 3 output,
      and prog/converttops wraps this in an executable.
      converttops is a generalization of Thomas Merz's jpeg2ps wrapper,
      in that it works for all types (formats, depth, colormap)
      of input images and gives PS output in one of these formats
        * level 1 (uncompressed)
        * level 2 (compressed ccittg4 or dct)
        * level 3 (compressed flate)

  (2) For composing a set of pages with any number of images
      painted on them, in either level 2 or level 3 formats.

  (3) For printing a page image or a set of page images, at a
      resolution that optimally fills the page, using
      convertFilesFittedToPS().

  The top-level calls of utilities in category 2, which can compose
  multiple images on a page, and which generate a PostScript file for
  printing or display (e.g., conversion to pdf), are:
      convertFilesToPS()
      convertFilesFittedToPS()
      convertSegmentedPagesToPS()

  All images are output with page numbers.  Bounding box hints are
  more subtle.  They must be included for embeding images in
  TeX, for example, and the low-level writers include bounding
  box hints by default.  However, these hints should not be included for
  multi-page PostScript that is composed of a sequence of images;
  consequently, they are not written when calling higher level
  functions such as convertFilesToPS(), convertFilesFittedToPS()
  and convertSegmentedPagesToPS().  The function l_psWriteBoundingBox()
  sets a flag to give low-level control over this.

FUNCTIONS

convertFilesFittedToPS

l_int32 convertFilesFittedToPS ( const char *dirin, const char *substr, l_float32 xpts, l_float32 ypts, const char *fileout )

  convertFilesFittedToPS()

      Input:  dirin (input directory)
              substr (<optional> substring filter on filenames; can be NULL)
              xpts, ypts (desired size in printer points; use 0 for default)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This generates a PS file for all files in a specified directory
          that contain the substr pattern to be matched.
      (2) Each image is written to a separate page in the output PS file.
      (3) All images are written compressed:
              * if tiffg4  -->  use ccittg4
              * if jpeg    -->  use dct
              * all others -->  use flate
          If the image is jpeg or tiffg4, we use the existing compressed
          strings for the encoding; otherwise, we read the image into
          a pix and flate-encode the pieces.
      (4) The resolution is internally determined such that the images
          are rendered, in at least one direction, at 100% of the given
          size in printer points.  Use 0.0 for xpts or ypts to get
          the default value, which is 612.0 or 792.0, rsp.
      (5) The size of the PostScript file is independent of the resolution,
          because the entire file is encoded.  The @xpts and @ypts
          parameter tells the PS decomposer how to render the page.

convertFilesToPS

l_int32 convertFilesToPS ( const char *dirin, const char *substr, l_int32 res, const char *fileout )

  convertFilesToPS()

      Input:  dirin (input directory)
              substr (<optional> substring filter on filenames; can be NULL)
              res (typ. 300 or 600 ppi)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This generates a PS file for all image files in a specified
          directory that contain the substr pattern to be matched.
      (2) Each image is written to a separate page in the output PS file.
      (3) All images are written compressed:
              * if tiffg4  -->  use ccittg4
              * if jpeg    -->  use dct
              * all others -->  use flate
          If the image is jpeg or tiffg4, we use the existing compressed
          strings for the encoding; otherwise, we read the image into
          a pix and flate-encode the pieces.
      (4) The resolution is often confusing.  It is interpreted
          as the resolution of the output display device:  "If the
          input image were digitized at 300 ppi, what would it
          look like when displayed at res ppi."  So, for example,
          if res = 100 ppi, then the display pixels are 3x larger
          than the 300 ppi pixels, and the image will be rendered
          3x larger.
      (5) The size of the PostScript file is independent of the resolution,
          because the entire file is encoded.  The res parameter just
          tells the PS decomposer how to render the page.  Therefore,
          for minimum file size without loss of visual information,
          if the output res is less than 300, you should downscale
          the image to the output resolution before wrapping in PS.
      (6) The "canvas" on which the image is rendered, at the given
          output resolution, is a standard page size (8.5 x 11 in).

convertSegmentedPagesToPS

l_int32 convertSegmentedPagesToPS ( const char *pagedir, const char *pagestr, const char *maskdir, const char *maskstr, l_int32 numpre, l_int32 numpost, l_int32 maxnum, l_float32 textscale, l_float32 imagescale, l_int32 threshold, const char *fileout )

  convertSegmentedPagesToPS()

      Input:  pagedir (input page image directory)
              pagestr (<optional> substring filter on page filenames;
                       can be NULL)
              maskdir (input mask image directory)
              maskstr (<optional> substring filter on mask filenames;
                       can be NULL)
              numpre (number of characters in name before number)
              numpost (number of characters in name after number)
              maxnum (only consider page numbers up to this value)
              textscale (scale of text output relative to pixs)
              imagescale (scale of image output relative to pixs)
              threshold (for binarization; typ. about 190; 0 for default)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This generates a PS file for all page image and mask files in two
          specified directories and that contain the page numbers as
          specified below.  The two directories can be the same, in which
          case the page and mask files are differentiated by the two
          substrings for string matches.
      (2) The page images are taken in lexicographic order.
          Mask images whose numbers match the page images are used to
          segment the page images.  Page images without a matching
          mask image are scaled, thresholded and rendered entirely as text.
      (3) Each PS page is generated as a compressed representation of
          the page image, where the part of the image under the mask
          is suitably scaled and compressed as DCT (i.e., jpeg), and
          the remaining part of the page is suitably scaled, thresholded,
          compressed as G4 (i.e., tiff g4), and rendered by painting
          black through the resulting text mask.
      (4) The scaling is typically 2x down for the DCT component
          (@imagescale = 0.5) and 2x up for the G4 component
          (@textscale = 2.0).
      (5) The resolution is automatically set to fit to a
          letter-size (8.5 x 11 inch) page.
      (6) Both the DCT and the G4 encoding are PostScript level 2.
      (7) It is assumed that the page number is contained within
          the basename (the filename without directory or extension).
          @numpre is the number of characters in the basename
          preceeding the actual page numer; @numpost is the number
          following the page number.  Note: the same numbers must be
          applied to both the page and mask image names.
      (8) To render a page as is -- that is, with no thresholding
          of any pixels -- use a mask in the mask directory that is
          full size with all pixels set to 1.  If the page is 1 bpp,
          it is not necessary to have a mask.

convertToPSEmbed

l_int32 convertToPSEmbed ( const char *filein, const char *fileout, l_int32 level )

  convertToPSEmbed()

      Input:  filein (input image file -- any format)
              fileout (output ps file)
              level (compression: 1 (uncompressed), 2 or 3)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This is a wrapper function that generates a PS file with
          a bounding box, from any input image file.
      (2) Do the best job of compression given the specified level.
          @level=3 does flate compression on anything that is not
          tiffg4 (1 bpp) or jpeg (8 bpp or rgb).
      (3) If @level=2 and the file is not tiffg4 or jpeg, it will
          first be written to file as jpeg with quality = 75.
          This will remove the colormap and cause some degradation
          in the image.
      (4) The bounding box is required when a program such as TeX
          (through epsf) places and rescales the image.  It is
          sized for fitting the image to an 8.5 x 11.0 inch page.

pixWriteMixedToPS

l_int32 pixWriteMixedToPS ( PIX *pixb, PIX *pixc, l_float32 scale, l_int32 pageno, const char *fileout )

  pixWriteMixedToPS()

      Input:  pixb (<optionall> 1 bpp "mask"; typically for text)
              pixc (<optional> 8 or 32 bpp image regions)
              scale (relative scale factor for rendering pixb
                    relative to pixc; typ. 4.0)
              pageno (page number in set; use 1 for new output file)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This low level function generates the PS string for a mixed
          text/image page, and adds it to an existing file if
          @pageno > 1.
      (2) The two images (pixb and pixc) are typically generated at the
          resolution that they will be rendered in the PS file.
      (3) pixb is the text component.  In the PostScript world, we think of
          it as a mask through which we paint black.
      (4) pixc is the (typically halftone) image component.  It is
          white in the rest of the page.  To minimize the size of the
          PS file, it should be rendered at a resolution that is at
          least equal to its actual resolution.
      (5) @scale gives the ratio of resolution of pixb to pixc.
          Typical resolutions are: 600 ppi for pixb, 150 ppi for pixc;
          so @scale = 4.0.  If one of the images is not defined,
          the value of @scale is ignored.
      (6) We write pixc with DCT compression (jpeg).  This is followed
          by painting the text as black through the mask pixb.  If
          pixc doesn't exist (alltext), we write the text with the
          PS "image" operator instead of the "imagemask" operator,
          because ghostscript's ps2pdf is flaky when the latter is used.
      (7) The actual output resolution is determined by fitting the
          result to a letter-size (8.5 x 11 inch) page.

pixWriteSegmentedPageToPS

l_int32 pixWriteSegmentedPageToPS ( PIX *pixs, PIX *pixm, l_float32 textscale, l_float32 imagescale, l_int32 threshold, l_int32 pageno, const char *fileout )

  pixWriteSegmentedPageToPS()

      Input:  pixs (all depths; colormap ok)
              pixm (<optional> 1 bpp segmentation mask over image region)
              textscale (scale of text output relative to pixs)
              imagescale (scale of image output relative to pixs)
              threshold (threshold for binarization; typ. 190)
              pageno (page number in set; use 1 for new output file)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This generates the PS string for a mixed text/image page,
          and adds it to an existing file if @pageno > 1.
          The PS output is determined by fitting the result to
          a letter-size (8.5 x 11 inch) page.
      (2) The two images (pixs and pixm) are at the same resolution
          (typically 300 ppi).  They are used to generate two compressed
          images, pixb and pixc, that are put directly into the output
          PS file.
      (3) pixb is the text component.  In the PostScript world, we think of
          it as a mask through which we paint black.  It is produced by
          scaling pixs by @textscale, and thresholding to 1 bpp.
      (4) pixc is the image component, which is that part of pixs under
          the mask pixm.  It is scaled from pixs by @imagescale.
      (5) Typical values are textscale = 2.0 and imagescale = 0.5.
      (6) If pixm == NULL, the page has only text.  If it is all black,
          the page is all image and has no text.
      (7) This can be used to write a multi-page PS file, by using
          sequential page numbers with the same output file.  It can
          also be used to write separate PS files for each page,
          by using different output files with @pageno = 0 or 1.

pixaWriteCompressedToPS

l_int32 pixaWriteCompressedToPS ( PIXA *pixa, const char *fileout, l_int32 res, l_int32 level )

  pixaWriteCompressedToPS()

      Input:  pixa (any set of images)
              fileout (output ps file)
              res (of input image)
              level (compression: 2 or 3)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This generates a PS file of multiple page images, all
          with bounding boxes.
      (2) It compresses to:
              cmap + level2:        jpeg
              cmap + level3:        flate
              1 bpp:                tiffg4
              2 or 4 bpp + level2:  jpeg
              2 or 4 bpp + level3:  flate
              8 bpp:                jpeg
              16 bpp:               flate
              32 bpp:               jpeg
      (3) To generate a pdf, use: ps2pdf <infile.ps> <outfile.pdf>

sarrayConvertFilesFittedToPS

l_int32 sarrayConvertFilesFittedToPS ( SARRAY *sa, l_float32 xpts, l_float32 ypts, const char *fileout )

  sarrayConvertFilesFittedToPS()

      Input:  sarray (of full path names)
              xpts, ypts (desired size in printer points; use 0 for default)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) See convertFilesFittedToPS()

sarrayConvertFilesToPS

l_int32 sarrayConvertFilesToPS ( SARRAY *sa, l_int32 res, const char *fileout )

  sarrayConvertFilesToPS()

      Input:  sarray (of full path names)
              res (typ. 300 or 600 ppi)
              fileout (output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) See convertFilesToPS()

writeImageCompressedToPSFile

l_int32 writeImageCompressedToPSFile ( const char *filein, const char *fileout, l_int32 res, l_int32 *pfirstfile, l_int32 *pindex )

  writeImageCompressedToPSFile()

      Input:  filein (input image file)
              fileout (output ps file)
              res (output printer resolution)
              &firstfile (<input and return> 1 if the first image;
                          0 otherwise)
              &index (<input and return> index of image in output ps file)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This wraps a single page image in PS.
      (2) The input file can be in any format.  It is compressed as follows:
             * if in tiffg4  -->  use ccittg4
             * if in jpeg    -->  use dct
             * all others    -->  use flate
      (3) Before the first call, set @firstpage = 1.  After writing
          the first page, it will be set to 0.
      (4) @index is incremented if the page is successfully written.

AUTHOR

Zakariyya Mughal <zmughal@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Zakariyya Mughal.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.