Zakariyya Mughal

NAME

Image::Leptonica::Func::recogident

VERSION

version 0.04

recogident.c

  recogident.c

      Top-level identification
         l_int32             recogaIdentifyMultiple()

      Segmentation and noise removal
         l_int32             recogSplitIntoCharacters()
         l_int32             recogCorrelationBestRow()
         l_int32             recogCorrelationBestChar()
         static l_int32      pixCorrelationBestShift()

      Low-level identification of single characters
         l_int32             recogaIdentifyPixa()
         l_int32             recogIdentifyPixa()
         l_int32             recogIdentifyPix()
         l_int32             recogSkipIdentify()

      Operations for handling identification results
         static L_RCHA      *rchaCreate()
         l_int32            *rchaDestroy()
         static L_RCH       *rchCreate()
         l_int32            *rchDestroy()
         l_int32             rchaExtract()
         l_int32             rchExtract()
         static l_int32      transferRchToRcha()
         static l_int32      recogaSaveBestRcha()
         static l_int32      recogaTransferRch()
         l_int32             recogTransferRchToDid()

      Preprocessing and filtering
         l_int32             recogProcessToIdentify()
         PIX                *recogPreSplittingFilter()
         PIX                *recogSplittingFilter()

      Postprocessing
         SARRAY             *recogExtractNumbers()

      Modifying recog behavior
         l_int32             recogSetTemplateType()
         l_int32             recogSetScaling()

      Static debug helper
         static void         l_showIndicatorSplitValues()

  See recogbasic.c for examples of training a recognizer, which is
  required before it can be used for identification.

  The character splitter repeatedly does a greedy correlation with each
  averaged unscaled template, at all pixel locations along the text to
  be identified.  The vertical alignment is between the template
  centroid and the (moving) windowed centroid, including a delta of
  1 pixel above and below.  The best match then removes part of the
  input image, leaving 1 or 2 pieces, which, after filtering,
  are put in a queue.  The process ends when the queue is empty.
  The filtering is based on the size and aspect ratio of the
  remaining pieces; the intent is to remove anything that is
  unlikely to be text, such as small pieces and line graphics.

  After splitting, the selected segments are identified using
  the input parameters that were initially specified for the
  recognizer.  Unlike the splitter, which uses the averaged
  templates from the unscaled input, the recognizer can use
  either all training examples or averaged templates, and these
  can be either scaled or unscaled.  These choices are specified
  when the recognizer is constructed.

FUNCTIONS

rchDestroy

void rchDestroy ( L_RCH **prch )

  rchDestroy()

      Input:  &rch
      Return: void

rchaDestroy

void rchaDestroy ( L_RCHA **prcha )

  rchaDestroy()

      Input:  &rcha
      Return: void

rchaExtract

l_int32 rchaExtract ( L_RCHA *rcha, NUMA **pnaindex, NUMA **pnascore, SARRAY **psatext, NUMA **pnasample, NUMA **pnaxloc, NUMA **pnayloc, NUMA **pnawidth )

  rchaExtract()

      Input:  rcha
              &naindex (<optional return> indices of best templates)
              &nascore (<optional return> correl scores of best templates)
              &satext (<optional return> character strings of best templates)
              &nasample (<optional return> indices of best samples)
              &naxloc (<optional return> x-locations of templates)
              &nayloc (<optional return> y-locations of templates)
              &nawidth (<optional return> widths of best templates)
      Return: 0 if OK, 1 on error

  Notes:
      (1) This returns clones of the number and string arrays.  They must
          be destroyed by the caller.

recogCorrelationBestRow

l_int32 recogCorrelationBestRow ( L_RECOG *recog, PIX *pixs, BOXA **pboxa, NUMA **pnascore, NUMA **pnaindex, SARRAY **psachar, l_int32 debug )

  recogCorrelationBestRow()

      Input:  recog (with LUT's pre-computed)
              pixs (typically of multiple touching characters, 1 bpp)
              &boxa (<return> bounding boxs of best fit character)
              &nascores (<optional return> correlation scores)
              &naindex (<optional return> indices of classes)
              &sachar (<optional return> array of character strings)
              debug (1 for results written to pixadb_split)
      Return: 0 if OK, 1 on error

  Notes:
      (1) Supervises character matching for (in general) a c.c with
          multiple touching characters.  Finds the best match greedily.
          Rejects small parts that are left over after splitting.
      (2) Matching is to the average, and without character scaling.

recogIdentifyPix

l_int32 recogIdentifyPix ( L_RECOG *recog, PIX *pixs, PIX **ppixdb )

  recogIdentifyPix()

      Input:  recog (with LUT's pre-computed)
              pixs (of a single character, 1 bpp)
              &pixdb (<optional return> debug pix showing input and best fit)
      Return: 0 if OK, 1 on error

  Notes:
      (1) Basic recognition function for a single character.
      (2) If L_USE_AVERAGE, the matching is only to the averaged bitmaps,
          and the index of the sample is meaningless (0 is returned
          if requested).
      (3) The score is related to the confidence (probability of correct
          identification), in that a higher score is correlated with
          a higher probability.  However, the actual relation between
          the correlation (score) and the probability is not known;
          we call this a "score" because "confidence" can be misinterpreted
          as an actual probability.

recogIdentifyPixa

l_int32 recogIdentifyPixa ( L_RECOG *recog, PIXA *pixa, NUMA *naid, PIX **ppixdb )

  recogIdentifyPixa()

      Input:  recog
              pixa (of 1 bpp images to match)
              naid (<optional> indices of components to identify; can be null)
              &pixdb (<optional return> pix showing inputs and best fits)
      Return: 0 if OK, 1 on error

  Notes:
      (1) See recogIdentifyPix().  This does the same operation
          for each pix in a pixa, and optionally returns the arrays
          of results (scores, class index and character string)
          for the best correlation match.

recogPreSplittingFilter

PIX * recogPreSplittingFilter ( L_RECOG *recog, PIX *pixs, l_float32 maxasp, l_float32 minaf, l_float32 maxaf, l_int32 debug )

  recogPreSplittingFilter()

      Input:  recog
              pixs (1 bpp, single connected component)
              maxasp (maximum asperity ratio (width/height) to be retained)
              minaf (minimum area fraction (|fg|/(w*h)) to be retained)
              maxaf (maximum area fraction (|fg|/(w*h)) to be retained)
              debug (1 to output indicator arrays)
      Return: pixd (with filtered components removed) or null on error

recogProcessToIdentify

PIX * recogProcessToIdentify ( L_RECOG *recog, PIX *pixs, l_int32 pad )

  recogProcessToIdentify()

      Input:  recog (with LUT's pre-computed)
              pixs (typ. single character, possibly d > 1 and uncropped)
              pad (extra pixels added to left and right sides)
      Return: pixd (1 bpp, clipped to foreground), or null on error.

  Notes:
      (1) This is a lightweight operation to insure that the input
          image is 1 bpp, properly cropped, and padded on each side.
          If bpp > 1, the image is thresholded.

recogSetScaling

l_int32 recogSetScaling ( L_RECOG *recog, l_int32 scalew, l_int32 scaleh )

  recogSetScaling()

      Input:  recog
              scalew  (scale all widths to this; use 0 for no scaling)
              scaleh  (scale all heights to this; use 0 for no scaling)
      Return: 0 if OK, 1 on error

recogSetTemplateType

l_int32 recogSetTemplateType ( L_RECOG *recog, l_int32 templ_type )

  recogSetTemplateType()

      Input:  recog
              templ_type (L_USE_AVERAGE or L_USE_ALL)
      Return: 0 if OK, 1 on error

recogSkipIdentify

l_int32 recogSkipIdentify ( L_RECOG *recog )

  recogSkipIdentify()

      Input:  recog
      Return: 0 if OK, 1 on error

  Notes:
      (1) This just writes a "dummy" result with 0 score and empty
          string id into the rch.

recogSplitIntoCharacters

l_int32 recogSplitIntoCharacters ( L_RECOG *recog, PIX *pixs, l_int32 minw, l_int32 minh, BOXA **pboxa, PIXA **ppixa, NUMA **pnaid, l_int32 debug )

  recogSplitIntoCharacters()

      Input:  recog
              pixs (1 bpp, contains only mostly deskewed text)
              minw (remove components with width less than this;
                    use -1 for default removing out of band components)
              minh (remove components with height less than this;
                    use -1 for default removing out of band components)
              &boxa (<return> character bounding boxes)
              &pixa (<return> character images)
              &naid (<return> indices of components to identify)
              debug (1 for results written to pixadb_split)

      Return: 0 if OK, 1 on error

  Notes:
      (1) This can be given an image that has an arbitrary number
          of text characters.  It does splitting of connected
          components based on greedy correlation matching in
          recogCorrelationBestRow().  The returned pixa includes
          the boxes from which the (possibly split) components
          are extracted.
      (2) If either @minw < 0 or @minh < 0, noise components are
          filtered out, and the returned @naid array is all 1.
          Otherwise, some noise components whose dimensions (w,h)
          satisfy w >= @minw and h >= @minh are allowed through, but
          they are identified in the returned @naid, where they are
          labelled by 0 to indicate that they are not to be run
          through identification.  Retaining the noise components
          provides spatial information that can help applications
          interpret the results.
      (3) In addition to optional filtering of the noise, the
          resulting components are put in row-major (2D) order,
          and the smaller of overlapping components are removed if
          they satisfy conditions of relative size and fractional overlap.
      (4) Note that the spliting function uses unscaled templates
          and does not bother returning the class results and scores.
          Thes are more accurately found later using the scaled templates.

recogSplittingFilter

l_int32 recogSplittingFilter ( L_RECOG *recog, PIX *pixs, l_float32 maxasp, l_float32 minaf, l_float32 maxaf, l_int32 *premove, l_int32 debug )

  recogSplittingFilter()

      Input:  recog
              pixs (1 bpp, single connected component)
              maxasp (maximum asperity ratio (width/height) to be retained)
              minaf (minimum area fraction (|fg|/(w*h)) to be retained)
              maxaf (maximum area fraction (|fg|/(w*h)) to be retained)
              &remove (<return> 0 to save, 1 to remove)
              debug (1 to output indicator arrays)
      Return: 0 if OK, 1 on error

  Notes:
      (1) We don't want to eliminate sans serif characters like "1" or "l",
          so we use the filter condition requiring both a large area fill
          and a w/h ratio > 1.0.

recogaExtractNumbers

SARRAY * recogaExtractNumbers ( L_RECOGA *recoga, BOXA *boxas, l_float32 scorethresh, l_int32 spacethresh, BOXAA **pbaa, NUMAA **pnaa )

  recogaExtractNumbers()

      Input:  recog
              boxas (location of components)
              scorethresh (min score for which we accept a component)
              spacethresh (max horizontal distance allowed between digits,
                           use -1 for default)
              &baa (<optional return> bounding boxes of identified numbers)
              &naa (<optional return> scores of identified digits)
      Return: sa (of identified numbers), or null on error

  Notes:
      (1) Each string in the returned sa contains a sequence of ascii
          digits in a number.
      (2) The horizontal distance between boxes (limited by @spacethresh)
          is the negative of the horizontal overlap.
      (3) We allow two digits to be combined if these conditions apply:
            (a) the first is to the left of the second
            (b) the second has a horizontal separation less than @spacethresh
            (c) the vertical overlap >= 0 (vertical separation < 0)
            (d) both have a score that exceeds @scorethresh
      (4) Each numa in the optionally returned naa contains the digit
          scores of a number.  Each boxa in the optionally returned baa
          contains the bounding boxes of the digits in the number.
      (5) Components with a score less than @scorethresh, which may
          be hyphens or other small characters, will signal the
          end of the current sequence of digits in the number.

recogaIdentifyMultiple

l_int32 recogaIdentifyMultiple ( L_RECOGA *recoga, PIX *pixs, l_int32 nitems, l_int32 minw, l_int32 minh, BOXA **pboxa, PIXA **ppixa, PIX **ppixdb, l_int32 debugsplit )

  recogaIdentifyMultiple()

      Input:  recoga (with training finished)
              pixs (containing typically a small number of characters)
              nitems (to be identified in pix; use 0 if not known)
              minw (remove components with width less than this;
                    use -1 for removing all noise components)
              minh (remove components with height less than this;
                    use -1 for removing all noise components)
              &boxa (<optional return> locations of identified components)
              &pixa (<optional return> images of identified components)
              &pixdb (<optional return> debug pix: inputs and best fits)
              debugsplit (1 returns pix split debugging images)
      Return: 0 if OK; 1 if more or less than nitems were found (a warning);
              2 on error.

  Notes:
      (1) This filters the input pixa, looking for @nitems if requested.
          Set @nitems == 0 if you don't know how many chars to expect.
      (2) This bundles the filtered components into a pixa and calls
          recogIdentifyPixa().  If @nitems > 0, use @minw = -1 and
          @minh = -1 to remove all noise components.  If @nitems > 0
          and it doesn't agree with the number of filtered components
          in pixs, a warning is issued and a 1 is returned.
      (3) Set @minw = 0 and @minh = 0 to get all noise components.
          Set @minw > 0 and/or @minh > 0 to retain selected noise components.
          All noise components are recognized as an empty string with
          a score of 0.0.
      (4) An attempt is made to return 2-dimensional sorted arrays
          of (optional) images and boxes, which can then be used to
          aggregate identified characters into numbers or words.
          One typically wants the pixa, which contains a boxa of the
          extracted subimages.

recogaIdentifyPixa

l_int32 recogaIdentifyPixa ( L_RECOGA *recoga, PIXA *pixa, NUMA *naid, PIX **ppixdb )

  recogaIdentifyPixa()

      Input:  recoga
              pixa (of 1 bpp images to match)
              naid (<optional> indices of components to identify; can be null)
              &pixdb (<optional return> pix showing inputs and best fits)
      Return: 0 if OK, 1 on error

  Notes:
      (1) See recogIdentifyPixa().  This does the same operation
          for each recog, returning the arrays of results (scores,
          class index and character string) for the best correlation match.

AUTHOR

Zakariyya Mughal <zmughal@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Zakariyya Mughal.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.