Automatic vectorization and text recognition

Brief description: Automatic vectorization is a procedure for converting raster data to appropriate vector objects.
WiseImage recognizes the following objects when vectorizing: lines, arcs, circles, hatches, points, texts and symbols. Raster curves and filled contours are approximated with vector polylines.
The program can recognize line styles and arrows on line and arc ends.
You can train the program to recognize new symbols and texts.
You can round vector object widths to specified values. You can also place vector objects corresponding to raster lines of various widths on different layers and/or assign different colors to them.

Tuning Vectorization

You can use one of the pre-defined templates or tune the parameters yourself. Vectorization settings can be saved as a template for further use.
Automatic vectorization is tuned in the Conversion Options dialog.
To open this dialog, choose Conversion Options from the rConvert menu.

Select the objects to recognize:

Open the Recognition tab of the Conversion Options dialog.

Open the Recognition tab of the R2VConversion Options dialog.

  • Select the entities, which you want to obtain after vectorization - select the corresponding checkboxes.
  • The additional parameters for recognized objects, such as line type, arrows, hatch angle, and others are located on the second level. Click on " +" to get access to these parameters.

Points
WiseImage will recognize a raster point with a size not less than 2x2 pixels (objects smaller than this are considered as speckles and ignored), and vertical and horizontal extents not greater than the Max Width value.
Do not switch this mode on when recognizing images of poor quality or images containing many speckles, since speckles can be recognized as points.

Lines
Turns on the algorithm for line recognition. As a result LINE entities are created.

Arcs and Circles
Turns on the algorithm for recognizing raster circles and arcs.

Polylines
This algorithm approximates central lines of raster objects with polylines (POLYLINE). The algorithm creates polylines consisting of straight segments only. You can use this algorithm individually (or with the Outlines algorithm) for vectorization of maps and other images consisting of arbitrary lines.

Outlines
This algorithm is used to approximate outlines of flooded areas by polylines. This algorithm creates closed polylines, approximating the boundaries of raster objects. The boundary polylines consist only of straight segments.
The algorithm traces raster lines whose width is greater than the value of the Min width parameter. To obtain outlines of all raster objects, you need to choose outlines with the Max Width parameter to 0.

Hatches
Selects the algorithm for hatch recognition. WiseImage Pro recognizes simple raster hatches and creates AutoCAD blocks, consisting of line segments. The Hatches algorithm searches for hatches only if the Lines algorithm is also on.

Symbols
Selects the algorithm for raster symbol recognition by searching specified samples. The algorithm only works if the algorithms Lines and Arcs and Circles are also on.

Setting the geometry of a vectorized drawing:

Min Length -minimum length of a raster object to be recognized. (For example, enter 1 mm in the Min Length field so that raster fragments smaller than 1 mm will not be recognized).

Max Width - maximum width of raster lines. Set the value of this parameter slightly greater than the measured line width on your drawing.

Max Break – the maximum accepted length of a break in a raster line to be ignored. Set the value of this parameter slightly greater than the distance between the dashes in dashed lines or the broken distance in poor quality lines.

Text Height – Set the value for this parameter equal to the maximum height of upper case raster text symbols.

Accuracy – this parameter corresponds to the accuracy of your raster image. Use higher accuracy values for images of good quality, and lower accuracy values for images of poor quality.

Orthogonalization – Select this checkbox to obtain orthogonal vector lines as a result of vectorizing raster lines, which deviate from the orthogonal direction by no more than 2°. You can also align vector objects to a specified base angle.

Separating vector objects by layer and/or by color

The criteria for separating vector objects by layer and/or color is the width of the original raster lines.

In the Separate tab of the R2V Conversion Options dialog, you can specify for resulting vector objects:

  • Width
  • Color
  • Layer

How to vectorize

  • Set the vectorization parameters.
    Information is provided in section Tuning Vectorization.
  • If you have several images in your document, select the one(s) to vectorize.
    You can also vectorize a selected image fragment or image clip.
  • Choose Raster2Vector from the Conversion menu.       

Text Recognition

This section provides information on various algorithms for recognizing raster text, and the procedure and parameters of setting up raster text recognition.

Text Recognition Algorithms

You can use the following algorithms for working with raster texts. You can choose them using the Recognition tab of the R2V Conversion Options dialog.

None - this algorithm searches for raster text areas without vectorizing.

Text Areas - this algorithm recognizes and creates text areas. You can enter text information in these text areas using a separate utility.

Polylines (Outlines) - approximating the raster text with vector polylines (outline).

OCR - recognizing the raster text and creating the corresponding text objects.

Text Recognition

  • Select the Text Area checkbox in the Recognition tab of the R2V Conversion Options dialog.
  • Choose your required algorithm for working with the raster text.
  • In the Options tab of the R2V Conversion Options dialog set the Text Height value equal to the maximum height of raster text symbols of upper case.
  • Tune the text recognition options in the Texts tab of the R2V Conversion Option dialog.
    See section Text Recognition Options for detailed information.

Editing recognized texts

The procedure of editing recognized texts (text areas) is used after recognition with the OCR module or with Text Areas recognition.

After applying automatic vectorization choose the Edit OCR Texts command from the rConvert menu. The program displays the first recognized text fragment (area) in the Text Correction dialog.

If necessary you can edit the text area contents. To accept the text and move on to the next one press Accept Recognized Text. To delete the current text press Delete OCR Text.

Text Recognition Options

Orientation - choose the orientation for raster texts contained in the image.

Overlapped by Graphics - if this option is on, the program searches for raster texts, crossed with other raster objects.
TIPS: It is not recommended to use this mode when working with complicated documents to avoid possible mistakes, such as incorrect recognition of small graphic objects as texts.

Standalone Letters - allows searching for standalone text characters.
TIPS: If this option is off, the program does not search for standalone text characters, but you can avoid incorrect recognition of small objects.

Patterns - If you use the OCR module, you should set patterns for text inscriptions contained in the raster document to obtain better recognition results. Select the Patterns checkbox.
If Patterns are not specified, the program uses a set of standard patterns.
See section Text Patterns for detailed information.

Height Table – if after vectorizing you want to obtain texts of specific height(s), enter the desired text height(s) in the table and select the Height Table checkbox. If you specify several height values, texts height will be rounded to the nearest value from the list.

Template file – this file is used for storing topology models of text characters, which are used when performing OCR.

You can also train the program to recognize other text characters or different forms of characters contained in the standard template file.
See section Training OCR for detailed information.

Setting Text Patterns

Text patterns can comprise of fixed and variable parts.
Here is formal description of word pattern definition:

" [% [length]character type] II [ letter] ]…"

length - number of letters (you do not need to specify this, if length will be different),

character type - type of letter sequence

Types of letters are presented in the following table:

D

Digits

N

Capital letters of national alphabet

n

Small letters of national alphabet

E

Capital letters of Latin alphabet

e

Small letters of Latin alphabet

S

Special characters (signs of plus, minus, equality, degree, and others)

For example, in texts 5V, 220V, 13.8V:
5,220,13.8 - variable part,
V - fixed part.
Therefore, the pattern can be defined as %DV. The length of variable parts is different, so it is not specified.

Examples:

Symbol sequence

Pattern

5;  25;  5559;  22,9

%D

R25; R15;  R13

R%2D

Moscow;  Oslo

%1E%e

project;  design

%n

5V; 220V; 13.8V

%DV

12°, 30°, 45°

%2D%1S

Training OCR

You can train the OCR module to recognize new characters. You can add a new character to one of the standard template libraries or create your own.
TIPS: In some cases it is more convenient to use the standard library (e.g., DEFAULT.OCR) as a base for adding custom characters, having saved it with a new name.

To train the program to recognize a new character or different forms of existing characters:

  • Choose Train OCR from the Conversion menu.
  • Open (create) OCR-file.
  • Enter a character to recognize in the Character field.
  • Select the corresponding character on the raster image using one of the selection buttons.
  • Press "+" to add the new pattern to the OCR-file.
  • Save the OCR-file.

 [ welcome ]   [ hybrid graphics ]   [ quick start ]   [ tutorial ]   [ reference ]   [ about ] 
 
 [ Welcome ]   [ WiseImage for Windows ]   [ WiseImage for AutoCAD ] 


 
top