Automatic Vectorization, Text Recognition

Brief description: Automatic vectorization is a procedure for converting raster data to appropriate vector objects.
WiseImage recognizes the following objects when vectorizing: lines, arcs, circles, hatches, points, texts and symbols. Raster curves and filled contours are approximated with vector polylines.
The program can recognize line styles and arrows on line and arc ends.
You can train the program to recognize new symbols and texts.
You can round vector object widths to specified values. You can also place vector objects corresponding to raster lines of various widths on different layers and/or assign different colors to them.

Tuning Vectorization

You can use one of the pre-defined templates or tune the parameters yourself.
Vectorization parameters can be saved as a template for further use.
Automatic vectorization is tuned in the R2VConversion Options dialog.
To open this dialog, choose Conversion Options from the rConvert menu.

Select objects to recognize:

Open the Recognition tab of the R2VConversion Options dialog.

  • Select the entities, which you want to obtain after vectorization - select the corresponding checkboxes.
  • The additional parameters for recognized objects, such as line type, arrows, hatch angle, and others are located on the second level. Click on " +" to get access to these parameters.

Points
WiseImage will recognize a raster point with a size not less than 2x2 pixels (objects smaller than this are considered as speckles and ignored), and vertical and horizontal extents not greater than the Max Width value.
Do not switch this mode on when recognizing images of poor quality or images containing many speckles, since speckles can be recognized as points.

Lines
Turns on the algorithm for line recognition. As a result LINE entities are created.

Arcs and Circles
Turns on the algorithm for recognizing raster circles and arcs.

Polylines
This algorithm approximates central lines of raster objects with polylines (POLYLINE). The algorithm creates polylines consisting of straight segments only. You can use this algorithm individually (or with the Outlines algorithm) for vectorization of maps and other images consisting of arbitrary lines.

Outlines
This algorithm is used to approximate outlines of flooded areas by polylines. This algorithm creates closed polylines, approximating the boundaries of raster objects. The boundary polylines consist only of straight segments.
The algorithm traces raster lines whose width is greater than the value of the Min width parameter. To obtain outlines of all raster objects, you need to choose outlines with the Max Width parameter to 0.

Hatches
Selects the algorithm for hatch recognition. WiseImage Pro recognizes simple raster hatches and creates AutoCAD blocks, consisting of line segments. The Hatches algorithm searches for hatches only if the Lines algorithm is also on.

Symbols
Selects the algorithm for raster symbol recognition by searching specified samples. The algorithm only works if the algorithms Lines and Arcs and Circles are also on.

Setting the geometry of vectorized drawing:

The geometry is set in the Options tab of the R2Vconversion Options dialog.

Min Length -minimum length of a raster object to be recognized. (For example, enter 1 mm in the Min Length field so that raster fragments smaller than 1 mm will not be recognized).

Max Width - maximum width of raster lines. Set the value of this parameter slightly greater than the measured line width on your drawing.

Max Break - maximum accepted length of a break in a raster line to be ignored. Set the value of this parameter slightly greater than the distance between dashes in dash lines or the broken distance in poor quality lines.

Text Height - Set the value for this parameter equal to the maximum height of raster text symbols of upper case.

Accuracy - this parameter corresponds to the accuracy of your raster image. Use a high Accuracy value for images of good quality, and a low Accuracy value- for images of poor quality.

Orthogonalization - select this checkbox to obtain orthogonal vector lines as a result of vectorizing raster lines, which deviate from the orthogonal direction by not more than .
You can also align vector objects to a specified base angle; enter the angle value in the corresponding field.

Separating vector objects by layer and /or by color

The criteria for separating vector objects by layer and/or color is the width of the original raster lines.
In the Separate tab of the R2V Conversion Options dialog, you can:

  • Specify width of resulting vector objects
  • Separate resulting vector objects by color
  • Separate resulting vector objects by layer

In the Width field you can specify widths for vector objects obtained after vectorization of raster objects with line widths included in the interval, defined with the Start and End fields. You can also define a layer and/or color for these vector objects in the corresponding fields.

How to Vectorize

  • Set the vectorization parameters.
    Information is provided in section Tuning Vectorization.
  • If you have several images in your document, select the one(s) to vectorize.
    You can also vectorize a selected image fragment or image clip.
  • Choose Raster2Vector from the Conversion menu.       

How to Recognize Text

This section provides information on various algorithms for recognizing raster text; the procedure and parameters of setting up raster text recognition.

Text Recognition Algorithms

You can use the following algorithms for working with raster texts. You can choose them using the Recognition tab of the R2V Conversion Options dialog.

None - this algorithm searches for raster text areas without vectorizing.

Text Areas - this algorithm recognizes and creates text areas. You can enter text information in these text areas using the procedure of editing recognized texts.
Information is provided in section Editing recognized texts

Polylines (Outlines) - approximating the raster text with vector polylines (outline).

OCR - recognizing the raster text and creating the corresponding text objects.

How to Recognize Text

  • Select the Text Area checkbox in the Recognition tab of the R2V Conversion Options dialog.
  • Choose your required algorithm for working with the raster text.
  • In the Options tab of the R2V Conversion Options dialog set the Text Height value equal to the maximum height of raster text symbols of upper case.
  • Tune the text recognition options in the Texts tab of the R2V Conversion Option dialog.
    See section Text Recognition Options for detailed information.

Editing recognized texts

The procedure of editing recognized texts (text areas) is used after recognition with the OCR module or with Text Areas recognition.

After applying automatic vectorization choose the command Edit OCR Texts from the Convert menu. The program displays the first recognized text fragment (area), the content of which appears in the Text Correction dialog.

You can edit the text area contents. To accept the text and move on to the next one press the Accept Recognized Text button of the Text Correction dialog. To delete the current text, press Delete OCR Text button in the Text Correction dialog.

Text Recognition Options

Orientation - choose the orientation for raster texts contained in the image.

Overlapped by Graphics - if this option is on, the program searches for raster texts, crossed with other raster objects.
TIPS: It is not recommended to use this mode when working with complicated documents to avoid possible mistakes, such as incorrect recognition of small graphic objects as texts.

Standalone Letters - allows searching for standalone text characters.
TIPS: If this option is off, the program does not search for standalone text characters, but you can avoid incorrect recognition of small objects.

Patterns - If you use the OCR module, you should set patterns for text inscriptions contained in the raster document to obtain better recognition results. Select the Patterns checkbox.
If Patterns are not specified, the program uses a set of standard patterns.
See section Text Patterns for detailed information.

Height Table - if after vectorizing, you want to obtain texts of specific height, enter the desired text height(s) in the table and select the Height Table checkbox.
If you specify several height values, texts height will be rounded to the nearest value from the list.

Template file -file for storing topology models of text characters, which are used when performing OCR.

You can also train the program to recognize other text characters or different forms of characters contained in the standard template file.
See section Training OCR for detailed information.

Setting Text Patterns

Text patterns can be comprised of fixed and variable parts.
Here is formal description of word pattern definition:

" [% [length]character type] II [ letter] ]…"

length - number of letters (you do not need to specify this, if length will be different),

character type - type of letter sequence

Types of letters are presented in the following table:

D

Digits

N

Capital letters of national alphabet

n

Small letters of national alphabet

E

Capital letters of Latin alphabet

e

Small letters of Latin alphabet

S

Special characters (signs of plus, minus, equality, degree, and others)

For example, in texts 5V, 220V, 13.8V:
5,220,13.8 - variable part,
V - fixed part.
Therefore, the pattern can be defined as %DV. The length of variable parts is different, so it is not specified.

Examples:

Symbol sequence

Pattern

5; 25; 5559; 22,9

%D

R25; R15; R13

R%2D

Moscow; Oslo

%1E%e

project; design

%n

5V; 220V; 13.8V

%DV

12°, 30°, 45°

%2D%1S

Training OCR

You can train the OCR module to recognize new characters. You can add a new character to one of the standard template libraries or create your own.
TIPS: In some cases it is more convenient to use the standard library (e.g., DEFAULT.OCR) as basis, having saved it with a new name.

To train the program to recognize a new character or different forms of existing characters:

  • Choose Train OCR from the Conversion menu.
  • Open (create) OCR-file.
  • Enter a character to recognize in the Character field.
  • Select the corresponding character on the raster image using one of the selection buttons.
  • Press "+" to add the new pattern to the OCR-file.
  • Save the OCR-file.

Correcting Vectorization Results

Vectorization results usually need further editing and correcting. The program has both an automatic and interactive correction procedure intended for this purpose.

Automatic correction of vectorization results

This operation restores contact of arcs and circles, " merges" vector fragments; removes vector "speckles", aligns lines to regular directions (angle 0°, 30°, 45°, 60°, 90°, etc.), if their deviations do not exceed the angle specified by user.

How to correct vectorization results automatically:

  • Choose Vector Correction Options from the Conversion menu.
  • Using the displayed Vector Correction Options dialog, specify the autocorrection operations and parameters. Close the dialog by pressing OK.
  • Select vector objects to apply autocorrection.
  • Choose Vector Autocorrect from the Conversion menu.     

Interactive correction of vectorization results

After applying automatic correction, it is recommended that you also use the manual correction (interactive). The commands for interactive correction are located on the Vector Correction toolbar.
Using interactive correction you can:

Commands for interactive vector correction

Join selected vector fragments to a polyline

Join selected vector fragments to a circle

Join selected vector objects to an arc

Join selected vector objects to a line

Join selected vector objects to create the closest matching (in terms of geometry) single vector object

Trim vector objects

Expand vector objects

Break at specified point

Correct to intersection

Align Angle and Distance

 

 [ welcome ]   [ hybrid graphics ]   [ quick start ]   [ tutorial ]   [ reference ]   [ about ] 
 
 [ Welcome ]   [ WiseImage for Windows ]   [ WiseImage for AutoCAD ] 


 
top