Automatic Vectorization,
Text Recognition
Brief description: Automatic
vectorization is a procedure
for converting raster data to appropriate vector objects.
WiseImage recognizes the following objects when vectorizing: lines, arcs, circles,
hatches, points, texts and symbols. Raster curves and filled contours
are approximated with vector polylines.
The program can recognize line styles and arrows on line and arc ends.
You can train the program to recognize new symbols and texts.
You can round vector object widths to specified values. You can also place vector
objects corresponding to raster lines of various widths on different layers
and/or assign different colors to them.
Tuning Vectorization
You can use one of the pre-defined templates or tune the parameters yourself.
Vectorization parameters can be saved as a template for further use.
Automatic vectorization is tuned in the R2VConversion Options
dialog.
To open this dialog, choose Conversion Options from the
rConvert menu.
Select objects
to recognize:
|
Open the Recognition tab of the R2VConversion Options
dialog.
- Select the entities, which you want to obtain after vectorization - select
the corresponding checkboxes.
- The additional parameters for recognized
objects, such as line type, arrows, hatch angle, and others are
located on the second level. Click on "
+" to get access to these
parameters.
|
 |
Points
WiseImage will recognize a raster point with a size not less than
2x2 pixels (objects smaller than this are considered as speckles and ignored),
and vertical and horizontal extents not greater than the Max Width
value.
Do not switch this mode on when recognizing images of poor quality or
images containing many speckles, since speckles can be recognized as points.
Lines
Turns on the algorithm for line recognition. As a result LINE entities
are created.
Arcs and Circles
Turns on the algorithm for recognizing raster circles and arcs.
Polylines
This algorithm approximates central lines of raster objects with polylines
(POLYLINE). The algorithm creates polylines consisting of straight
segments only. You can use this algorithm individually (or with the Outlines
algorithm) for vectorization of maps and other images consisting of
arbitrary lines.
Outlines
This algorithm is used to approximate outlines of flooded areas by
polylines. This algorithm creates closed polylines, approximating the
boundaries of raster objects. The boundary polylines consist only of straight
segments.
The algorithm traces raster lines whose width is greater than the value
of the Min width parameter. To obtain outlines of all raster objects,
you need to choose outlines with the Max Width parameter to 0.
Hatches
Selects the algorithm for hatch recognition. WiseImage Pro recognizes
simple raster hatches and creates AutoCAD blocks, consisting of line segments.
The Hatches algorithm searches for hatches only if the Lines
algorithm is also on.
Symbols
Selects the algorithm for raster symbol recognition by searching specified
samples. The algorithm only works if the algorithms Lines and Arcs
and Circles are also on.
Setting
the geometry of vectorized drawing:
The geometry is set in the Options tab of the R2Vconversion
Options dialog.
|
Min Length -minimum length of a raster object to
be recognized. (For example, enter 1 mm in the Min Length
field so that raster fragments smaller than 1 mm will not be recognized).
Max Width - maximum width of raster lines. Set
the value of this parameter slightly greater than the measured line
width on your drawing.
|
 |
Max Break - maximum accepted length of a break in a raster line to be ignored. Set
the value of this parameter slightly greater than the distance between
dashes in dash lines or the broken distance in poor quality lines.
Text
Height - Set the value for this parameter equal to the maximum height of raster text
symbols of upper case.
Accuracy - this parameter corresponds to the accuracy of your raster image. Use a high
Accuracy value for images of good quality, and a low Accuracy value- for
images of poor quality.
Orthogonalization - select this checkbox to obtain orthogonal vector
lines as a result of vectorizing raster lines, which deviate from the
orthogonal direction by not more than .
You can also align vector objects to a specified base angle; enter the angle
value in the corresponding field.
Separating vector
objects by layer and /or by color
The criteria for separating vector objects by layer and/or color is the width
of the original raster lines.
In the Separate tab of the R2V Conversion Options
dialog, you can:
- Specify width of resulting vector objects
- Separate resulting vector objects by color
- Separate resulting vector objects by layer
|
 |
In the Width field you can specify widths for vector objects
obtained after vectorization of raster objects with line widths included
in the interval, defined with the Start and End fields.
You can also define a layer and/or color for these vector objects in the
corresponding fields.
How to Vectorize
- Set the vectorization parameters.
Information
is provided in section Tuning
Vectorization.
- If you have several images in your document, select the one(s) to vectorize.
You can also vectorize a selected image fragment or image clip.
- Choose Raster2Vector from the Conversion menu.
How to Recognize
Text
This section provides information on various algorithms for recognizing raster
text; the procedure and parameters of setting up raster text recognition.
Text Recognition Algorithms
|
|
You can use the following algorithms for working with raster texts. You can
choose them using the Recognition tab of the R2V Conversion Options
dialog.
|
|
|
None - this algorithm searches for raster
text areas without vectorizing.
|
|
|
Text Areas - this algorithm recognizes and creates
text areas. You can enter text information in these text areas using
the procedure of editing recognized texts.
Information is provided in section Editing
recognized texts
|
|
|
Polylines (Outlines) - approximating
the raster text with vector polylines (outline).
|
|
|
OCR - recognizing the raster text and creating
the corresponding text objects.
|
How to Recognize
Text
- Select the Text Area checkbox in the Recognition
tab of the R2V Conversion Options dialog.
- Choose your required algorithm for working with the raster text.
- In the Options tab of the R2V Conversion Options
dialog set the Text Height value equal to the maximum
height of raster text symbols of upper case.
- Tune the text recognition options in the Texts tab of the R2V
Conversion Option dialog.
See section Text Recognition
Options for detailed information.
Editing recognized texts
The procedure of editing recognized
texts (text areas) is used after recognition with the OCR module or with
Text Areas recognition.
|
After applying automatic vectorization choose the command Edit OCR Texts
from the Convert menu. The program displays the first
recognized text fragment (area), the content of which appears in
the Text Correction dialog.
|

|
You can edit the text area contents. To accept the
text and move on to the next one press the Accept Recognized Text
button of the Text Correction dialog. To delete the current
text, press Delete OCR Text button in the Text Correction
dialog.
Text Recognition Options
|
Orientation - choose the orientation for raster
texts contained in the image.
Overlapped by Graphics - if this option is on, the
program searches for raster texts, crossed with other raster objects.
TIPS: It is not recommended to use this mode when working
with complicated documents to avoid possible mistakes, such as incorrect
recognition of small graphic objects as texts.
|
 |
Standalone Letters - allows searching for standalone text characters.
TIPS: If this option is off, the program does not search for
standalone text characters, but you can avoid incorrect recognition of
small objects.
Patterns - If you use the OCR module, you should set patterns for text inscriptions
contained in the raster document to obtain better recognition results.
Select the Patterns checkbox.
If Patterns are not specified, the program uses a set of standard
patterns.
See section Text Patterns
for detailed information.
Height
Table - if after vectorizing, you want to obtain texts of specific height, enter
the desired text height(s) in the table and select the Height Table
checkbox.
If you specify several height values, texts height will be rounded to
the nearest value from the list.
Template
file -file for storing topology models of text characters, which are used when performing
OCR.
You can also train the program to recognize other text characters or different
forms of characters contained in the standard template file.
See section Training
OCR for detailed information.
Setting Text Patterns
Text patterns can be comprised of fixed and variable parts.
Here is formal description of word pattern definition:
"
[% [length]character
type] II [ letter] ]…"
length - number of letters (you do
not need to specify this, if length will be different),
character type - type of letter sequence
Types of letters are presented in the following
table:
|
D
|
|
|
N
|
|
|
n
|
Small letters of national alphabet
|
|
E
|
Capital letters of Latin alphabet
|
|
e
|
Small letters of Latin alphabet
|
|
S
|
Special characters (signs of plus, minus, equality, degree, and others)
|
For example, in texts 5V, 220V, 13.8V:
5,220,13.8 - variable part,
V - fixed part.
Therefore, the pattern can be defined as %DV. The length of variable parts is different, so it is not
specified.
Examples:
|
Symbol sequence
|
Pattern
|
|
5; 25; 5559; 22,9
|
%D
|
|
|
R%2D
|
|
Moscow; Oslo
|
%1E%e
|
|
project; design
|
%n
|
|
5V; 220V; 13.8V
|
%DV
|
|
12°, 30°, 45°
|
%2D%1S
|
Training OCR
You can train the OCR module to recognize new characters. You can add a new
character to one of the standard template libraries or create your own.
TIPS: In some cases it is more convenient to
use the standard library (e.g., DEFAULT.OCR) as basis, having saved it
with a new name.
|
To train the program to recognize
a new character or different forms of existing characters:
- Choose Train OCR from the Conversion menu.
- Open (create) OCR-file.
- Enter a character to recognize in the Character field.
- Select the corresponding character on the raster image using one of the selection
buttons.
- Press "+" to add the new pattern to the OCR-file.
- Save the OCR-file.
|
 |
Correcting Vectorization Results
Vectorization results usually need further editing and correcting. The program
has both an automatic and interactive correction procedure intended for
this purpose.
Automatic correction
of vectorization results
This operation restores contact of arcs and circles, " merges" vector fragments;
removes vector "speckles", aligns lines to regular directions (angle
0°, 30°, 45°, 60°, 90°, etc.), if their deviations do not
exceed the angle specified by user.
How
to correct vectorization results automatically:
- Choose Vector Correction Options from the Conversion
menu.
- Using the displayed Vector Correction Options dialog, specify
the autocorrection operations and parameters. Close the dialog by pressing
OK.
- Select vector objects to apply autocorrection.
- Choose Vector Autocorrect from the Conversion menu.
Interactive
correction of vectorization results
After applying automatic correction, it is recommended that you also use the
manual correction (interactive). The commands for interactive correction
are located on the Vector Correction toolbar.
Using interactive correction you can:
|
Commands
for interactive vector correction
|
|

|
Join selected vector fragments to a polyline
|
|

|
|
|

|
|
|

|
Join selected vector objects to a line
|
|

|
Join selected vector objects to create the closest matching (in terms of geometry)
single vector object
|
|
|
Trim vector objects
|
|
|
Expand vector objects
|
|
|
Break at specified point
|
|

|
Correct to intersection
|
|
|
Align Angle and Distance
|
|