|
DETECTING MULTIPLE COLUMNS
K2pdfopt has a number of options to control when it tries to break the PDF document
into multiple columns. The command-line options are -as, -col, -ch,
-cgr, and -crgh. Or you can select option "co" from the interactive
menu (as of v1.34) to adjust all of these values. If you don't know what to put,
just go with the default, but I'll discuss what they do here.
AUTO-STRAIGHTEN
If you are processing a photo-copied or scanned document which might not be perfectly
straight, try turning on the auto-straighten / de-skew using -as. The
document should be auto-straightened for the best chance of correctly detecting
multiple columns.
COLUMN LIMIT
Use the -col option to select the maximum number of columns to be detected
in your document. This can realistically be 1, 2, or 4 (3 is the same as 4).
It essentially sets the level of recursion that k2pdfopt will use on detected
regions. That is, if you set it to 2, k2pdfopt will look for one column break in
the main page. If you set it to 4, k2pdfopt will then (recursively) look for a
column break within each "column" if it finds a column break on the main page.
Setting to 1 will turn off multiple-column detection, i.e. k2pdfopt
will not look for multiple columns. The default value for -col is 2,
so you must manually set it to 4 for documents with more than 2 columns.
COLUMN DETECTION OPTIONS
The figure below will be used to demonstrate three of the column detection options:
The -ch option is used to specify the minimum height for a multi-column
region in inches. That is, if you use -ch 4.0, then any multi-column
region within the PDF page must be at least 4 inches high, otherwise it will not
be broken into multiple columns. The default value is 1.5.
The -cgr option (Column Gap Range) specifies the range over which
k2pdfopt looks for a break (gap) between the columns. If it is set to 1.0,
k2pdfopt will scan the entire horizontal range of the page for a column break.
If it is set to 0.05, k2pdfopt will only scan the middle 5% of the page for
a column break. The default is 0.33.
The -crgh option (Column Row Gap Height) specifies the minimum horizontal gap
heights, in inches, that must surround each multi-column region. Setting this
value higher makes it harder for k2pdfopt to find a multi-column region, because
it requires the gaps to be larger. The default is 1/72 of an inch.
|
|
|