/* ** k2pdfopt.c Optimize 1 and 2-column PDF's for Kindle-2 by displaying ** columns separately and stripping removing margins and ** excess white space. ** ** Copyright (C) 2013 http://willus.com ** ** This program is free software: you can redistribute it and/or modify ** it under the terms of the GNU Affero General Public License as ** published by the Free Software Foundation, either version 3 of the ** License, or (at your option) any later version. ** ** This program is distributed in the hope that it will be useful, ** but WITHOUT ANY WARRANTY; without even the implied warranty of ** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ** GNU Affero General Public License for more details. ** ** You should have received a copy of the GNU Affero General Public License ** along with this program. If not, see . ** ** v1.65 6 APRIL 2013 ** NEW FEATURES / OPTIONS ** - Added Kobo Glo and Kobo Touch device settings. ** (http://www.mobileread.com/forums/showpost.php?p=2441354&postcount=336) ** - Re-vamped the bmp_source_page_add() function so that the ** logic that breaks the page out into displayable rectangular ** regions can be used in other places (e.g. by the OCR fill-in ** function). ** - Added option -ocrcols which sets the max number of columns for ** processing with OCR (if different from the -col value). You would ** use this if you want to OCR a PDF file using -mode copy, but ** the file has multiple columns of text. ** (http://www.mobileread.com/forums/showpost.php?p=2442523&postcount=341) ** - Added option -rsf (row-split figure-of-merit) which controls a ** new algorithm which goes back and looks for rows of text which ** should be split into two (or three) separate rows. This is meant ** to help catch those cases where k2pdfopt should have split apart ** two rows of text but did not because of a small amount of overlap. ** See breakinfo_find_doubles() in breakinfo.c. ** ** LIBRARY UPDATES ** - Compiled with latest versions of major libraries: MuPDF 1.2, ** DjVu 3.5.25.3, FreeType 2.4.11, Turbo JPEG 1.2.1, PNG 1.5.14, ** Z-lib 1.2.7. ** - Linux version now compiled with gcc 4.7.2 in Ubuntu 12. ** ** TWEAKS ** - Clarified usage for -vb in k2usage.c ** - Changed "destination" to "E-reader" in places on the k2 interactive ** menu and device menu. ** - Put "disclaimer" in OCR usage which clarifies the purpose. ** - Default crop margins are now zero (was 0.25 inches). This was ** confusing too many people. ** (http://www.mobileread.com/forums/showpost.php?p=2456032&postcount=352) ** - In bmp_region_vertically_break(), different width regions and ** regions with different ending/starting row heights cause ** a vertical gap to be inserted in the output. ** ** BUG FIXES ** - Call k2pdfopt_settings_sanity_check() once per source document. ** This fixes a crash when converting multiple files. ** (Certain vars weren't getting correctly initialized on the ** 2nd, 3rd, etc. conversion files.) ** (http://www.mobileread.com/forums/showpost.php?p=2409726&postcount=317) ** - Fixed array-out-of-bounds access in k2proc.c ** (bmpregion_find_multicolumn_divider function) which occasionally ** caused k2pdfopt to terminate abnormally (typically when converting ** mostly blank pages). ** (http://www.mobileread.com/forums/showpost.php?p=2456548&postcount=356) ** - Fixed k2pdfopt_proc_one() in k2file.c so that native PDF output ** is turned off if the source file is not PDF (e.g. DjVu conversion). ** - Fixed spacing between regions with -vb -2 or -vb -1 (gap between ** pages where new chapter starts, for example--font change, etc.). ** (http://www.mobileread.com/forums/showpost.php?p=2373550&postcount=292) ** - Minimum width in vertical line detection is now 1 pixel. ** (http://www.mobileread.com/forums/showpost.php?p=2452356&postcount=345) ** - Better diagnostic output on TESSDATA_PREFIX env var. ** - Fixed native PDF output so that scientific notation is not allowed ** in PDF clipping commands. This was causing native conversions ** not to work correctly in some cases. ** (http://www.mobileread.com/forums/showpost.php?p=2467063&postcount=371) ** ** v1.64a 5 JAN 2013 ** - Fixed bug in Native PDF output introduced in v1.64. ** (stream_deflate function in wmupdf.c) ** ** v1.64 4 JAN 2013 ** - Native PDF output changed so that source pages are converted ** to XObjects (Form type). This should be much more robust when ** putting contents from multiple source pages onto a single ** destination page. ** - Added profile for Kindle paperwhite. (-dev kpw) ** - The fontdata.c file in willus lib has been reduced to only one ** font in order to reduce the size of the k2pdfopt binaries since ** k2pdfopt only uses one font for the -sm option. ** - The page width and height can now be specified in terms of ** the trimmed source page width and height. Use 't' for the ** units, e.g. -w 1t -h 1t. This would typically be used with ** the -mode copy and/or -grid options. ** - The -bp option can now take a numeric argument (inches) to ** insert a gap (of that many inches) between each source page. ** - There is now an interactive menu option for selecting the ** OCR language training file (Tesseract OCR only). ** - Fixed memory leak in bmpregion_find_multicolumn_divider(). ** - Fixed default value for -col in usage. ** - Clarified -ocrlang usage. ** - Compiled Linux versions with -static and -static-libstdc++ ** to hopefully reduce shared library incompatibilities. ** ** v1.63 20 DEC 2012 ** - Now supports OCR in multiple languages using Tesseract with ** Unicode-16 text encoding so that the OCR text can be copy / pasted ** into Unicode-aware applications. ** To select the language for OCR: -ocrlang (or -l). ** Examples: -ocrlang eng (English) ** -ocrlang fra (French) ** -ocrlang chi_sim (Chinese simplified) ** [Note that using the -ocrvis t option will not show the ** OCR text correctly for any character above unicode value ** 255 since I do not use any embedded fonts, but the text ** will convert to the correct Unicode values when copy / pasted.] ** - Tesseract "cube" files are automatically checked so that the ** best OCR detection mode is selected. ** - Updated wmupdf.c in willus lib to account for both CropBox ** and MediaBox to determine page origin (fixes user-reported ** bug in native output mode for pages with non-zero MediaBox ** origins). ** - Made changes to multicolumn divider finder to improve the ** speed. Includes counting pixels by column rather than row, ** making use of trimmed column boundaries, and using a ** 2-D pixel count array. Resulting code runs ~ 5 - 15% faster ** on average in my regression tests. ** - Removed dprintf() and fsincos() from willus lib to prevent ** minor compiling problems on some platforms. Fixed some other ** minor issues for kindlepdfviewer. ** ** v1.62 15 NOV 2012 ** CODE RE-ORGANIZATION ** - This was largely motivated by the kindlepdfviewer app which ** now uses k2pdfopt code: ** https://github.com/hwhw/kindlepdfviewer ** - Moved the bulk of the code to a k2pdfopt library consisting ** of 21 source modules. The main k2pdfopt.c program is now ** only about 150 lines. ** - The willus and k2pdfopt libraries have options to compile ** so that they don't access any other 3rd-party library calls. ** Seach for "THIRD PARTY" in willus.h. ** - There is now a K2PDFOPT_KINDLEPDFVIEWER macro which can ** be defined in k2pdfopt.h to make the code more friendly for ** the kindle viewer app. For example, compiling kview.c with ** all third-party libs disabled and K2PDFOOPT_KINDLEPDFVIEWER ** defined results an executable about 300 KiB in size in Windows. ** ** NEW FEATURES ** - New -neg option inverts the output to be white on black ** ("night mode"). Note that figures and photographs are not ** distinguished, so they will also be inverted. ** - Native mode now defaults to off and -n turns it on no matter ** what (disables text wrapping and OCR). ** ** BUG FIXES ** - Setting re-initialized before final call to k2parsecmd. ** - Fixed minor memory leak in willus lib (vector_nd_free). ** - Safeguard against possible infinite loop in willus lib ** (bmp_more_rows). ** - Word spacing history initialized before each new document. ** - The XObjects dictionary is now merged when crop boxes from ** multiple source pages are put onto the same output page. ** This should improve the native PDF output in certain cases. ** ** v1.61 3 NOV 2012 ** - Some user menu options were not taking effect. This is fixed. ** - Compiled with tesseract 3.02.02 and Leptonica 1.69. ** - User "help" menu fits in 25-line window now. ** ** v1.60 1 NOV 2012 ** MAJOR NEW FEATURES ** - Option to keep native PDF contents (see full details under ** command-line option usage under -n for "native"). ** User menu option "n". This is only available if k2pdfopt ** has been compiled with MuPDF. ** - New grid option grids the source page into a fixed number ** of rows and columns with some overlap. E.g. -grid 2x2. ** This option works well with the -n option above. ** This is under user menu option "mo". ** ** NEW FEATURES / CHANGES ** - The new -dev option selects device profiles (there are ** only two so far: kindle2 and nook simple touch). ** The user menu option for -dev is "d". ** - The new -mode option selects a particular mode of operation ** that is shorthand for a number of options. There are ** currently four modes: default (def), copy, 2-column (2col), ** and fw (fit width). E.g. "-mode copy" makes k2pdfopt behave ** just like my "pdfr" program (thus eliminating the need for me ** to distribute pdfr separately). And "-mode fw" makes k2pdfopt ** behave very much like sopdf's fit width option. For more details, ** see the full command-line usage (k2pdfopt -?). ** The user menu option for -mode is "mo". ** - The user input system was revamped so that the menu options ** build up a set of command-line arguments that are passed to ** the main program. This way the user can easily see the ** command-line options that match the selections from the menu. ** - Command line arguments can be put in directly at the user ** input menu (anything beginning with a - is considered to be ** a command-line option and is appended to the list). ** - At the user input menu, '-' will clear all menu-selected ** options, '--' will clear the command-line options actually ** entered at the command-line, and '---' will clear the options ** from the K2PDFOPT environment variable. ** - The -vb option can now take -2 as an argument, which indicates ** that the vertical spacing is to be exactly preserved from the ** source document. ** - New -t option to specify trimming or no trimming (-t-) of ** excess white space. Default is to trim white space. ** - New -dpi option is same as -odpi. ** - The -w and -h options can take values in inches or cm now ** instead of pixels, and they can be negative to specify that ** the device size should be scaled from the source page size. ** - With -w and -h set to -1 to follow the source page sizes, ** k2pdfopt can now handle varying destination page sizes ** (see set_margins_and_devsize function). ** - The -m and -om command-line arguments can now be a comma- ** -delimited list of margins: left,top,right,bottom. ** E.g. -m 0.5,1.0,0.5,1.0. ** - The "Author", "Title", and "CreationDate" fields in the ** source PDF file are now correctly passed on to the output ** file. If there is no Title field in the source, the base ** output file name is used. ** - Specifying -f2p -2 will now also set -vb -2 automatically. ** Use -fp -2 -vb -1 to revert to v1.51 behavior. ** ** BUG FIXES ** - Fixed OCR word placement bug for tall (centered) bitmaps. ** - Make sure if crop boxes are used that padding/corner marking ** is turned off and OCR is turned off. ** - Fixed some issues when switching to Ghostscript and when ** processing a folder full of bitmaps. ** - Should work with postscript (.eps or .ps) files now ** (requires Ghostscript; output file is still PDF). ** - Fixed bmpregion_find_vertical_breaks() bug where it wasn't ** always correctly interpreting the last section of a region. ** - The -vls option works correctly when combined with -vb -1 ** and no text wrapping now (vls_test.pdf). ** - Eliminated divide by zero issue in bmpregion_is_clear() when ** gt_in gets too small. ** - Put call to wrapbmp_flush() in publish_master() when flushall ** is set (fixes bug reported by hm88 on mobileread) ** - Adjusted critierion for too thick / too thin hyphen in ** bmpregion_hyphen_detect() to be more correct and to allow ** for slightly thinner hyphens. ** - If mupdf can't open the file in Windows, I try the 8.3 alternative ** file name. This solved a problem involving a path that had a ** non-traditional (non-ASCII) character in it. ** - Fixed bug in detect_vertical_lines() that caused problems ** in some cases on 64-bit versions. ** ** v1.51 9-21-12 ** NEW FEATURES ** - New option -jf for special figure justification (under ** "j" in interactive menu). ** - -f2p option applies to small figures as well as tall figures. ** - Compiled w/MuPDF v1.1 and Freetype v2.4.10. ** - Source can be built without DjVuLibre library (see HAVE_DJVU ** macro) or MuPDF (see HAVE_MUPDF macro). If no MuPDF, then ** Ghostscript must be installed to process PDF. ** - Option -o (used to control overwrite) now sets the output ** name via a formatting string. ** Example #1: -o %s_k2opt ** This is the default and and appends _k2opt to the source name. ** (%s is replaced by the base name of the source file.) ** Example #2: -o out%04d ** In this case, each subsequent output file (assuming you ** specify more than one input file) will be out0001.pdf, ** out0002.pdf, out0003.pdf, ... ** ** CHANGES(!) TO COMMAND-LINE OPTIONS/INPUT MENU ** - I generally like to avoid doing this for backwards compatibility, ** but I felt some changes were overdue. ** COMMAND-LINE ** * The overwrite option is now controlled with -ow instead of ** -o since I wanted -o to specify the output name format. ** * The new command-line option for setting OCR visibility (was -wc) ** is -ocrvis. E.g. "-ocrvis st" will show the 's'ource document ** and the OCR 't'ext. ** * The -gtcmax option has been changed to -cgmax to be more ** consistent. It's also been moved under the "co" options ** in the interactive menu. Meaning of -gtc option has been ** clarified somewhat. ** INTERACTIVE MENU ** * Revamped the user input menu a bit by grouping things under ** certain options in order to reduce the number. ** > "o" now sets the output name format string and you ** now set the output DPI (-odpi) under "d" for device resolution. ** > "g" (gamma correction), "s" (sharpening), and "wt" (white ** threshold) are all now under "cs" for contrast/sharpen. ** > "de" (defect size), "evl" (erase vertical lines), and "gs" ** (ghostscript) are all under "s" for Special options. ** > "mc" (mark corners) is now set under "pd" (padding/marking). ** > "ws" (word spacing) is now set under "w" (wrap text). ** ** BUG FIXES ** - OCR text now rendered at a more uniform height. ** - Box around OCR text options fixed (-ocrvis b). ** - Fixed bug in text re-flow when a hyphen is detected. ** - Fixed interpretation of -ocrhmax so that it is based on the ** source dimension. ** - Min figure height from -jf option is applied in the ** bmpregion_find_vertical_breaks() function. ** - Interactive menu should be smarter about when you try to enter ** a file name at the prompt (fixed a bug where it seemed to ** not accept a file you would type in at the prompt even though ** it would eventually process it). ** - No longer needs Jasper or GSL libs for build. ** - Tightened up the pdfwrite_bitmap function in willus lib. ** - Cleaned up / revised source code and build instructions for ** a simpler build (eliminated two basically unneeded libraries). ** - OCR options are ignored in versions compiled without OCR. ** ** v1.50 9-7-12 ** MAJOR NEW FEATURE: OCR ** - For PDFs in English, added optical character recognition ** using two different open source libraries: Tesseract v3.01 ** (http://code.google.com/p/tesseract-ocr/) and GOCR v0.49 ** (http://jocr.sourceforge.net). The OCR'd text ** is embedded into the document as invisible ASCII text ** stored in the same location as the bitmapped words, ** exactly like some document scanning software works ** (e.g. Canon's Canoscan software). This allows the resultant ** PDF document to be searched for text, assuming the OCR ** is successful (won't always be the case, but should ** work reasonably well if the source text is clear enough). ** - Tesseract works far better than GOCR, but requires that ** you download the "trained data" file for your language ** from http://code.google.com/p/tesseract-ocr/downloads/list ** and point the environment var TESSDATA_PREFIX to the root ** folder for your trained data. If the tesseract trained ** data files are not found, k2pdfopt falls back to GOCR. ** E.g. if your data is in c:\tesseract-ocr\tessdata\... ** then set TESSDATA_PREFIX= c:\tessseract-ocr\ ** - I wrestled with this, but the default for OCR (for now) ** is for it to be turned off. You must explicitly turn it ** on with the -ocr command-line option or "oc" at the ** interactive menu. I did this because it does significantly ** slow down processing (about 20 words/second on a fast PC ** using Tesseract). ** - -wc option sets the OCR word color (visibility) ** (e.g. -wc 0 for invisible OCR text, the default). ** - -ocrhmax option sets max height of OCR'd word in inches. ** ** OTHER NEW FEATURES: ** - Added -evl option (menu item "e") to erase vertical lines. ** This allows the option, for example, to get rid of ** column divider lines which often prevent k2pdfopt from ** properly separating columns and/or wrapping text. ** - Detects and eliminates hyphens when wrapping text. Turn ** this off with -hy- ("w" interactive menu option). ** - New option -wrap+ option will unwrap/re-flow narrow columns ** of text to your wider device screen (typically desired ** on a Kindle DX, for example). Best if combined with -fc-. ** - There is now a max column gap threshold option so that ** columns are not detected if the gap between them is too large. ** (-gtcmax command-line option). Default = 1.5. ** - New option -o controls when files get overwritten. E.g. ** -o 10 tells k2pdfopt not to overwrite any existing files ** larger than 10 MB without prompting (the default). ** - New option -f2p ("fit to page") can be used to fit tall ** figures or the "red-boxed" regions (when using -sm) onto ** single pages. ** ** BUG FIXES / MISC ** - Fixed description of -whitethresh in usage (had wrong default). ** - Interactive menu more obvious about what files are specifed ** and allows wildcards for file specification. ** - Fixed bug in word_gaps_add() where the gap array was getting ** erroneously filled with zeros, leading to some cases where ** words got put together with no gap between them during text ** wrapping/re-flow. ** - The -sm option (show marked source) now works correctly when ** used with the -c (color) option. ** - Removed the separate usage note about Ghostscript and put it ** under the -gs option usage. ** - Fixed bug where zero height bitmap was sometimes passed to ** bmp_src_to_dst(). Also rounded off (rather than floor()-ing) ** scaling height used in the bitmap passed to bmp_src_to_dst(). ** - Due to some minor bitmap rendering improvements, this version ** seems to be generally a little faster (~2-4%) compared to ** v1.41 (under same conditions--no OCR). ** ** v1.41 6-11-2012 ** IMPROVEMENTS ** - Compiled w/MuPDF v1.0. ** - Tweaked the auto-straightening algorithm--hopefully more ** accurate and robust. ** - Improved auto-contrast adjust algorithm and added option ** to force a contrast setting by suppying a negative ** value for -cmax. Does a better job on scans of older ** documents with significantly yellowed or browned pages. ** - Options -? and -ui- when specified together now correctly ** echo the entire usage without pausing so that you can ** redirect to a file (as claimed in the usage for -?). ** ** BUG FIXES ** - Fixed bug where the column finding algorithm became far ** too slow on certain types of pages (to the point where ** k2pdfopt appeared to have crashed). ** - Fixed bug where k2pdfopt wasn't working correctly when ** the -c option was specified (color output). ** - Fixed bug where if max columns was set to 3 in the ** interactive menu, it didn't get upgraded to 4. ** - Fixed memory leak in bmpregion_add() (temp bitmap ** wasn't getting freed). ** - Fixed memory leak in bmp_src_to_dst() (temp 8-bit ** bitmap not getting freed). ** - Fixed memory leak in bmpregion_one_row_wrap_and_add() ** (breakinfo_free). ** - Check for zero regions in breakinfo_compute_row_gaps() ** and breakinfo_compute_col_gaps(). ** - Autostraighten no longer inadvertently turned on ** when debugging. ** ** v1.40 4-5-2012 ** - This is my most substantial update so far. ** I did a re-write of many parts of the code, including ** all of the text wrapping functions. I also put this ** version through many hours of regression testing. ** - Major new features: ** * Does true word wrap (brings words up from the ** next line if necessary). ** * Preserves indentation, justification, and vertical ** spacing more faithfully. Overall, particularly for ** cases with text wrapping, I think the output looks ** much better. ** * Ignores defects in scanned documents. ** * Compiled with all of the very latest third party ** libraries, including mupdf 0.9. ** * v1.40 is about 5% faster than v1.35 on average ** (Windows 64 version). ** - New justification command-line option is: ** -j [-1|0|1|2][+/-] ** Using -1 tells k2pdfopt to use the document's own ** justification. A + after will attempt to fully ** justify the text. A - will force no full justification. ** Nothing after the number will attempt to determine ** whether or not to use full justification based on ** if the source document is fully justified. ** - The default defect size to ignore in scanned documents ** is a specified user size (default is 1 point). The ** command-line option is -de (user menu option "de"). ** - Command line options -vls, -vb, and -vs control ** vertical spacing, breaks, and gaps. They are all ** under the interactive user menu under "v". ** - Line spacing is controlled by -vls. ** Example: -vls -1.2 (the default) will preserve ** the default document line spacing up to 1.2 x ** single-spaced. If line spacing exceeds 1.2 x in the ** source document, the lines are spaced at 1.2 x. ** The negative value (-1.2) tells k2pdfopt to use it ** as a limit rather than forcing the spacing to be ** exactly 1.2 x. A positive value, on the other hand, ** forces the spacing. E.g. -vls 2.0 will force line ** spacing to be double-spaced. ** - Regions are broken up vertically using the new -vb ** option. It defaults to 2 which breaks up regions ** separated by gap 2 X larger than the median line gap. ** For behavior more like v1.35, or to not break up the ** document into vertical regions, use -vb -1. Vertical ** breaks between regions are shown with green lines when ** using -sm. ** - The new -vs option sets the maximum gap between regions ** in the source document before they are truncated. ** Default is -vs 0.25 (inches). ** - Added menu option for -cg under "co". ** - Reduced default min column gap from 0.125 to 0.1 inches. ** - The -ws (word spacing threshold) value is now specified ** as a fraction of the lowercase letter height (e.g. a ** small 'o'). The new default is 0.375. ** ** v1.35 2-16-2012 ** - Changed how the columns in a PDF file are interpreted ** when the column divider moves around some. The column ** divider is now allowed to move around on the page ** but still have the columns be considered contiguous. ** This is controlled by the -comax option. Use ** -comax -1 to revert to v1.34 and before. The ** default is -comax 0.2. See example at: ** http://willus.com/k2pdfopt/help/column_divider.shtml ** - Added nice debugging tool with the -sm command-line ** option ("sm" on interactive menu) which shows marked ** source pages so you can clearly see how k2pdfopt ** is interpreting your PDF file and what affect the ** options are having. ** - The last line in a paragraph, if shorter than the ** other lines significantly, will be split differently ** and not fully justified. ** - Modified the column search function to better find ** optimal gaps. ** - The height of a multi-column region is calculated ** more correctly now (does not include blank space, ** and both columns must exceed the minimum height ** requirement). ** - Text immediately after a large rectangular block ** (typically a figure) is now appended to the ** figure region, since it is often the axis labels ** for the figure. ** - Fixed array-out-of-bounds bugs in ** bmpregion_wrap_and_add() and break_point(). ** - Added some performance enhancements regarding how ** regions are trimmed (rowcount[] and colcount[] ** arrays). ** - The file name to be processed is now listed with the ** interactive menu, and a wildcard can now be specified ** as the file name on the interactive menu. ** ** v1.34a 12-30-2011 ** - Some build corrections after the first release of ** v1.34 which had issues in Linux and Windows. ** - Fixed interpretation of -jpg flag when it's the last ** command-line option specified. ** ** v1.34 12-30-2011 ** - I've collected enough bug reports and new feature ** requests that I decided to do an update. ** - Added -cgr and -crgh options to give more control ** over how k2pdfopt selects multi-column regions. ** - Don't switch to Ghostscript on DJVU docs. ** - Continues processing files even if has an error on ** one page. ** - Fixed bug in orientation detection (minimum returned ** value is now 0.01 so as not to kill the average). ** - Added document scale factor (-ds or "ds" in menu) ** which allows users to correct PDF docs that are the ** wrong size (e.g. if your PDF reader says your ** document is 17 x 22 inches when it should be ** 8.5 x 11, use -ds 0.5). ** - Fixed bug in break_point() where bp1 and bp2 did not ** get initialized correctly. ** ** v1.33 11-11-2011 ** - Added autodetection of the orientation of the PDF ** file. This is somewhat experimental and comes with ** several caveats, but I have made it the default ** because I think it works pretty well. ** Caveat #1: It assumes the PDF/DJVU file is mostly ** lines of text and looks for regularly spaced lines ** of text to determine the orientation. ** Caveat #2: If it determines that the page is ** sideways, it rotates it 90 degrees clockwise, so it ** may end up upside down. ** - The autodetection is set with the -rt command-line ** option (or the "rt" menu option): ** 1. Set it to a number to rotate your PDF/DJVU file ** that many degrees counter-clockwise. ** 2. Set it to "auto" and k2pdfopt will examine up ** to 10 pages of the file to determine the ** orientation it will use. ** 3. Set it to "aep" to auto-detect the rotation of ** every page. If you have different pages that ** are rotated differently from each other within ** one file, you can use this option to try to ** auto-rotate each page. ** 4. To revert to v1.32 and turn off the orientation ** detection, just put -rt 0 on the command line. ** - Added option to attempt full justification when ** breaking lines of text. This is experimental and ** will only work well if the output dpi is chosen so ** that rows break approximately evenly. To turn on, ** use the "j" option in the interactive menu or the ** -j command-line option with a + after the selection, ** e.g. ** -j 0+ (left/full justification) ** -j 1+ (center/full justification) ** -j 2+ (right/full justification) ** ** v1.32 10-25-2011 ** - Make sure locale is set so that decimal marker is ** a period for numbers. This was causing problems ** in locales where the decimal marker is a comma, ** resulting in unreadable PDF output files. This ** was introduced by having to compile for the DJVU ** library in v1.31. ** - Slightly modified compile of DJVU lib (re: locale). ** - Remove "cd" option from interactive menu (it was ** obsoleted in v1.27). ** - Warn user if source bitmap is excessively large. ** - Print more info in header (compiler, O/S, chip). ** ** v1.31 10-17-2011 ** - Now able to read DJVU (.djvu) files using ddjvuapi ** from djvulibre v3.5.24. All output is still PDF. ** - Now offer generic i386 versions for Win and Linux ** which are more compatible w/older CPUs, and fixed ** issue with MuPDF so it doesn't crash on older CPUs ** when compiled w/my version of MinGW gcc. ** ** v1.30 10-4-2011 ** - Just after I posted v1.29, I found a bug I'd introduced ** in v1.27 where k2pdfopt didn't quit when you typed 'q'. ** I fixed that. ** - Made user menu a little smarter--allows different ** entries depending on whether a source file has already ** been specified. ** ** v1.29 10-4-2011 ** - Input file dpi now defaults to twice the output dpi. ** (See -idpi option.) ** - Added option to break input pages at the end of each ** output page. ("Break pages" in menu or -bp option.) ** - Set dpi minimums to 50 for input and 20 for output. ** ** v1.28 10-1-2011 ** - Fixed bug that was causing vertical stripes to show ** up on Mac and Linux version output. ** - OSX 64-bit version now available. ** ** v1.27 9-25-2011 ** - Changed default max columns to two. There were ** too many cases of false detection of sub-columns. ** Use -col 4 to detect up to 4 columns (or select ** the "co" option in the user menu). ** - The environment variable K2PDFOPT now can be ** use to supply default command-line options. It ** replaces all previous environment variables, ** which are now ignored. The options on the ** command line override the options in K2PDFOPT. ** - Added -rt ("rt" in menu) option to rotate the source ** pages by 90 (or 180 or 270) degrees if desired. ** - Default startup is now to show the user menu rather ** than command line usage. Type '?' for command line ** usage or use the -? command line option to see usage. ** - Added three new "expert-mode" options for controlling ** detection of gaps between columns, rows, and words: ** -gtc, -gtr, -gtw. The -gtc option replaces ** the -cd option from v1.26. These can all be set ** with the "gt" menu option. Use the "u" option for ** more info (to see usage). ** - In conjunction with the new "expert-mode" options, ** I adjusted how gaps between columns, rows, and words ** are detected and adjusted the defaults to hopefully ** be more robust. ** - You can now enter all four margin settings (left, ** top, right, bottom) from the user input menu for ** "m" and "om". ** - Added -x option to get k2pdfopt to exit without asking ** you to press first. ** ** v1.26 9-18-2011 ** - Added column detection threshold input (-cd). Set ** higher to make it easier to detect multiple columns. ** - Adjusted the default column detection to make column ** detection a bit easier on scanned docs with ** imperfections. ** ** v1.25 9-16-2011 ** - Smarter detection of number of TTY rows. ** ** v1.24 9-12-2011 ** - Input on user menu fixed not to truncate file names ** longer than 32 chars for Mac and Linux. ** ** v1.23 9-11-2011 ** - Added right-to-left (-r) option for scanning pages. ** ** v1.22 9-10-2011 ** - First version compiled under Mac OS X. ** - Made some changes to run on OS X. Kludgey, but works. ** You have to double-click the icon and then drag a file ** to the display window and press . I've made ** linux work similarly. ** - Since Mac and Linux shells default to black on white, ** I've made the the text colors more friendly to that ** scheme for linux and Mac. Use -a- to turn off text ** coloring altogether, or set the env variable ** K2PDFOPT_NO_TEXT_COLORING. ** ** v1.21 9-7-2011 ** - Moved some bmp functions to standard library. ** - JPEG images always done at 8 bpc (no dithering). ** - Fixed dithering of 1-bit-per-colorplane images. ** ** v1.20 9-2-2011 ** - Added dithering for bpc < 8. Use -d- to turn off. ** - Adjusted gamma correction algorithm slightly (so that ** pure white stays pure white). ** ** v1.19 9-2-2011 ** - Added gamma adjust. Setting to a value lower than 1.0 ** will darken the font some and appear to thicken it up. ** Default is 0.5. Thanks to PaperCrop for the idea. ** - Interactive menu now uses letters for the options. ** This should keep the option choices the same even if ** I add new ones, and now the user can enter a page range ** as the final entry. ** ** v1.18 8-30-2011 ** - break_point() function now uses same white threshold ** as all other functions. ** - Added "-wt" option to manually specify "white threshold" ** value above which all pixels are considered white. ** - Tweaked the contrast adjustment algorithm and changed ** the max to 2.0 (was much higher). ** - Added "-cmax" option to limit contrast adjustment. ** ** v1.17 8-29-2011 ** - Min region width now 1.0 inches. Bug fixed when ** output dpi set too large--it is now reduced so that ** the output display has at least 1-inch of display. ** ** v1.16 8-29-2011 ** - Now queries user for options when run (just press ** to go ahead with the conversion). ** Use -ui- to disable this (it is automatically disabled ** when run from the command line in Windows). ** - Fixed bug in MuPDF calling sequence that results in ** more robust reading of PDF files. (Fixes the parsing ** of the second two-column example on my web page.) ** - Fixed bug in MuPDF library that prevented it from ** correctly parsing encrypted sections in PDF files. ** (This bug is not in the 0.8.165 tarball but it ** was in the version that I got via "git".) ** This only affected a small number of PDF files. ** - New landscape mode (not the default) is enabled ** with the -ls option. This turns the output sideways ** on the kindle, resulting in a more magnified display ** for typical 2-column files. Thanks to Taesoo Kwon ** for this idea. ** - Default PDF output is now much smaller--about half ** the original size. This is because the bitmaps are ** saved with 4 bits per colorplane (same as the Kindle). ** You can set this to 1, 2, 4, or 8 with the -bpc option. ** Thanks to Taesoo Kwon and PaperCrop for this idea. ** - Default -m value is now 0.25 inches (was 0.03 inches). ** This ignores anything within 0.25 inches of the edge ** of the source page. ** - Now uses precise Kindle 2 (and 3?) display resolution ** by default. Thanks to the PaperCrop forum for pointing ** out that Shift-ALT-G saves screenshot on Kindle. ** The kindle is a weird beast, though--after lots of ** testing, I figured out that I have to do the ** following to get it to display the bitmaps with ** a 1:1 mapping to the Kindle's 560 x 735 resolution: ** (a) Make the actual bitmap in the PDF file ** 563 x 739 and don't use the excess pixels. ** I.e. pad the output bitmap with 3 extra ** columns and 4 extra rows. ** (b) Put black dots in the corners at the 560x735 ** locations, otherwise the kindle will scale ** the bitmap to fit its screen. ** This is accomplished with the new -pr (pad right), -pb ** (pad bottom), and -mc (mark corners) options. The ** defaults are -pr 3 -pb 4 -mc. ** - New -as option will attempt to automatically straighten ** source pages. This is not on by default since it slows ** down the conversion and is somewhat experimental, but I've ** found it to be pretty reliable and it is good to use on ** scanned PDFs that are a bit tilted since the pages need ** to be straight to accurately detect cropping regions. ** - Reads 8-bit grayscale directly from PDF now for faster ** processing (unless -c is specified for full color). ** - Individual bitmaps created only in debug mode. ** k2_src_dir and k2_dst_dir folders no longer needed. ** ** v1.15 8-3-2011 ** - Substantial code re-write, mostly to clean things up ** internally. Hopefully won't introduce too many bugs! ** - Can handle up to 4 columns now (see -col option). ** - Added -c for full color output. ** - If column width is close to destination screen width, ** the column is fit to the device. Controlled with -fc ** option. ** - Optimized much of code for 8-bit grayscale bitmaps-- ** up to 50% faster than v1.14. ** - Added -wrap- option to disable text wrapping. ** - Can convert specific pages now--see -p option. ** - Added margin ignoring options: -m, -ml, -mr, -mt, -mb. ** - Added options for margins on the destination device: ** -om, -oml, -omr, -omt, -omb. ** - Min column gap now 0.125 inches and min column height ** now 1.5 inches. Options -cg and -ch added to control ** this. ** - Min word spacing now 0.25. See -ws option. ** ** v1.14 7-26-2011 ** - Smarter line wrapping and text sizing based on custom options. ** (e.g. should work better for any size destination screen ** --not just 6-inch.) ** - Bug fix. -w option fixed. ** - First page text doesn't butt right up against top of page. ** ** v1.13 7-25-2011 ** - Added more command-line options: justification, encoding ** type, source and destination dpi, destination width ** and height, and source margin width to ignore. ** Use -ui to turn on user input query. ** - Now applies a sharpening algorithm to the output images ** (can be turned off w/command-line option). ** ** v1.12 7-20-2011 ** - Fixed a bug in the PDF output that was ignored by some readers ** (including the kindle itself), but not by Adobe's reader. ** PDF files should be readable by all software now. ** ** v1.11 7-5-2011 ** - Doesn't put "Press to exit." if launched as a ** command in a console window (in Windows). No change to ** Linux version. ** ** v1.10 7-2-2011 ** - Integrated with mupdf 0.8.165 so that Ghostscript is ** no longer required! Ghostscript can still be used/ ** will be tried if mupdf fails to decypher the pdf file. ** - PDF page number count now much more reliable. ** ** v1.07 7-1-2011 ** - Fixed bugs in the pdf writing that were making the ** pdf files incompatible with the kindle. ** - Compiled w/gcc 4.5.2. ** - Added smarter determination of # of PDF pages in source, ** though it doesn't always work on newer PDF formats. ** This can cause an issue with the win32 version because ** calling Ghostscript on a page number beyond what is in ** the PDF file seems to sometimes result in an exception. ** ** v1.06 6-23-2011 ** - k2pdfopt now first tries to find Ghostscript using the registry ** (Windows only). If not found, searches path and common folders. ** - Compiled w/turbo jpeg lib 1.1.1, libpng 1.5.2, and zlib 1.2.5. ** - Correctly sources single bitmap files. ** ** v1.05 6-22-2011 ** Fixed bug in routine that looks for Ghostscript. ** Also, Win64 version now looks for gsdll64.dll/gswin64c.exe ** before gsdll32.dll/gswin32c.exe. ** ** v1.04 6-6-2011 ** No longer requires Imagemagick's convert program. ** ** v1.03 3-29-2010 ** Made some minor mods for Linux compatibility. ** ** v1.02 3-28-2010 ** Changed rules for two-column detection to hopefully avoid ** false detection. At least 0.1 inches must separate columns. ** ** v1.01 3-22-2010 ** Fixed some bugs with file names having spaces in them. ** Added program icon. Cleaned up screen output some. ** ** v1.00 3-20-2010 ** First released version. Auto adjusts contrast, clears ** edges. ** */