/* ** k2pdfopt.c Optimize 1 and 2-column PDF's for Kindle-2 by displaying ** columns separately and stripping removing margins and ** excess white space. ** ** Copyright (C) 2023 http://willus.com ** ** This program is free software: you can redistribute it and/or modify ** it under the terms of the GNU Affero General Public License as ** published by the Free Software Foundation, either version 3 of the ** License, or (at your option) any later version. ** ** This program is distributed in the hope that it will be useful, ** but WITHOUT ANY WARRANTY; without even the implied warranty of ** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ** GNU Affero General Public License for more details. ** ** You should have received a copy of the GNU Affero General Public License ** along with this program. If not, see . ** ** ** VERSION HISTORY ** ** v2.55 26 DEC 2023 ** ENHANCEMENTS ** -Windows version compiled with MinGW, gcc v13.2.0 on Windows 11 ** core i9-9900 CPU. ** -Mac arm-64 version compiled with clang 17 on macOS 14 Sonoma ** -Mac x64 version cross-compiled on M1-CPU Mac mini with clang 17 ** on macOS 14 Sonoma ** -Linux version compiled with gcc 13.2.1 on Fedora 39 (virtual box) ** -Library updates: Tesseract 5.3.3, Leptonica 1.83, Mupdf 1.23.7, ** Freetype 2.13.2, libpng 1.6.40, jpeg-turbo 2.1.5.1, ** openjpeg 2.5.0, jbig2dec 0.20, zlib 1.3 ** BUG FIXES ** -Bookmarks / TOC page numbering fixed in wmupdf.c (broke in v2.54 ** due to change in mupdf library). ** ** v2.54 6 JAN 2023 ** ENHANCEMENTS ** -Compiled with gcc v12.2.0 in MS Windows ** -Linux version compiled with v12.2.0 in Fedora 37 ** -Mac ARM version compiled with v12.2.0 in Ventura ** -Mac x64 version compiled with Clang v15 in Ventura ** -Library updates: Tesseract 5.3.0, Leptonica 1.83, Mupdf 1.21, ** Freetype 2.12.1, libpng 1.6.39, jpeg-turbo 2.1.4, ** openjpeg 2.5.0, jbig2dec 0.19, zlib 1.2.13 ** BUG FIXES ** -ocrword library updated to fix possible memory leak from not ** freeing bitmaps used by the data structure. ** ** v2.53 18 JUL 2020 ** NEW FEATURES ** -Added more support for DJVU input files, including OCR-layer text ** extraction, reading of bookmarks, and information viewing ** (-i or GUI "Info" view). ** -Added support for CBZ input files. ** ENHANCEMENTS ** -The multithreaded Tesseract OCR capability has been significantly ** improved. The images for OCR processing are now queued up into ** larger groups so that the multithreading is more effective. ** Since the introduction of the -ocrd option in v2.50 (Dec 2018), ** the default OCR method of OCRing an entire line of text at a ** time had effectively turned off OCR multithreading since only ** one row of text at a time was passed to the OCR engine. ** -Updated many of the support libraries to their latest releases: ** Tesseact 4.1.1, Leptonica 1.79, Mupdf 1.17, Freetype 2.10.2 ** libpng 1.6.37, jpeg-turbo 2.0.4, openjpeg 2.3.1 ** -When previewing "marked up" pages, the preview now only ** processes the desired preview page, making it much faster for ** previewing pages above page 1. ** -Added -ocrvbb option to more precisely determine OCR layer text ** bounding boxes in pre-existing OCR layers. ** BUG FIXES ** -Fixed bug in ocr layer sorting algorithm in ** wtextchars_group_by_words() function in k2ocr.c ** -Fixed a bug where one of my png image reading functions was not ** thread safe (not re-entrant). This had the potential to cause ** very random crashing. ** -Fixed multiple issues with text selection of a pre-existing ** OCR layer. Some text wasn't always included in certain cases, ** and sometimes the selection box was over-sized. ** -Altered the detection of words in a row of text to work better / ** more consistently in obvious cases. See v2.53 comments in ** get_word_gap_threshold() in bmpregion.c. ** ** v2.52 12 JUN 2020 ** NEW FEATURES ** -Automatically downloads Tesseract v4.0 training files from github. ** Use -y for no prompting. See -ocrlang command usage for full details. ** -Added new -go (grid order) option to specify the order in which ** areas in gridded pages are processed. Grid area processing also ** defaults to right-to-left if -r is specified. ** ENHANCEMENTS ** -Prints Tesseract init info at the beginning of each file conversion ** now--in the GUI also. ** -Redirects Tesseract library debug statements to a log file. Prints ** them out if it cannot initialize correctly. ** -Re-edited a lot of the -mode usage descriptions. ** -CMakeLists.txt files updated. ** -Added Kobo Clara HD, Kobo Forma, and Kobo Libra H2O to device list ** (using preliminary resolution numbers). ** -Windows version now compiled with gcc 9.3.1 on a core i9-9900. ** MS WINDOWS GUI ** -Tesseract language selection and download fully supported. ** -Added three more preset buttons for a total of seven. ** -Added "View" button to GUI to view PDF source file. ** BUG FIXES ** -Fixed issue where OCR layer placement from an already generated ** OCR layer was not correct if a document scale factor not equal ** to 1 was used (e.g. -ds 2). ** -Hopefully improved detection of out-of-family small rows. ** Modified textrows_remove_small_rows() in textrows.c and also added ** -rhmin option to remove any row w/text height less than a certain value. ** https://www.mobileread.com/forums/showthread.php?p=3693210#post3693210 ** https://www.mobileread.com/forums/showthread.php?p=2781815#post2781815 ** -Fixed some issues with -fr option. Added -ddr option (-ddr- to turn ** off double-row detect) to prevent tall figures from potentially ** getting broken up inadvertently. Modified k2proc.c to ** "rotate the trow" structure next the end of bmpregion_add() function. ** https://www.mobileread.com/forums/showthread.php?p=3549163#post3549163 ** -Fixed issue where the output margins weren't exactly right in the ** case of -mode trim. See k2settings_adjust_devdims_for_om() function ** in k2settings.c. (PM from Tex2002ans on MR.) ** -Added textrows_remove_defects() function to bmpregion_find_textrows() ** in bmpregion.c. This helped fix an issue where rows were not ** getting selected properly in a document due to small scanning ** defects. ** -Fixed implementation of -bp . This had been broken for some ** time. It was getting overridden by other features. See mods to ** k2proc.c and k2master.c. ** https://www.mobileread.com/forums/showthread.php?p=3743393#post37433930 ** -Fixed some bugs with autocrop (-ac)--particularly an issue that was ** causing it to hang for pages with a high aspect ratio--this caused the ** "goes idle after page 16" in this post: ** https://www.mobileread.com/forums/showthread.php?p=3957183#post3957183 ** See mods to k2bmp.c. ** -Fixed ghostscript (DLL API) to work with UTF-8 strings and also ** improved error processing. Also fixed a ghostscript DLL ** initialization error that sometimes caused a crash. ** https://www.mobileread.com/forums/showthread.php?p=3884300#post3884300 ** https://www.mobileread.com/forums/showthread.php?p=3886929#post3886929 ** -Option to auto-rename existing files rather than overwrite them. ** See updated -ow command-line usage (-ow+). Also will prompt for ** auto-renaming. ** -Expanded the usage description for -gtr. ** -Fixed where "u" menu option sometimes printed one too few linefeeds ** (pr1cmdopt() function in k2usage.c) ** -Fixed issue with a French accented "a" (and some other characters) ** having bad text selection dimensions (from 24 Aug 2019). See mods ** to pdfwrite.c. Certain chars had negative widths/heights in the ** Helvetica data set in pdfwrite.c. ** ** V2.51a 5 JAN 2019 ** - Fixed issue with MS Windows GUI not displaying PDF file info ** correctly (due to change in MuPDF behavior). ** ** V2.51 4 JAN 2019 ** - Fixed issue with Tesseract and SSE/AVX support ** - Added -ocrdpi option to optimize Tesseract v4.0.0 performance ** - Added code mods from: https://gitlab.com/axet/android-k2pdfopt/commit/163d9c9248f9ac00068ecb60af33f1ae7ec17540 ** ** V2.50 27 DEC 2018 ** ENHANCEMENTS ** - Compiled w/Tesseract v4.0.0, MuPDF v1.14, Freetype v2.9.1, ** Leptonica 1.74.4, OpenJPEG 2.3.0, Turbo JPEG 2.0.1, ** and PNG lib v1.6.35. ** - New OCR detection methods for Tesseract--can have Tesseract ** analyze entire lines (default) or even the entire page of text ** rather than just individual words as was done in previous versions. ** This should improve OCR accuracy when using Tesseract. ** See -ocrd option. ** - New debugging diagnostic for Tesseract training files: -lang ? ** (Or in MS Windows GUI, select Help -> Tesseract Training File Info) ** - The default device is now a 6-inch 300 dpi screen a la the ** Kindle Paperwhite 3+ ** - Added Kindle Oasis 2 device (7-inch 300 dpi screen) ** - Page straightening feedback added to MS WINDOWS GUI log ** - Fixed Android source code issues pointed out in this post: ** https://www.mobileread.com/forums/showthread.php?p=3655291#post3655291 ** MS WINDOWS ** - Compiled with gcc 7.3. ** LINUX ** - Compiled in Fedora 29 with gcc 8.2. ** MAC OSX ** - Compiled with gcc 8.2. ** ** V2.42 20 MAY 2017 ** NEW FEATURES ** - Bitmap output. The -o option now takes names ending ** in .jpg and .png to tell k2pdfopt to write out bitmaps instead of ** a PDF file. E.g. -o .png writes pages to srcfile0001.png, ** srcfile0002.png, etc. ** - Added de-warp (-dw) option to de-warp scanned pages a la ** ScanTailor. This is most useful for books that have been pressed ** against the glass of a copier and the scans still have some ** curvature (warping) in them. Or for photographs of pages taken ** at a slight angle. Uses Leptonica's built-in functions. ** ENHANCEMENTS ** - Improved autocrop (-ac) algorithm and added an "aggressiveness" factor ** to it. It is now called more efficiently as well (it used to be ** called multiple times per source page). ** - Added Kobo Aura One to built-in device list ** https://www.mobileread.com/forums/showthread.php?p=3513245#post3513245 ** - Improved straightening/deskew algorithm. Faster and uses weighted ** average / fine grain search (25 Mar 2017). ~20% faster for 64-bit. ** - Keyboard shortcut change: 'i' key now brings up file info. ** - PDF information shows dates and page size more clearly now. ** BUG FIXES ** - The -ocrout function works even if native output mode is specified. ** - Pressing ESC key when file overwrite is requested now results in a ** "no" response (it used to result in a "yes" response). ** ** V2.41 25 FEB 2017 ** ENHANCEMENTS ** - Updated libraries: MuPDF v1.10a, Tesseract 3.05.00, ** Leptonica 1.74.1, FreeType 2.7.1, libpng v1.6.28, zlib 1.2.11. ** - Compiled with MinGW/GNU C 6.3 (MS Windows version) ** NEW FEATURE ** - Added bitmap erosion filter (-er option) as an alternative to ** gamma correction (-g) to make text thicker/darker (idea from ** the original "Yet Another PDF to LRF Converter" program). ** ** V2.40 7 JAN 2017 ** NEW FEATURE ** - OCR processing is now multithreaded. This may provide a significant ** boost in OCR processing speed depending on how many CPU threads you ** have available. See the -nt option. As part of this upgrade, the ** OCR and total CPU time are now echoed for each run (using the ** clock() function). Include MS Windows GUI control for number of ** CPUS. Inspired by Harry Shamanski's Elucidate app. ** BUG FIXES ** - Fixed a couple of minor places in the usage where options were not ** in alphabetical order. ** - k2printf() flushes stdout. ** - MS Windows GUI crop selection does not ask for page range if ** document only has one page. ** - MS Windows GUI no longer exceeds size of desktop when maximized. ** ** V2.36 26 NOV 2016 ** ENHANCEMENTS ** - The -colorfg and -colorbg options can take an array of colors, in ** which case each array element is used for the next row of text. ** See usage. Suggested in a 5 Nov 2016 e-mail. ** BUG FIXES ** - If using -mode trim with -bp, blank pages are passed through to ** the output file as if -mode copy. From 11-16-2016 e-mail. ** - Minimum dpi allowed from k2settings_settings_set_margins_and_devsize ** is 1. ** - Fixed bug where -bp was not recognized if at the end of a command line. ** - -ocrvis b did not work before without 's'. Now it does. ** - Fixed Tesseract to correctly parse .traineddata files when certain ** locales are set. It was not correctly parsing Chinese .traineddata ** files on my Windows 10 system because of the locale. ** - Updated GOCR library to GOCR v0.50. Not convinced it is improved. ** - Re-ordered some command line options in usage to keep them alphabetical. ** ** V2.35 22 OCT 2016 ** ENHANCEMENTS ** - Mac OSX version compiled on new machine running Sierra with GCC 6.2.0. ** Also, the binaries are compressed with a newer version of UPX which ** is compatible with Mac OSX 10.12 Sierra. ** - Linux binaries compiled on CentOS 7.2 with GCC 4.8.5. ** - Windows binaries compiled with GCC 6.2.0 (MinGW). ** - Compiled with the latest versions of libpng (1.6.25), freetype (2.7), ** turbo JPEG (1.5.1). ** - I tried newer versions of MuPDF (v1.9a and v1.10 pre-release), but ** they broke more things than I was comfortable with, so I've stayed ** with MuPDF v1.8 for this release. ** NEW FEATURES ** - Added -jfc- option to prevent trying to join figure captions to the ** figures. ** http://www.mobileread.com/forums/showthread.php?p=3342105#post3342105 ** - Added new conversion mode, -mode concat, which keeps the output at ** the same dimensions as the source file and concatenates crop-boxes ** (red boxes) together--as many as can fit on each page without breaking ** them apart. ** - Added option -f2p -3 to support -mode concat. ** BUG FIXES ** - No longer crashes in native PDF output mode if there is no output. ** - Writes more informative message to screen if output file not written. ** - Makes sure output file can be opened for writing before proceeding ** with the conversion. Warns user if file cannot be opened. ** http://www.mobileread.com/forums/showthread.php?p=3343367#post3343367 ** - Correctly processes blank/empty pages in .djvu files. ** http://www.mobileread.com/forums/showthread.php?p=3350691#post3350691 ** - The -title option now substitutes the file name for %s or %b, like ** the -o option. ** http://www.mobileread.com/forums/showthread.php?p=3389292#post3389292 ** - The -grid overlap percentage is more precise now. ** - Warning message to use -fc- with -odpi, -fs, or -mag. Not sure this ** is the best way--should I just turn off -fc if those are specified? ** Reported in 22 April 2016 e-mail. ** Also: http://www.mobileread.com/forums/showthread.php?p=3354548#post3354548 ** - Wide-char (UTF-8) DJVU file names now work. ** http://www.mobileread.com/forums/showthread.php?p=3351085#post3351085 ** - Blank pages no longer cause an improper conversion with -mode trim. ** E-mail from 17 Mar 2016. ** - Clarified usage of -mode copy, explaining about gamma and contrast ** settings (how they are not reset to 1 with -mode copy). ** - In the function where the 2-column divider is detected, a special ** call is made to find_textrows() so that figure caption joining is ** disabled unless -jfc+ is specified. This helps 2-column detection ** work more reliably. ** http://www.mobileread.com/forums/showthread.php?p=3351808#post3351808 ** http://www.mobileread.com/forums/showthread.php?p=3342105#post3342105 ** MS WINDOWS GUI BUG FIXES ** - The GUI now correctly selects the "crop" conversion mode. ** ** V2.34b 21 MAR 2016 ** MS WINDOWS BUG FIXES ** - Native conversions correctly generate unicode file names ** http://www.mobileread.com/forums/showthread.php?p=3284264#post3284264 ** - GUI correctly provides right-click help on the output folder widgets ** ** V2.34a 19 MAR 2016 ** BUG FIX ** - Device "Kobo H2O" can now be correctly assigned. (E-mail 3-17-16) ** MS WINDOWS GUI BUG FIXES ** - Fixed "Additional Options" and "Command-line Options" read-only issue. ** - Fixed file list box overlapping the output folder box. ** - Fixed "Add File" selection of .pdf and .djvu files. ** ** V2.34 18 MAR 2016 ** ENHANCEMENTS ** - Compiled with the latest versions of MuPDF (1.8), Tesseract (3.04.01), ** libpng (1.6.21), freetype (2.6.2), turbo JPEG (1.4.2), and ** leptonica (1.72). ** NEW FEATURES ** - Added an option to scale the output font size to a fixed value: ** -fs . E.g. -fs 12 will scale the median font size ** in the source document to be 12 points in the converted document. ** See details in the command-line usage help. Supported in MS Windows ** GUI as well (new check box with up/down control next to it). ** http://www.mobileread.com/forums/showthread.php?p=3175758#post3175758 ** - The -o has two new format characters--%b for the base name of the ** source file (not including the path). %f for the folder the source ** file is stored in. ** - The -mag feature magnifies the text/images in the output file. It ** as the same effect as increasing -odpi. ** - New -y option is same as -ow (assumes "yes" on overwriting files). ** I just did this for compatibility with other programs that use this ** same option the same way (e.g. ffmpeg). ** - The -cbox and -ibox options now have a "u" appendix that applies to ** any unspecified pages. E.g. -cbox5-10 -cboxu ** This will apply to pages 5 through 10 and to all ** other pages. ** http://www.mobileread.com/forums/showthread.php?p=3204759#post3204759 ** - The -ci option can be used to specify an image for a cover page for ** the PDF file (doesn't work with native PDF output yet). ** http://www.mobileread.com/forums/showthread.php?p=3196307#post3196307 ** NEW MS WINDOWS GUI FEATURES ** - There is now a selection button for the output folder, which uses ** the new %b formatting in the -o option. Requested 6 Jan 2016 via e-mail. ** - Fixed several bugs involving the crop box GUI selection. ** - Crop box setting prompts for an overriding page range now, and then ** reports both the -cbox and the -m dimensions when complete. ** - Added code to detect Calibri vs. Arial font, but this didn't have ** the effect I wanted, so I commented it back out (in k2gui.c, ** search for "fontscale".) ** BUG FIXES ** - Fixed the documentation of -ocrcol. Said it was ignored now, but it ** is not ignored for documents that don't already have an OCR layer. ** - If the folder for the converted file(s) does not exist, it will be ** created (6 Jan 2016 e-mail). ** - Gets page bounding box from "Pages" object if not available for an ** individual page. (E-mail bug report, 10 Oct 2015). ** - Made -bpc / -jpg incompatibility more clear in the command-line usage. ** - k2printf correctly uses a semaphore to prevent threads from both using ** it at the same time. ** - MS Windows GUI: Fixed issues with multi-threading when starting the ** conversion process and starting the process to generate the bitmap for ** setting the crop boxes. Semaphores were not being correctly closed. ** - MS Windows GUI: File opening works more reliably now after multiple ** aborted conversions (new willusgui_file_open_ex function). ** - Postscript files are correctly processed now. Because ghostscript ** cannot pull specific pages from a PS file, they are now converted to ** PDF (using ghostscript) before any further processing by k2pdfopt. ** ** V2.33a 3 OCT 2015 ** BUG FIX ** - Fixed MS Windows GUI bug where preview page controls were missing. ** ** v2.33 27 SEP 2015 ** NEW FEATURES ** - Compiled with GCC v5.2.0 and MuPDF v1.7a (released May 7, 2015). ** The MuPDF upgrade involved modifying a significant amount of the ** MuPDF interface code in the willus library since Artifex changed the ** APIs on several functions, but the bulk of the logic did not change. ** I uncovered a bug in the pdf_dict_del() function as well (reported). ** - The -i option displays information about the source PDF file. Added ** to MS Windows GUI also. ** - Added -fr option to rotate wide-aspect-ratio figures to landscape. ** http://www.mobileread.com/forums/showthread.php?p=3060339#post3060339 ** - Added Kindle Paperwhite 3 (2015 release) and Pocketbook Basic 2 to ** dev list (from http://www.mobileread.com/forums/showthread.php?t=253579) ** - Smarter sorting of red regions on a multiple-column page. See ** pageregion_sort() function in pageregions.c. ** - New -ibox option has same format as -cbox, but these boxes are ignored ** by k2pdfopt--they are "whited out" in the source file. For native ** output, the contents may still be visible in the output. ** - The -neg option now attempts to only negate text passages to white ** on black and to leave figures alone. Use -neg+ to negate everything. ** http://www.mobileread.com/forums/showthread.php?p=3104536#post3104536 ** - Added option -ehl to erase horizontal lines in the document. Works ** exactly like the -evl option. ** - Added -author and -title options to specify the author and title of ** the output PDF. ** http://www.mobileread.com/forums/showthread.php?p=3112052#post3112052 ** - Added -px option to exclude a set of pages, e.g. -px 4,7,10-20. ** http://www.mobileread.com/forums/showthread.php?p=3112052#post3112052 ** - User can use color markings to tell k2pdfopt where to apply page ** breaks to the output file. ** http://www.mobileread.com/forums/showthread.php?p=3152988#post3152988 ** - The -? option can now be followed by a (wildcard) matching string to ** show the usage of a particlar option, e.g. -? -ws. ** ** NEW MS WINDOWS GUI FEATURES ** - Crop margins replaced by three crop-box entries. ** These entries can act as either -cbox or -ibox. ** - Added an "INFO" button to show information on the selected PDF file. ** - If a wildcard is specified for a PDF source file ** on the command line (e.g. *.pdf), and the GUI is launched, all matching ** entries are placed into the GUI file list. ** ** BUG FIXES ** - With notes options turned on (-nl / -nr), k2pdfopt will still search ** for multiple columns if no notes are found on the page. In addition, ** the -crgh option now more directly affects column divider finding. ** See textrows_remove_small_rows() call in bmpregion_find_multicolumn_divider(). ** http://www.mobileread.com/forums/showthread.php?p=3148589#post3148589 ** - Fixed multiple file select (broke when I converted to wide chars in v2.30). ** - Modified bmpregion_hyphen_detect() to be less strict about rejecting ** hyphens that aren't exactly centered. Also modified calculation of ** lcheight in bmpregion_calc_bbox()--see the function. ** http://www.mobileread.com/forums/showthread.php?p=3119501#post3119501 ** - The k2pdfopt web site and help pages work again from the help menu. ** - Turned off some debugging text from the bmp_autocrop2 function in k2bmp.c. ** - Not really a bug fix, but the command-line help is now shown in ** Courier New in MS Windows (a mono-spaced font). ** - In info_update() in wmupdf.c in the willus library, I check to see ** if I can resolve the Info dictionary. This checks to see if it can ** be parsed correctly. If not, I discard the dictionary. This was ** causing a bug that a user submitted to me in an e-mail on 15 April 2015. ** The users had a PDF file with a corrupt "Info" dictionary. ** - WPDFOUTLINE structures correctly freed. ** - MuPDF v1.7 stores ligatured characters differently than previous versions ** in its internal character arrays, so I had to compensate for this. ** ** v2.32 6 MAR 2015 ** NEW FEATURES ** - A new auto-cropping feature (-ac) has been added where k2pdfopt will ** attempt to crop out dark edges due to scanning / copying artifacts. ** There is a checkbox for it in the MS Windows GUI. ** - MS Windows GUI: graphical selection of crop margins has now been ** implemented. When you click the "Select Margins" button, k2pdfopt will ** overlay all of the pages in the "Pages to Convert" box and allow you ** to select a rectangular crop region with the mouse, which it will then ** use to populate the "Crop Margins" values. ** - The -ls option now takes an optional page range so that it can be ** applied to specified pages. A control was added to the MS Windows ** GUI for this. ** http://www.mobileread.com/forums/showthread.php?p=3016119#post3016119 ** - Improved the context sensitive help in the MS Windows GUI. ** ** OTHER ** - Update the list of devices and their dimension from the information ** collected at http://www.mobileread.com/forums/showthread.php?t=253579 ** - Clarified -cbox usage. ** http://www.mobileread.com/forums/showthread.php?p=3048458#post3048458 ** - Added source code flow description to k2pdfopt.c. (2-18-15 e-mail). ** ** BUG FIXES ** - Fixed a bug with notes in the margins (-nl/-nr options), checking ** for notesrows->n==0 (was causing a crash). E-mailed on 13 Jan 2015. ** - Clarified usage for -m option. ** - Added MS Windows GUI confirmation of Tesseract initialization (in the ** conversion dialog box). ** http://www.mobileread.com/forums/showthread.php?p=3016119#post3016119 ** - -cbox- did not work correctly if was beyond the last page of ** the source PDF. This is fixed. ** http://www.mobileread.com/forums/showthread.php?p=3047145#post3047145 ** - Fixed a preview error noted by a mobileread member. Was due to ** not correctly clearing a WTEXTCHARS structure in ** ocrlayer_bounding_box_inches(). ** http://www.mobileread.com/forums/showthread.php?p=3027213#post3027213 ** - Fixed some issues with masterinfo_should_flush() where it wasn't ** correctly figuring out the next page. ** - Fixed issues selecting the text in text edit boxes as they gain focus ** (either through tabbing or mouse clicks). ** - The number of inserted rows added by textrows_find_doubles() is now ** limited to a reasonable number. This was going out of control in ** one oddball case and (I think) causing k2pdfopt to crash. ** ** v2.31 27 DEC 2014 ** NEW FEATURES ** - Added -ppgs option to post process output with ghostscript pdfwrite ** device. This can improve text selection when there are overlapping ** cropped regions. Recommended in 7 Dec 2014 e-mail. ** - In MS Windows GUI, added context-sensitive help. If you right-click ** most controls in the main window, you will get a dialog box explaining ** what that control does. ** NEW DEVICES ** - Added separate Kindle Paperwhite 2 resolution: 710x960. ** From 12 Dec 2014 e-mail. ** - Added kindle voyage. ** BUG FIXES ** - Erase vertical lines correctly works from MS Windows GUI again. ** OTHER ** - Compiled with MuPDF v1.6 library. ** - Separated out functions in wmupdf.c that do not depend on MuPDF. ** Those are now in wpdf.c. This helps the KOReader build. ** https://github.com/koreader/koreader-base/pull/290 (2 Dec 2014) ** Also e-mail on 7 Dec 2014. ** - No longer assumes that WIN32 = GUI. To compile so as not to use ** the WIN32 API, define NO_WIN32_API from the compile command line. ** ** v2.30 26 NOV 2014 ** NEW FEATURES ** - MS Windows version (including GUI) now supports wide-character file ** names (UTF16), so names with non-ASCII characters in them (e.g. ** Chinese characters) should be correctly handled. ** - Added -colorfg and -colorbg options to adjust the foreground (text) ** and background colors of the output file (only works for bitmapped ** output files--doesn't work for native PDF output). You can even use ** a background bitmap (which will be tiled). Suggested in mobileread.com ** private message, 24 Nov 2014. (beta4). ** - Windows versions compiled with gcc 4.9.2 (MinGW). ** ** MS WINDOWS GUI UPDATES ** - As mentioned above, the MS Windows GUI supports wide characters now. ** - Added "Get Folder" button. (beta1) ** http://www.mobileread.com/forums/showthread.php?p=2945278#post2945278 ** ** BUG FIXES ** - Marking corners now works for color bitmapped output. ** http://www.mobileread.com/forums/showpost.php?p=2962214&postcount=935 ** - Compiled with OpenJPEG v2.1.0--fixed some cases with incorrectly ** reading PDF files. (beta2) The typical symptom is reported as ** "JPX stream not read coorectly." Issue reported in 2 Nov 2014 e-mail. ** - Removed "wrectmaps->n=..." debugging output from k2ocr.c. (beta3) ** - Clarified intended use of -rt vs. -ls. ** http://www.mobileread.com/forums/showthread.php?p=2938230#post2938230 ** - Fixed buffer overrun in k2gui_cbox_set_pages_completed(). ** Reported via e-mail on 10 Nov 2014. ** - Preview mode correctly turns on color for native PDF output ** (also turns off gamma). If the user tries to turn off color output ** in the MS Windows GUI while native PDF output is checked, a dialog ** box will pop up explaining why it won't uncheck. ** http://www.mobileread.com/forums/showthread.php?p=2930855#post2930855 ** - Preview mode correctly puts up alert box when source file is not found. ** - File open / folder open puts up alert box if file or folder is not found. ** - The -ocrsp+ command-line option is now correctly recognized. ** http://www.mobileread.com/forums/showthread.php?p=2902672#post2902672 ** - GOCR is correctly used if Tesseract cannot be initialized. ** ** v2.21 25 JUL 2014 ** - Compiled with MuPDF v1.5 (a highly recommened, mostly-bug-fix ** upgrade recommended by the MuPDF folks). ** ** v2.20 25 JUL 2014 ** NEW FEATURES ** - Added k2ocr_ocrwords_get_from_ocrlayer() and supporting functions to ** to more intelligently extract all of the OCR-layer text from a major ** ("red box") region (rather than parsing for the words graphically, ** the OCR layer is queried for any words which are within the box). ** This should eliminate the need to use the -ocrcol option on PDF ** files which already have their own text layer. ** - There is a new option optimized for PDFs that have notes in the ** left or right margins. This option (-nl for notes in the left ** margin or -nr for notes in the right margin) tells k2pdfopt to ** look for notes and intersperse them with the main text. The notes ** can even alternate, e.g. left margin for odd pages and right margin for ** even pages would be -nlo -nre. Works well, for example, with the ** attachment from this post: ** http://www.mobileread.com/forums/showthread.php?p=2539093#post2539093 ** E.g. k2pdfopt -p 18-23 -cg 0.05 -nl -sm -f2p -1 ebook.pdf ** - The word spacing (-ws) option now defaults to -0.20 (the old default ** was 0.375). When a negative value is given, an automatic word spacing ** detection algorithm now used to break apart words in lines. The ** algorithm will try to choose a natural word spacing value, with ** the minimum allowed being the absolute value of the setting (e.g. ** 0.2 for the default). If you want k2pdfopt to aggressively break ** lines (e.g. break apart long words if they don't fit on a line), ** use a smaller absolute value, e.g. -ws -0.01. ** You can use a positive value for the older style of line breaking, ** and the lines are only broken where a gap exceeds that fraction ** of the height of a lower-case 'o'. There is also a new Windows ** GUI checkbox for this option. ** - Entire rows of text within the OCR layer can either have the words ** within rendered individually (the default and original behavior ** of k2pdfopt), or the entire row can be rendered at once with ** spaces used between each word. This may improve the text selection ** behavior for certain readers. The option that controls this is ** -ocrsp: ** -ocrsp puts one space between each word in the row. ** -ocrsp+ puts multiple spaces between each word in the row in ** order to better position the words since k2pdfopt ** does not typically match the exact font used by ** the source document when doing the OCR layer (it ** always uses arial). ** -ocrsp- reverts to the original (default) behavior. ** To test this, you can temporarily make the OCR layer visible by using ** this option: -ocrvis st ** Thanks to a 19 Jan 2014 e-mail for inspiring this change. A user ** was having difficulty selecting text on their Sony PRS-T1 (the ** entire row would select whenever they pressed on a single word). ** - The -m and -om options now can use units, e.g. pixels, inches, cm, ** and "s" for page/screen size. In addition, -m can use the other ** units used by -cbox, -w, and -h ("t" and "x"). ** - The new "x" unit of measure corresponds to the OCR Layer bounding ** box, e.g. -cbox 0x,0x,1x,1x will correspond to a crop box that ** matches the bounding box of the OCR layer. This can be used by ** the -w, -h, and -m options as well. See -h usage. Thanks to ** markom for suggesting this (quite a while ago!): ** http://www.mobileread.com/forums/showthread.php?p=2212714#post2212714 ** - New option -to[-] for text-only output, removes all figures as ** determined by a height limit (see -jf option). Use with -bp m ** to avoid text selection issues if using in conjunction with native ** output. Requested in a 9 May 2014 e-mail. ** - If -wt+ is specified for the white threshold, all pixels >= the ** specified value will be painted pure white (255). ** https://github.com/koreader/koreader/pull/549 ** - While not a perfect work-around, large, stylized first letters ** which frequently begin a book chapter (typically the height of 2 ** or 3 normal-sized text rows) are now detected when wrapping text ** lines so that the lines adjacent to them are more-or-less correctly ** wrapped. Still needs improvement. ** ** MS-WINDOWS GUI ENHANCEMENTS ** - If the custom buttons are not used, a "2-column" and a "Fit Width" ** button are automatically assigned. ** - New MS-Windows GUI check box for -bp m option (Avoid text select overlap). ** - New MS-Windows GUI check box for defects (sets -de 1.5). ** - New MS-Windows GUI check box / text box for line break setting (-ws option) ** - The last settings (other than the custom button presets) are remembered ** between settings (stored in the K2PDFOPT_CUSTOM0 env var). ** (4-15-14 e-mail) ** - There is now a "Restore Defaults" button since k2pdfopt remembers ** its last settings. ** - New option for GUI: -rls[+|-]. Forces/disables restoration of last ** settings from K2PDFOPT_CUSTOM0 environment variable. ** - Environment variables related to the MS Windows GUI are only read ** and set through Windows calls (not through getenv() and putenv())-- ** this was causing problems in trying to clear them. ** - Command-line options that don't impact the GUI are put into the ** "additional options" box upon launch. ** - For file overwriting, the user is now given a "Yes to All" option ** and a "No to all" option. ** - There are two new menu options--to save and restore the settings ** (stored in environment variables) to and from a file. This is done ** in the k2gui_save_settings_to_file() and ** k2gui_restore_settings_from_file() functions in k2gui.c. ** http://www.mobileread.com/forums/showthread.php?p=2865852#post2865852 ** - If the output file cannot be opened (e.g. because another application ** already has it open), a message box is shown informing the user ** rather than just quitting the program. ** http://www.mobileread.com/forums/showthread.php?p=2862339#post2862339 ** ** BUG FIXES ** - Tabbing between crop margin text fields in the MS Windows GUI keeps ** the entire text field selected even when you change a value before ** tabbing. ** - Fixed bug in window positioning at startup. ** - Fixed bug where -ds did not get properly applied in native mode. ** http://www.mobileread.com/forums/showthread.php?p=2828182#post2828182 ** - Fixed bug where tesseract was incorrectly initializing languages ** that do not have CUBE/COMBINED data (e.g. Russian). If that ** initialization fails, it now tries CUBE-only (and then no CUBE at all). ** This required a couple minor mods to the Tesseract library itself, ** which are included in the latest k2pdfopt source code distribution. ** http://www.mobileread.com/forums/showthread.php?p=2859736#post2859736 ** ** SOURCE CODE MODIFICATIONS ** - The bmpregion_add() function and some others use a new parameter ** data structure rather than having so many arguments passed to them ** (ADDED_REGION_INFO). ** - Copied web text at top of main k2 page to k2pdfopt.c intro. ** ** v2.18 14 JUN 2014 ** - Fixed problem when scaling sometimes gets out of control with tall ** regions. Was causing excessively large bitmaps to be allocated ** which would sometimes run the system out of memory. Search for ** "2.18" in k2proc.c. ** http://www.mobileread.com/forums/showthread.php?p=2846916#post2846916 ** ** v2.17a 2 JUN 2014 ** - Fixes MuPDF v1.4 problem where it was not correctly using MS Windows ** system fonts (introduced in v2.17). ** - Compiled w/gcc 4.8.3. ** ** v2.17 17 MAY 2014 ** ENHANCEMENTS ** - Compiled with the latest versions of MuPDF (1.4), Turbo JPEG (1.3.1), ** libpng (1.6.10), and freetype (2.5.3). ** ** v2.16 03 MAY 2014 ** BUG FIXES ** - Avoid zero-value return from masterinfo_break_point(). ** (5-2-14 e-mail). ** - TOC positioning fixed when source pages aren't large enough to ** cause a new destination page (see k2publish.c). Also, wmupdf ** output now correctly handles UTF-8 outline/TOC titles. ** http://www.mobileread.com/forums/showthread.php?p=2815504#post2815504 ** ** v2.15 22 MAR 2014 ** ENHANCEMENTS ** - The -cbox option usage has been rewritten and hopefully clarified. ** It is a very powerful and useful new option as of v2.10. ** ** BUG FIXES ** - Specific variable to track Tesseract initialization (it was sometimes ** getting missed if it was turned on after a conversion in the GUI). ** - Mode selection works in GUI (wasn't correctly selecting "fitpage") ** and in text menu again (1-7-14 e-mail). ** - Fixed memory leak in bmp_detect_vertical_lines() in k2bmp.c. ** http://www.mobileread.com/forums/showthread.php?p=2737816#post2737816 ** - Fixed several memory leaks--made sure bmpregion_free() is called ** for each declared BMPREGION, and also patched wmupdf.c to fix two ** memory leaks. The most significant memory leak was in the dtcompress.c ** library function, though, which has also been fixed. ** http://www.mobileread.com/forums/showthread.php?p=2752370#post2752370 ** - Fixed incorrectly formed #ifdef HAVE_OCR_LIB in k2publish.c. Thanks ** to user facut on mobileread.com, who e-mailed me about this on ** 21 March 2014. ** ** v2.14 31 DEC 2013 ** - Compiled using dtcompress.c module in willus library which avoids ** requiring my custom modification to zlib. Thank you to Dirk Thierbach ** for this. ** - willus lib modules use more standard include callouts for MuPDF and ** DjVu. ** - Added CMakeLists.txt files to source distribution (also from Dirk). ** - Correctly re-compiled Win32 build (wasn't done correctly in v2.13). ** ** v2.13 30 DEC 2013 ** ENHANCEMENTS ** - Added kobo mini to device profile: ** http://www.mobileread.com/forums/showthread.php?p=2704451#post2704451 ** - Applied patches from KOReader team: ** http://www.mobileread.com/forums/showthread.php?p=2701292#post2701292 ** >> rectmap.patch.txt (only on if K2PDFOPT_KINDLEPDFVIEWER defined) ** >> indent_calibration.patch.txt (with modifications so that the ** example case that resulted in the patch still works, but so ** that most of my regression tests don't break.) ** See this link: https://github.com/koreader/koreader/issues/305 ** >> word_gap_calculation.patch.txt, auto-gap not on by default per ** this post: ** http://www.mobileread.com/forums/showthread.php?p=2708795#post2708795 ** >> static.patch.txt ** >> debug.patch.txt ** - The -ws option can take a negative value to turn on automatic word ** spacing from the KOReader team. ** - can have even 'e' and odd 'o' specifiers, e.g. ** -p e (process all even pages numbers) ** -p 1-100e,1-100o (process all even pages from 1-100, then all odd ones.) ** - Cropboxes and gridded areas can now be processed fully into separate ** columns if desired. ** - The -cbox option now takes an optional page list to specify a specific ** range of pages to apply the crop box to. e.g. ** -cbox1-99o applies the crop box to pages 1,3,5,...,99. ** - The -cbox option takes units as well, so you can specify the crop box ** size in pixels, inches, cm, or a fraction of the source page size. The ** units work the same as for the -w and -h options. ** - GUI is smarter about creating command-line versions of margins ** when they are all the same. ** - New "-bp m" option will force page break at end of each major (red-box) ** region. This can improve text selection problems in native mode by ** avoiding the text selection overlap that can occur when multiple ** cropped source regions are placed on to a single output page. ** (12/13/13 e-mail). ** ** BUG FIXES ** - Error checking done on K2PDFOPT_WINPOS variable. ** http://www.mobileread.com/forums/showthread.php?p=2700958#post2700958 ** - Makes sure "(all)" doesn't get put in for the -p option (GUI). ** ** v2.12 30 NOV 2013 ** BUG FIXES ** - No longer writes k2pdfopt_out.png when previewing in the GUI. ** - Removed DLL dependencies from 64-bit Windows compile. ** ** v2.11 28 NOV 2013 ** BUG FIXES (MW WINDOWS GUI) ** - Several routines in k2gui_cbox.c which are called from k2file.c ** during the conversion were not correctly working during a preview ** and were resulting in garbage sometimes being sent to the ** desktop screen if the preview button was clicked after a file ** conversion. This has been fixed. ** ** v2.10 23 NOV 2013 ** NEW FEATURES ** - The PDF "Outlines" tree (often called "bookmarks" by PDF viewers) ** that helps you navigate the PDF file and is usually shown in the left ** pane of the PDF viewer is now preserved in the converted file. Or ** you can create your own bookmarks from a simple text file if your ** PDF source file doesn't have one (or if you want to change it). ** See the -toc, -toclist, and -tocsave command-line options. ** (toc = Table of Contents.) Destination page breaks are forced ** at outline anchor pages by default (see -bp option). ** - A new -cbox option allows you to specify a crop box to be applied ** to each page. You can specify more than one, and each separate ** crop box will be rendered to a different output page, similar to ** the way the -grid option works. See -cbox in the command usage. ** Using -mode crop with -cbox, you can crop a source PDF file to ** a destination PDF file. You can specify different crop boxes ** for even and odd pages, as well. ** - The -bpl option now allows you to specify a list of source pages ** where destination page breaks will be forced. ** - Three new modes: -mode trim causes the source page to be trimmed and ** the destination to be sized to the trimmed source. -mode fitpage ** is similar, but squeezes the trimmed source page into the specified ** device output screen size. -mode crop is a complement to the -cbox ** option and causes each cropped box to be placed on a new page the ** size of the cropped box. ** ** ENHANCEMENTS ** - Windows versions are compiled with gcc 4.8.2. ** - The Win64 binary is now compressed with UPX 3.91w which finally is ** able to compress the Win64/PE format. ** ** BUG FIXES ** - In native output, consecutive streams now delimited by white space. ** http://www.mobileread.com/forums/showthread.php?p=2655550#post2655550 ** - Pages with no "/Contents" entry are correctly handled. ** - Re-wrote masterinfo_break_point() to make use of ** bmpregion_find_textrows() so that decisions on where to break ** pages in the "fitwidth" mode should be more consistent and also ** will be affected by the -gtr option. ** http://www.mobileread.com/forums/showthread.php?p=2686067#post2686067 ** - Removed last vestiges of -pi option (interactive menu 'w' option ** was incorrectly still using it). ** - The vert_line_erase() function in k2bmp.c correctly handle the ** cbmp pointer when it is an 8-bit bitmap now. ** - Fixed a flow problem in k2file.c (k2pdfopt_proc_one() function) ** which was causing the GUI preview not to work with -mode copy. ** - The textrows_remove_small_rows() function no longer includes ** figures (REGION_TYPE_FIGURE) when doing statistics on the row ** heights. ** ** v2.03 21 SEP 2013 ** ENHANCEMENTS ** - MuPDF library now uses the Sumatra versions of pdf-font.c and ** pdf-fontfile.c so that it correctly checks Windows system fonts ** for non-embedded fonts in the PDF file. ** ** BUG FIXES ** - Native mode is correctly turned off as the default setting. ** - Native mode output works correctly from the MS Windows GUI. ** - Check boxes made consistent (native/wrap/OCR) with quick ** sanity check call. ** ** v2.02 16 SEP 2013 ** ENHANCEMENTS ** - The main bitmap resampling function in the willus library, ** bmp_resample(), now uses an alternate fixed-precision version (with ** virtually no accuracy loss) on non-64-bit compiles, where it has ** been tested to be significantly (50% to 100%) faster (on 64-bit ** modern Intel CPUs, the floating-point version is actually fastest). ** This should improve k2pdfopt performance on 32-bit and ARM ** implementations, including on the KOReader, for example. ** The resample routine also checks to see if no size change is ** required--if so, it does a simple bmp_copy() call instead of resampling. ** Thanks to Xin Huang (Chrox) for this suggestion and the initial ** implementation. ** ** GENERIC BUG FIXES ** - echo_source_page_count correctly initialized to zero. ** - hyphen detection turned on (has been disabled since at least v1.66). ** (See bmpregion_hyphen_detect() function.) ** - gap_sorted variable in textwords_add_word_gaps() is no longer static. ** - Changed kindle paperwhite dims to 658 x 889 based on 9-15-13 e-mail ** feedback. ** ** MS WINDOWS GUI BUG FIXES ** - Minimize/Maximize/Restore/Close buttons added to GUI convert dialog ** box so that users can get k2pdfopt out of the way while it is working. ** - GUI Conversion box no longer stays on top of other windows. ** - Text displayed in GUI conversion box is limited in number of lines ** until the very end to speed up scrolling on long conversions. ** - The GUI file list is not cleared the file list after a successful ** conversion anymore in case the user wants to try some different ** command-line options for the conversion. ** ** v2.01 4 SEP 2013 ** BUG FIXES ** - Fixed significant memory leak in wmupdf.c (added wtextchars_free call in ** wtextchars_text_inside()). Was causing k2pdfopt to crash on conversions ** of large PDF files. ** - Better feedback after preview button is pressed. Also works correctly ** when re-sizing the window during a preview. ** - Inserted 0.1-second sleep at end of k2gui_preview_start()--seems to ** prevent occasional problems with the preview. ** ** v2.00 2 SEP 2013 ** MAJOR NEW FEATURES ** - Added a GUI for the MS Windows version. In MS Windows, k2pdfopt now ** has a dual-mode operation: it runs as a GUI or a console program ** depending on the situation and command-line arguments: Use -gui ** to force the GUI, -guimin to minimize the GUI, use -gui- for no GUI. ** - Window size / position stored in K2PDFOPT_WINPOS env. var. ** - Custom presets stored in K2PDFOPT_CUSTOM env. vars. (=1-4). ** - As part of the GUI, printf() and aprintf() calls switched to k2printf(). ** - For PDF documents that contain text (not scanned), I now use ** the MuPDF text analysis functions so that even on re-flowed documents ** (i.e. not using native PDF output) I can include the text from ** the source for searching and highlighting without resorting to OCR. ** I treat this as a sort of "virtual OCR". You explicitly turn it ** on with -ocr m (m for MuPDF), though it is on by default since it ** adds very little time to the PDF file processing. It is ** not necessary if you are using native PDF output (-n). ** Requested at http://www.mobileread.com/forums/showthread.php?p=2513005#post2513005 ** ** OTHER NEW FEATURES ** - Windows 64-bit version compiled with gcc 4.7.3 (on core i5-670). ** The optimized Windows 32-bit version still uses gcc 4.6.3 (it does ** not compile correctly in gcc 4.7.3 on my MinGW platform due to some ** issue with with the Tesseract library). The generic Windows 32-bit ** version uses gcc 4.7.3. ** - Compiled with latest versions of MuPDF (v1.3), OpenJPEG, FreeType, ** Turbo-JPEG, PNG, and Z libraries. ** - A new option, -bmp, will write out a preview bitmap of the specified ** page, e.g. -bmp 2 will write out the 2nd converted page to the file ** k2pdfopt_out.png (sorry, the bitmap file name cannot be specified). ** - New option -sp simply echos the page counts of the source files ** and exits. ** - OCR text can be written to a separate text file now with the ** -ocrout option. E.g. -ocrout %s.txt will write the OCR from ** myfile.pdf to myfile.txt. ** Requested at http://www.mobileread.com/forums/showthread.php?p=2558909 ** and via e-mail on 3 June 2013. ** ** SOURCE CODE RE-WRITE ** - I re-did the way a number of data structures store information ** (BMPREGION, TEXTROWS) and eliminated the BREAKINFO structure. ** This simplifies several function calls and makes it a bit more ** straightforward to follow the flow of the software and to expand ** the functionality. ** - The way gaps are placed between regions of text has been ** substantially re-done. I have done extensive regression testing, ** and overall I think things are improved, but people may find ** minor changes in behavior from v1.66. ** ** BUG FIXES ** - In k2pdfopt_settings_set_margins_and_devsize(), doesn't flush the ** master bitmap unless the width changes. ** - Mode -f2p -2 (put every new red-box region onto a new page) no ** longer add any gap between these regions, so that the page size ** is exactly the region size. ** - When expected numeric arguments on the command-line don't occur ** (e.g. -g is not followed by a numeric value), that parameter is not ** set, a warning is printed, and the non-numeric argument is processed ** as if it were the next argument on the command line (pointed out ** by Jens Wallauer, 6-12-13). ** - Modified tessedit.cpp in Tesseract library to automatically detect ** if .cube files are present so that it uses them even for multiple ** language specification. Correspondingly modified ocrtess_init() ** in willus library. ** - v1.65 was crashing after not finding the Tesseract language file ** and switching to GOCR: ** http://www.mobileread.com/forums/showpost.php?p=2483500&postcount=393 ** This is fixed (tesseract status now stored in global static variable ** in k2ocr.c). ** ** NOTES ** - This version still does not have unicode / wide character file ** name handling in MS Windows (file names are stored in 8-bit ASCII ** strings, as before). ** ** v1.66 23 JUNE 2013 ** NEW FEATURES ** - Option -bp+ will break pages between the green regions as marked by ** the -sm option (feature request from 6-2-13 e-mail). ** ** BUG FIXES ** - Option -mode def correctly sets margins to zero instead of 0.25. It ** also now correctly turns off native mode and landscape mode. ** - Prevents infinite OCR font sizes from being written to PDF file. ** (5-26-2013 private message at mobileread.com about a problem ** converting a DjVu file.) ** - Fixed array-out-of-bounds issue when searching for the column ** divider, particularly with blank pages. See "v1.66 fix" in ** k2proc.c. Fixes these reported issues: ** 1. 4-27-13 e-mail (alice.pdf) ** 2. http://www.mobileread.com/forums/showthread.php?p=2558185 ** 3. 7-10-13 e-mail (failed on pages 1, 2, and 14). ** - Fixed breakinfo_find_doubles() in breakinfo.c to avoid an infinite ** loop situation. See "v1.66 fix" notes. Fixes this reported issue: ** 1. 7-7-13 e-mail (pages 98 and 187 failed). ** - Fixed bug in MuPDF library when fz_ensure_buffer() is called with ** buf->cap==1 (results in infinite loop). Reported in 6-18-13 e-mail ** on a conversion that hung with -mode fw. ** ** v1.65 6 APRIL 2013 ** NEW FEATURES / OPTIONS ** - Added Kobo Glo and Kobo Touch device settings. ** (http://www.mobileread.com/forums/showpost.php?p=2441354&postcount=336) ** - Re-vamped the bmp_source_page_add() function so that the ** logic that breaks the page out into displayable rectangular ** regions can be used in other places (e.g. by the OCR fill-in ** function). ** - Added option -ocrcols which sets the max number of columns for ** processing with OCR (if different from the -col value). You would ** use this if you want to OCR a PDF file using -mode copy, but ** the file has multiple columns of text. ** (http://www.mobileread.com/forums/showpost.php?p=2442523&postcount=341) ** - Added option -rsf (row-split figure-of-merit) which controls a ** new algorithm which goes back and looks for rows of text which ** should be split into two (or three) separate rows. This is meant ** to help catch those cases where k2pdfopt should have split apart ** two rows of text but did not because of a small amount of overlap. ** See breakinfo_find_doubles() in breakinfo.c. ** ** LIBRARY UPDATES ** - Compiled with latest versions of major libraries: MuPDF 1.2, ** DjVu 3.5.25.3, FreeType 2.4.11, Turbo JPEG 1.2.1, PNG 1.5.14, ** Z-lib 1.2.7. ** - Linux version now compiled with gcc 4.7.2 in Ubuntu 12. ** ** TWEAKS ** - Clarified usage for -vb in k2usage.c ** - Changed "destination" to "E-reader" in places on the k2 interactive ** menu and device menu. ** - Put "disclaimer" in OCR usage which clarifies the purpose. ** - Default crop margins are now zero (was 0.25 inches). This was ** confusing too many people. ** (http://www.mobileread.com/forums/showpost.php?p=2456032&postcount=352) ** - In bmp_region_vertically_break(), different width regions and ** regions with different ending/starting row heights cause ** a vertical gap to be inserted in the output. ** ** BUG FIXES ** - Call k2pdfopt_settings_sanity_check() once per source document. ** This fixes a crash when converting multiple files. ** (Certain vars weren't getting correctly initialized on the ** 2nd, 3rd, etc. conversion files.) ** (http://www.mobileread.com/forums/showpost.php?p=2409726&postcount=317) ** - Fixed array-out-of-bounds access in k2proc.c ** (bmpregion_find_multicolumn_divider function) which occasionally ** caused k2pdfopt to terminate abnormally (typically when converting ** mostly blank pages). ** (http://www.mobileread.com/forums/showpost.php?p=2456548&postcount=356) ** - Fixed k2pdfopt_proc_one() in k2file.c so that native PDF output ** is turned off if the source file is not PDF (e.g. DjVu conversion). ** - Fixed spacing between regions with -vb -2 or -vb -1 (gap between ** pages where new chapter starts, for example--font change, etc.). ** (http://www.mobileread.com/forums/showpost.php?p=2373550&postcount=292) ** - Minimum width in vertical line detection is now 1 pixel. ** (http://www.mobileread.com/forums/showpost.php?p=2452356&postcount=345) ** - Better diagnostic output on TESSDATA_PREFIX env var. ** - Fixed native PDF output so that scientific notation is not allowed ** in PDF clipping commands. This was causing native conversions ** not to work correctly in some cases. ** (http://www.mobileread.com/forums/showpost.php?p=2467063&postcount=371) ** ** v1.64a 5 JAN 2013 ** - Fixed bug in Native PDF output introduced in v1.64. ** (stream_deflate function in wmupdf.c) ** ** v1.64 4 JAN 2013 ** - Native PDF output changed so that source pages are converted ** to XObjects (Form type). This should be much more robust when ** putting contents from multiple source pages onto a single ** destination page. ** - Added profile for Kindle paperwhite. (-dev kpw) ** - The fontdata.c file in willus lib has been reduced to only one ** font in order to reduce the size of the k2pdfopt binaries since ** k2pdfopt only uses one font for the -sm option. ** - The page width and height can now be specified in terms of ** the trimmed source page width and height. Use 't' for the ** units, e.g. -w 1t -h 1t. This would typically be used with ** the -mode copy and/or -grid options. ** - The -bp option can now take a numeric argument (inches) to ** insert a gap (of that many inches) between each source page. ** - There is now an interactive menu option for selecting the ** OCR language training file (Tesseract OCR only). ** - Fixed memory leak in bmpregion_find_multicolumn_divider(). ** - Fixed default value for -col in usage. ** - Clarified -ocrlang usage. ** - Compiled Linux versions with -static and -static-libstdc++ ** to hopefully reduce shared library incompatibilities. ** ** v1.63 20 DEC 2012 ** - Now supports OCR in multiple languages using Tesseract with ** Unicode-16 text encoding so that the OCR text can be copy / pasted ** into Unicode-aware applications. ** To select the language for OCR: -ocrlang (or -l). ** Examples: -ocrlang eng (English) ** -ocrlang fra (French) ** -ocrlang chi_sim (Chinese simplified) ** [Note that using the -ocrvis t option will not show the ** OCR text correctly for any character above unicode value ** 255 since I do not use any embedded fonts, but the text ** will convert to the correct Unicode values when copy / pasted.] ** - Tesseract "cube" files are automatically checked so that the ** best OCR detection mode is selected. ** - Updated wmupdf.c in willus lib to account for both CropBox ** and MediaBox to determine page origin (fixes user-reported ** bug in native output mode for pages with non-zero MediaBox ** origins). ** - Made changes to multicolumn divider finder to improve the ** speed. Includes counting pixels by column rather than row, ** making use of trimmed column boundaries, and using a ** 2-D pixel count array. Resulting code runs ~ 5 - 15% faster ** on average in my regression tests. ** - Removed dprintf() and fsincos() from willus lib to prevent ** minor compiling problems on some platforms. Fixed some other ** minor issues for kindlepdfviewer. ** ** v1.62 15 NOV 2012 ** CODE RE-ORGANIZATION ** - This was largely motivated by the kindlepdfviewer app which ** now uses k2pdfopt code: ** https://github.com/hwhw/kindlepdfviewer ** - Moved the bulk of the code to a k2pdfopt library consisting ** of 21 source modules. The main k2pdfopt.c program is now ** only about 150 lines. ** - The willus and k2pdfopt libraries have options to compile ** so that they don't access any other 3rd-party library calls. ** Seach for "THIRD PARTY" in willus.h. ** - There is now a K2PDFOPT_KINDLEPDFVIEWER macro which can ** be defined in k2pdfopt.h to make the code more friendly for ** the kindle viewer app. For example, compiling kview.c with ** all third-party libs disabled and K2PDFOOPT_KINDLEPDFVIEWER ** defined results an executable about 300 KiB in size in Windows. ** ** NEW FEATURES ** - New -neg option inverts the output to be white on black ** ("night mode"). Note that figures and photographs are not ** distinguished, so they will also be inverted. ** - Native mode now defaults to off and -n turns it on no matter ** what (disables text wrapping and OCR). ** ** BUG FIXES ** - Setting re-initialized before final call to k2parsecmd. ** - Fixed minor memory leak in willus lib (vector_nd_free). ** - Safeguard against possible infinite loop in willus lib ** (bmp_more_rows). ** - Word spacing history initialized before each new document. ** - The XObjects dictionary is now merged when crop boxes from ** multiple source pages are put onto the same output page. ** This should improve the native PDF output in certain cases. ** ** v1.61 3 NOV 2012 ** - Some user menu options were not taking effect. This is fixed. ** - Compiled with tesseract 3.02.02 and Leptonica 1.69. ** - User "help" menu fits in 25-line window now. ** ** v1.60 1 NOV 2012 ** MAJOR NEW FEATURES ** - Option to keep native PDF contents (see full details under ** command-line option usage under -n for "native"). ** User menu option "n". This is only available if k2pdfopt ** has been compiled with MuPDF. ** - New grid option grids the source page into a fixed number ** of rows and columns with some overlap. E.g. -grid 2x2. ** This option works well with the -n option above. ** This is under user menu option "mo". ** ** NEW FEATURES / CHANGES ** - The new -dev option selects device profiles (there are ** only two so far: kindle2 and nook simple touch). ** The user menu option for -dev is "d". ** - The new -mode option selects a particular mode of operation ** that is shorthand for a number of options. There are ** currently four modes: default (def), copy, 2-column (2col), ** and fw (fit width). E.g. "-mode copy" makes k2pdfopt behave ** just like my "pdfr" program (thus eliminating the need for me ** to distribute pdfr separately). And "-mode fw" makes k2pdfopt ** behave very much like sopdf's fit width option. For more details, ** see the full command-line usage (k2pdfopt -?). ** The user menu option for -mode is "mo". ** - The user input system was revamped so that the menu options ** build up a set of command-line arguments that are passed to ** the main program. This way the user can easily see the ** command-line options that match the selections from the menu. ** - Command line arguments can be put in directly at the user ** input menu (anything beginning with a - is considered to be ** a command-line option and is appended to the list). ** - At the user input menu, '-' will clear all menu-selected ** options, '--' will clear the command-line options actually ** entered at the command-line, and '---' will clear the options ** from the K2PDFOPT environment variable. ** - The -vb option can now take -2 as an argument, which indicates ** that the vertical spacing is to be exactly preserved from the ** source document. ** - New -t option to specify trimming or no trimming (-t-) of ** excess white space. Default is to trim white space. ** - New -dpi option is same as -odpi. ** - The -w and -h options can take values in inches or cm now ** instead of pixels, and they can be negative to specify that ** the device size should be scaled from the source page size. ** - With -w and -h set to -1 to follow the source page sizes, ** k2pdfopt can now handle varying destination page sizes ** (see set_margins_and_devsize function). ** - The -m and -om command-line arguments can now be a comma- ** -delimited list of margins: left,top,right,bottom. ** E.g. -m 0.5,1.0,0.5,1.0. ** - The "Author", "Title", and "CreationDate" fields in the ** source PDF file are now correctly passed on to the output ** file. If there is no Title field in the source, the base ** output file name is used. ** - Specifying -f2p -2 will now also set -vb -2 automatically. ** Use -fp -2 -vb -1 to revert to v1.51 behavior. ** ** BUG FIXES ** - Fixed OCR word placement bug for tall (centered) bitmaps. ** - Make sure if crop boxes are used that padding/corner marking ** is turned off and OCR is turned off. ** - Fixed some issues when switching to Ghostscript and when ** processing a folder full of bitmaps. ** - Should work with postscript (.eps or .ps) files now ** (requires Ghostscript; output file is still PDF). ** - Fixed bmpregion_find_vertical_breaks() bug where it wasn't ** always correctly interpreting the last section of a region. ** - The -vls option works correctly when combined with -vb -1 ** and no text wrapping now (vls_test.pdf). ** - Eliminated divide by zero issue in bmpregion_is_clear() when ** gt_in gets too small. ** - Put call to wrapbmp_flush() in publish_master() when flushall ** is set (fixes bug reported by hm88 on mobileread) ** - Adjusted critierion for too thick / too thin hyphen in ** bmpregion_hyphen_detect() to be more correct and to allow ** for slightly thinner hyphens. ** - If mupdf can't open the file in Windows, I try the 8.3 alternative ** file name. This solved a problem involving a path that had a ** non-traditional (non-ASCII) character in it. ** - Fixed bug in detect_vertical_lines() that caused problems ** in some cases on 64-bit versions. ** ** v1.51 9-21-12 ** NEW FEATURES ** - New option -jf for special figure justification (under ** "j" in interactive menu). ** - -f2p option applies to small figures as well as tall figures. ** - Compiled w/MuPDF v1.1 and Freetype v2.4.10. ** - Source can be built without DjVuLibre library (see HAVE_DJVU ** macro) or MuPDF (see HAVE_MUPDF macro). If no MuPDF, then ** Ghostscript must be installed to process PDF. ** - Option -o (used to control overwrite) now sets the output ** name via a formatting string. ** Example #1: -o %s_k2opt ** This is the default and and appends _k2opt to the source name. ** (%s is replaced by the base name of the source file.) ** Example #2: -o out%04d ** In this case, each subsequent output file (assuming you ** specify more than one input file) will be out0001.pdf, ** out0002.pdf, out0003.pdf, ... ** ** CHANGES(!) TO COMMAND-LINE OPTIONS/INPUT MENU ** - I generally like to avoid doing this for backwards compatibility, ** but I felt some changes were overdue. ** COMMAND-LINE ** * The overwrite option is now controlled with -ow instead of ** -o since I wanted -o to specify the output name format. ** * The new command-line option for setting OCR visibility (was -wc) ** is -ocrvis. E.g. "-ocrvis st" will show the 's'ource document ** and the OCR 't'ext. ** * The -gtcmax option has been changed to -cgmax to be more ** consistent. It's also been moved under the "co" options ** in the interactive menu. Meaning of -gtc option has been ** clarified somewhat. ** INTERACTIVE MENU ** * Revamped the user input menu a bit by grouping things under ** certain options in order to reduce the number. ** > "o" now sets the output name format string and you ** now set the output DPI (-odpi) under "d" for device resolution. ** > "g" (gamma correction), "s" (sharpening), and "wt" (white ** threshold) are all now under "cs" for contrast/sharpen. ** > "de" (defect size), "evl" (erase vertical lines), and "gs" ** (ghostscript) are all under "s" for Special options. ** > "mc" (mark corners) is now set under "pd" (padding/marking). ** > "ws" (word spacing) is now set under "w" (wrap text). ** ** BUG FIXES ** - OCR text now rendered at a more uniform height. ** - Box around OCR text options fixed (-ocrvis b). ** - Fixed bug in text re-flow when a hyphen is detected. ** - Fixed interpretation of -ocrhmax so that it is based on the ** source dimension. ** - Min figure height from -jf option is applied in the ** bmpregion_find_vertical_breaks() function. ** - Interactive menu should be smarter about when you try to enter ** a file name at the prompt (fixed a bug where it seemed to ** not accept a file you would type in at the prompt even though ** it would eventually process it). ** - No longer needs Jasper or GSL libs for build. ** - Tightened up the pdfwrite_bitmap function in willus lib. ** - Cleaned up / revised source code and build instructions for ** a simpler build (eliminated two basically unneeded libraries). ** - OCR options are ignored in versions compiled without OCR. ** ** v1.50 9-7-12 ** MAJOR NEW FEATURE: OCR ** - For PDFs in English, added optical character recognition ** using two different open source libraries: Tesseract v3.01 ** (http://code.google.com/p/tesseract-ocr/) and GOCR v0.49 ** (http://jocr.sourceforge.net). The OCR'd text ** is embedded into the document as invisible ASCII text ** stored in the same location as the bitmapped words, ** exactly like some document scanning software works ** (e.g. Canon's Canoscan software). This allows the resultant ** PDF document to be searched for text, assuming the OCR ** is successful (won't always be the case, but should ** work reasonably well if the source text is clear enough). ** - Tesseract works far better than GOCR, but requires that ** you download the "trained data" file for your language ** from http://code.google.com/p/tesseract-ocr/downloads/list ** and point the environment var TESSDATA_PREFIX to the root ** folder for your trained data. If the tesseract trained ** data files are not found, k2pdfopt falls back to GOCR. ** E.g. if your data is in c:\tesseract-ocr\tessdata\... ** then set TESSDATA_PREFIX= c:\tessseract-ocr\ ** - I wrestled with this, but the default for OCR (for now) ** is for it to be turned off. You must explicitly turn it ** on with the -ocr command-line option or "oc" at the ** interactive menu. I did this because it does significantly ** slow down processing (about 20 words/second on a fast PC ** using Tesseract). ** - -wc option sets the OCR word color (visibility) ** (e.g. -wc 0 for invisible OCR text, the default). ** - -ocrhmax option sets max height of OCR'd word in inches. ** ** OTHER NEW FEATURES: ** - Added -evl option (menu item "e") to erase vertical lines. ** This allows the option, for example, to get rid of ** column divider lines which often prevent k2pdfopt from ** properly separating columns and/or wrapping text. ** - Detects and eliminates hyphens when wrapping text. Turn ** this off with -hy- ("w" interactive menu option). ** - New option -wrap+ option will unwrap/re-flow narrow columns ** of text to your wider device screen (typically desired ** on a Kindle DX, for example). Best if combined with -fc-. ** - There is now a max column gap threshold option so that ** columns are not detected if the gap between them is too large. ** (-gtcmax command-line option). Default = 1.5. ** - New option -o controls when files get overwritten. E.g. ** -o 10 tells k2pdfopt not to overwrite any existing files ** larger than 10 MB without prompting (the default). ** - New option -f2p ("fit to page") can be used to fit tall ** figures or the "red-boxed" regions (when using -sm) onto ** single pages. ** ** BUG FIXES / MISC ** - Fixed description of -whitethresh in usage (had wrong default). ** - Interactive menu more obvious about what files are specifed ** and allows wildcards for file specification. ** - Fixed bug in word_gaps_add() where the gap array was getting ** erroneously filled with zeros, leading to some cases where ** words got put together with no gap between them during text ** wrapping/re-flow. ** - The -sm option (show marked source) now works correctly when ** used with the -c (color) option. ** - Removed the separate usage note about Ghostscript and put it ** under the -gs option usage. ** - Fixed bug where zero height bitmap was sometimes passed to ** bmp_src_to_dst(). Also rounded off (rather than floor()-ing) ** scaling height used in the bitmap passed to bmp_src_to_dst(). ** - Due to some minor bitmap rendering improvements, this version ** seems to be generally a little faster (~2-4%) compared to ** v1.41 (under same conditions--no OCR). ** ** v1.41 6-11-2012 ** IMPROVEMENTS ** - Compiled w/MuPDF v1.0. ** - Tweaked the auto-straightening algorithm--hopefully more ** accurate and robust. ** - Improved auto-contrast adjust algorithm and added option ** to force a contrast setting by suppying a negative ** value for -cmax. Does a better job on scans of older ** documents with significantly yellowed or browned pages. ** - Options -? and -ui- when specified together now correctly ** echo the entire usage without pausing so that you can ** redirect to a file (as claimed in the usage for -?). ** ** BUG FIXES ** - Fixed bug where the column finding algorithm became far ** too slow on certain types of pages (to the point where ** k2pdfopt appeared to have crashed). ** - Fixed bug where k2pdfopt wasn't working correctly when ** the -c option was specified (color output). ** - Fixed bug where if max columns was set to 3 in the ** interactive menu, it didn't get upgraded to 4. ** - Fixed memory leak in bmpregion_add() (temp bitmap ** wasn't getting freed). ** - Fixed memory leak in bmp_src_to_dst() (temp 8-bit ** bitmap not getting freed). ** - Fixed memory leak in bmpregion_one_row_wrap_and_add() ** (breakinfo_free). ** - Check for zero regions in breakinfo_compute_row_gaps() ** and breakinfo_compute_col_gaps(). ** - Autostraighten no longer inadvertently turned on ** when debugging. ** ** v1.40 4-5-2012 ** - This is my most substantial update so far. ** I did a re-write of many parts of the code, including ** all of the text wrapping functions. I also put this ** version through many hours of regression testing. ** - Major new features: ** * Does true word wrap (brings words up from the ** next line if necessary). ** * Preserves indentation, justification, and vertical ** spacing more faithfully. Overall, particularly for ** cases with text wrapping, I think the output looks ** much better. ** * Ignores defects in scanned documents. ** * Compiled with all of the very latest third party ** libraries, including mupdf 0.9. ** * v1.40 is about 5% faster than v1.35 on average ** (Windows 64 version). ** - New justification command-line option is: ** -j [-1|0|1|2][+/-] ** Using -1 tells k2pdfopt to use the document's own ** justification. A + after will attempt to fully ** justify the text. A - will force no full justification. ** Nothing after the number will attempt to determine ** whether or not to use full justification based on ** if the source document is fully justified. ** - The default defect size to ignore in scanned documents ** is a specified user size (default is 1 point). The ** command-line option is -de (user menu option "de"). ** - Command line options -vls, -vb, and -vs control ** vertical spacing, breaks, and gaps. They are all ** under the interactive user menu under "v". ** - Line spacing is controlled by -vls. ** Example: -vls -1.2 (the default) will preserve ** the default document line spacing up to 1.2 x ** single-spaced. If line spacing exceeds 1.2 x in the ** source document, the lines are spaced at 1.2 x. ** The negative value (-1.2) tells k2pdfopt to use it ** as a limit rather than forcing the spacing to be ** exactly 1.2 x. A positive value, on the other hand, ** forces the spacing. E.g. -vls 2.0 will force line ** spacing to be double-spaced. ** - Regions are broken up vertically using the new -vb ** option. It defaults to 2 which breaks up regions ** separated by gap 2 X larger than the median line gap. ** For behavior more like v1.35, or to not break up the ** document into vertical regions, use -vb -1. Vertical ** breaks between regions are shown with green lines when ** using -sm. ** - The new -vs option sets the maximum gap between regions ** in the source document before they are truncated. ** Default is -vs 0.25 (inches). ** - Added menu option for -cg under "co". ** - Reduced default min column gap from 0.125 to 0.1 inches. ** - The -ws (word spacing threshold) value is now specified ** as a fraction of the lowercase letter height (e.g. a ** small 'o'). The new default is 0.375. ** ** v1.35 2-16-2012 ** - Changed how the columns in a PDF file are interpreted ** when the column divider moves around some. The column ** divider is now allowed to move around on the page ** but still have the columns be considered contiguous. ** This is controlled by the -comax option. Use ** -comax -1 to revert to v1.34 and before. The ** default is -comax 0.2. See example at: ** http://willus.com/k2pdfopt/help/column_divider.shtml ** - Added nice debugging tool with the -sm command-line ** option ("sm" on interactive menu) which shows marked ** source pages so you can clearly see how k2pdfopt ** is interpreting your PDF file and what affect the ** options are having. ** - The last line in a paragraph, if shorter than the ** other lines significantly, will be split differently ** and not fully justified. ** - Modified the column search function to better find ** optimal gaps. ** - The height of a multi-column region is calculated ** more correctly now (does not include blank space, ** and both columns must exceed the minimum height ** requirement). ** - Text immediately after a large rectangular block ** (typically a figure) is now appended to the ** figure region, since it is often the axis labels ** for the figure. ** - Fixed array-out-of-bounds bugs in ** bmpregion_wrap_and_add() and break_point(). ** - Added some performance enhancements regarding how ** regions are trimmed (rowcount[] and colcount[] ** arrays). ** - The file name to be processed is now listed with the ** interactive menu, and a wildcard can now be specified ** as the file name on the interactive menu. ** ** v1.34a 12-30-2011 ** - Some build corrections after the first release of ** v1.34 which had issues in Linux and Windows. ** - Fixed interpretation of -jpg flag when it's the last ** command-line option specified. ** ** v1.34 12-30-2011 ** - I've collected enough bug reports and new feature ** requests that I decided to do an update. ** - Added -cgr and -crgh options to give more control ** over how k2pdfopt selects multi-column regions. ** - Don't switch to Ghostscript on DJVU docs. ** - Continues processing files even if has an error on ** one page. ** - Fixed bug in orientation detection (minimum returned ** value is now 0.01 so as not to kill the average). ** - Added document scale factor (-ds or "ds" in menu) ** which allows users to correct PDF docs that are the ** wrong size (e.g. if your PDF reader says your ** document is 17 x 22 inches when it should be ** 8.5 x 11, use -ds 0.5). ** - Fixed bug in break_point() where bp1 and bp2 did not ** get initialized correctly. ** ** v1.33 11-11-2011 ** - Added autodetection of the orientation of the PDF ** file. This is somewhat experimental and comes with ** several caveats, but I have made it the default ** because I think it works pretty well. ** Caveat #1: It assumes the PDF/DJVU file is mostly ** lines of text and looks for regularly spaced lines ** of text to determine the orientation. ** Caveat #2: If it determines that the page is ** sideways, it rotates it 90 degrees clockwise, so it ** may end up upside down. ** - The autodetection is set with the -rt command-line ** option (or the "rt" menu option): ** 1. Set it to a number to rotate your PDF/DJVU file ** that many degrees counter-clockwise. ** 2. Set it to "auto" and k2pdfopt will examine up ** to 10 pages of the file to determine the ** orientation it will use. ** 3. Set it to "aep" to auto-detect the rotation of ** every page. If you have different pages that ** are rotated differently from each other within ** one file, you can use this option to try to ** auto-rotate each page. ** 4. To revert to v1.32 and turn off the orientation ** detection, just put -rt 0 on the command line. ** - Added option to attempt full justification when ** breaking lines of text. This is experimental and ** will only work well if the output dpi is chosen so ** that rows break approximately evenly. To turn on, ** use the "j" option in the interactive menu or the ** -j command-line option with a + after the selection, ** e.g. ** -j 0+ (left/full justification) ** -j 1+ (center/full justification) ** -j 2+ (right/full justification) ** ** v1.32 10-25-2011 ** - Make sure locale is set so that decimal marker is ** a period for numbers. This was causing problems ** in locales where the decimal marker is a comma, ** resulting in unreadable PDF output files. This ** was introduced by having to compile for the DJVU ** library in v1.31. ** - Slightly modified compile of DJVU lib (re: locale). ** - Remove "cd" option from interactive menu (it was ** obsoleted in v1.27). ** - Warn user if source bitmap is excessively large. ** - Print more info in header (compiler, O/S, chip). ** ** v1.31 10-17-2011 ** - Now able to read DJVU (.djvu) files using ddjvuapi ** from djvulibre v3.5.24. All output is still PDF. ** - Now offer generic i386 versions for Win and Linux ** which are more compatible w/older CPUs, and fixed ** issue with MuPDF so it doesn't crash on older CPUs ** when compiled w/my version of MinGW gcc. ** ** v1.30 10-4-2011 ** - Just after I posted v1.29, I found a bug I'd introduced ** in v1.27 where k2pdfopt didn't quit when you typed 'q'. ** I fixed that. ** - Made user menu a little smarter--allows different ** entries depending on whether a source file has already ** been specified. ** ** v1.29 10-4-2011 ** - Input file dpi now defaults to twice the output dpi. ** (See -idpi option.) ** - Added option to break input pages at the end of each ** output page. ("Break pages" in menu or -bp option.) ** - Set dpi minimums to 50 for input and 20 for output. ** ** v1.28 10-1-2011 ** - Fixed bug that was causing vertical stripes to show ** up on Mac and Linux version output. ** - OSX 64-bit version now available. ** ** v1.27 9-25-2011 ** - Changed default max columns to two. There were ** too many cases of false detection of sub-columns. ** Use -col 4 to detect up to 4 columns (or select ** the "co" option in the user menu). ** - The environment variable K2PDFOPT now can be ** use to supply default command-line options. It ** replaces all previous environment variables, ** which are now ignored. The options on the ** command line override the options in K2PDFOPT. ** - Added -rt ("rt" in menu) option to rotate the source ** pages by 90 (or 180 or 270) degrees if desired. ** - Default startup is now to show the user menu rather ** than command line usage. Type '?' for command line ** usage or use the -? command line option to see usage. ** - Added three new "expert-mode" options for controlling ** detection of gaps between columns, rows, and words: ** -gtc, -gtr, -gtw. The -gtc option replaces ** the -cd option from v1.26. These can all be set ** with the "gt" menu option. Use the "u" option for ** more info (to see usage). ** - In conjunction with the new "expert-mode" options, ** I adjusted how gaps between columns, rows, and words ** are detected and adjusted the defaults to hopefully ** be more robust. ** - You can now enter all four margin settings (left, ** top, right, bottom) from the user input menu for ** "m" and "om". ** - Added -x option to get k2pdfopt to exit without asking ** you to press first. ** ** v1.26 9-18-2011 ** - Added column detection threshold input (-cd). Set ** higher to make it easier to detect multiple columns. ** - Adjusted the default column detection to make column ** detection a bit easier on scanned docs with ** imperfections. ** ** v1.25 9-16-2011 ** - Smarter detection of number of TTY rows. ** ** v1.24 9-12-2011 ** - Input on user menu fixed not to truncate file names ** longer than 32 chars for Mac and Linux. ** ** v1.23 9-11-2011 ** - Added right-to-left (-r) option for scanning pages. ** ** v1.22 9-10-2011 ** - First version compiled under Mac OS X. ** - Made some changes to run on OS X. Kludgey, but works. ** You have to double-click the icon and then drag a file ** to the display window and press . I've made ** linux work similarly. ** - Since Mac and Linux shells default to black on white, ** I've made the the text colors more friendly to that ** scheme for linux and Mac. Use -a- to turn off text ** coloring altogether, or set the env variable ** K2PDFOPT_NO_TEXT_COLORING. ** ** v1.21 9-7-2011 ** - Moved some bmp functions to standard library. ** - JPEG images always done at 8 bpc (no dithering). ** - Fixed dithering of 1-bit-per-colorplane images. ** ** v1.20 9-2-2011 ** - Added dithering for bpc < 8. Use -d- to turn off. ** - Adjusted gamma correction algorithm slightly (so that ** pure white stays pure white). ** ** v1.19 9-2-2011 ** - Added gamma adjust. Setting to a value lower than 1.0 ** will darken the font some and appear to thicken it up. ** Default is 0.5. Thanks to PaperCrop for the idea. ** - Interactive menu now uses letters for the options. ** This should keep the option choices the same even if ** I add new ones, and now the user can enter a page range ** as the final entry. ** ** v1.18 8-30-2011 ** - break_point() function now uses same white threshold ** as all other functions. ** - Added "-wt" option to manually specify "white threshold" ** value above which all pixels are considered white. ** - Tweaked the contrast adjustment algorithm and changed ** the max to 2.0 (was much higher). ** - Added "-cmax" option to limit contrast adjustment. ** ** v1.17 8-29-2011 ** - Min region width now 1.0 inches. Bug fixed when ** output dpi set too large--it is now reduced so that ** the output display has at least 1-inch of display. ** ** v1.16 8-29-2011 ** - Now queries user for options when run (just press ** to go ahead with the conversion). ** Use -ui- to disable this (it is automatically disabled ** when run from the command line in Windows). ** - Fixed bug in MuPDF calling sequence that results in ** more robust reading of PDF files. (Fixes the parsing ** of the second two-column example on my web page.) ** - Fixed bug in MuPDF library that prevented it from ** correctly parsing encrypted sections in PDF files. ** (This bug is not in the 0.8.165 tarball but it ** was in the version that I got via "git".) ** This only affected a small number of PDF files. ** - New landscape mode (not the default) is enabled ** with the -ls option. This turns the output sideways ** on the kindle, resulting in a more magnified display ** for typical 2-column files. Thanks to Taesoo Kwon ** for this idea. ** - Default PDF output is now much smaller--about half ** the original size. This is because the bitmaps are ** saved with 4 bits per colorplane (same as the Kindle). ** You can set this to 1, 2, 4, or 8 with the -bpc option. ** Thanks to Taesoo Kwon and PaperCrop for this idea. ** - Default -m value is now 0.25 inches (was 0.03 inches). ** This ignores anything within 0.25 inches of the edge ** of the source page. ** - Now uses precise Kindle 2 (and 3?) display resolution ** by default. Thanks to the PaperCrop forum for pointing ** out that Shift-ALT-G saves screenshot on Kindle. ** The kindle is a weird beast, though--after lots of ** testing, I figured out that I have to do the ** following to get it to display the bitmaps with ** a 1:1 mapping to the Kindle's 560 x 735 resolution: ** (a) Make the actual bitmap in the PDF file ** 563 x 739 and don't use the excess pixels. ** I.e. pad the output bitmap with 3 extra ** columns and 4 extra rows. ** (b) Put black dots in the corners at the 560x735 ** locations, otherwise the kindle will scale ** the bitmap to fit its screen. ** This is accomplished with the new -pr (pad right), -pb ** (pad bottom), and -mc (mark corners) options. The ** defaults are -pr 3 -pb 4 -mc. ** - New -as option will attempt to automatically straighten ** source pages. This is not on by default since it slows ** down the conversion and is somewhat experimental, but I've ** found it to be pretty reliable and it is good to use on ** scanned PDFs that are a bit tilted since the pages need ** to be straight to accurately detect cropping regions. ** - Reads 8-bit grayscale directly from PDF now for faster ** processing (unless -c is specified for full color). ** - Individual bitmaps created only in debug mode. ** k2_src_dir and k2_dst_dir folders no longer needed. ** ** v1.15 8-3-2011 ** - Substantial code re-write, mostly to clean things up ** internally. Hopefully won't introduce too many bugs! ** - Can handle up to 4 columns now (see -col option). ** - Added -c for full color output. ** - If column width is close to destination screen width, ** the column is fit to the device. Controlled with -fc ** option. ** - Optimized much of code for 8-bit grayscale bitmaps-- ** up to 50% faster than v1.14. ** - Added -wrap- option to disable text wrapping. ** - Can convert specific pages now--see -p option. ** - Added margin ignoring options: -m, -ml, -mr, -mt, -mb. ** - Added options for margins on the destination device: ** -om, -oml, -omr, -omt, -omb. ** - Min column gap now 0.125 inches and min column height ** now 1.5 inches. Options -cg and -ch added to control ** this. ** - Min word spacing now 0.25. See -ws option. ** ** v1.14 7-26-2011 ** - Smarter line wrapping and text sizing based on custom options. ** (e.g. should work better for any size destination screen ** --not just 6-inch.) ** - Bug fix. -w option fixed. ** - First page text doesn't butt right up against top of page. ** ** v1.13 7-25-2011 ** - Added more command-line options: justification, encoding ** type, source and destination dpi, destination width ** and height, and source margin width to ignore. ** Use -ui to turn on user input query. ** - Now applies a sharpening algorithm to the output images ** (can be turned off w/command-line option). ** ** v1.12 7-20-2011 ** - Fixed a bug in the PDF output that was ignored by some readers ** (including the kindle itself), but not by Adobe's reader. ** PDF files should be readable by all software now. ** ** v1.11 7-5-2011 ** - Doesn't put "Press to exit." if launched as a ** command in a console window (in Windows). No change to ** Linux version. ** ** v1.10 7-2-2011 ** - Integrated with mupdf 0.8.165 so that Ghostscript is ** no longer required! Ghostscript can still be used/ ** will be tried if mupdf fails to decypher the pdf file. ** - PDF page number count now much more reliable. ** ** v1.07 7-1-2011 ** - Fixed bugs in the pdf writing that were making the ** pdf files incompatible with the kindle. ** - Compiled w/gcc 4.5.2. ** - Added smarter determination of # of PDF pages in source, ** though it doesn't always work on newer PDF formats. ** This can cause an issue with the win32 version because ** calling Ghostscript on a page number beyond what is in ** the PDF file seems to sometimes result in an exception. ** ** v1.06 6-23-2011 ** - k2pdfopt now first tries to find Ghostscript using the registry ** (Windows only). If not found, searches path and common folders. ** - Compiled w/turbo jpeg lib 1.1.1, libpng 1.5.2, and zlib 1.2.5. ** - Correctly sources single bitmap files. ** ** v1.05 6-22-2011 ** Fixed bug in routine that looks for Ghostscript. ** Also, Win64 version now looks for gsdll64.dll/gswin64c.exe ** before gsdll32.dll/gswin32c.exe. ** ** v1.04 6-6-2011 ** No longer requires Imagemagick's convert program. ** ** v1.03 3-29-2010 ** Made some minor mods for Linux compatibility. ** ** v1.02 3-28-2010 ** Changed rules for two-column detection to hopefully avoid ** false detection. At least 0.1 inches must separate columns. ** ** v1.01 3-22-2010 ** Fixed some bugs with file names having spaces in them. ** Added program icon. Cleaned up screen output some. ** ** v1.00 3-20-2010 ** First released version. Auto adjusts contrast, clears ** edges. ** */