use UTF-8 encoding

2019-05-15 18:20:12 +02:00 · 2019-05-15 18:20:12 +02:00 · 1c3623d719
parent 7fe8e1ffe0
commit 1c3623d719
1 changed files with 27 additions and 27 deletions
--- a/54
+++ b/54
@ -25,7 +25,7 @@ cryptographic software is subject to U.S. export control laws and
 regulations. The new 1997 Commerce Department Export Administration
 Regulations (EAR) explicitly provide that "A printed book or other printed
 material setting forth encryption source code is not itself subject to the
-EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
+EAR." (see 15 C.F.R. §734.3(b)(2)). PGP, in an overabundance of caution,
 has only made available its source code in a form that is not subject to
 those regulations. So, books containing cryptographic source code may be
 published, and after they are published they may be exported, but only
@ -167,24 +167,24 @@ The first step to getting OrnniPage 7 to work well is to set it up with
 options to disable all of its more advanced features for preserving font
 changes and formatting. Look in the Seffings menu.

-· Create a Zone Contents File with all of ASCII in it, plus the extra
+· Create a Zone Contents File with all of ASCII in it, plus the extra
  bullet, currency, yen and pilcrow symbols. Name it "Source Code".
-· Create a Source Code style set. Within it, create a Source Code zone style
+· Create a Source Code style set. Within it, create a Source Code zone style
  and make it the default.
-· Set the font to something fixed-width, like Courier.
-· Set a fixed font size (10 point) and plain text, left-aligned.
-· Set the tab character to a space.
-· Set the text flow to hard line returns.
-· Set the margins to their widest.
-· The font mapping options are irrelevant.
+· Set the font to something fixed-width, like Courier.
+· Set a fixed font size (10 point) and plain text, left-aligned.
+· Set the tab character to a space.
+· Set the text flow to hard line returns.
+· Set the margins to their widest.
+· The font mapping options are irrelevant.

 Go to the settings panel and:

-· Under Scanner, set the brightness to manual. With careful setting of the
+· Under Scanner, set the brightness to manual. With careful setting of the
  threshold, this generates much better results than either the automatic
  threshold or the 3D OCR. Around 144 has been a good setting for us; you
  may want to start there.
-· Under OCR, you'll build a training file to use later, but turn off
+· Under OCR, you'll build a training file to use later, but turn off
  automatic page orientation and select your Source Code style set in the
  Output Options. Also set a reasonable reject character. (For test, we
  used the pi symbol, which came across from the Macintosh as a weird
@ -228,26 +228,26 @@ specific Latin-1 characters to be processed.

 They characters most in need of training are as follows:

-· Zero is printed 'slashed.'
-· Lowercase L has a curled tail to distinguish it clearly from other
+· Zero is printed 'slashed.'
+· Lowercase L has a curled tail to distinguish it clearly from other
  vertical characters like 1 and I.
-· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
+· The or-bar or pipe symbol '|' is printed "broken" with a gap in the
  middle to distinguish it similarly.
-· The underscore character has little "serifs" on the end to distinguish
+· The underscore character has little "serifs" on the end to distinguish
  it from a minus sign. We also raised it a just a tad higher than the
  normal underscore character, which was too low in the character cell to
  be reliably seen by OmniPage.
-· Tabs are printed as a hollow right-pointing triangle, followed by blanks
+· Tabs are printed as a hollow right-pointing triangle, followed by blanks
  to the correct alignment position. If not trained enough, OmniPage
  guesses this is a capital D. You should train OmniPage to recognize this
  symbol as a currency symbol (Latin-1 244).
-· Any spaces in the original that follow a space, or a blank on the printed
+· Any spaces in the original that follow a space, or a blank on the printed
  page, are printed as a tiny black triangle. You should train OmniPage to
  recognize this as a center dot or bullet (Latin-1 267). We didn't use a
  standard center dot because OmniPage confused it with a period.
-· Any form feeds in the original are printed as a yen currency symbol
+· Any form feeds in the original are printed as a yen currency symbol
  (Latin-1 245).
-· Lines over 80 columns long are broken after 79 columns by appending a big
+· Lines over 80 columns long are broken after 79 columns by appending a big
  ugly black block. You should train OmniPage to recognize this as a
  pilcrow (paragraph symbol, Latin-1 266). We did this because after
  deciding something black and visible was suitable, we found out the font
@ -264,16 +264,16 @@ to train on, use that.

 Other things that need training:

-· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
+· ~ (tilde), ^ (caret), ` (backquote) and ' (quote). These get dropped
  frequently unless you train them.
-· i, j and; (semicolon). These get mixed up.
-· 3 and S. These also get mixed up.
-· Q can fail to be recognized.
-· C and [ can be confused.
-· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
+· i, j and; (semicolon). These get mixed up.
+· 3 and S. These also get mixed up.
+· Q can fail to be recognized.
+· C and [ can be confused.
+· c/C, o/O, p/P, s/S, u/U, v/V, w/W, y/Y and z/Z are often confused. This
  can be helped by some training.
-· r gets confused with c and n. I don't understand c, but it happens.
-· f gets confused with i.
+· r gets confused with c and n. I don't understand c, but it happens.
+· f gets confused with i.

 The OCR training pages have lots of useful examples of troublesome
 characters. Scan a few pages of material, training each page, then scan a