|
699 books posted to PG from DP!
|
| DP » Formatting Guidelines |
Formatting GuidelinesVersion 1.8, generated June 1, 2005Formatting Guidelines in French / Directives de Formatage en françaisCheck out the Proofreading Quiz and TutorialThe Primary Rule"Don't change what the author wrote!" The final electronic book seen by a reader, possibly many years in the future, should accurately convey the intent of the author. If the author spelled words oddly, leave them spelled that way. If the author wrote outrageous racist or biased statements, leave them that way. If the author puts italics, bold text or a footnote every third word, mark them italicized, bolded or footnoted. We do change minor typographical conventions that don't affect the sense of what the author wrote. For example, we rejoin words that were broken at the end of a line (End-of-line Hyphenation). Changes such as these help us produce a consistently formatted version of the book. The proofreading rules we follow are designed to achieve this result. Please carefully read the rest of these Guidelines with this concept in mind. To assist the next proofreader and the post-processor, we also preserve line breaks. This allows them to easily compare the lines in the text to the lines in the image. Summary GuidelinesThe Formatting Summary is a short, 2-page printer-friendly (.pdf) document that summarizes the main points of these Guidelines, and gives examples of how to proofread. Beginning Proofreaders are encouraged to print out this document and keep it handy while proofreading. You may need to download and install a .pdf reader. You can get one free from Adobe® here. About This DocumentThis document is written in order to reduce formatting differences when proofreading of one book is distributed among many proofreaders, each working on different pages of the book. This helps us all do formatting the same way. That makes it easier for the post-processor to eventually combine all these proofread pages into one e-book. It is not intended as any kind of a general editorial or typesetting rulebook. We've included in this document all the items that new users have asked about formatting while proofreading. If there are any items missing, or items that you consider should be done differently, or if something is vague, please let us know. This document is a work in progress. Help us to progress by posting your suggested changes in the Documentation Forum in this thread. Project CommentsOn the proofreading interface page (Project Page) where you start proofreading pages, there is a section called "Project Comments" containing information specific to that project (book). Read these before you start proofreading pages! If the Project Manager wants you to format something in this book differently from the way specified in these Guidelines, that will be noted here. Instructions in the Project Comments override the rules in these Guidelines, so follow them. (This is also where the Project Manager may give you interesting tidbits of information about the author or the project.) Please also read the Project Thread: The Project Manager may clarify project-specific guidelines here, and it is often used by proofreaders to alert other proofreaders to recurring issues within the project and how they can best be addressed. On the Project Page, the link 'Images, Pages Proofread, & Differences' allows you to see how other proofreaders have made changes. This Forum thread discusses different ways to use this information. Forum/Discuss this ProjectOn the proofreading interface page (Project Page) where you start proofreading pages, on the line "Forum", there is a link titled "Discuss this Project" (if the discussion has already started), or "Start a discussion on this Project" (if it hasn't). Clicking on that link will take you to a thread in the projects forum dedicated to this specific project. That is the place to ask questions about this book, inform the Project Manager about problems, etc. Using this project forum thread is the recommended way to communicate with the Project Manager and other proofreaders who are working on this book. Fixing errors on Previous PagesWhen you select a project for proofreading, the Project Comments page is loaded. This page contains links to pages from this project that you have recently proofread. (If you haven't proofread any pages yet, there will be no links shown.) Pages listed under either "DONE" or "IN PROGRESS" are available to make proofreading corrections or to finish proofreading. Just click on the link to the page. So if you discover that you made a mistake on a page, or marked something incorrectly, you can click on that page here and re-open it to fix the error. For more detailed information, refer to either the Standard Proofreading Interface Help or the Enhanced Proofreading Interface Help, depending on which interface you are using.
Front/Back Title PageProofread all the text, just as it was printed on the page, whether all capitals, upper and lower case, etc., including the years of publication or copyright. Older books often show the first letter as a large ornate graphic—proofread this as just the letter.
Table of ContentsProofread the Table of Contents just as it is printed in the book, whether all capitals, upper and lower case, etc. and surround it with /* and */. Leave a blank line between these markers and the rest of the text. Page number references should be retained and be placed at least six spaces past the end of the line. Remove any periods or asterisks (leaders) used to align the page numbers.
Blank PageProofread as [Blank Page] if both the text and the image are blank. If there is text in the proofreading text area and a blank image, or if there is an image but no text, follow the directions for a Bad Image or Bad Text. Page Headers/Page FootersRemove page headers and page footers, but not footnotes, from the text. The page headers are normally at the top of the image and have a page number opposite them. Page headers may be the same all through the book (often the title of the book and the author's name), they may be the same for each chapter (often the chapter number), or they may be different on each page (describing the action on that page). Remove them all, regardless, including the page number. A chapter header will start further down the page and won't have a page number on the same line. See the next section for a specific example.
Chapter HeadersProofread chapter headers as they appear in the text. A chapter header may start a bit farther down the page than the page header and won't have a page number on the same line. Chapter Headers are often printed all caps; if so, keep them as all caps. Put 4 blank lines before the "CHAPTER XXX". Include these blank lines even if the chapter starts on a new page; there are no 'pages' in an e-book, so the blank lines are needed. Then leave 1 (one) blank line between each additional part of the chapter header, such as a chapter description, opening quote, etc., and finally leave 2 (two) blank lines before the start of the text of the chapter. Old books often printed the first word or two of every chapter in all caps or small caps; change these to upper and lower case (first letter only capitalized). Watch out for a missing double quote at the start of the first paragraph, which some publishers did not include or which the OCR missed due to a large capital in the original. If the author started the paragraph with dialog, insert the double quote.
Section HeadersSome texts have sections within chapters. Proof these headers as they appear in the text. Leave 2 blanks lines before the header and one after, unless the Project Manager has requested otherwise. If you are not sure if a header indicates a chapter or a section, post a question in the Project Thread, noting the page number. Other Major Divisions in TextsMajor Divisions in the text such as Preface, Foreword, Introduction, Prologue, Epilogue, Appendix, References, Conclusion, Glossary, Summary, Acknowledgements, Bibliography, etc., should be proofread in the same way as Chapter Headers, i.e. 4 blank lines before the heading and 2 blank lines before the start of the text. Paragraph Side-Descriptions (Sidenotes)Some books will have short descriptions of the paragraph along the side of the text. These are called sidenotes. Move sidenotes to just above the paragraph that they belong to. A sidenote should be surrounded by a sidenote tag [Sidenote: and ], with the text of the sidenote placed in between. Proofread the sidenote text as it is printed, preserving the line breaks, italics, etc.. Leave a blank line after the sidenote, so that it does not get merged into the paragraph when the text is rewrapped during post-processing. If there are multiple sidenotes for a single paragraph, put them one after another at the start of the paragraph. Leave a blank line separating each of them. If the paragraph began on a previous page, put the Sidenote at the top of the page and mark it with * so that the post-processor can see that it belongs on the previous page. Like this: *[Sidenote: (text of sidenote)]. The post-processor will move them to the appropriate place. Sometimes a Project Manager will request that you put Sidenotes next to the sentence they apply to, rather than at the top or bottom of the paragraph. In this case, don't separate them out with blank lines.
Paragraph Spacing/IndentingPut a blank line before the start of paragraphs, even if a paragraph starts at the top of a page. You should not indent the start of paragraphs, but if all paragraphs are already indented, don't bother removing those spaces—that can be done automatically during post-processing. See the Chapter Headers image/text for an example. Multiple ColumnsProofread ordinary text which has been printed in two columns as a single column. Spans of multiple-column text within single column sections should be proofread as a single column by placing the text from the left-most column first, the text from the next one after it, and so on. You do not need to mark where the columns were split, just join them together. If the columns are lists of items, mark the start of the list with /* and the end with */ so that the lines do not get re-wrapped during post-processing. Leave a blank line between these markers and the rest of the text. See also the Indexes, Lists of Items and Tables sections of these Guidelines. IllustrationsText for an illustration should be surrounded by an illustration tag [Illustration: and ], with the caption text placed in between. Proofread the caption text as it is printed, preserving the line breaks, italics, etc. If an illustration has no caption, add a tag [Illustration]. If the illustration is in the middle of or at the side of a paragraph, move the illustration tag to before or after the paragraph and leave a blank line to separate them. Rejoin the paragraph by removing any blank lines left by doing so. If there is no paragraph break on the page, mark the illustration tag with an * like so *[Illustration: (text of caption)], move it to the top of the page, and leave 1 (one) blank line after it.
Footnotes/EndnotesFootnotes are placed out-of-line; that is, the text of the footnote is left at the bottom of the page and a tag placed where it is referenced in the text. During proofreading, this means: 1. The number, letter, or other character that marks a footnote location should be surrounded with brackets ([ and ]). Remove any spaces before the [—keep it right next to the word being footnoted[1] or its punctuation mark,[2] as shown in the text, and the two examples in this sentence. When footnotes are marked with a series of special characters (*, †, ‡, §, etc.) we replace these with Capital letters in order (A, B, C, etc.). 2. A footnote should be surrounded by a footnote tag [Footnote #: and ], with the footnote text placed in between, and the footnote number or letter placed where the # is shown in the tag. Proofread the footnote text as it is printed, preserving the line breaks, italics, etc. Leave the footnote text at the bottom of the page. Be sure to use the same tag in the footnote as you used in the text where the footnote was referenced. In some books, the Project Manager may ask that you move the footnotes in-line; read the Project Comments for instructions in this case. See the Page Headers/Page Footers image/text for an example footnote. If there's a footnote at the bottom of the page with no footnote marker in the text, especially if it starts mid-sentence or mid-word, it's probably a continuation of a footnote from a previous page. Leave it at the bottom of the page near the other footnotes, and surround it with *[Footnote: (text of footnote)] (without any footnote number or marker). The * indicates that the footnote was continued, and brings it to the attention of the post-processor. If a footnote continues on the next page (the page ends before the footnote does), leave the footnote at the bottom of the page, and just put an asterisk * where the footnote ends, like this: [Footnote 1: (text of footnote)]*. (The * indicates that the footnote ended prematurely, and brings it to the attention of the post-processor, who will eventually join it up with the rest of the footnote text. If a continued footnote ends or starts on a hyphenated word, mark both the footnote
and the word with *, thus: If a footnote or endnote is referenced in the text but does not appear on that page, keep the footnote/endnote number or marker and surround it with square brackets [ and ]. This is common in scientific and technical books, where footnotes are often grouped at the end of chapters. See "Endnotes" below.
In some books, footnotes are separated from the main text by a horizontal line. We don't keep this so please just leave a blank line between the main text and the footnotes. (See example above.) Endnotes are just footnotes that have been located together at the end of a chapter or at the end of the book, instead of on the bottom of each page. These are proofread in the same manner as out-of-line footnotes. Where you find an endnote reference in the text, just surround it with [ and ]. If you are proofreading one of the ending pages with the endnotes text on it, surround the text of each note with [Footnote #: (text of endnote)], with the endnote text placed in between, and the endnote number or letter placed where the # is. Put a blank line after each endnote so that they remain separate paragraphs when the text is rewrapped during post-processing. Footnotes in Poetry or Tables should be treated the same as other footnotes. Proofreaders should tag them and leave them at the bottom of the page; the post-processor will decide on the final placement.
ItalicsProofread italicized text with <i> inserted at the start and </i> inserted at the end of the italics. (Note the "/" in the closing tag.) Punctuation goes outside the italics, unless it is an entire sentence or section that is italicized, or the punctuation is itself part of a phrase, title or abbreviation that is italicized. The periods that mark an abbreviated word in the title of a journal such as Phil. Trans. are part of the title for italicization purposes, and are included within the italic tags, thus: <i>Phil. Trans.</i>. For dates and similar phrases, proofread the entire phrase as italics, rather than marking the words as italics and the numbers as non-italics. The reason is that many typefaces found in older texts used the same design for numbers in both regular and italics. If the italicized text consists of a series/list of words or names, mark these up with italics tags individually. Examples—Italics:
Bold TextProofread bold text (text printed in a heavier typeface) with <b> inserted before the bold text and </b> after it. (Note the "/" in the closing tag.) Punctuation goes outside the bold tags, unless it is an entire sentence or section that is in bold, or the punctuation is itself part of a phrase, title or abbreviation that is in bold type. See the Page Headers/Page Footers image/text for an example. Some Project Managers may specify in the Project Comments that bold text be rendered as all caps. SuperscriptsOlder books often abbreviated words as contractions, and printed them as
superscripts, for example: In scientific & technical works, proofread superscripted characters with curly braces
{ and }, surrounding them, even if there is only one character superscripted.
The Project Manager may specify in the Project Comments that superscripted text be marked up differently. SubscriptsSubscripted text is often found in scientific works, but is not common in other
material. Proofread subscripted text by inserting an underline character _ and
surrounding the text with curly braces { and }.
Underlined TextProofread underlined text as Italics, with <i> and </i>. (Note the "/" in the closing tag.) Underlining was often used to indicate emphasis when the typesetter was unable to actually italicize the text, for example in a typewritten document. Some Project Managers may specify in the Project Comments that underlined text be marked up with the <u> and </u> tags. S p a c e d O u t Text (gesperrt)Proofread s p a c e d o u t text as Italics, with <i> and </i>, and remove the extra spaces between letters in each word. (Note the "/" in the closing tag.) This was a typesetting technique used to emphasize a piece of text in older German (and some Italian) books. Italics serve that purpose for modern readers, and extra spacing may not be clear on all the different screen sizes & fonts where people may read the final e-book. Font size changesNormally we do not do anything to mark changes in font size. The exception to this is when the font size changes to indicate a block quotation. Words in all CapitalsProofread words that are printed in all capital letters as all capital letters. The exception to this is the first word of a chapter: many old books typeset the first word of these in all caps; this should be changed to upper and lower case, so "ONCE upon a time," becomes "Once upon a time," Words in Small CapsProofread words that are printed in Small Caps as mixed upper and lowercase, and surround the text with <sc> and </sc> markup. Example: This is Small Caps would correctly be <sc>This is Small Caps</sc>. Large, Ornate opening Capital letter (Drop Cap)Proofread large and ornate graphic first letters of a chapter, section, or paragraph as just the letter. Dashes, Hyphens, and Minus SignsThere are generally four such marks you will see in books:
Note: If an em-dash appears at the start or end of a line of your OCR'd text, join it with the other line so that there are no spaces or line breaks around it. Only if the author used an em-dash to start or end the paragraph or line of poetry or dialog should you leave it at the start or end of a line. See the examples below. Examples—Dashes, Hyphens, and Minus Signs:
End-of-line HyphenationWhere a hyphen appears at the end of a line, join the two halves of the hyphenated word back together. If it is really a hyphenated word like well-meaning, join the two halves leaving the hyphen in-between. But if it was just hyphenated because it wouldn't fit on the line, and is not a word that is usually hyphenated, then join the two halves and remove the hyphen. Keep the joined word on the top line, and put a line break after it to preserve the line formatting—this makes it easier for the 2nd Round Proofreader. See the Dashes, Hyphens, and Minus Signs section of these Guidelines for examples of each kind (nar-row turns into narrow, but low-lying keeps the hyphen). If the word is followed by punctuation, then carry that punctuation onto the top line, too. Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together. Like this: to-*day. The asterisk will bring it to the attention of the post processor, who has access to all the pages, and can determine how the author typically wrote this word. End-of-page HyphenationProofread end-of-page hyphens by leaving the hyphen at the end of the last line, and mark it with a * after
the hyphen. On pages that start with part of a word from the previous page or an em-dash, place a * before the partial word or em-dash. These markings indicate to the post-processor that the word must be rejoined when the pages are combined to produce the final e-book. Single word at bottom of pageProofread these by deleting the word, even if it's the second half of a hyphenated word. In some older books, the single word at the bottom of the page (called a "catchword", usually printed near the right margin) indicates the first word on the next page of the book (called an "incipit"). It was used to alert the printer to print the correct reverse (called "verso"); to make it easier for printers' helpers to make up the pages prior to binding; also to help the reader avoid turning over more than one page." InitialsRemove all spaces in names printed as initials, even if it appears that the typesetter included spaces (or partial spaces) in the printed version. For example, proofread H. M. S. Pinafore as H.M.S. Pinafore, Proofread G. B. Shaw as G.B. Shaw. This avoids the potential problem of the letters being broken across lines when text is rewrapped. ContractionsRemove any extra space in contractions, for example: would n't should be proofread as wouldn't. This was often an early printers convention, where the space was retained to indicate that 'would' and 'not' were originally separate words. It is also sometimes an artifact of the OCR. Remove the extra space in either case. Some Project Managers may specify in the Project Comments not to remove extra spaces in contractions, particularly in the case of texts which contain slang, dialect, or are written in languages other than English. Poetry/EpigramsThis section applies to an occasional Poem or Epigram in a mainly non-poetry book. For an entire book of poetry, see the special guidelines for Poetry Books. Mark poetry or epigrams so the post-processor can find it more quickly. Insert a separate line with /* at the start of the poetry or epigram and a separate line with */ at the end. Leave a blank line between these markers and the rest of the text. Preserve the relative indentation of the individual lines of the poem or epigram by adding 2, 4, 6 (or more) spaces in front of the indented lines to make them resemble the original. When a line of verse is too long for the printed page, many texts wrap the continuation onto the next printed line and place a wide indentation in front of it. These continuation lines should be rejoined with the line above. Continuation lines usually start with a lower case letter. They will appear randomly unlike normal indentation, which occurs at regular intervals in the metre of the poem. If the poetry is centered on the printed page, don't try to center the lines of poetry during proofreading. Move the lines to the left margin, and preserve the relative indentation of the lines. Footnotes in poetry should be treated the same as usual footnotes during proofreading. See footnotes for details. Line Numbers in poetry should be kept. Put them at the end of the line, leaving at least 6 spaces between them and the end of the text. See Line Numbers for details. Check the Project Comments for the specific text you are proofreading. Books of poetry often have special instructions from the Project Manager. Many times, you won't have to follow all these formatting guidelines for a book that is mostly or entirely poetry.
Letters/CorrespondenceProofread letters and correspondence as you would paragraphs. Put a blank line before the start of the letter, you do not need to duplicate any indenting. Surround consecutive heading or footer lines (such as addresses, date blocks, salutations or signatures) with /* and */ markers. Leave a blank line between the markers and the rest of the text. The markers will ensure the individual lines are kept in post-processing and not rewrapped. Don't indent the heading or footer lines, even if they are indented or right justified in the original—just put them at the left margin. The post-processor will format them as needed.
Lists of ItemsSurround lists with /* and */ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the individual lines are not rewrapped during post-processing. Use this markup for any such list that should not be reformatted, including lists of questions & answers, items in a recipe, etc.
TablesSurround tables with /* and */ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the individual lines are not rewrapped during post-processing. Format the table with spaces to look approximately like the original table. Don't make the table wider than 75 characters. Project Gutenberg's guidelines go on to say "...except where it can't be helped. Never, ever longer than 80...". Do not use tabs for formatting—use space characters only. Tab characters will line up differently between computers, and your careful formatting will not always display the same way. It's often hard to format tables in plain ASCII text; just do your best. This is much easier if you use a mono-spaced font such as DPCustomMono or Courier. Remember that the goal is to preserve the Author's meaning, while producing a readable table in an e-book. Sometimes this requires sacrificing the original format of the table on the printed page. Check the Project Comments and discussion thread because other proofreaders may have settled on a specific format. If there is nothing there, you might find something useful in the Gallery of Table Layouts forum thread. Footnotes in tables should go at the end of the table. See footnotes for details.
Block QuotationsSurround block quotations with /# and #/ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the block quotation is formatted properly during post-processing. Apart from adding the markers, block quotations should be proofed as any other text. Block quotations are long quotations (typically several lines and sometimes several pages) and are often (but not always) printed with wider margins or in a smaller font size—sometimes both.
Double QuotesFor quotes in English language, proofread these as plain ASCII " double quotes. Do not change double quotes to single quotes. Leave them as the Author wrote them. For quotes from other languages, use the quotation marks appropriate to that language if they are available. The French equivalent, guillemets, «like this», are available from the pulldown menus in the proofreading interface. Remember to remove space between the guillemets and the quoted text; if needed, it will be added in post-processing. Same applies to languages which use reversed guillemets, »like this«. The quotation marks used in some texts (in German or other languages), „like this” are also available in the pulldown menus; for the sake of simplicity, you should always use „ and “ regardless of actual quotes used in original text, as long as the quotes used in original text are clearly lower and upper. If needed, the quotes will be changed to ones used in the text in post-processing. The Project Manager may instruct you in the Project Comments to proofread non-English language quotation marks differently for a particular book. Single QuotesProofread these as the plain ASCII ' single quote (apostrophe). Do not change single quotes to double quotes. Leave them as the Author wrote them. Quote Marks on each lineIn general, proofread quotation marks at the beginning of each line of a quotation by removing all of them except for the one at the start of the first line of the quotation. If the quotation goes on for multiple paragraphs, each paragraph should have an opening quote mark on the first line of the paragraph. Often there is no closing quotation mark until the very end of the quoted section of text, which may not be on the same page you are proofreading. Leave it that way—do not add closing quotation marks that are not in the page image. There are some language specific exceptions. In French, for example, dialog within quotations uses a combination of different punctuation to indicate various speakers. If you are not familiar with a particular language, check the Project Comments or leave a message for the Project Manager in the Forum Discussion for clarification. Periods Between SentencesProofread periods between sentences with a single space after them. You do not need to remove extra spaces after periods if they're already in the scanned text—we can do that automatically during post-processing. See the Chapter Headers image and text for an example. PunctuationIn general, there should be no space before punctuation characters except opening quotation marks. If scanned text has a space before punctuation, remove it. This applies even to languages, such as French, which normally use spaces before punctuation characters. Spaces before punctuation sometimes appear because books typeset in the 1700's & 1800's often used partial spaces before punctuation such as a semicolon or comma.
Line BreaksLeave all line breaks in so that the next proofreader and the post-processor can compare the lines in the text to the lines in the image easily. Be especially careful about this when rejoining hyphenated words or moving words around em-dashes. If the previous proofreader removed the line breaks, please replace them so that they once again match the image. Extra blank lines that are not in the image should be removed except where we intentionally add them for formatting. But blank lines at the bottom of the page are fine—these are removed during post-processing. Extra Spaces or Tabs Between WordsExtra spaces and tab characters between words are common in OCR output. You don't need to bother removing these—that can be done automatically during post-processing. However, extra spaces around punctuation, em-dashes, quote marks, etc. do need to be removed when they separate the symbol from the word. For example, in A horse ; my kingdom for a horse. the space between the word "horse" and the semicolon should be removed. But the 2 spaces after the semicolon are fine—you don't have to delete one of them. Trailing Space at End-of-lineDo not bother inserting spaces at the ends of lines of text. It is a waste of your time for something that we can take care of automatically later. Similarly do not waste your time removing extra spaces at the ends of lines. Line NumbersKeep line numbers. Place them at least six spaces past the right hand end of the line, even if they are on the left side of the poetry/text in the original image. Line numbers are numbers in the margin for each line, or sometimes every fifth or tenth line, and are common in books of poetry. Since poetry will not be reformatted in the e-book version, the line numbers will be useful to readers. Extra Spacing/Stars/Line Between ParagraphsMost paragraphs start on the line immediately after the end of the previous one. Sometimes two paragraphs are separated to indicate a "thought break." A "thought break" may take the form of a line of stars, hyphens or some other character, a plain or floridly decorated horizontal line, a simple decoration, or even just an extra blank line or two. A "thought break" may represent a change of scene or subject, a lapse in time or a bit of suspense. This is intended by the author, so we preserve them by putting a blank line, 5 *'s indented 7 spaces and then 7 spaces apart, as shown in the example. Sometimes printers used decorative lines to mark the ends of chapters. As we already mark Chapter Headers, there is no need to add a "thought break" marker. The proofreading interface has the "thought break" marker available to cut and paste.
Period Pause "..." (Ellipsis)The guidelines are different for English and Languages Other Than English (LOTE). ENGLISH: Leave a space before the three dots, and a space after. The exception is at the end of a sentence, when there would be no space, four dots, and a space after. This is also the case for any other ending punctuation mark: the 3 dots follow immediately, without any space. For example: Sometimes you will see it with the punctuation at the end; so proofread it that way: Remove extra dots, if any, or add new ones, if necessary, to bring the number to three (or four) as appropriate. LOTE: (Languages Other Than English) Use the general rule "Follow closely the style used in the printed page." Sometimes the printed page is unclear: in that case, insert a * to draw the attention of the post-processor. If spaces appear to exist between the dots, or between the word and the dots, replace the spaces with underscores: like this_... or like this_._._. depending on the style. This will avoid problems in rewrapping, and will be replaced by spaces during post-processing. Accented/Non-ASCII CharactersPlease proofread these using the proper UTF-8 characters. For characters which are not in Unicode, see the Project Manager instructions in the Project Comments. If they are not on your keyboard, there are several ways of inputting these characters:
The original Project Gutenberg will post as a minimum, 7-bit ASCII versions of texts, but versions using other character encodings which can preserve more of the information from the original text are accepted. Project Gutenberg Europe publishes UTF-8 as default encoding, but other appropriate encodings are also welcomed. Currently for Distributed Proofreaders this means using Latin-1 or ISO 8859-1 and -15, and in the future will include Unicode. Distributed Proofreaders Europe already uses Unicode. For Windows:
For Apple Macintosh:
‡ Note: No equivalent shortcut, use drop-down menus. Characters with Diacritical marksIn some projects, you will find characters with special marks either above or below the normal latin A..Z character. These are called diacritical marks, and indicate a special pronunciation for this character. If such a character does not exist in Unicode, it should be entered by using combining diacritical marks: these are Unicode symbols which can't appear alone, but appear above (or below) the letter after which they are placed. They could be entered by first entering the base letter, and then the combining mark, using applets and programs mentioned above. On some systems, diacritical marks may not appear exactly where they should, but, for example, moved to the right. They should still be used, as people with other systems will see them correctly. However, if, for any reason, you can't see or enter combining marks properly, mark such letter with an *. Note that Modifier diacritical marks also exist; these should not be used. Non-Latin CharactersThere are projects which contain text printed in non-Latin characters; that is, characters other than the Latin A...Z characters, for example Greek, Cyrillic (used in Russian, Slavic and other languages), Hebrew, or Arabic characters. These characters should be entered in the text just as Latin characters are. (WITHOUT transliteration!) If a document is written entirely in a non-Latin script, it is the best to install a keyboard driver which supports the language. Consult your operating system manual for instructions on how to do that. If the script appears only occasionaly, you may use a separate program to enter it. See above for some of the programs. If you are uncertain about a character or an accent, mark it with an * to bring it to the attention of the second round proofreader or the post-processor. For scripts which cannot be so easily entered, such as Arabic, surround the text with appropriate markers: [Arabic: **] and leave it as scanned. Include the ** so the post-processor can address it later. FractionsProofread fractions as follows: 2½ becomes 2-1/2. The hyphen prevents the whole and fractional part from becoming separated when the lines are rewrapped during post-processing. Page References "See Pg. 123"Proofread page number references within the text such as (see p. 123) as they appear in the image. Check the Project Comments to see if the Project Manager has special requirements for page references. IndexesPlease retain page numbers in index pages. Surround the index with /* and */ tags, leaving a blank line before /* and after */. You don't need to align the numbers as they appear in the scan; just put a comma or semicolon, followed by the page numbers. Indexes are often printed in 2 columns; this narrower space can cause entries to split onto the next line. Rejoin these back onto a single line. Indexes are a case where long lines created by following this rule are acceptable, since the lines will be re-wrapped to the proper width and indentation during post-processing. Place one blank line between each entry in the index. For sub-topic listings in an index, start each one on a new line, indented 2 spaces. Treat each new section in an index (A, B, C...) the same as a section header by placing 2 blank lines before it. Old books sometimes printed the first word of each letter in the index in all caps or small caps; change this to match the style used for the rest of the index entries.
Plays: Actor Names/Stage DirectionsFor all plays:
For metrical plays: (Plays written as rhymed poetry)
Please check the Project Comments, as the Project Manager may specify different formatting.
Anything else that needs special handling or that you're unsure ofWhile proofreading, if you encounter something that isn't covered in these guidelines that you think needs special handling or that you are not sure how to handle, post your question, noting the png (page) number, in the Project Discussion thread (a link to the project-specific forum is in the Project Comments), and put a note in the proofread text explaining the problem. Your note will explain to the next proofreader or post-processor what the problem or question is. Start your note with a square bracket and two asterisks [** and end it with another square bracket ]. This clearly separates it from from the Author's text and signals the next proofreader to stop and carefully examine this part of the text & the matching image to address any issues. If you are proofreading in a later round and come across a note from a proofreader in a previous round, once you have resolved the issue, please take a moment and provide Feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation in the future. Specific Guidelines for Special BooksThese particular types of books have specific guidelines that add to or modify the normal guidelines given in this document. Projects for these books are often difficult, and are not recommended for beginning proofreaders. They are more appropriate to experienced proofreaders or people who have expertise in the particular field. Click on the link below when you need to see the guidelines for one of these types of books.
Common ProblemsOCR Problems: 1-l-IOCR commonly has trouble distinguishing between the digit '1' (one), the lowercase letter 'l' (ell), and the uppercase letter 'I'. This is especially true for books where the pages may be in poor condition. Watch out for these. Read the context of the sentence to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading. Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier. OCR Problems: 0-OOCR commonly has trouble distinguishing between the digit '0' (zero), and the uppercase letter 'O'. This is especially true for books where the pages may be in poor condition. Watch out for these. Normally the context of the sentence is sufficient to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading. Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier. OCR Problems: Hyphens and DashesOCR commonly has trouble distinguishing between dashes & hyphens. Proofread these carefully—OCR'd text often has only one hyphen for an em-dash that should have two. See the rules for a hyphenated words and em-dashes for more detailed information. Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier. OCR Problems: ScannosAnother common OCR issue is misrecognition of characters. We call these errors "scannos" (like "typos"). This misrecognition can result in a word which:
Possibly the most common example of the second type is "and" being OCR'ed as "arid." Other examples: "eve" for "eye", "Torn" for "Tom", "train" for "tram". This type is harder to spot and we have a special term for them: "Stealth Scannos." We collect examples of Stealth Scannos in this thread. Spotting scannos is much easier if you use a mono-spaced font such as DPCustomMono or Courier. Handwritten Notes in BookDo not include handwritten notes in a book (unless it is overwriting faded, printed text to make it more visible). Do not include handwritten marginal notes made by readers, etc. Some Project Managers may ask that handwritten notes be marked with [HW: (text of the note)]. Bad ImagesIf an image is bad (not loading, chopped off, unable to be read), please put a post about this bad image in the Project Comments forum. Do not click on "Return Page to Round"; if you do, the page will be reissued to the next proofreader. Instead, click on the "Report Bad Page" button so this page is 'quarantined'. Note that some page images are quite large, and it is common for your browser to have difficulty displaying them, especially if you have several windows open or are using an older computer. Before reporting this as a bad page, try clicking on the "Image" line on the bottom of the page to bring up just the image in a new window. If that brings up a good image, then the problem is probably in your browser or system. It's fairly common for the image to be good, but the OCR scan is missing the first line or two of the text. Please just type in the missing line(s). If nearly all of the lines are missing in the scan, then either type in the whole page (if you are willing to do that), or just click on the "Return Page to Round" button and the page will be reissued to someone else. If there are several pages like this, you might post a note in the Project Comments forum to notify the Project Manager. Wrong Image for TextIf there is a wrong image for the text given, please put a post about this bad image in the Project Comments forum. Do not click on "Return Page to Round"; if you do, the page will be reissued to the next proofreader. Instead, click on the "Report Bad Page" button so this page is 'quarantined'. Previous Proofreader MistakesIf the previous proofreader made a lot of mistakes or missed a lot of things, please take a moment and provide Feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation so that they will know how in the future. Please be nice! Everyone here is a volunteer and presumably trying their best. The point of your feedback message should be to inform them of the correct way to proofread, rather than to criticize them. Give a specific example from their work showing what they did, and what they should have done. If the previous proofreader did an outstanding job, you can also send them a message about that—especially if they were working on a particularly difficult page. Printer Errors/MisspellingsCorrect all of the words which the OCR has misread (scannos), but do not correct what may appear to you to be misspellings or printer errors that occur on the scanned image. Many of the older texts have words spelled differently from modern usage and we retain these older spellings, including any accented characters. If you are unsure, place a note in the txet [**typo for text?] and ask in the Project Discussion thread. If you do make a change, include a note describing what you changed: [*Transcriber's Note: typo fixed, changed from "txet" to "text"]. Include an * so the post-processor will notice it. Factual Errors in TextsIn general, don't correct factual errors in the author's book. Many of the books we are proofreading have statements of fact in them that we no longer accept as accurate. Leave them as the author wrote them. A possible exception is in technical or scientific books, where a known formula or equation may be given incorrectly, especially if it is shown correctly on other pages of the book. Notify the Project Manager about these, either by sending them a message via the Forum, or by inserting [**note sic explain-your-concern] at that point in the text. Uncertain Items[...to be completed...]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||