Currently almost the entire Lenin Archive needs to repaired and cleaned up due to extremely poor HTML layout and mouseover code. In order to do this we've prepared this todo list, a list of all works that have yet to be cleaned, wrote a script to automate some of the cleanup process and have cleaned up Volume One of the Collected Works to be used as a reference so that volunteers can have guide on how to help with the cleanup effort.
A List of All Works to be Cleaned Up
To-Do Index
Things you'll need to setup for cleanup:
Cleanup Script
First thing you'll need to do with the HTML is to copy it all and paste it into the input box below and click the 'Process Text' button. This will cleanup the majority of the errors present in the HTML. Then you can click the 'Copy Text' button to copy the cleaned up text to your clipboard and paste it into your text editor so that you can complete the rest of the cleanup.
Input Output
Paragraphs
Paragraphs should all be one line (use Ctrl + J on Notepad++). This also goes for any other element in general but the problem is most prevalent with paragraphs.
Like this:
<p>Text for illustration purposes. Text for illustration purposes. Text for illustration purposes. Text for illustration purposes. Text for illustration purposes.</p> |
Instead of this:
<p> Text for illustration purposes. Text for illustration purposes. Text for illustration purposes. Text for illustration purposes. Text for illustration purposes. </p> |
Making paragraphs one line will make it easier to use the search function for spelling and mark-up errors in future.
Table and List Cleanup
Tables and Lists need to be cleaned up to be more readable. We'll be following the W3 standard of 2 spaces for indentation for this.
Tables:
<table> <tr> <td>Example Text</td> <td>Example Text</td> <td>Example Text</td> </tr> <tr> <td>Example Text</td> <td>Example Text</td> <td>Example Text</td> </tr> <tr> <td>Example Text</td> <td>Example Text</td> <td>Example Text</td> </tr> </table> |
Unordered lists:
<ul> <li>Example Text</li> <li> <ul> <li>Sublist Example Text</li> <li>Sublist Example Text</li> <li>Sublist Example Text</li> </ul> </li> <li>Example Text</li> <li>Example Text</li> <li>Example Text</li> <li>Example Text</li> </ul> |
Ordered lists:
<ol> <li>Example Text</li> <li> <ol> <li>Sublist Example Text</li> <li>Sublist Example Text</li> <li>Sublist Example Text</li> </ol> </li> <li>Example Text</li> <li>Example Text</li> <li>Example Text</li> <li>Example Text</li> </ol> |
V.I Lenin Header
The V.I. Lenin header needs to have its the links to the next work/chapter removed and be changed from "V.I Lenin" to "Vladimir Ilyich Lenin"
| From this: |
<h2>
<a title="The New Factory Law" href="../../1897/jun/02.htm">V. I.</a>  
<a title="The Tasks of the Russian Social-Democrats" href="../../1897/dec/31b.htm">Lenin</a></h2>
|
| To this: |
<h2>Vladimir Ilyich Lenin</h2> |
Replace footer tables with the following format. You can copy, paste and adjust the template below. Anything that is written as a template will need to be changed for the appropriate work/chapter. If a work is not part of a multi-chapter work then remove the first table before the <hr> tag. Refer to a cleaned up work for reference.
<table class="footer"> <tr> <td><a href="template.htm">Previous Chapter</a></td> <td>|</td> <td><a href="template.htm">Next Chapter</a></td> </tr> <tr> <td colspan="3"><a href="template.htm">Document Index</a></td> </tr> </table> <hr class="end"> <table class="footer"> <tr> <td class="footer-backward" colspan="3"><a href="template.htm">< Backwards</a></td> <td></td> <td class="footer-forward" colspan="3"><a href="template.htm">Forwards ></a></td> </tr> <tr> <td><a href="../../index.htm">Works Index</a></td> <td>|</td> <td><a href="../../cw/template.htm">Current Volume</a></td> <td>|</td> <td><a href="../../cw/index.htm#template">Collected Works</a></td> <td>|</td> <td><a href="../../../index.htm">L.I.A. Index</a></td> </tr> </table> |
GUESS Links
The GUESS portion of links in indexes need to be removed.
| From this: | <a href="i8i.htm#v02zz99h-135-GUESS">The Economic Theories of Romanticism</a> |
| To this | <a href="i8i.htm">The Economic Theories of Romanticism</a> |
PLACEHOLDER Footnotes
You'll occassionally find footnotes that only reads as [PLACEHOLDER FOOTNOTE]. These need to be filled in. To find the footnote you can check the <a> tag's id attribute. There are two types of footnotes so we'll explain how you're supposed to read them.
The first are Editor footnotes written either by Lenin or by the editor of that work at the time.
id="#bkV01P001F01" V01 represents the Lenin Collected Works Volume (in this case Volume 1) P001 is the page number the footnote appears on (page 1 in this case) F01 represents which footnote appeared on that page (in this case the first) |
The second are Collected Works notes written by the Collected Works editors and are found at the end of the book.
id="#bkV01E001" V01 represents the Lenin Collected Works Volume (in this case Volume 1) E001 is the number of that footnote. |
Then its as simple as finding the specific footnote in the pdf of the book and filling in the placeholder.
Automated Sections
This section contains all the cleanup that is usually done by the script. We have included it in case they were missed by the script so that you can clean it up manually.
HTML5 Conversion (partially automated)
We'll be converting the Lenin Internet Archive from XHTML over to HTML5. The Nu Checker will inform you what elements/attributes are obsolete.
Convert the DOCTYPE
| From this: | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-transitional.dtd"> |
| To this: | <!DOCTYPE html> |
The beginning <html> tag needs to be changed:
| From this: | <html xmlns="http://www.w3.org/1999/xhtml"> |
| To this: | <html lang="en"> |
Self-closing tags no longer need a / to close themselves.
| From this: | <br /> <hr /> <hr class="section" /> |
| To this: | <br> <hr> <hr class="section"> |
The name attribute has been made obsolete for all tags (except for those within the <head> tag) in favour for the id attribute. Most footnotes in the LIA contain both so simply remove the name attribute.
| From this: | <a name="ft" id="ft"> |
| To this: | <a id="ft"> |
Mouseover Code (automated)
Remove <a> Tags Related to Mouseover Code
Example of mouseovers code:
<a onmouseover="window.status=' 04 . 013 . v04pp64h '" onmouseout="window.status=''"> </a> <a onmouseover="window.status=' 04 . 013 . v04pp64h '" onmouseout="window.status=''">This</a> is some text for illustration purposes [...] |
Remove anchor <a> tags that look like this:
<a name='v04pp64h:14' /> <a name='v04pp64h:14'> |
Comments (automated)
Remove comments that look like the following:
<!-- Emacs-File-stamp: "~/Lia/archive/lenin/works/1898/aug/facstats.htm" -->
<!-- Textfile born: "2003-01-01T01:01:01-0100" -->
<!-- Emacs-Time-stamp: "2005-03-04 16:58:19 cymbala" -->
<!-- t2h-body -->
<!-- vol=04 pg=013 src=v04pp64h type= -->
|
Footnotes (automated)
Footnotes need to be simplified and corrected.
Firstly class="ednote" and class="anote" need to be changed to class="endnote".
Secondly the class="endnote" attribute inside the <a> element doesn't do anything and should actually be in the <sup> element in order to have it do its intended effect.
Thirdly as previously mentioned, the name attribute should be removed as it has now become obsolete.
| From this: |
<sup><a class="anote" id="bkV01E001" name="bkV01E001" href="#fwV01E001">[1]</a></sup> <sup><a class="ednote" id="bkV01P001F01" name="bkV01P001F01" href="#fwV01P001F01">[1]</a></sup> |
| To this: |
<sup class="endnote"><a id="bkV01E001" href="#fwV01E001">[1]</a></sup> <sup class="endnote"><a id="bkV01P001F01" href="#fwV01P001F01">[1]</a></sup> |
Entities (automated)
Replace entities with their actual counterpart. This won't be a comprehensive list but will go over the most common ones to be replaced. You'll have to search up any that aren't on this list.
|
.txt Links (automated)
In the infoblock you'll find in the Source info (usually in the year) a .txt link. Remove this from the HTML.
Example:
<a href="../../cw/v13pp72.txt">1972</a> |
From this:
Source: Lenin Collected Works, Progress Publishers, 1972, Moscow, Volume 13, pages 15-49.
To this:
Source: Lenin Collected Works, Progress Publishers, 1972, Moscow, Volume 13, pages 15-49.
README Links (automated)
You'll sometimes find in the infoblock a "• README" link that goes to a deadend. Simply remove this in the HTML.
From this:
Public Domain: Lenin Internet Archive (2004). You may freely copy, distribute, display and perform this work; as well as make derivative and commercial works. Please credit “Marxists Internet Archive” as your source. • README
To this:
Public Domain: Lenin Internet Archive (2004). You may freely copy, distribute, display and perform this work; as well as make derivative and commercial works. Please credit “Marxists Internet Archive” as your source.
class="title" Attributes (automated)
Remove class="title" attribute from <h2>, <h3>, <h4> etc titles. They do not do anything.
| From this: | <h3 class="title">The Fourth Duma Election Campaign and the Tasks of the Revolutionary Social-Democrats</h3> |
| To this | <h3>The Fourth Duma Election Campaign and the Tasks of the Revolutionary Social-Democrats</h3> |
When You're Finished
Run your HTML through the Nu HTML Checker for any mistakes and HTML5 conversions that were missed.
If you so wish, add the following to the info header of the work with you credited:
Refactored by: [Your Name/Handle]
Send an email with the cleaned up HTML file and and a link to the work to the current admin of the LIA and inform them of your work.