Some customizations of rtftohtml require a little understanding of how the filter work, others require a lot. All of the customizations involve editing either html-trans or one of the character translation files.
In html-trans there are four tables. They are .PTag, .TTag, .PMatch and .TMatch. These tables begin with the name (in column one) and continue until the next table starts. All blank lines and lines beginning with a '#' are discarded. '#' lines are typically used for comments. The tables themselves are composed of records containing a fixed number of fields which are separated by commas. The fields are either strings (which should be quoted) integers or bitmasks.
Each entry in the .PTag table describes an HTML paragraph markup . The format is:
.PTag
#"name","starttag","endtag","col2mark","tabmark","parmark",allowtext,cannest,DelteCol1,fold,TocStyl
"h1","<h1>\n","</h1>\n","\t","\t","<br>\n",0,0,0,1This is a level 1 heading. The "\n" in the start and end-tag fields forcesa newline in the HTML markup. Since newlines are ignored in HTML (except in <pre>) it's only effect is to make the HTML output more readable. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<br>" followed by a newline (just for looks). Text markup (like <b>) is not allowed within <h1> text, because we leave that up to the HTML client. No nesting is allowed - (see the discussion on nested styles in section 7). No text is deleted. Every paragraph using this markup will also generate a level-1 table of contents entry.
"Normal","","\n","\t","\t","<p>\n",1,0,0,0This is the default for normal text. Regular text in HTML has no required start and end-tags. The "\n" in the end-tag field forces a newline in the HTML markup. Since newlines are ignored in HTML (except in <pre>) it's only effect is to make the HTML output more readable. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<p>" followed by a newline (just for looks). Text markup (like <b>) is allowed within Normal text. No nesting is allowed - (see the discussion on nested styles in section 7). No text is deleted.
"ul","<ul>\n<li>","</ul>","\t","\t","\n<li>",1,1,0,0This is the entry for unordered lists. This generates a "<ul>\n<li>" at the start of the list and "</ul>/n" at the end. There is no difference between the first tab and any other. They both translate to a tab mark. Paragraph marks generate "<li>" preceded by a newline (just for looks). Text markup (like <b>) is allowed, and this entry may be nested - and it allows others to be nested within it. This allows nested lists. No text is deleted.
"ul-d","<ul>\n<li>","</ul>","\t","\t","\n<li>",1,1,1,0This entry is identical to the previous except that the DeleteCol1 field is set to 1. This is used to remove bullets (which really appear in the RTF) because we don't want to see them in the HTML.
Each entry in the .TTag table describes an HTML text markup . The format is:
.TTag
"name","starttag","endtag"
Each entry in the .PMatch correlates a paragraph style name to some entry in the .PTag table. The format is:
.PMatch
"Paragraph Style",nesting_level,"PTagName"
5.1.3.1 Sample .PMatch Entries
"heading 1",0,"h1"This is a level 1 heading. Any paragraphs with this paragraph style will be mapped to the entry in the .PTag table named "h1".
"numbered list",0,"ol-d"This is used for numbered lists. Any paragraphs with this paragraph style will be mapped to the entry in the .PTag table named "ol-d".
"numbered list 2",2,"ol-d"This is an entry for a nested paragraph style. The nesting level of two is used to indicate that this paragraph should appear in the HTML nested within two levels of paragraph markups. The paragraph marked with this style may only appear after a paragraph style that has a nesting level of 1 or greater.
Each entry in the .TMatch table describes processing for text styles . The format is:
.TMatch
"Font",FontSize,Match,Mask,"TextStyleName"
The order of bits in the Match and Mask bit-maps are: # v^bDWUHACSOTIB - Bold # v^bDWUHACSOTI - Italic # v^bDWUHACSOT - StrikeThrough # v^bDWUHACSO - Outline # v^bDWUHACS - Shadow # v^bDWUHAC - SmallCaps # v^bDWUHA - AllCaps # v^bDWUH - Hidden # v^bDWU - Underline # v^bDW - Word Underline # v^bD - Dotted Underline # v^b - Double Underline # v^ - SuperScript # v - SubScript
5.1.4.1 Sample .TMatch Entries
# double-underline/not hidden -> hot text # double-underline/hidden -> href # v^bDWUHACSOTIB,v^bDWUHACSOTIB "",0,00100000000000,00100010000000,"_Hot" "",0,00100010000000,00100010000000,"_HRef"The first entry will match any text formatted with double underline EXCEPT if it is hidden text. This is accomplished by using those two bits to compare (the MASK field) and having a 1 in the double underline bit and a zero for the hidden text bit. The second entry will match any text formatted with BOTH double underline and hidden text. Any text that matches the first will be treated as the hot text of a link. Any text that matches the second will be taken as the href itself. (The filter requires that the HRef text immediately precede the Hot text.)
# Regular matches - You can have multiple of these active # monospace fonts -> tt "Courier",0,00000000000000,00000000000000,"tt"This will match any text that uses the Courier font and mark it using the HTML text markup appearing in the .TTag table with the entry name "tt".
# bold -> bold # v^bDWUIACSOTIB,v^bDWUIACSOTIB "",0,00000000000001,00000000000001,"b"This will match any text that has bold attributes and will mark it using the HTML text markup appearing in the .TTag table with the entry name "b". Note that bold text using the Courier font would match both this entry and the previous. This will yeild markup of the form <b><tt>hi</tt><b>. Note that "b" is the name of an entry in the .TTag table, not the HTML markup that is used!
To add a new paragraph style , simply go to the .PMatch table and add an entry to the end. Put the name of the paragraph style (quoted), the nesting level (usually zero) and the name of the .PTag entry that should be used.