Download interlinear.pl.
Currently, this script expects your computerized lexicon to be the same format as mine. If you would like to use this script, email me and I'll try to add support for your format (assuming it's a consistent format, so that a computer program can easily parse it — if you keep your language information in prose format, I can't help ya ;)).
Documentation
Usage
At the command line, run:
perl ./interlinear.pl < source-file.txt > interlinearized-file.html
Syntax
See the interlinearized texts in Writing section of Arthaey's website:
http://www.arthaey.com/conlang/writing/
Configuration
You must begin each interlinear source file with a configuration section, which defines the names of the languages used and specifies where the lexicon file and the dictionary HTML page are located. For example:
<config> L0 = LanguageToBeInterlinearized L1 = SmoothTranslationLang L2 = OtherSmoothTranslationLang dictionary = ../www/dictionary.html lexicon = saved-lexicon </config>
The language codes must be L0..L9, and L0 must be the language whose lines are
to be interlinearized. You must define dictionary
to be the relative path to
the HTML version of your dictionary (morphemes will be linked to
$dictionary#$morpheme
). You must also define lexicon
to be the relative path
to the FreezeThaw-saved version of your lexicon.
You may optionally include extra words in ``temporary lexicon'' section, before
the interlinear text itself. Words defined here will override words in the
lexicon defined in the config
section (although only for this one text).
Use the same format as for your main lexicon (which currently must be SIL
Shoebox's format) Proper names are the most likely thing to be defined here.
For example:
<lexicon> \lx Arthei \ph 'Ar\Te \ps prop \ge Arthaey </lexicon>
Interlinear Markup
After the <config> ... </config> section comes the interlinear text. These lines begin with one of the Ln language codes defined in the configuration section, followed by a colon and whitespace, and then the text itself. For the L0 line, you will further mark the text up so that it can be properly broken down into morphemes and automatically glossed.
Place |
at the end of each morpheme. To select a morpheme's sense that
isn't the first one, append the sense's number directly after the pipe. Thus,
bat
and bat|1
will gloss to the first meaning of the word bat, and
bat|2
will gloss to the second meaning of the word bat. The order of
words' senses is determined by order of entry in the lexicon.
Surround with {
and }
characters that belong in the final orthographic
version but that aren't part of the dictionary form of the morpheme. These
characters will be displayed in the final version, but will not be used to
look up the glosses of morphemes. (Punctuation marks will need to be included
in curly braces, for example.)
Add parts of morphemes that have been left out of the final orthographic
version with [
and ]
. These characters will not be displayed in the final
version, but they will be used to look up the glosses of morphemes.
A #
will become a newline (HTML <br/>
), and two ##
together will
become a new paragraph tag (HTML <p/>
) in the big orthographic version.
To preserve the case of a particular word, prefix it with ^
. This is most
useful for proper names.
Any HTML (or anything, really) between <
and >
will be passed
verbatim to the big orthographic version of the text, although not to the
line-by-line orthograrhic version.
Links to each line's line number are automatically placed at the very beginning
of each line. Normally, this is what you want. Sometimes, however, you will
want more explicit control over the link's placement: for example, HTML
headings will otherwise cause a line break between the link and the line
itself. Anywhere a @
appears in a line, it will be replaced by the link to
the line number.
Source
Download interlinear.pl.