What is it?
lex2xml is a Perl script that converts a Shoebox .lex file into XML. It reads the .lex file from STDIN and outputs the XML to STDOUT.
Note that the current version is written specifically for my conlang Asha'ille. You will have to modify the script in the following places to use it yourself:
- replace
@hierarchy
with your own word-categories - replace
<lexicon>
attributes with your own values - replace
<person>
node with your own values - replace
x-asha
with your own ISO language code - replace Ashaille.pm with your own language-specific module, defining
(and renaming) the following functions:
- replace
kateinu
andkateinu_sort
with your own conlang-specific romanization schemes
- replace
Requirements
Download
The current version of lex2xml can be downloaded here (last modified July 25, 2011).
Usage
Run the Perl script like so:
lex2xml < dictionary.lex > dictionary.xml
See lexicon-update.sh for how to use this script to go from Shoebox .lex file to the "pretty" dictionary and thesaurus that I use for Asha'ille.
Examples
Example dictionary.lex input:
\lx caea \ph 'ke.A \ps n \ge world \sd fe:nature \et fv:aea \dc 19/Nov/2002 \dt 16/Feb/2004 \lx jhurla \ph 'Zur\lA \ps interj \ge hello \xv Vedá aró jhurla vel das. \xe Good morning, everyone. \xt 17/Dec/2004 \ue Cannot be used to greet someone you don't know \dc before 17/Dec/2004 \dt 17/Dec/2004
Example dictionary.xml output:
<?xml version="1.0" encoding="UTF-8" ?> <lexicon xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.arthaey.com/conlang/lexicon lexicon.xsd" lexeme-lang="x-asha" document-lang="en" src="http://www.arthaey.com/conlang/lexicon/lexicon.xml" > <person role="author"> <name>Arthaey Angosii</name> <email>arthaey@gmail.com</email> <url>http://www.arthaey.com/</url> </person> <entry> <lexeme>caea</lexeme> <lexeme-sort>CAEA</lexeme-sort> <ipa>ˈke.ɑ</ipa> <cxs>'ke.A</cxs> <kateinu>cæa</kateinu> <kateinu-sort>cDB</kateinu-sort> <word-class>n</word-class> <gloss lang="en">world</gloss> <gloss-sort>WORLD</gloss-sort> <domain lang="en">nature</domain> <domain-path>Nature</domain-path> <xref type="etymology">aea</xref> <date>2002-11-19</date> </entry> <entry> <lexeme>jhurla</lexeme> <lexeme-sort>JHURLA</lexeme-sort> <ipa>ˈʒuɹlɑ</ipa> <cxs>'Zur\lA</cxs> <kateinu>Jurპla</kateinu> <kateinu-sort>TGJRB</kateinu-sort> <word-class>interj</word-class> <gloss lang="en">hello</gloss> <gloss-sort>HELLO</gloss-sort> <example> <text lang="x-asha">Vedá aró jhurla vel das.</text> <text lang="en">Good morning, everyone.</text> </example> <note type="usage">Cannot be used to greet someone you don't know</note> <date>2004-12-17</date> </entry> </lexicon>