|
|
Text Conversion ProgramsBy Shlomo Perets, MicroType This column will address various groups of programs which in different cases and situations can be of much use to the technical writer, saving hours of work, enhancing the final product and sometimes locating and dealing with problems which are difficult to trace. In this issue, I would like to discuss text conversion programs. When we incorporate raw files from the outer world in our work part lists, database reports, or text files created on other computing platforms we frequently encounter some or all of the following problems:
In all of these cases, replacing existing character strings with new strings will solve the problems, provided you know exactly what is wrong and what is needs to be changed to. While text editors and word processors enable this using the "find and replace" function, this is generally impractical and the degree of success is limited. A conversion program will do the job much more successfully and efficiently. Conversion programs can run on many types of files. They can also be used to process FrameMaker or Word files, but you must operate on the textual representation of the file (eg MIF or RTF) and not on the native binary format. A conversion program enables execution of a series of custom-defined search and replace actions (known as rules), but the success of the conversion depends on you. First, you need to know what exactly in the input file you want to change. You can analyze the input file using programs such as List (by Vernon D. Buerg), which function like an X-ray of the file, showing the contents both as regular text (where possible) and as hexadecimal codes. You then formulate your search and replace rules, and run the conversion program. Invariably, this is a trial and error process, and each time you run the program you will discover nuances you have not yet addressed. If necessary, update the rules you formulated in order to improve the handling of the different variations encountered in the input files. Although this may seem a tedious process while you are doing it, once the rules are finalized the conversion program can be operated on any number of files of the same type with minimum intervention thus automating the processing and manipulation of massive quantities of text. Simple rules might be in a find and replace format such as XFMR
-> transformer , and even at this level, using a conversion
program to run batches of numerous such replacements is much
more efficient than using the Find and Replace function in your
word processor.
As you progress, your rules can get complex. To make them
more readable you can use abbreviations and include annotations.
There are also several public domain or shareware conversion programs available. The functionality and abilities of these programs varies widely.
In any case, I recommend that you search the internet for the
most appropriate program for your specific needs. The following Reform(at) script removes commands (starting/ending with angled brackets) and change multiple spaces into a single space. The DELETE logical variable is initially set to false. When a left angled bracket is encountered the DELETE variable is set to true. When DELETE is true, characters in the ALMSTANY category are skipped. A right angled toggles the values of the DELETE variable. Scripts can get complicated and sometimes you need multiple scripts, with the output of one script piped as an input to another script. \def DELETE false \def ALMSTANY { 20 .. 3B, 3D, 3F..FF } \def SPACEBAND \2{" "} ; \n{x} means "n
or more occurences of x" "</P>" --> 0d 0a 0d 0a ; end of paragraph ?~DELETE : DELETE "<" -->
Techniques & Resources |