upCast Processing Language (UPL) Specification

Stefan Christian Roth &

R

Revision History
Revision 1Sun, 17 Jan 2010 13:33:00 CET

1. Introduction
1. Goals of UPL
2. UPL in context
3. An example of UPL usage
2. UPL components
1. UPL Core
2. UPL Tree Processor
3. UPL Core (Language Reference)
1. Introduction
2. Lexical Structure
2.1. Escape Sequences
2.2. Line Terminators
2.3. Tokens
2.4. White Space
2.5. Comments
2.6. Identifiers
2.7. Keywords
2.8. Literals
2.9. Separators
2.10. Operators
3. Types
3.1. Bool
3.2. Color
3.3. Id
3.4. List
3.5. Null
3.6. Numeric
3.7. String
3.8. Void
4. Operator/Type Matrices
4.1. ++ (increment), -- (decrement)
4.2. + (unary plus), - (unary minus)
4.3. ! (not)
4.4. * (multiplication), div (division), mod (modulo)
4.5. = (equals)
4.6. != (is not equal to)
4.7. + (addition/concatenation)
4.8. - (subtraction)
4.9. < (less-than), > (greater-than)
4.10. <= (less-than or equal), >= (greater-than or equal)
4.11. or (logical), and (logical), xor
4.12. := (assignment)
5. Blocks
6. Flow control
6.1. if
6.2. if … else/elseif
6.3. while
6.4. do … while
6.5. for
6.6. for-each
6.7. break
6.8. return
7. Exceptions and exception handling
8. Expressions
9. Variables
9.1. Definition
9.2. Reference
9.3. Assignment
10. Parameters
10.1. Definition
10.2. Reference
11. Functions
11.1. Defining a function
11.2. Calling a function
12. Java Function Bindings
12.1. Defining a function binding
12.2. Calling a bound Java function
13. Directives
13.1. #charset
13.2. #set
13.3. #include
13.4. #namespace
4. UPL Tree Processor
1. Building blocks
1.1. UPL Core constructs
1.2. Rules
2. Processing model
2.1. Run initialize()
2.2. Walk the document tree
2.3. Run finalize() / error-finalize()
5. UPL Function reference
1. Type casting functions
1.1. to-bool
1.2. to-color
1.3. to-id
1.4. to-list
1.5. to-null
1.6. to-numeric
1.7. to-string
2. Functions on Colors
2.1. get-color-component
3. Date & Time functions
3.1. current-dateTime
3.2. format-dateTime
4. File functions
4.1. add-to-zipfile
4.2. copy-file
4.3. create-zipfile
4.4. delete-file
4.5. file-exists
4.6. fs-copy
4.7. fs-create
4.8. fs-delete
4.9. fs-move
4.10. get-path-component
4.11. is-file
4.12. is-filetype
4.13. is-folder
4.14. list-files
4.15. list-files
4.16. list-files-recursively
4.17. list-files-recursively
4.18. move-file
4.19. write
4.20. writeln
5. Functions for grouping
5.1. is-painted
5.2. mark-end
5.3. mark-start
5.4. paint-adjacent
5.5. paint-following
5.6. paint-preceding
5.7. set-paint-attr
5.8. set-paint-value
5.9. set-painter
6. Graphical UI functions
6.1. set-progress
6.2. set-ui-text
6.3. show-dialog
7. Functions on Lists
7.1. append
7.2. append-all
7.3. count
7.4. flatten
7.5. index-of
7.6. is-in
7.7. remove
7.8. value-at
8. Logging functions
8.1. clear-log-messages
8.2. forward-log-message
8.3. forward-log-messages
8.4. get-log-messages
8.5. log
8.6. log-custom
8.7. start-logger
8.8. stop-logger
9. Boolean logic functions
9.1. exists
9.2. exists-var
9.3. false
9.4. is-null
9.5. not
9.6. true
10. Functions on DOM nodes
10.1. attach-value
10.2. attach-value
10.3. comment
10.4. delete
10.5. detach-values
10.6. detach-values
10.7. element
10.8. element
10.9. filter-attrs
10.10. get-attr
10.11. get-value
10.12. insert-nodes
10.13. insert-nodes
10.14. mark-split
10.15. name
10.16. processing-instruction
10.17. remove-attrs
10.18. remove-attrs
10.19. rename-element
10.20. replace-with-children
10.21. replace-with-text
10.22. set-attr
10.23. set-attr
10.24. specifies
10.25. string
10.26. text
11. Numeric functions
11.1. abs
11.2. max
11.3. min
12. Other functions
12.1. app-buildnumber
12.2. debug
12.3. delay
12.4. entering
12.5. eval-xpath
12.6. get-environment-value
12.7. get-outline-level
12.8. get-realm-value-names
12.9. get-rulemode
12.10. get-var
12.11. hoist-single-listpar
12.12. leaving
12.13. markup-regex
12.14. print
12.15. println
12.16. run-module
12.17. set-grouping
12.18. set-heading-level
12.19. set-process-children
12.20. set-rulemode
12.21. set-var
12.22. single-listpar-level
12.23. stop
12.24. stop
12.25. test
12.26. throw
12.27. unique-timestamp
12.28. unmangle-string
12.29. wl-convert-doc-to-rtf
12.30. wl-convert-doc-to-rtf
12.31. wl-convert-doc-to-rtf
12.32. wl-convert-rtf-to-doc
12.33. wl-convert-rtf-to-doc
13. Functions for working with styles
13.1. %
13.2. markup-style
13.3. markup-style
14. Functions on Strings
14.1. codepoints-to-string
14.2. contains
14.3. ends-with
14.4. escape-characters
14.5. format-numeric
14.6. index-of
14.7. index-of
14.8. lower-case
14.9. matches
14.10. matches-list
14.11. normalize-space
14.12. parse-numbering
14.13. process-adjacent-text
14.14. replace
14.15. replace
14.16. starts-with
14.17. string-join
14.18. string-length
14.19. string-to-codepoints
14.20. substring
14.21. substring
14.22. substring-after
14.23. substring-before
14.24. substring-tail
14.25. substring-tail
14.26. upper-case

Chapter 1. Introduction

UPL (upCast Processing Language) is a specialized document processing language aimed at making document conversion from graphically marked-up documents into rich logically marked-up documents easy.

UPL gives its users the flexibility to perform a broad range of typical and complex tasks in the realm of document conversion, but hiding the complexity. UPL is very intuitive to use as it borrows established concepts from well-known languages in adjacent realms like XSLT, XPath and CSS, robust and extendable. The development of UPL was not purely academic, it was likewise driven by requirements emerging in document conversion projects all around the world.

1. Goals of UPL

The simple goal of UPL is to optimize the process of converting documents under three major aspects: ease of use, effort minimization and real world usability.

UPL was developed and is targeted to overcome the limitations of current document conversion and processing approaches that are not suited very well to cope with the challenges of converting documents with mostly flat and graphically marked-up structures into logically marked-up documents. As such, it serves very well as a sort of pre-processor to further transformations with XSLT by creating easily acessible structures.

To fulfill the claim ease of use, UPL has been drafted as a simple rules based declarative language component at the top level, with an imperative language core that may be easily learned and used by typical users of document processing systems.

UPL fulfills the claim effort minimization by offering an impressive number of convenience functions solving many recurring, complex tasks in the application domain of document conversion, which include “unsharp” functions helping you to deal with fuzzy input.

And finally, to fulfill the claim for real world usability, UPL follows and brings with it programming mechanisms as known from well known programming languages like Java, C and XPath. This extensibility guarantees the users of UPL the possibility to successfully process all real world documents they need to handle.

2. UPL in context

Most electronically authored documents are created with Microsoft Word – at least at some point during their history. But in contrast to the requirements of modern document pipelines and their underlying systems Word documents are mainly graphically marked-up and their logical structure (even though easily recognized by humans) is not directly accessible to computer systems.

In order to be used in modern systems Word documents have to be enriched with structure. If this is done manually or with the help of manually implemented systems this is a time consuming, expensive and error-prone task.

infinity-loop’s well-known software upCast is widely used to bridge the gap between the different styles of document markup. upCast converts graphically marked-up Word documents into logically marked-up documents by analyzing the logical structure as it is perceived by humans.

As an off-the-shelf product upCast comes with a set of parameters that control details of this "up-casting" process. But without some help from its users (who else knows more about the documents to be handled than them?) upCast cannot mark-up automatically all the valuable information being available in a document. This is the reason why the conversion and structural enrichment process in upCast 7 (which was monolithic in earlier versions) has been split into two phases.

The first phase in upCast is an importer that converts a Word document into an intermediate XML representation that contains both graphical and logical markup. In doing so upCast cleans up the source document and applies a comprehensive set of heuristics in order to extract all the standard XML constructs (like lists, tables, footnotes, …) available in Word documents automatically.

In the second phase UPL comes into play. UPL is used to easily specify additional complex transformations that need to be appied to the intermediate document tree structure that upCast can't do automatically because of its lack of knowledge about the specific type of document.

A UPL Tree Processing specification is declarative and consists of simple rules that apply certain specialized actions or transformations on the intermediate document tree when certain conditions match. In a way, you can think of UPL as XSLT working in-place with a set of highly specialized operators.

3. An example of UPL usage

To give you a first impression of a UPL specification, here's an example of a typical application:

Example 1.1. 

Let‘s say we have determined that all paragraphs containing more than 85% of text that (1) is bold and (2) whose font-size is between 16pt and 18pt are a heading of level 1 – even if they are not marked up with Word paragraph styles. To make all paragraphs that fulfill the above informal description a heading, the following UPL code could be used:

[element(uci:par) 
  and %(@css:font-weight="bold" and @css:font-size >= 16pt and @css:font-size <= 18pt) > 0.85]
{
  set-heading-level(1);
}

Chapter 2. UPL components

UPL in its whole consists of two functionally different components: the UPL Core and the UPL Tree Processor.

1. UPL Core

UPL Core consists of the core functionality like support for typed variables, flow control constructs, function definitions and execution of statemt sequences, block building and expression evaluation. UPL Core is used in upCast in several places.

2. UPL Tree Processor

The UPL Tree Processor defines an algorithm as well as a controlling and selection mechanism to process the nodes of a document tree using UPL Core functionality. It uses a selector/actions metaphor for this, similar to the design of CSS. The UPL Tree Processor is implemented in upCast's UPL Tree Processor module.

Both parts are described in the following two sections in detail.

Chapter 3. UPL Core (Language Reference)

1. Introduction

A UPL specification initially consists of one “initial input” source (e.g. text typed in the GUI or a string set via the API). Unlike to all further (included) input sources, global variables/module parameters are resolved (textually replaced) in the primary input source before it is passed to the UPL processor.

The primary input source and any other input source may contain references to other input sources to be included. Any input source is an ordered sequence of raw characters in a specific encoding (e.g. ISO-8859-1 or UTF-8) that is initially converted into a sequence of Unicode characters. Next, so-called escape sequences occurring in the input are resolved and also translated to Unicode characters. Together these two steps lead to a sequence of input characters that are grouped to tokens (e.g. string literals). Tokens in turn are used to form the building blocks (e.g. a rule) of a UPL specification.

More formally, a UPL specification is processed by a UPL processor in six steps before it can be applied to a document:

Transliteration: Converts the given input characters from a particular character encoding into a sequence of Unicode characters. Subsequently characters specified in one of the possible escape notations are resolved into their Unicode character.

Preprocessing: Included input sources are imported. Note that this happens before lexical analysis.

Lexical Analysis: Translates the sequence of Unicode characters into a sequence of tokens (e.g. keywords, identifiers).

Syntactic Analysis: Parses the sequence of tokens, performs syntax checking.

Semantic Analysis: Statically checks the meaning of a UPL specification to ensure it obeys the rules of the UPL language.

Code Generation: Translates the analyzed specification into executable code.

2. Lexical Structure

This section deals with the lexical structure of UPL specifications.

UPL specifications can be written in many different character encodings (e.g. ASCII, ISO-8859-1 or UTF-8) but are converted by a UPL processor into Unicode before they are really processed. This step is called lexical translation or transliteration.

Actually a UPL specification is nothing more than a sequence of raw characters stored in a specific encoding somewhere on a storage medium. The transliteration step converts sequences of raw characters into input characters by applying two lexical translations.

  • input characters are translated from their given encoding (e.g. ISO-8859-1) into Unicode characters.

  • so-called escape sequences are resolved and the resulting Unicode characters are substituted into the input sequence.

By default UPL supports a wide range of input encodings that cover all encodings supported by the Java programming language implementation it is running on.

2.1. Escape Sequences

UPL knows two kinds of escape sequences being used to allow authors to refer to characters they can't easily put in a UPL specification in specific situations. These are character escapes and unicode escapes.

First, character escapes cancel the meaning of special characters in UPL. Any character (except a hexadecimal digit) can be escaped with a backslash to remove its special meaning.

Example 3.1. 

For example,

"\"" 

is a string consisting of one double quote, the \ escapes the quotation mark.


Secondly, unicode escapes allow authors to refer to characters they can't easily put in a UPL specification because the encoding they use (e.g. ISO-8859-1) does not provide the desired character (e.g. a chinese character). Unicode escapes consist of a backslash \ followed by a hexadecimal number (consisting of at most six hexadecimal digits (0-9, a-f, A-F)), which stands for the Unicode character at that code-point. If a hexadecimal digit follows the hexadecimal number, the end of the hexadecimal number needs to be made clear. There are two ways to do that: with a white space character or by providing exactly 6 hexadecimal digits. In fact, these two methods may be combined. Exactly one white space character is ignored after a hexadecimal escape.

Note

This means that "real" white space after the escape sequence must itself either be escaped or doubled.

Example 3.2. 

"\22"

is the same as

"\22 "

which is the same as

"\""

which is the same as

"\000022"

which is the same as

"\000022 "

and all examples identify a string consisting of one double quote character.

U+0022 is the Unicode codepoint for the double quotation mark.


Note

Unicode escapes are always considered to be part of an identifier or a string (i.e., "\7B" (the Unicode escape for the character {) is not punctuation, even though { is, and "\32" (the Unicode escape for the character 2) is allowed at the start of an identifier, even though 2 is not).

Note

The common escapes \n (newline), \r (carriage return), \f (form feed) and \t (tab) are not supported in UPL. If you want to use these characters, you must use Unicode escapes instead:

The often used escapes…

…meaning…

…must be written in UPL as unicode escapes (followed by a single space):

\n

LINE FEED

\a

\r

CARRIAGE RETURN

\d

\f

FORM FEED

\c

\t

CHARACTER TABULATION

\9

2.2. Line Terminators

The transliterated Unicode characters read from input sources are divided into lines by recognizing line terminators. This definition of lines determines the line numbers produced by the UPL processor e.g. in case of issued warning and error messages. It also specifies the termination of the end-of-line form of a comment.

There are three line terminators defined in UPL: the ASCII LF character (also known as line feed or newline, U+000A), the ASCII CR character (also known as carriage return or simply return, U+000D) and the combination of the ASCII CR character immediately followed by the ASCII LF character.

Thus, lines in UPL are terminated by the ASCII characters LF or CR or CR LF.

Note:

The two characters CR immediately followed by LF are counted as one line terminator, not two.

2.3. Tokens

The input characters and line terminators that result from transliteration, escape processing and line recognition are scanned and produce a sequence of input elements. Those input elements that are not white space or comments are so-called tokens. Tokens are the terminal symbols that are used to describe how syntactically correct UPL specifications may be written.

White space and comments can serve to separate tokens that, if adjacent, might be tokenized in another manner.

2.4. White Space

White space is defined as the ASCII characters space (U+0020), horizontal tab (U+0009), form feed (U+000C), as well as the line terminators (U+000A, U+000D).

2.5. Comments

Like Java and other modern languages UPL knows two different types of comments, traditional comments and end-of-line comments. As usual in both cases the content of comments is completely discarded by a UPL processor.

A traditional comment consists of all the text from the start-marker /* up to the end-marker */.

Note

Both makers and the text in-between belong to the comment.

Note

In contrast to most other languages UPL supports nested traditional comments. Thus, a traditional comment may contain child-comments and hence the number of end-markers for traditional comments in a UPL specifications must be balanced with the number of start-markers.

Example 3.3. 

/* stop(); /* stop execution */ */

will be a valid UPL program with the comment spanning from the first /* to the last */ (in contrast to parsing this e.g. in Java, where the comment would end after the first */).

This comes in handy when you want to temporarily and quickly comment out a section of code that already contains multi-line style comments.


The second supported comment type are end-of-line comments. They comprise all the text from the start-marker // up to the end of the line, designated by the next line terminator.

Note

The markers /* and */ have no special meaning in end-of-line comments and the marker // has no special meaning in traditional comments.

Note

As usual, comments are not analyzed within strings.

2.6. Identifiers

A UPL identifier is made from the following characters, but has to start with a letter:

Letters: a-z, A-Z, and other alphabetic characters from other languages

Digits: 0-9

Specials: - (minus) and _ (underscore)

Colon: An identifier may also contain the colon (:) character, which is used to separate a namespace prefix part from the local name part e.g. when using an identifier to designate namespaced element nodes.

Additionally, UPL identifiers may contain character and unicode escape sequences.

No identifier can have the same name as a UPL keyword, a boolean literal or the null literal.

2.7. Keywords

The following character sequences are reserved for use as keywords and cannot be used as identifiers:

and
as
break
case
catch
continue
default
div
do
else
elseif
finally
for
for-each
function
goto
if
javaclass
method
mod
or
parameter
return
switch
then
try
variable
while

While true, false, void and null might appear to be keywords, they are technically literals.

Note

UPL is right now still a somewhat moving target. Therefore, please also avoid using names for identifiers that have a specific meaning in one of the well known programming languages like Java, C# or XSLT – perhaps one day UPL will have to use one of those languages‘ keywords.

2.8. Literals

A literal is the source code representation of a value of one of the data types defined in UPL.

Boolean literals

The UPL type Bool has two values, represented by the literals true and false.

String literals

A string literal consists of zero or more characters enclosed in double or single quotes (both quotes, the starting and the ending one, for a string have to be of the same kind). Each character in the string may be represented by an escape sequence.

Numeric literals

In UPL all values dealing with numbers (like integers and floats known from other programming languages) are of the same sort: Numeric. Even lengths with dimensions are of sort Numeric.

A Numeric has the following parts: a whole-number part, a decimal point (represented by a period character), a fractional part and a dimension identifier. Not all parts of a Numeric are mandatory for a numeric Literal. A Numeric representing a number may consist of any combination of the whole-number-part, the decimal point and the fractional part as long as at least one digit contributes to the literal. A Numeric literal may end with a dimension identifier, but does not have to.

The following dimension identifiers are defined in UPL: tw (twips), px (pixel), cm, mm, in, pt, pc.

A pt is 1/72 inch.

A px is 1/96 inch.

A cm is 1/2.54 inch.

Color literals

The basic color literal construct is the well known hexadecimal notation: a # immediately followed by either three or six hexadecimal digits. In a given context where a Color is expected (e.g. in an assignment or a comparison) colors may also be indentified by one of the Ids aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, orange, purple, red, silver, teal, white, and yellow (see http://www.w3.org/TR/CSS21/syndata.html#color-units).

List literals

A list literal is denoted by a {}-pair enclosing the elements of the list. The single list elements are separated by a comma ','.

Null literal

null is the literal for the value of type Null.

Void literal

void is the literal for the one single value of type Void.

2.9. Separators

The following characters are separators (punctuators) known in UPL:

(  )  {  }  [  ]  ;  ,  :

2.10. Operators

The following tokens are the operators known in UPL (ordered by precedence as specified from highest at the top to lowest at the bottom):


Operator syntax

Description

Precedence

++

increment

7 (highest)

--

decrement

+

unary plus

-

unary minus

!

not

*

multiplication

6

div

division

mod

modulo

+

addition; concatenation

5

-

subtraction

<

less than

4

<=

less than or equal to

>

greater than

>=

greater than or equal to

=

equals

3

!=

is not equal to

or

conditional or

2

and

conditional and

xor

logical exclusive or

:=

assignment

1 (lowest)

Operators && and ||

The && and || operators perform conditional AND and conditional OR operations on two boolean expressions. These operators exhibit "short-circuiting" behavior, which means that the second operand is evaluated only if needed.

Operator --

Note that the operator '--' must be separated by whitespace from the variable name. This is a requirement from the fact that variable names may contain the '-' character as part of their name.

So, instead of

$i--; /* wrong! */

you need to write

$i --; /* correct */

3. Types

UPL is not a strongly typed language, which means that not all types for UPL language components can be calculated at compile time. Types limit the values that a variable or a parameter can hold or that an expression can produce, limit the operations suported on those values and determine the meaning of the operations.

There are seven data types defined in UPL: Bool, Color, Id, List, Null, Numeric and String. All types are derived from the type Value that is the common superclass for all other classes.

3.1. Bool

The Bool type represents a logical quantity with two possible values, indicated by the literals true and false. Besides the literals the additional constructors true() and false() are available.

The cast function to-bool() may be used to convert values of other types to Bool.

3.2. Color

The data type Color is used to represent colors in UPL. Values for colors may either be written in the well known hexadecimal notation: a # immediately followed by either three or six hexadecimal characters. Colors may also be indentified by one of the Ids aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, orange, purple, red, silver, teal, white, and yellow (see http://www.w3.org/TR/CSS21/syndata.html#color-units).

The cast function to-color() may be used to convert values of selected other types to Color.

3.3. Id

The data type Id represents either named values as used as parameters for functions (which you'll find listed with that function's documentation), or element or attribute names. An Id therefore allows using a namespace resp. the declared namespace prefix with it. The syntax for an Id is:

(nsprefix ':')? name

Both, the nsprefix and name components of an Id must be UPL identifiers with the exception that nsprefix must not contain a colon (':').

The cast function to-id() may be used to convert values of selected other classes to an Id.

3.4. List

The data type List is used to represent an ordered sequence of values of any type (including List itself). The syntax for a constant List is:

'{' value? (',' value)* '}'

The empty list is therefore created by writing {}.

value can be either a constant value represented by a literal, a variable reference or any other expression yielding a value result. Values are inserted with their respective type into the list. A list can be heterogenous, i.e. it can contain elements of different types at the same time. A UPL List is therefore similar to Java's ArrayList class (on which it is also based internally).

The cast function to-list() may be used to convert values of other classes to a List.

Example 3.4. 

{}

creates an empty list

{ 1, 2, 5.5 }

creates a list with the three Numeric values 1, 2 and 3.5

{ "Error: ", $code }

creates a list of the String value "Error: " and the contents of the variable $code at the time of List construction

{ { 1, 1 }, { 2, 4 }, {3, 9 } }

creates a List of two-element lists which contain two Numeric values each

{ { 1, square(1) }, { 2, square(2) }, { 3, square(3) } }

creates the same list as in the previous example, assuming the square() funtcion is defined to calculate the squared value of its argument


3.5. Null

The data type Null is used to represent exceptional values or non-existing values.

Example 3.5. 

@color

either returns the contents of the XML attribute color on the context node as a String, or a value of type Null when the context node does not have such an attribute.

Note that just returning an empty string is not a valid alternative solution in the latter case as it would not allow you to distinguish the situation where the attribute is present, but has an (allowed) value of the empty string, from the situation where the attribute is not present at all.


3.6. Numeric

The data type Numeric is used to represent dimensionless or dimensioned numbers, e.g. 42, 3.1415, 2.54cm, -360tw.

They are either the result of function calls or can be created using Numeric literals.

The cast function to-numeric() may be used to convert values of selected other classes to a Numeric.

3.7. String

The data type String is used to represent strings in UPL. As usual, string literals are characters enclosed in double or single quotes.

The following operators are defined for the data type String: = (equals), != (not-equals), < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), + (addition, string concatenation), * (multiplication, concatenates the string n times).

Note

The + operator in UPL is overloaded as it is in Java. As a convenience you may simply concatenate strings by using the + operator. Additionally, if the left-hand operand for + is a string, any other data type on the right hand side of the + operator will automatically be cast to String and a string concatenation will be carried out.

Example 3.6. 

If you write the following in UPL:

“number:” + 10 + 1 

you will get

“number:101”

, a string concatenation of “number:”, “10” and “1”.

If you want to carry out a mathematical addition on 10 and 1 before adding it to the string you have to put the expression 10 + 1 into brackets and write

“number:” + (10+1)

The cast function to-string() may be used to convert values of other classes to a String.

3.8. Void

The data type Void is used to declare functions that do not return a value.

Example 3.7. 

function hello-world() as Void {
  print( "Hello, world!" );
}

hello-world() is defined to not return a value and therefore is not required to include a return statement.


4. Operator/Type Matrices

This section lists the possible combinations of operand types and operators. Tables are to be read with left column=first operand type, top row=second operand type in the form

A ○ B

: operator as given in left upper table corner

Left column: A

Top row: B

✕: means that the operation is not defined and will throw an EvalException

4.1. ++ (increment), -- (decrement)

++, --

Bool

Color

Id

List

Null

Numeric

++: A+1

--: A-1

String

4.2. + (unary plus), - (unary minus)

+, -

Bool

Color

Id

List

Null

Numeric

+: A

-: (-A)

String

4.3. ! (not)

!

Bool

!A

Color

!to-bool(A)

Id

!to-bool(A)

List

!to-bool(A)

Null

!to-bool(A)

Numeric

!to-bool(A)

String

!to-bool(A)

4.4. * (multiplication), div (division), mod (modulo)

*, div, mod

Bool

Color

Id

List

Null

Numeric

String

Bool

Color

Id

List

Null

Numeric

AB

String

4.5. = (equals)

=

Bool

Color

Id

List

Null

Numeric

String

Bool

A=B

false

Color

Ar=Ar & Ag=Ag & Ab=Ab & Aa=Aa

A=to-color(B)

false

Id

to-color(A)=B

A=B

false

List

∀ι(Αι=Βι)

false

Null

false

false

false

false

false

false

false

Numeric

false

A=B

String

false

A=B

4.6. != (is not equal to)

!=

Bool

Color

Id

List

Null

Numeric

String

Bool

A!=B

true

Color

Ar!=Ar & Ag!=Ag & Ab!=Ab & Aa!=Aa

A!=to-color(B)

true

Id

to-color(A)!=B

A!=B

true

List

∀ι(Αι!=Βι)

true

Null

true

true

true

true

true

true

true

Numeric

true

A!=B

String

true

A!=B

!= is !(=)

You can think of the values in the above table to be derived by calculating !(A=B), i.e. the negated values of the operator results as described in this table.

4.7. + (addition/concatenation)

+

Bool

Color

Id

List

Null

Numeric

String

Bool

Color

Id

List

A⊕B

A⊕B

A⊕B

A⊕B

(element by element)

A⊕B

A⊕B

A⊕B

Null

Numeric

A+B

String

A⊕to-string(B)

A⊕to-string(B)

A⊕to-string(B)

A⊕to-string(B)

A⊕"null"

A⊕to-string(B)

A⊕B

+ : addition    ⊕: concatenation

4.8. - (subtraction)

-

Bool

Color

Id

List

Null

Numeric

String

Bool

Color

Id

List

Null

Numeric

A-B

String

4.9. < (less-than), > (greater-than)

<, >

Bool

Color

Id

List

Null

Numeric

String

Bool

A○B

Color

Id

List

Null

Numeric

AB

String

comp(A,B)○0

4.10. <= (less-than or equal), >= (greater-than or equal)

The operators <= and >= are calculated as follows, according to the matrices above:

A<=B ::= (A=B) or (A<B)

where an exception during evaluation of (A=B) is treated as false.

A>=B ::= (A=B) or (A>B)

where an exception during evaluation of (A=B) is treated as false.

The evaluation is succeed-fast, i.e. if (A=B) is true, the second operand of the or-expression is not evaluated.

4.11. or (logical), and (logical), xor

or, and, xor

Value

Value

to-bool(A)○to-bool(B)

The order of evaluation for the operators or and and is from left to right, and it is guaranteed that only that many operands are evaluated as necessary to determine the final result.

For xor, both operands are always evaluated, and the order of evaluation is not defined.

4.12. := (assignment)

:=

Bool

Color

Id

List

Null

Numeric

String

Bool

A:=B

Color

A:=B

A:=to-color(B)

Id

A:=B

List

A:=B

Null

A:=B

Numeric

A:=B

String

A:=B

5. Blocks

A block is used to group a sequence of statements into a group. A block must begin with a left brace { and end with a right brace }.

Example 3.8. A Block

{
   /* some statements here... */
}

Blocks are particularly useful when used with flow control statements because they allow you to execute a group of statements rather than just a single statement. Additionally, they allow you to define scopes of visibility of variables. Blocks serving for specifying scopes are also used when defining custom loggers (see start-logger(), stop-logger()).

6. Flow control

The statements inside your UPL source files are generally executed from top to bottom, in the order that they appear. Flow control statements, however, break up the flow of execution by employing decision making, looping, and branching, enabling your program to conditionally execute particular blocks of code. This section describes the decision-making statements (if, if … else), the looping statements (for, for-each, while, do … while), and the branching statements (break, return).

6.1. if

The if statement is the most basic of all the control flow statements. It tells your program to execute a certain section of code only if a particular test evaluates to true.

Example 3.9. 

if($number = 1) 
{
  print("unicycle");
}

Note

Please note that in contrast to other programming languages like Java or C, the opening and closing braces are required.

6.2. if … else/elseif

The if … else statement provides a secondary path of execution when the test evaluates to false.

Example 3.10. 

if($number = 1) 
{
  print("unicycle");
} else {
  print("bike");
}

There is also a variant of the if … else statement that knows more than two sections of code using the elseif keyword:

Example 3.11. 

if($number = 1) 
{
  print("unicycle"); 
} elseif($number = 2)
{ 
  print("bike"); 
} else { 
  print("trike"); 
}

Note

Please note that in contrast to other programming languages like Java or C, the opening and closing braces are required.

6.3. while

The while statement continually executes a block of statements while a particular condition is true.

Example 3.12. 

while( $number > 0 ) 
{ 
  $number := $number - 1; 
}

Tip

You can implement an infinite loop using the while statement as follows:

while( true() ) {
  /* place your code here */ 
}

6.4. do … while

UPL also includes a do … while statement.

Example 3.13. 

do { 
  $number := $number - 1; 
} while( $number > 0 );

The difference between do … while and while is that do … while evaluates its expression at the bottom of the loop instead of the top. Therefore, the statements within the do block are always executed at least once.

6.5. for

The for statement provides a compact way to iterate over a range of values. Programmers often refer to it as the "for loop" because of the way in which it repeatedly loops until a particular condition is satisfied. The general form of the for statement can be expressed as follows:

for (initialization; termination; post-iteration) {
  /* your code goes here */
}

When using this version of the for statement, keep in mind that:

  • The initialization expression initializes the loop; it's executed once, as the loop begins.

  • When the termination expression evaluates to false, the loop terminates.

  • The increment expression is invoked after each iteration through the loop; it is perfectly acceptable for this expression to increment or decrement a value, or perform any other statement.

Example 3.14. 

variable $number as Numeric;
for( $number := 0 ; $number < 5 ; $number++ ) 
{ 
  /* place your code here */ 
}

Note that the variable $number must be defined outside the for statement, you cannot do this in the initialization expression.


The three expressions of the for loop are optional. An infinite loop can be created as follows:

for ( ; ; ) {
  /* your code goes here */ 
}

6.6. for-each

The for-each statement provides a compact way to iterate over all the items contained in a list.

Example 3.15. 

variable $number as Numeric;
for-each( $number in {1,4,5} )
{
  print($number); 
}

The variable $number must already be defined outside the for-each statement. The used variable will subsequently be set to all the values contained in the list in element order.


6.7. break

A break statement terminates the innermost while, do … while, for or for-each statement.

Example 3.16. 

for-each( $number in {1,4,5} ) 
{
  if($number = 4) 
  { break; }
}

6.8. return

The return statement exits from the current function or method and control flow returns to where the function/method was invoked. The return statement has two forms: one that returns a value, and one that doesn't. To return a value, simply put the value (or an expression that calculates the value) after the return keyword.

Example 3.17. 

return $number;

The data type of the returned value must match the type of the function's declared return value. For methods, use the form of return that doesn't return a value:

return;

7. Exceptions and exception handling

UPL uses exceptions to handle errors and other exceptional events. When an error occurs during the execution of a statement, UPL throws an exception. This means that the normal flow of the program is interrupted and that the UPL processor attempts to find the innermost excption handling block that declares to be able to handle the type of exception (error) that occurred. The exception handler can attempt to recover from the error or, if it determines that the error is unrecoverable, provide a gentle exit from the program.

Three statements play a part in handling exceptions:

  • The try keyword identifies a block of statements within which an exception might be thrown.

  • The catch keyword must be associated with a try-block and identifies a block of statements that can handle a particular type of exception. The statements within this block are executed if an exception of a particular type occurs within the try-block.

  • The finally keyword must be associated with a try-block and identifies a block of statements that are executed regardless of whether or not an exception occurred within the try-block.

The general form of the try{} … catch(){} statement can be expressed as follows:

try {
  /* statements where an exception might be thrown */
} catch ($e as ExceptionType) {
  /* statements that handle an exception
     $e is opaque, do not use it directly! Use to-string($e) to get a textual
     representation of the exception. */
} finally {
  /* statements executed regardless of whether or not an error occured */
}

UPL offers the possibility to specify multiple catch statements for a try statement. Each catch statement specifies the exception type it handles. By using the variable bound in the catch statement details about the exception may be queried.

The following exception types are currently defined and used:

Exception

This type is the super-type of all the following, more specific types. You should therefore put the catch statement for Exception last after all more specific catch statements.

IOException

This exception signals an error during file IO operations. Functions that might throw this exception: write(), writeln()

EvalException

This exception indicates a dynamic error during UPL program execution. Details can be retrieved from the exception's message component, available through the to-string() function evaluated on the exception variable.

UserDefinedException

This exception is currently never thrown by the UPL implementation. You should use this type with the throw() function.

TypeConversionException

This exception is thrown when a value of a specific type cannot be cast to a different type. This exception can occur either in explicit casting functions (like to-color(), to-numeric(), …), but can also occur on implicit casting operations.

8. Expressions

An expression consists of one or more operands and zero or more operators linked together to compute a value. Operands and operators can be grouped for evaluation using parantheses ( and ). Essentially, for predence, association and order of evaluation are the same as for Java (and therefore most programming languages).

9. Variables

UPL allows you to define variables like in Java or C. Variables defined outside of any block are called global variables and can be accessed from anywhere within the program. Variables defined within a block are local to it, i.e. their scope is the block they were defined in.

Variable names are constructed as follows:

'$' identifier

To clearly differentiate variables from other, mostly constant, identifier usage, variable names always start with the dollar sign.

9.1. Definition

A variable is defined as follows:

'variable' varname 'as' type ( ':=' init-expression )? ';'

For varname, you can use any valid variable name, e.g. $counter or $i.

For type, any UPL type (except Void) is allowed, including the generic Value type which allows storing any of the specific types in the variable.

For the optional initialization part, init-expression can be any expression that evaluates to a value whose type is the same as the declared type for the variable.

Example 3.18. 

To declare a variable $s to hold a string and initialize it with the empty string, you'd write:

variable $s as String := "";

9.2. Reference

To refer to the contents of a variable, you simply write the variable's name, e.g.:

print( $s );

9.3. Assignment

To assign a new value to a variable, you use

varname ':=' expression ';'

Example 3.19. 

$s := "Hello world!";

would assign the string value "Hello world!" to the variable $s.


10. Parameters

Sometimes, you'd like to pass initial values from the outside, so-called parameters, to an UPL program. In UPL, this works much like <xsl:param> in XSLT. Parameters are very similar to variables with the only difference being that they can take their initial values by way of an implementation-defined mechanism from outside the UPL execution environment.

Once defined, it is not allowed to re-assign a new value to a parameter.

10.1. Definition

A parameter is defined as follows:

'parameter' parname 'as' type ( 'default' default-expression )? ';'

For parname, you can use any valid variable name, e.g. $mode or $p.

For type, any UPL type (except Void) is allowed, including the generic Value type which allows storing any of the specific types in the variable. It is up to the implementation how parameter values of a specific type are created.

For the optional default part, default-expression can be any expression that evaluates to a value whose type is the same as the declared type for the parameter. Note that the default-expression is only relevant (and used) in the case where the surrounding environment does not provide a value for that specific parameter at all. In all other cases, the supplied value is used instead of the default-expression's value.

Important

Parameters can only be defined at the top level of an UPL program, outside of any block, and before any rule or function or method definition.

10.2. Reference

To refer to the contents of a parameter, you simply write the parameter's name, e.g.:

print( $mode );

11. Functions

UPL allows you to define custom functions. Functions do not need to be declared before being used. Their definition, however, must always be part of the executed UPL program, though it does not matter whether they have been #included from another file or are part of the main UPL program unit.

11.1. Defining a function

A function definition takes the following form:

'function' funcname '(' ( param 'as' type )? (',' param 'as' type )* ')' 'as' restype '{' body '}'

with

funcname

the name of the method, which must be a UPL identifier

param

the name of a formal parameter, which must be a UPL variable name

type

the required type for the formal parameter param, which must be one of the UPL types Bool, Id, List, Numeric, String. Note that the generic type Value is not an allowed type for a function parameter.

restype

the type of the result of the function, which must be one of the UPL types Bool, Id, List, Numeric, String, Value (generic type for returning any of the five specific types) and the special type Void (signalling that the function does not return a value at all).

body

a sequence of statements and/or variable definitions; each code path must end with a return statement returning a value of the specified restype, unless the function is defined wit a return type of Void, in which case you must use either return without any value or not use return at all.

Example 3.20. Function definition

function hello( $name as String ) as String {
  variable $result as String := "Hello ";
  if( $name = "" ) {
    $result := $result + "you!";
  } else {
    $result := $result + $name + "!";
  }
  return $result;
}

defines a function returning a value.

function greeting( $name as String ) as Void {
  print( "Hello " + $name );
}

defines a function without a return value.


11.2. Calling a function

To call a built-in or user-defined function, use:

name '(' param? (',' param)* ')'

Example 3.21. Calling a function

greeting( "Christian" );        /* prints "Hello Christian" on the console */
log( INFO, hello( "Steven" ) ); /* writes "Hello Steven!" to the logfile */
print( hello( "" ) );           /* prints "Hello you!" on the console */
$res := hello( 5 );             /* throws a TypeConversionException: parameter type does not match */

Function definitions are looked up based on the actual types of parameters used in the call at the time of the call. As a consequence, when the actual types do not exactly match one of the available function definitions, UPL throws a TypeConversionException (which can be caught in a try … catch construct).

You can overload functions of the same name by specifying different signatures.

You cannot overload a function on its return type only.

12. Java Function Bindings

UPL allows you to define function bindings to static Java functions. The defining Java classes need to be on the Java classpath at function execution time.

The mechanism used is similar to the one XSLT processors use for extension functions: The function identifier is a namespaced Id, with the namespace name designating the implementing Java class, and the local name being the name of the static function to call in that class.

12.1. Defining a function binding

A Java function binding takes the following form:

'javafunction' prefix:funcname '(' ( param 'as' type )? (',' param 'as' type )* ')' 'as' restype ';'

with

prefix

the namespace prefix for the namespace name identifying the Java class to declare a static member function of. The namespace name must have the form java:Fully.Qualified.Classname

funcname

the name of the Java method in the class designated by the namespace name bound to prefix

param

the name of a formal parameter, which must be a UPL variable name

type

the required type for the formal parameter param, which must be one of the UPL types Bool, Id, List, Numeric, String, Value.

restype

the type of the result of the function, which must be one of the UPL types Bool, Id, List, Numeric, String, Value (generic type for returning any of the five specific types) and the special type Void (signalling that the function does not return a value at all).

12.2. Calling a bound Java function

To call a bound Java function, use:

prefix:funcname '(' param? (',' param)* ')'

Example 3.22. Java function binding example

Suppose we wanted to write a custom Java class that reverses the order of characters of a string. The Java code could look like the following:

package com.example;
import de.infinityloop.upcast.upl.val.UPLString;

public class StringReverser {
  public static UPLString reverse( UPLString source ) {
    return new UPLString( new StringBuffer( source.getAsString() ).reverse().toString() );    }
}

To create the function binding to that Java function in UPL, use the following code:

#namespace sr "java:com.example.StringReverser";
javafunction sr:reverse( $source as String ) as String;

And to finally call that function somewhere, you could use code like the following:

...
println( sr:reverse("never odd or even") ); // a palindrome :-)
...

which should print the following to the system console:

neve ro ddo reven

13. Directives

UPL supports a small number of directives that either allow setting general execution options for the program at hand or operations like including external source files or defining a namespace binding to a prefix.

Directives always start with a hash mark character.

13.1. #charset

This directive allows you to declare the encoding (or character set) used for the following UPL program (including the #charset directive). The use of the #charset directive has the same requirements as the @charset rule defined in CSS2.1. We therefore just include the relevant parts of section 4.4 of the Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification by reference, the only difference being that in UPL, the directive is #charset (with a hash at the beginning) instead of @charset.

13.2. #set

The generic #set directive allows setting options for processing the UPL program at hand in a certain way. The syntax for the #set directive is as follows:

'#set' option ':=' value ';'

with:

option

the name of an option as listed below

value

the value constant to set the option to

The following options are available (default value when not specified printed in bold):

option (as Id)

value range

description

singleStep

true | false

When true, UPL program execution is stopped after each statement, requiring user interaction for execution continuation. This is intended for debugging uses.

defaultRuleMode

"break" | "continue" | "exit" | "label:labelname"

Sets the default rule mode to use for tree traversal using this file. This can be overridden in rules using set-rulemode() and the current value can be queried using get-rulemode().

traceRuleApplication

true | false

When true, the execution of all selected rules is logged to the logging system. This is intended for debugging uses only since it can generate an enormous amount of data.

leaveEvents

true | false

When true, the rule selection and execution algorithm is performed twice per node, once at entering time (i.e. before processing the node's children), and once at leaving time (after having processed a node's children). You can check for the current execution mode using the entering() and leaving() functions.

IDAttributes

'"' (((elemqname|'*') '/' )? '@' attqname ' ')* '"'

A string of whitespace-separated patterns identifying attributes that need to be treated as having the type ID. This ensures that in element splitting operations (as can happen in e.g. markup-regex() or mark-split()), the cloned element does not get a copy of this attribute which would violate its ID status.

Example 3.23. 

#set IDAttributes := "figure/@figid @elemid";

would identify both the figid attribute on figure elements and the elemid attribute on all elements as attributes of type ID.


The default value is "".

Whenever the option is set, it replaces any previous setting.

Any attribute that has a specified type of ID (e.g. due to information gathered by the parser from a DTD) will be treated as such, regardless whether it is listed or not.

13.3. #include

The #include directive allows including UPL code verbatim from external files. The result of an #include directive is exactly the same as if itself was replaced by the contents of the included file.

The syntax of the #include directive is as follows:

'#include(' 'source:' srcfile ('encoding:' encodingname)? ('encrypted:' cryptkey)? ');'

with

srcfile

the source file specification to the file to include. This may be either an absolute path to the file in local file system convention or URL format, or a relative path. If the path is relative, it is resolved to the current UPL file's base URI. The base URI of a top-level UPL file (as specified e.g. in an upCast Module) is the base URI of the pipeline document containing that module resp. that top-level UPL code.

encodingname

is the (optional) Java name of the encoding the file to be included is in. This value is a fallback value for those cases where the encoding can not be automatically detected using a possibly present BOM at the beginning of the file, and if the file to be read does not include a #charset directive at its very beginning.

encrypted

this (optional) parameter indicates whether the file to be read has been encrypted by specifying its decryption key. When the key string is the empty string or less than 30 characters long, an upCast-internal decryption key is used known only to infinity-loop. If the key is 30 characters long, it is used as the decryption key for the file to be included.

Example 3.24. 

#include( source:"file:///C:/functions.upl" encoding:"UTF-8" );

reads the file C:\functions.upl with UTF-8 encoding (unless the file specifies a different encoding itself) and pastes it at the current location in the including file.


13.4. #namespace

To be able to use namespace prefixes in identifiers, you need to bind them to their respective namespace name. This is accomplished usind the #namespace directive, which declares namespaces and their prefixes for use in UPL:

'#namespace' nsprefix '"' nsname '";'

with

nsprefix

the namespace prefix you want to use in indentifiers to refer to the declared namespace. This must be an identifier, except that it may not contain the colon ':' character.

nsname

the namespace name, typically a URI, to bind the namespace prefix to

Common namespace declarations

To access properties in the internal tree created by the RTF Importer module, we recommend defining the following namespace declarations at the top of your UPL code for convenience:

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";
#namespace css "http://www.infinity-loop.de/namespace/2006/upcast-css";
#namespace cssc "http://www.infinity-loop.de/namespace/2006/upcast-cssclass";
#namespace csso "http://www.infinity-loop.de/namespace/2006/upcast-cssoverride";

To access commonly used realm variables easily, the following namespace declarations are recommended:

#namespace application "http://www.infinity-loop.de/namespace/upcast-realm/application";
#namespace environment "http://www.infinity-loop.de/namespace/upcast-realm/environment";
#namespace pipeline "http://www.infinity-loop.de/namespace/upcast-realm/pipeline";
#namespace module "http://www.infinity-loop.de/namespace/upcast-realm/module";
#namespace javaproperty "http://www.infinity-loop.de/namespace/upcast-realm/javaproperty";

Chapter 4. UPL Tree Processor

The UPL Tree Processor is a language extension to UPL Core. It is designed to run defined actions for a node in upCast's internal document tree when a condition for that node matches. In this sense, it is very similar to CSS' notion of selectors and associated declaration blocks.

This module allows you to specify actions to be taken in a declarative way.

1. Building blocks

A UPL program to be run by the UPL Tree Processor consists of several building blocks.

1.1. UPL Core constructs

A UPL Tree Processor program can contain any constructs of UPL Core.

1.2. Rules

A UPL Tree Processor program typically also contains rules. Rules are a pair consisting of a selector and an action block. The selector specifies a condition the context node must satisfy before the actions specified in the action block are applied to it.

The syntax for a rule is

(label ':')? selectorlist actionblock

where selectorlist is a list of selectors separated by a comma (',')and actionblock is an action block.

Example 4.1. 

The following rule prints the value of the context node's level attribute to the console if the context node is a heading element and has that attribute:

[element(heading) and exists(@level)]
{
  print( @level );
}

The following rule sets the delete attribute to true on inline elements that do not have any textual contents and all span elements whose textual contents is the string "empty":

[element(inline) and string()=''],
[element(span) and string()="empty"]
{
  set-attr( delete, true );
}

1.2.1. Selector

A selector looks very similar to an XPath predicate or predicate list:

('[' expression ']')+

expression can be any UPL expression.

If the selector is true when evaluated on the context node, the selector is considered true. If a selector is true, the actions in the following action block are executed.

Tip

You can think of an UPL selector as a (or possibly multiple) XPath predicate being applied from left to right on a source sequence that contains just the context node. If after applying all predicates that sequence still contains the context node, the action block is executed.

1.2.2. Action block

An action block looks like just an ordinary block in UPL Core:

'{' statements '}'

statements can be any sequence of statements as defined in UPL Core. They are executed on the context node in the order specified when the associated selector yields true. You can think of the action block being the body of a function not taking any explicit parameters and returning a value that is run when the selector matches.

2. Processing model

When the UPL Tree Processor runs, the following steps are executed in order:

2.1. Run initialize()

If the UPL Tree Processor program defines a function of the signature

function initialize() as Value

then it is run.

If the function returns the Numeric zero (0), the next step in the processing model is executed, i.e. the tree traversal is started.

If the function returns a non-zero Numeric value, UPL Tree Processor execution is aborted and the following steps are not executed.

2.2. Walk the document tree

Next, the document tree is traversed in a depth-first traversal. The document tree is the internal upCast document tree. It is usually constructed in an earlier running importer module like the RTF Importer or the XML Importer.

The starting node of the traversal is the Document node, which is then the first context node. The context node changes during the tree traversal. Many pre-defined functions require that context node to be defined to be useful.

For the context node, the set of rules defined in the UPL Tree Processor program are considered in the same order as they are written (=defined), from top to bottom. The first rule for which the selector expression yields true on the context node is chosen and the action block is executed.

  • If after executing the action block, the internal rule mode variable is set to break, no further rules are considered, but the next node in document order is chosen as the new context node and with this node, the process of rule consideration starts again at the first defined rule.

  • If after executing the action block, the internal rule mode variable is set to continue, the next (and possibly further) rules are considered, still with the same context node. If one of the remaining rules' selector matches, its action block is executed and depending on the state of the internal rule mode variable after that, the corresponding action as described is taken. If no further rule matches, the next node in document order is chosen as the new context node and with this node, the process of rule consideration starts again at the first defined rule.

  • If after executing the action block, the internal rule mode variable is set to jump:label, the next rule after the specified label marker (and possibly further ones down) are considered, still with the same context node. If one of the selectors of the rules following label matches, its action block is executed and depending on the state of the internal rule mode variable after that, the corresponding action as described is taken. If no further rule matches, the next node in document order is chosen as the new context node and with this node, the process of rule consideration starts again at the first defined rule.

This process continues until all nodes have been visited.

Traversal is sequential and deterministic

The traversal takes place in a defined, deterministic, strictly sequential manner. This is in contrast e.g. to XSLT, where template selection and application is non-deterministic and often parallel (though the result is of course well defined), depending on the implementation.

2.3. Run finalize() / error-finalize()

After the tree traversal has finished without any errors, and if it is defined, the UPL Tree Processor tries to call the function

function finalize() as Value

The result of this function is set as the upCast UPL Tree Processor module's result value in the pipeline variable ModuleResult.

If errors occurred during UPL Tree Processor execution, however, it will try to call the function

function error-finalize() as Value

instead. The result of this function is set as the upCast UPL Tree Processor module's result value in the pipeline variable ModuleResult.

Chapter 5. UPL Function reference

1. Type casting functions

1.1. to-bool

to-bool(value as Value) as Bool

Casts its argument to a Bool value.

The boolean value of a Bool is its own value.

The boolean value of a Color is always true.

The boolean value of an Id is true if the identifier length is greater zero (0), false otherwise.

The boolean value of a List is true only if it has at least one item and if all of its items cast to Bool are true.

The boolean value of a Null is false.

The boolean value of a Numeric is false if it is zero (0), true otherwise.

The boolean value of a String is true if its length is greater zero (0), false otherwise.

Example 5.1. 

to-bool( "test" )

returns true.

to-bool( { "abc", 0 } )

returns false because the numeric second element of the list cast to Bool is false.


1.2. to-color

to-color(value as Value) as Color

Casts its argument to a Color value. The argument must be a valid CSS 2.1 color value string. Additionally, rgba() from the CSS 3 Color Module is supported.

The Color value of an Id is its value parsed to a color as described above.

The Color value of a String is its value parsed to a color as described above.

If a value cannot be parsed into a color as described above, a TypeConversionException is thrown.

Trying to cast Bool, List, Numeric or Null to a Color throws a TypeConversionException.

Example 5.2. 

to-color( red )

is the same as

to-color( "#F00" )

which is the same as

to-color( "#ff0000" )

which is the same as

to-color( "rgb(255,0,0)" )

which is the same as

to-color( "rgb(100%,0%,0%)" )

which is the same as

to-color( "rgba(255,0,0, 1.0)" )

which all designate the color red.


1.3. to-id

to-id(value as Value) as Id

Casts its argument to an Id value.

The Id value of an Id is its own value.

The Id value of a String is its contents.

Trying to cast a Bool, Color, List, Numeric or Null to an Id throws a TypeConversionException.

Example 5.3. 

to-id( "uci:par" )

returns the Id uci:par.


1.4. to-list

to-list(value as Value) as List

Casts its argument to a List value.

The list value of a List is itself.

The list value of a Numeric, Color, Id, List, String and Null is a one-element list with value as its element.

1.5. to-null

to-null(value as Value) as Null

Casts its argument to a Null value, i.e. it effectively returns the Null value null always.

1.6. to-numeric

to-numeric(value as Value) as Numeric

Casts its argument to a Numeric value.

The numeric value of a Bool is 1 (if it is true), or 0 (if it is false).

The numeric value of a Numeric is itself.

The numeric value of a String is its value parsed as a decimal number (either with or without a dimension specification). If the value cannot be parsed, a TypeConversionException is thrown.

For Color, Id, List and Null a TypeConversionException is thrown.

1.7. to-string

to-string(value as Value) as String

Casts its argument to a String value.

The string value of a Bool is "true" or "false", respectively.

The string value of a Color is the hex notation of the represented color as defined in HTML, format: #rrggbb

The string value of an Id is its value.

The string value of a List is the concatenation of its member elements cast to String, with U+0020 (space character) as a separator added between two individual values.

The string value of a Numeric is its human readable representation.

The string value of a String is itself.

Trying to cast Null to a String throws an TypeConversionException.

Example 5.4. 

to-string( 5 )

returns the string "5".

to-string( { "ab", 2, someId } )

returns the string "ab 2 someId".


2. Functions on Colors

2.1. get-color-component

get-color-component(color as Color, component as Id) as Numeric

Returns an individual color component value of the Color value passed as argument color. The component to return is determined by the component parameter, which can take the Id values RED, GREEN, BLUE and ALPHA.

The Numeric value returned is between 0 and 255 for the components RED, GREEN and BLUE, and between 0.0 (=fully transparent) and 1.0 (opaque) for ALPHA.

Example 5.5. 

get-color-component( to-color( red ), RED )

returns 255

get-color-component( to-color( red ), ALPHA )

returns 1.0

get-color-component( to-color( "#123456" ), GREEN )

returns 52


3. Date & Time functions

3.1. current-dateTime

current-dateTime() as String

Returns the current date and time in form of an ISO date string.

Example 5.6. 

The code

current-dateTime()

will for example return a String similar to

"2008-07-31T23:49:22.599Z"

3.2. format-dateTime

format-dateTime(dateTime as String, format as String) as String

This retuns a formatted ISO date string. dateTime must be an ISO date and time string. format is a formatting string for the individual components in the passed ISO date string.

The formatting options are the same as for the Java class java.text.SimpleDateFormat – please see there for details.

Example 5.7. 

The code

format-dateTime( "2008-07-31T23:44:20.923Z", "h:mm a")

will return the String

"11:44 PM"

4. File functions

4.1. add-to-zipfile

add-to-zipfile(zipfile as String, pathPrefix as String, fileOrFolder as String) as Void

This function lets you add files or complete folders to an existing ZIP file.

The parameter zipfile specifies the absolute path of the ZIP file to add files or folders to.

pathPrefix specifies a path prefix to use within the ZIP file for the added content.

fileOrFolder specifies the file or folder (including all of its contents) to be added to the ZIP file.

The ZIP file must already exist. To create a new, empty ZIP file, use create-zipfile() .

Example 5.8. 

add-to-zipfile( "/dist/files.zip", "input", "/data/in/sample.xml" );

will add the file sample.xml under the path input/sample.xml into the existing zipfile at /dist/files.zip.


4.2. copy-file

copy-file(fromFile as String, toFile as String) as Bool

This method copies the file fromFile to the file toFile. Returns true when the copy was successful, false otherwise. The paths need to be absolute, but can be specified either using URL notation (preferred for platform independence) or in local file system naming convention.

4.3. create-zipfile

create-zipfile(zipfile as String) as Void

This function creates a new, empty ZIP with name and location as specified by the absolute path in zipfile. If that file already exists, it is overwritten.

You can add files to a ZIP file using add-to-zipfile() .

4.4. delete-file

delete-file(filepath as String) as Bool

Deletes the file with absolute path filepath.

4.5. file-exists

file-exists(file as String) as Bool

This method checks whether the specified file exists in the local file system. The path must be absolute, but can be specified either using URL notation (preferred for platform independence) or in local file system naming convention.

4.6. fs-copy

fs-copy(mode as Id, src as String, dest as String) as Bool

This function lets you copy file system objects (files or complete folders). You can choose from several modes specifying how the copy operation should be performed and how the passed absolute paths src and dest are to be interpreted.

FILE-TO-FILE

copies the file src to the file specified by dest. Note that dest must be a full filename path, not a path to just the folder where the file should wind up. Use FILE-TO-FOLDER for this. The advantage of this mode is that during the copy, you can rename the file. If the destination file already exists, it is silently overwritten.

FILE-TO-FOLDER

copies the file src into the folder dest, keeping its original name. If a file system object by that name already exists in dest, it is silently overwritten.

FOLDER-TO-FOLDER

recursively copies the folder src into the folder dest, creating it and any intermediate folders . If a file system object by that name already exists in dest, it is silently overwritten.

CONTENTS-TO-FOLDER

recursively copies all file system objects within src into the folder specified by dest (creating it and any intermediate folders when necessary), replacing any existing objects that might already exist. The folder src will not be deleted.

Example 5.9. 

fs-copy( FILE-TO-FILE, "/usr/dev/draft.rtf", "/usr/project/final.rtf" );

copies the file draft.rtf into the project folder under the name final.rtf

fs-copy( FILE-TO-FOLDER, "/usr/dev/manual.rtf", "/usr/dist/doc/" );

copies the file manual.rtf into the doc folder

fs-copy( FOLDER-TO-FOLDER, "/usr/dev/doc", "/usr/dist" );

copies the complete doc folder into the dist folder

fs-copy( CONTENTS-TO-FOLDER, "/usr/dev/doc", "/usr/dist/documentation" );

copies the contents of the doc folder into the documentation folder


4.7. fs-create

fs-create(type as Id, abspath as String) as Bool

This function lets you create a new file system object, i.e. a new empty file or a folder.

Parameter type specifies what to create using which semantics:

FILE

creates a new, empty file at the absolute file location specified by abspath if it does not already exist. If creation was successful or the file already exists at that location, true is returned, otherwise the result is false. When a file at that location already exists, it is not modified in any way, meaning that data already present in the file is not cleared.

FOLDER

creates a new, empty folder at the absolute location specified by abspath if it does not already exist. If creation was successful or the folder already exists at that location, true is returned, otherwise the result is false. When a folder at that location already exists, it is not modified in any way, meaning that file or folder contents in it is not deleted.

FILE-REPLACE

similar to FILE, but deletes any existing object at that location beforehand, regardless whether it is a file or a folder (including all of its contents!). Use with caution!

FOLDER-REPLACE

similar to FOLDER, but deletes any existing object at that location beforehand, regardless whether it is a file or a folder (including all of its contents!). Use with caution!

4.8. fs-delete

fs-delete(mode as Id, fsobject as String) as Bool

This function deletes a file system object (file or complete folder). You can choose from several modes specifying how the deletion operation should be performed and how the passed absolute path fsobject is to be interpreted.

SELF

deletes the file system object fsobject. This can be a file or a folder, in which case the deletion operation is performed recursively on its contents before it is deleted itself.

CONTENTS

recursively deletes all file system objects within the folder specified by fsobject, leaving you with an empty fsobject folder.

Example 5.10. 

fs-delete( SELF, "/usr/dev/draft.rtf" );

deletes the file draft.rtf

fs-delete( SELF, "/usr/dist/doc/" );

deletes the folder doc, including all of its contents

fs-delete( CONTENTS, "/usr/dev/doc/" );

deletes all file system objects (files or folders) within the doc folder, resulting in a now empty doc folder.


4.9. fs-move

fs-move(mode as Id, src as String, dest as String) as Bool

This function lets you move file system objects (files or complete folders) from one location to another. You can choose from several modes specifying how the move operation should be performed and how the passed absolute paths src and dest are to be interpreted.

FILE-TO-FILE

moves the file src to the file specified by dest. Note that dest must be a full filename path, not a path to just the folder where the file should wind up. (Use FILE-TO-FOLDER for this for the latter.) The advantage of this mode is that during the move, you can rename the file. If the destination file already exists, it is silently overwritten.

FILE-TO-FOLDER

moves the file src into the folder dest, keeping its original name. If a file system object by that name already exists in dest, it is silently overwritten.

FOLDER-TO-FOLDER

moves the folder src into the folder dest, creating it and any intermediate folders if necessary. If a file system object by that name already exists in dest, it is silently overwritten.

CONTENTS-TO-FOLDER

moves all file system objects within src into the folder specified by dest (creating it and any intermediate folders when necessary), replacing any existing objects that might already exist. The folder src will not be deleted.

Example 5.11. 

fs-move( FILE-TO-FILE, "/usr/dev/draft.rtf", "/usr/project/final.rtf" );

moves the file draft.rtf into the project folder under the (new) name final.rtf

fs-move( FILE-TO-FOLDER, "/usr/dev/manual.rtf", "/usr/dist/doc/" );

moves the file manual.rtf into the doc folder

fs-move( FOLDER-TO-FOLDER, "/usr/dev/doc", "/usr/dist" );

moves the complete doc folder into the dist folder

fs-move( CONTENTS-TO-FOLDER, "/usr/dev/doc", "/usr/dist/documentation" );

moves the contents of the doc folder into the documentation folder


4.10. get-path-component

get-path-component(path as String, component as Id) as String

This method can extract a certain component of a file path. The component can be one of:

LOCAL return the value of the variable in local file system format

URL return the value of the variable in URL format

LOCALPATH return only the path component (without filename and without trailing file separator) of the value of the variable. If the variable is a folder, the value is returned unchanged.

URLPATH same as localpath, but returns the value in URL format

LOCALNAME returns only the file name component of the variable value in local format

URLNAME same as localname, but returns the value in URL format

LOCALEXTENSION returns the file extension of the variable value in local format or the empty string, if it hasn't an extension

URLEXTENSION same as localextension, but returns the value in URL format

LOCALBASENAME returns the same value as localname, but with trailing dot and extension stripped if it exists

URLBASENAME same as localbasename, but returns value in URL format

LOCALBASENAMEPATH essentially, this is localpath + localbasename, i.e. the value of the variable minus extension (including trailing dot)

URLBASENAMEPATH same as localbasenamepath, but returns value in URL format

Example 5.12. 

Calls to get-path-component() will have the following results (as String):

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCAL )

C:\Documents and Settings\upCast\The file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URL )

file:///C:/Documents%20and%20Settings/upCast/The%20file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALPATH )

C:\Documents and Settings\upCast

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLPATH )

file:///C:/Documents%20and%20Settings/upCast

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALNAME )

The file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLNAME )

The%20file.xml

get-path-component( “C:\Documents and Settings\upCast\The file.x m l”, LOCALEXTENSION )

x m l

get-path-component( “C:\Documents and Settings\upCast\The file.x m l”, URLEXTENSION )

x%20m%20l

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALBASENAME )

The file

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLBASENAME )

The%20file

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, LOCALBASENAMEPATH )

C:\Documents and Settings\upCast\The file

get-path-component( “C:\Documents and Settings\upCast\The file.xml”, URLBASENAMEPATH )

file:///C:/Documents%20and%20Settings/upCast/The%20file


4.11. is-file

is-file(path as String) as Bool

Tests if the object found under the specified path is a file. The object must exist for the result to be correct.

4.12. is-filetype

is-filetype(type as Id, file as String) as Bool

Returns whether file is of the specified type or not. This function looks into the file's content for making the best possible decision, i.e. it does not rely on the file extension.

When file does not exist, false is returned.

When an unsupported type is specified, an exception is thrown.

The following values for type are supported:

DOC

test if file is a Microsoft Word binary file (most often with a *.doc extension)

RTF

test if file is an RTF (Rich Text Format) file (most often with an *.rtf extension)

WORD

test if file is a Microsoft Word file, i.e. either an RTF or a DOC file. This is a shortcut for (is-filetype(DOC, $file) or is-filetype(RTF, $file) ).

Example 5.13. 

is-filetype( DOC, "/test/somefile.ext" )

returns true when somefile.ext is a Microsoft Word binary file, false otherwise.


4.13. is-folder

is-folder(path as String) as Bool

Tests if the object found under the specified path is a folder. The object must exist for the result to be correct.

4.14. list-files

list-files(baseDir as String) as List

This method generates a list of all flat files (i.e., only the direct file children, no directories, only files that are visible) within baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file as an URL string (file protocol).

4.15. list-files

list-files(baseDir as String, flags as Id) as List

This method generates a list of all file system objects (i.e., only direct children) within baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file system object as an URL string (file protocol).

You must specify the search algorithm using the additional flags parameter by concatenating the desired flags to an Id (in any order):

F

include file objects

D

include folder (directory) objects

H

include hidden objects

Example 5.14. 

list-files( "/Users/test/", HF );

creates a list of all files in /Users/test/, including hidden files.

list-files( "/Users/test/", D );

creates a list of all folders in /Users/test/.

list-files( "/Users/test/", FDH );

creates a list of all files and folders in /Users/test/, including hidden files or folders.


4.16. list-files-recursively

list-files-recursively(baseDir as String) as List

This method generates a list of all visible flat files that are descendants of baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file as an URL string (file protocol). Invisible files or folders are neither included in the list nor traversed.

4.17. list-files-recursively

list-files-recursively(baseDir as String, flags as Id) as List

This method generates a list of all file system objects that are descendants of baseDir in the file system hierarchy.

Each list element contains the absolute path to a found file system object as an URL string (file protocol).

You must specify the search algorithm using the additional flags parameter by concatenating the desired flags to an Id (in any order):

F

include file objects

D

include folder (directory) objects

H

include hidden objects

Example 5.15. 

list-files-recursively( "/Users/test/", HF );

creates a list of all file descendants of /Users/test/, including hidden files and traversing hidden folders.

list-files( "/Users/test/", D );

creates a list of all descendant folders of /Users/test/.

list-files( "/Users/test/", FDH );

creates a list of all descendant files and folders under /Users/test/, including hidden files or folders, and traversing hidden folders.


4.18. move-file

move-file(src as String, dest as String) as Bool

Moves the file src to a new location dest. This can be used to do both, actually move a file to some other location or just rename it.

4.19. write

write(filename as String, encoding as String, mode as Id, data as String) as Void

This method allows you to write its data argument to the specified file with name filename. You can set the encoding to be used (e.g. "UTF-8"), and set the writing mode to either replace the existing content in that file (WRITE) or get appended at the end (APPEND).

4.20. writeln

writeln(filename as String, encoding as String, mode as Id, data as String) as Void

This method allows you to write its data argument to the specified file with name filename. The platform-specific line separator code sequence is automatically appended.

You can set the encoding to be used (e.g. "UTF-8"), and set the writing mode to either replace the existing content in that file (WRITE) or get appended at the end (APPEND).

5. Functions for grouping

5.1. is-painted

is-painted(paintColor as Id) as Bool

Returns true when the context node is painted with the specified paintColor.

Important

This function does not report whether a painter of the respective color is set on that node, it only reports whether the node is already painted. A painting can only be achieved within an UPL execution by the methods paint-adjacent(), paint-following(), paint-preceding(), or set-painter() if ond only if its list of specified painter types contained the type "this".

5.2. mark-end

mark-end(paintColor as Id) as Void

This function places an end marker of the specified paintColor on the context node.

For details, see the section on Painters in the upCast Manual.

Example 5.16. 

mark-end( address );

will place an end marker of color address on the context node.


5.3. mark-start

mark-start(paintColor as Id) as Void

This function places a start marker of the specified paintColor on the context node.

For details, see the section on Painters in the upCast Manual.

Example 5.17. 

mark-start( address );

will place a start marker of color address on the context node.


5.4. paint-adjacent

paint-adjacent(paintColor as Id, siblingCount as Numeric) as Numeric

This function paints the context node and (siblingCount – 1) following sibling nodes with the specified paintColor.

This function places a start marker of the specified paintColor on the context node, and an end marker of the specified paintColor on the last node that was painted.

Note

This function, in contrast to e.g. set-painter(), immediately paints the respective nodes, which means the paint color can be queried by the is-painted() function.

The function returns the number of nodes actually painted (which may be smaller than siblingCount when there are less sibling nodes).

For details, see the section on Painters in the upCast Manual.

Example 5.18. 

paint-adjacent( address, 2 );

will paint the context node and its following sibling with color address immediately.


5.5. paint-following

paint-following(paintColor as Id, condition as BoolExpression, endMode as Id) as Numeric

This function paints all contiguously following siblings of the context node with the specified paintColor for which condition matches or until the end of the following-sibling axis of the context node is reached.

endMode can have the following values:

NONE

no special handling of the last painted node

START

the last node painted will get set a start marker of the specified paintColor

END

the last node painted will get set an end marker of the specified paintColor

The method returns the number of nodes that were painted.

Note

This function, in contrast to e.g. set-painter(), immediately paints the respective nodes, which means the paint color can be queried by the is-painted() function.

For details, see the section on Painters in the upCast Manual.

Example 5.19. 

paint-following( address, @css:font-family="Times", END );

will paint the all contiguously following sibling nodes of the context node whose font-family is "Times". Additionally, the last painted node will get an end marker set as if by calling mark-end( address ) on it.


5.6. paint-preceding

paint-preceding(paintColor as Id, condition as BoolExpression, endMode as Id) as Numeric

This function paints all contiguously preceding siblings of the context node with the specified paintColor for which condition matches or until the end of the preceding-sibling axis of the context node is reached.

endMode can have the following values:

NONE

no special handling of the last painted node

START

the last node painted will get set a start marker of the specified paintColor

END

the last node painted will get set an end marker of the specified paintColor

The method returns the number of nodes that were painted.

Note

This function, in contrast to e.g. set-painter(), immediately paints the respective nodes, which means the paint color can be queried by the is-painted() function.

For details, see the section on Painters in the upCast Manual.

Example 5.20. 

paint-preceding( address, @uci:class="addr", START );

will paint all contiguously preceding sibling nodes of the context node whose class is "addr". Additionally, the last painted node will get a start marker set as if by calling mark-start( address ) on it.


5.7. set-paint-attr

set-paint-attr(paintColor as Id, attrQName as Id, attrValue as String) as Void

This method lets you set an attribute (with qualified name attrQName) with associated value attrValue on the context node that will be promoted to the immediate grouping parent of the context node during grouping (if that exists) for the specified paintColor.

During grouping of a sequence of nodes, the attrValue (among all attributes of the same attrQName) of the node that last (in document order) specifies a value for it is used for the value of that attribute on the grouping element.

Paint attributes, though set on the grouped nodes, will not be serialized on the individual nodes, only on the created grouping element.

Example 5.21. 

set-paint-attr( address, uci:language, @xml:lang );

will create an additional grouping attribute on a final address grouping dependent on the last set value for uci:language in a group, e.g.

<uci:block uci:type="address" uci:language="en">
   ...
</uci:block>

5.8. set-paint-value

set-paint-value(paintColor as Id, name as Id, value as Value) as Void

This method lets you set an UPL value (with name name) with associated value value on the context node that will be promoted to the immediate grouping parent of the context node during grouping (if that exists) for the specified paintColor.

During grouping of a sequence of nodes, the value (among all values of the same name) of the node that last (in document order) specifies a value for it is used for the value of that UPL value on the grouping element.

Paint values are never serialized. They can be queried in subsequent UPL module runs on the grouping elements that have been created in the internal document tree using the get-value() function.

Example 5.22. 

set-paint-value( address, cdata, string() );

will set an UPL value with the CDATA contents of the last grouped node on the grouping element for paintColor address, using the key cdata for later retrieval.


5.9. set-painter

set-painter(paintColor as Id, painterTypes as List) as Void

This function lets you set a painter of specified paintColor and painterTypes on the context node.

paintColor is the value which will be used as the type attribute value of the grouping element created by a subsequent Grouper module. When you set a qualified name, the resulting value will be its expanded name.

painterTypes is an ordered (from start to end) list of painter types and fallback painter types to be used if a painter fails.

This method does not actually paint any nodes, unless painterTypes also contains the type "this", in which case the context node is immediately painted.

For details, see the section on Painters in the upCast Manual.

Example 5.23. 

set-painter( address, { "start-end", "this" } );

will place a painter of color address on the context node, with a preferred type of "start-end" and a fallback type of "this".


6. Graphical UI functions

6.1. set-progress

set-progress(curval as Numeric, maxval as Numeric) as Void

Lets you set progress information for the currently running (UPL-) module. The current state is defined as the current progress value curval compared to the maximum progress value maxval.

Example 5.24. 

set-progress( 4.0, 8.0 );

sets the progress bar to 50% of the currently running task's full duration.


6.2. set-ui-text

set-ui-text(elementId as Id, labeltext as String) as Void

This method lets you set the text in various components of upCast's user interface. The text is updated immediately in the UI, so you can e.g. provide the user with more detailed progress information while a lengthy UPL code sequence is taking place.

You specify the element for which you want to set the text using a symbolic constant in elementId, and the text is passed in the labeltext parameter.

The following symbolic constants are available:

PROGRESS-LABEL

sets the main label of the progress bar in upCast's pipeline window (at the lower left)

PROGRESS-SUBLABEL

sets the sub-label of the progress bar in upCast's pipeline window (at the lower right) Note that this label may overlap the PROGRESS-LABEL when both labels' texts are sufficiently long.

Example 5.25. 

The code

function initialize() as Numeric {
  set-ui-text( PROGRESS-SUBLABEL, "Initializing UPL...");
}

will display the text "Initializing UPL..." in the progress bar's sub-label when the UPL module starts to execute.


6.3. show-dialog

show-dialog(dialogType as Id, windowTitle as String, dialogText as String, buttonDescription as String) as Numeric

This function displays a customizable dialog with up to three buttons.

The dialogType determines the overall style of the dialog:

PLAIN

a plain dialog

INFO

a dialog with an info-type icon

QUESTION

a dialog that asks the user a question, usually with a question mark icon

WARNING

a dialog that warns the user, usually with an exclamation mark icon

ERROR

a dialog that informa the user of an error, usually with a stop-sign icon

You can specify the dialog window's title in the parameter windowTitle.

The body text of the dialog is passed in the dialogText parameter.

Finally, the buttonDescription parameter takes a specially formatted string specifying the button(s) you want to be displayed, and which one should be the default button. The syntax is as follows:

buttonDescription ::= button ('|' button){0,2}
           button ::= [*]?[^|]+

In other words, button text specifications are separated by the pipe character '|', and you specify the default button by prefixing it with an asterisk '*'. The maximum number of buttons is 3. Note that buttons are specified in a virtual OK, Cancel, Alternative order, which means that depending on the OS you are running on, the displayed order of the buttons may vary from the specification order.

The function returns the number of the button clicked, with closing the dialog by its window decoration (instead of one of its buttons) returns the value 1.

The show-dialog() function will only work when running in a GUI environment. When running in commandline or API mode where there is no GUI available, the function immediately returns the value (-1).

Example 5.26. 

show-dialog( WARNING, "Warning Dialog", "This is a warning dialog.", "OK|*Cancel|Abort");

will display the following dialog when running in GUI mode:

resulting dialog


7. Functions on Lists

7.1. append

append(list as List, element as any) as List

Appends the value in element to the List list so that it becomes the new last element.

The function returns the value passed in list, which is the List modified as described.

7.2. append-all

append-all(list as List, appendList as List) as List

Appends all elements in appendList to the List list so that they become the new last elements. The elements are appended in the order they are stored in appendList.

The function returns the value passed in list, which is the List modified as described.

7.3. count

count(list as List) as Numeric

Returns the number of elements in list.

7.4. flatten

flatten(source as List) as List

Returns a List where all items in source of type List have been (recursively) flattened.

A List that has one or more Lists as its members is considered flattened when all its items of type List have been replaced by the individual elements in order of that List, possibly recursively. The result is a List that does no longer contain any elements of type List. This is best shown with an

Example 5.27. 

flatten( { a, { b1, b2 }, c } )

returns { a, b1, b2, c }

flatten( { a, { b1, { b2i, b2ii }, b3 }, c, { d1, d2 } } )

returns { a, b1, b2i, b2ii, b3, c, d1, d2 }

flatten( { a, b, c } )

returns { a, b, c }, that is the list is returned unchanged because there aren't any sub-lists.


7.5. index-of

index-of(list as List, value as any) as Numeric

Returns the index within list of the first occurrence of value. If value is not a member of list, -1 is returned.

7.6. is-in

is-in(searchString as String, listValue as List) as Bool

Determines if the List list contains an element that, after having been cast to a String as if applying the to-string() function, is equal to the string searchString.

7.7. remove

remove(list as List, index as Numeric) as List

Removes the element with index index from the List list.

The function returns the value passed in list, which is the List modified as described.

7.8. value-at

value-at(list as List, index as Numeric) as Value

Extracts the value at position index of the passed list. The index is 1-based, i.e. the first element has index 1, the second has index 2 etc.

If the requested index position does not exist, the Null value is returned.

8. Logging functions

8.1. clear-log-messages

clear-log-messages(logRealm as Id) as Void

This functions clears the internally collected log messages. With the logRealm parameter, you decide which log event collector to clear. This can be one of:

Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger MODULE, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

This function can be useful if you want to periodically clear collected log messages you no longer need in long-running or looping pipelines to prevent excessive or indefinitely increasing memory usage.

8.2. forward-log-message

forward-log-message(level as Id, messagecode as Numeric, logmessage as Value, ...) as Void

This method lets you create a new custom log message and place it into the logical parent component (module or pipeline) of the current component (module or pipeline).

If the logical parent is the application (in other words: if called from the top-level pipeline component), this method does nothing.

In level, you specify the log message level you want to set for that message. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL.

messagecode lets you specify a custom message code. You can use this to find your own specific messages in a logger later using the get-log-messages() function by specifying the respective codes in its includeCodes list.

Important

Your custom message codes must be greater than 0. Any negative codes are reserved by upCast for its own error message constants.

See also: de.infinityloop.msg.Msg

Finally, you can add an arbitrary list of Value objects to be output as the logmessage.

Example 5.28. 

forward-log-message( WARN, 5, "The value ", $number, " is not equal to 5." );

will create a log message with level WARN and message code 5 in the logical parent with the concatenated string representations of the remaining Value objects in the specified order.


8.3. forward-log-messages

forward-log-messages(levels as List, includeCodes as List, excludeCodes as List) as Numeric

Currently, there is no documentation available for this function.

8.4. get-log-messages

get-log-messages(name as Id, levels as List, includeCodes as List, excludeCodes as List) as List

This method lets you retrieve a (possibly) filtered list of log messages for the currently running module, the whole worfklow or a custom logger created with start-logger(). The result is a list of two-element lists as described below.

The name parameter specifies the logger from which to retrieve the log messages. Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger module, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

With levels, you specify the list of log message levels you are interested in. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL. If the list is the empty list, all types are considered.

Each log message has a numerical code, which is defined in the class de.infinityloop.msg.Msg. With includeCodes, you can specify the list of numerical codes or symbolic names (ids) that should be included in the result. With excludeCodes, you can specify the list of numerical codes or symbolic names (ids) that should be excluded from the result. In both cases, when the list is the empty list, no respective filtering is applied.

The algorithm (=order) in which the described filters are applied is as follows:

  1. From the set of all log messages of the logger designated by the name parameter, only those are considered that match any of the levels in levels. When levels is the empty list, no filtering is applied.

  2. From the subset created in step 1, only those log messages are considered whose numerical message code matches any of the codes listed in includeCodes. When includeCodes is the empty list, no filtering is applied.

  3. From the subset created in step 2, only those log messages are considered whose numerical message code does not match any of the codes listed in excludeCodes. When excludeCodes is the empty list, no filtering is applied.

The resulting set of log messages after step 3 is the final result set of size N and used for building the result of the function, which is a list of two-element lists as follows:

{
  { message-level1 as String, message-text1 as String }
  { message-level2 as String, message-text2 as String }
    ...
  { message-levelN as String, message-textN as String }
}

Example 5.29. 

The call

get-log-messages( MODULE, {ERROR, WARN}, {}, { -12, -14});

creates its result from all log messages created up to the call in the currently running module by selecting any ERROR or WARN messages whose error codes are not -12 or -14.

The call

get-log-messages( PIPELINE, {WARN}, {-2, -3, -5}, {});

creates its result from all log messages created up to the call in the currently executing pipeline by selecting any WARN messages whose error codes are either -2, -3 or -5.

The call

get-log-messages( sub, {}, {}, {});

creates its result from all log messages created up to the call in the in-scope custom logger named sub started using an earlier call to start-logger() like start-logger( sub, DEBUG ).


8.5. log

log(level as Id, message as Value, ...) as Void

Outputs the specified message via upCast's logging system. The type of log entry to be generated can be set using the level parameter, which can take the values DETAIL, VERBOSE, DEBUG, INFO, WARN, ERROR or FATAL.

8.6. log-custom

log-custom(name as Id, level as Id, messagecode as Numeric, logmessage as Value, ...) as Void

This method lets you add your own, attributed log messages to a pre-defined or custom logger.

name designates the logger to add the message to. Possible values are:

PIPELINE

all log events for the currently executing pipeline (this also includes the log events created by the currently running module, accessible separately by the logger module, and all modules run earlier in the pipeline)

MODULE

all log events created during execution of the currently running module

<name>

log events created by the custom logger named name (see also start-logger())

In level, you specify the log message level you want to set for that message. This can be any of the following levels: FATAL, ERROR, WARN, INFO, DEBUG, VERBOSE, DETAIL.

messagecode lets you specify a custom message code. You can use this to find your own specific messages in a logger later using the get-log-messages() function by specifying the respective codes in its includeCodes list.

Important

Your custom message codes must be greater than 0. Any negative codes are reserved by upCast for its own error message constants.

See also: de.infinityloop.msg.Msg

Finally, you can add an arbitrary list of Value objects to be output as the logmessage.

Example 5.30. 

log-custom( sub, WARN, 5, "The value ", $number, " is not equal to 5." );

will create a log message with level WARN and message code 5 in the custom logger sub with the concatenated string representations of the remaining Value objects in the specified order.


8.7. start-logger

start-logger(name as Id, level as Id) as Void

This method starts a custom logger with name name at the current code nesting level, accepting log messages of the specified level and higher-level messages.

If the named custom logger does already exist, it is not cleared, but newly generated log messages are appended if they satisfy the – possibly changed in comparison to the one specified at its creation – level condition. You can also use this to resume collecting log events in this logger after a call to stop-logger().

The following logger names are not allowed (in any combination of uppercase and lowercase characters), since they designate loggers pre-defined by upCast: PIPELINE, MODULE

Important

The custom logger is only valid at the nesting level it was first created (and all deeper levels). It is automatically disposed of when the UPL program execution flow leaves the defining block.

8.8. stop-logger

stop-logger(name as Id) as Void

This method stops adding log messages to the custom logger of name name name. Logging to this logger can be resumed by calling start-logger() again.

9. Boolean logic functions

9.1. exists

exists(value as any) as Bool

Returns true when the passed value is not Null.

9.2. exists-var

exists-var(varname as Id) as Bool

Returns true when an in-scope UPL variable with the same name as the Id passed in varname exists.

If (and only if) the namespace of the Id is one of the variable realm namespaces, this returns true when a realm variable of the same name as varname exists. When this function returns true, $varname and get-var( varname ) will not fail, although the returned value could still be of type Null when the respective realm variable's value is null.

See: get-var()

9.3. false

false() as Bool

Returns a Bool with value false.

9.4. is-null

is-null(value as any) as Bool

Returns true when the passed value is Null.

9.5. not

not(value as Bool) as Bool

Returns the negated value of the passed Bool value.

9.6. true

true() as Bool

Returns a Bool with value true.

10. Functions on DOM nodes

10.1. attach-value

attach-value(key as Id, value as Value) as Void

This method lets you attach an UPL value (with name key) and associated value value to the context node. This value can be queried using the get-value() function.

The function returns always true.

Note

The difference to setting an attribute on the context node is that a value can also be attached to nodes that do not support attributes (like Text nodes, PI nodes or Comment nodes) and that the type information of the value is retained.

However, attached values are never serialized to XML.

10.2. attach-value

attach-value(target as String, ..., key as Id, value as Value) as Void

This method lets you attach an UPL value (with name key) and associated value value to all nodes selected by the target XPath 1.0 expression (evaluated relative from the context node). This value can later be queried using the get-value() function.

The function returns true when the target expression selects at least one target node, false otherwise.

Example 5.31. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:par) and @uci:heading-level > 0] {
  attach-value( "descendant::text()", heading-text, true() );
}

will attach the Bool value true to all text nodes that are descendants of a heading paragraph.


10.3. comment

comment() as Bool

Tests whether the context node is a Comment node.

10.4. delete

delete() as Void

This function deletes the context node (including all of its children) from the internal document tree.

Note

The context node is not removed until the next context node is selected during processing, as otherwise no context node would be available during processing. This function merely flags it as to be deleted at the next context node change.

10.5. detach-values

detach-values(valNamePatt as Id) as Bool

Removes a named single attached value or a number of values matching a pattern from the context node. The value name or pattern can be specified in valNamePatt.

When valNamePatt is a qualified name, that value is removed.

When valNamePatt is nsprefix:* , all values in the specified namespace with prefix nsprefix are removed.

When valNamePatt is *:key, all values with the local name key are removed from the node, regardless of the namespace they might be in. This does not include the null namespace!

When valNamePatt is :*, all values are removed from the context node that are in the null namespace.

The method returns a List of IDs representing the qualified names of all values that were actually removed from the context node.

To remove all values from the context node, regardless of name and/or namespace, use the function detach-values().

If valNamePatt does not exist or does not match any value present on the context node, the method does nothing.

10.6. detach-values

detach-values() as Bool

This method removes all attached valuea from the context node.

The method returns a List of Ids representing the qualified names of all values that were actually removed from the context node.

10.7. element

element() as Bool

Tests whether the context node is an element.

10.8. element

element(qname as Id) as Bool

Tests whether the context node has the same namespace and local name as the passed qualified name.

10.9. filter-attrs

filter-attrs(filterSpec as List) as Void

This function should only be used in an XML Exporter's attribute filter program. This method lets you filter attributes by name or pattern from an element before it is serialized to a file or character stream. Since upCast internally produces a huge number of attributes (mostly by exploding CSS style properties into real attributes), the result can get unmanageable large. Mostly, however, you are actually only interested in a small set of attributes for further processing, and this function helps you specify which attributes to keep in the serialized XML tree and which to discard.

The filterSpec is a List of two-element Lists. Each two-element List consists of two bits of info:

  1. The first element is either INCLUDE or EXCLUDE and defines whether the attribute(s) matched by the second element are to be included in or excluded from the serialized tree.

  2. The second element is either an attribute name or an attribute pattern (see below).

The second element supports the following patterns:

"*"

matches all attributes, regardless whether they are in a namespace or the null namespace. Note that this is the only pattern that can (and must) be specified as a String.

:*

matches all attributes in the null namespace

*:*

matches all attributes that are in a non-null namespace

prefix:*

matches all attributes in the namespace bound to the specified prefix

*:name

matches any attribute that has a local name of name

prefix:name

matches exactly the attribute name that is in the namespace that is bound to the prefix prefix

name

matches the attribute name in the null namespace

Each attribute present on the context node is filtered against the list of inclusion/exclusion patterns. The matching is done in order of listing.

The workings of this function are best described with an example:

Example 5.32. 

Suppose we want to only include all attributes in the uci namespace except for uci:fullStyle and uci:diffStyle on the current context element. How can this be expressed easily?

Well, we just translate the above sentence word by word into the respective filter expression as follows:

filter-attrs(
  {
    { EXCLUDE, "*" }, // "only include..." means we start with nothing, i.e. exclude all
    { INCLUDE, uci:* }, // "...all attributes in the uci namespace..."
    { EXCLUDE, uci:fullStyle }, // "...except for uci:fullStyle..."
    { EXCLUDE, uci:diffStyle } // "...and uci:diffStyle."
  } );

Now, filter-attrs() takes one attribute after the orther on the context element and matches it in turn against each of the filter entries. If the attribute matches the pattern, it is tagged with the specified action, INCLUDE or EXCLUDE, overwriting the tag it had assigned previously.

If the attribute doesn't match a pattern, its current action tag is not changed.

The initial tag for each attribute is INCLUDE.

After the complete list of filters has been tried to match against the attribute, it will carry either an EXCLUDE or INCLUDE tag – and that designates how the attribute will be handled.

Simple, isn't it? You can mix EXCLUDE and INCLUDE filter elements as you like and achieve complex filters with very few lines of code.

Let's examine that algorithm for the sample attribute uci:fullStyle. Initially, its tag is INCLUDE. Now, for the nth entry specified entry in the filter list, the following happens:

  1. The attribute matches the pattern *. It is assigned the EXCLUDE tag.

  2. The attribute matches the pattern uci:*. It is assigned the INCLUDE tag.

  3. The attribute matches the pattern uci:fullStyle. It is assigned the EXCLUDE tag.

  4. The attribute doesn't match the pattern uci:diffStyle. Its tag is not changed.

At this point, all filter entries have been matched ans possibly applied, and the last tag assigned to the attribute is EXCLUDE. Therefore, the attribute uci:fullStyle is excluded, i.e. removed from the context element (end as a result e.g. excluded from serialization).


10.10. get-attr

get-attr(attrName as Id) as String

Gets the value of the attribute attrName on the context node. When the attribute does not exist or the context node is not an element, a Null value is returned.

This method backs the shortcut @attrName in UPL Core.

Example 5.33. 

@uci:class

is equivalent to

get-attr( uci:class )

and will return the value of the attribute uci:class on the context node (or a Null value, if it does not exist).


10.11. get-value

get-value(key as Id) as Value

This method retrieves any value with name key from the context node that was previously set on that node using the attach-value() function.

If such a value does not exist on the context node, the method returns Null.

10.12. insert-nodes

insert-nodes(mode as Id, xmlsource as String) as Bool

This function inserts the parsed XML xmlsource document fragment in accordance with the specified mode relative from the context node.

The mode parameter can have the following values:

BEFORE

the document fragment is inserted immediately before the context node as its preceding sibling

AFTER

the document fragment is inserted immediately after the context node as its following sibling

Since xmlsource is parsed before being inserted into the document tree, it must be well-formed.

Any namespaces and prefixes used in the xmlsource string must be declared and in-scope in the UPL program at the calling position of the function.

The function returns true when the target expression selects at least one target node, false otherwise.

Note

Obviously, you cannot call this method on the document root element. Trying to do so will result in an exception being thrown.

Example 5.34. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:item)] {
  insert-nodes( BEFORE, "<uci:marker>" + @uci:numberingtext + "</uci:marker>" );
}

inserts an element uci:marker before every list item element and makes its text content the numbering text string. When the document tree looks like

<uci:item uci:numberingtext="(a)">Item a</uci:item>
<uci:item uci:numberingtext="(b)">Item a</uci:item>
…

, then it will result after running the above UPL treeprocessor rule in the following:

<uci:marker>(a)</uci:marker><uci:item uci:numberingtext="(a)">Item a</uci:item>
<uci:marker>(b)</uci:marker><uci:item uci:numberingtext="(b)">Item a</uci:item>
…

10.13. insert-nodes

insert-nodes(target as String, mode as Id, xmlsource as String) as Bool

This function inserts the parsed XML xmlsource document fragment in accordance with the specified mode relative from each of the target nodes selected by the target XPath 1.0 expression.

The mode parameter can have the following values:

BEFORE

the document fragment is inserted immediately before each target node as its preceding sibling

AFTER

the document fragment is inserted immediately after each target node as its following sibling

Since xmlsource is parsed before being inserted into the document tree, it must be well-formed.

Any namespaces and prefixes used in the xmlsource string must be declared and in-scope in the UPL program at the calling position of the function.

The function returns true when the target expression selects at least one target node, false otherwise.

Note

You cannot call this method with a target that is the document root element or any other target node that does not allow the document fragment to be inserted as its previous or following sibling. Trying to do so will result in an exception being thrown.

Example 5.35. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:par)] {
  insert-nodes( "descendant::uci:inline", BEFORE, "<uci:start/>" );
  insert-nodes( "descendant::uci:inline", AFTER, "<uci:end/>" );
}

will surround each descendant uci:inline element of a paragraph by the empty elements uci:start and uci:end. When the document tree looks like

<uci:par>This is 
  <uci:inline csso:font-weight="bold">bold and 
    <uci:inline csso:font-style="italic">bold-italic</uci:inline>
  </uci:inline> text.</uci:par>

(indented for clarity), then it will result after running the above UPL treeprocessor rule in the following:

<uci:par>This is 
  <uci:start/><uci:inline csso:font-weight="bold">bold and 
    <uci:start/><uci:inline csso:font-style="italic">bold-italic</uci:inline></uci:end/>
  </uci:inline><uci:end/> text.</uci:par>

10.14. mark-split

mark-split(where as Id, condition as BoolExpression, mode as Id) as Void

This function puts a split marker onto the context node which indicates various properties of a tree splitting action to be performed after the tree traversal of this UPL Tree Processor run.

The parameter where indicates where in relation to the context node the split should be performed:

BEFORE

split the tree immediately before the context node

AFTER

split the tree after the context node

BOTH

split the tree both, before and after the context node

OFF

remove any previously set splitting marks on this node; the parameters condition and mode are not used

The parameter condition identifies the point up to which node in the ancestor chain the tree should be split. That node is identified as the first node in the ancestor axis (starting from the context node) for which the specified condition evaluates to true.

The mode parameter determines how the node identified by the condition expression is to be interpreted:

SPLIT

the node identified is the last one in the ancestor chain to get split

BELOW

the node identified is the first one in the ancestor chain that must not get split

Example 5.36. 

Assuming the following XML source

<document>
  <par>ABC<br/>DEF.</par>
  <par>AB<span class="a">C<br/>D</span>EF.</par>
  <par>AB<i class="b"><br/>C</i>DEF.</par>
  <par>AB<em class="c">C<br/></em>DEF.</par>
</document>

, with an UPL rule of

[element(br)] {
  mark-split( AFTER, element(span) or element(par), SPLIT );
}

the following result is achieved after tree traversal and performing the actual splitting:

1  <document>
2    <par>ABC<br/></par><par>DEF.</par>
3    <par>AB<span class="a">C<br/></span><span class="a">D</span>EF.</par>
4    <par>AB<i class="b"><br/></i></par><par><i class="b">C</i>DEF.</par>
5    <par>AB<em class="c">C<br/></em></par><par>DEF.</par>
6  </document>

Explanation:

line 2: The nearest ancestor of the br element that first satisfies the condition is the par element. Therefore, that is the topmost one to get split into two.

line 3: The nearest ancestor of the br element that first satisfies the condition is the span element. Therefore, that is the topmost one to get split into two. Note how the attribute is automatically copied to the cloned span node as well.

line 4: The nearest ancestor of the br element that first satisfies the condition is the par element. Therefore, that is the topmost one to get split into two. Note how the i element is cloned (incl. its class attribute) as required.

line 5: The nearest ancestor of the br element that first satisfies the condition is the par element. Therefore, that is the topmost one to get split into two. Although the split takes place after the br element, but still within the em element, that is not found as a clone in the second, cloned par element because it is completely empty.


10.15. name

name() as String

Returns the qualified name of the context node.

Note

This returns the qualified name of the context node as it is specified in the source document. This means that any namespace prefixes are the ones declared and used in the internal document, not the ones declared and used in the UPL code.

10.16. processing-instruction

processing-instruction() as Bool

Tests whether the context node is a Processing Instruction node.

10.17. remove-attrs

remove-attrs(attrNamePattern as Id) as List

Removes a named single attribute or a number of attributes matching a pattern from the context node. The attribute name or pattern can be specified in attrNamePatt.

When attrNamePatt is a qualified attribute name, that attribute is removed.

When attrNamePatt is nsprefix:* , all attributes in the specified namespace with prefix nsprefix are removed.

When attrNamePatt is *:attname, all attributes with the local name attname are removed from the node, regardless of the namespace they might be in. This does not include the null namespace!

When attrNamePatt is :*, all attributes are removed from the context node that are in the null namespace.

The method returns a List of Ids representing the qualified names of all attributes that were actually removed from the context node.

If attrNamePatt does not exist or does not match any value present on the context node, the method does nothing.

To remove all attributes, regardless of name and/or namespace, use the function remove-attrs() with no parameters.

10.18. remove-attrs

remove-attrs() as List

Removes all attributes from the context node, regardless of (local) name and/or namespace.

The method returns a List of Ids representing the qualified names of all attributes that were actually removed from the context node.

10.19. rename-element

rename-element(newName as Id) as Void

When the context node is an element, renames that element to the specified newName.

10.20. replace-with-children

replace-with-children() as Void

This function replaces the context node by its children, i.e. it effectively removes the context node from the tree, moving its children onto its parent.

Note

The context node is not removed until the next context node is selected during processing, as otherwise no context node would be available during processing. However, it does no longer have children, which will have been already moved directly after it in its sibling axis. Keep this in mind when performing further actions or evaluating XPath expressions from the context node after calling this function.

10.21. replace-with-text

replace-with-text(data as String) as Void

This function replaces the context node (including its descendants, if any), by a single Text node with the string contents as specified in the data parameter.

Example 5.37. 

<city>M<entity name="uuml"/>nchen</city>

With a rule like

[element(entity)] {
  if( @name="auml" ) {
    replace-with-text( "ä" );
  } else if( @name="ouml" )
    replace-with-text( "ö" );
  } else if( @name="uuml" )
    replace-with-text( "ü" );
  }
}

will result in the following XML:

<city>München</city>

10.22. set-attr

set-attr(attrName as Id, attrValue as any) as Void

Creates or sets an attribute of name attrName with value attrValue on the context element. The value is converted to a String before being set as if using the to-string() function on the value.

The function returns always true.

Calling this method on a non-element context node will throw an exception.

Important

You cannot set attributes in the css namespace http://www.infinity-loop.de/namespace/2006/upcast-css. Attributes in that namespace are read-only. Trying to set an attribute in this namespace will throw an EvalException.

10.23. set-attr

set-attr(target as String, attrName as Id, attrValue as any) as Void

Creates or sets an attribute of name attrName with value attrValue on all element nodes selected by the target XPath 1.0 expression (evaluated relative from the context node). The value is converted to a String before being set as if using the to-string() function on the value.

The function returns true when the target expression selects at least one target element, false otherwise.

Calling this method on a non-element context node will throw an exception.

Important

You cannot set attributes in the css namespace http://www.infinity-loop.de/namespace/2006/upcast-css. Attributes in that namespace are read-only. Trying to set an attribute in this namespace will throw an EvalException.

Example 5.38. 

#namespace uci "http://www.infinity-loop.de/namespace/2006/upcast-internal";

[element(uci:par)] {
  set-attr( "descendant::uci:image", inPara, true() );
}

will set the attribute inPara to value "true" on all descendant uci:image elements in a paragraph.


10.24. specifies

specifies(propertyName as Id) as Bool

This function returns true when on the current context node, the given attribute or CSS property propertyName is actually specified (in contrast to the value being also available by inheritance when it was specified at some ancestor element), and false otherwise.

The special upCast namespaces for CSS properties are handled in the following manner:

csso:<propname>

The function returns true when on the context node, the CSS property propname is explicitly specified as a local style override on that node.

cssc:<propname>

The function returns true when on the context node, the CSS property propname is specified by way of a reference to a named style rule. This means that as a consequence, this node also specifies a uci:class attribute, i.e. specifies( uci:class ) is also always true on such a node.

css:<propname>

The function returns true when the property propname was explicitly specified on that node either by a local style override or a named style reference. Effectively, this is a shortcut for (specifies( csso:propname ) or specifies( cssc:propname )).

Example 5.39. 

<uci:inline uci:diffStyle="color: red;">
  <uci:inline uci:diffStyle="font-weight: bold;">Text that is red and bold.</uci:inline>
</uci:inline>

Given the XML above,

specifies( csso:color )

returns true for the outer uci:inline element and false for the inner uci:inline element. Contrast this to querying the color value, where

@css:color

will return "red" for both the outer and inner uci:inline element because the color is inherited from the outer element to the inner.


10.25. string

string() as String

Returns the concatenated PCDATA content of the descendant Text nodes of the context node (i.e. "the text content").

10.26. text

text() as Bool

Tests whether the context node is a Text node.

11. Numeric functions

11.1. abs

abs(val as Numeric) as Numeric

Returns the absolute value of the Numerics val.

Example 5.40. 

abs( -3 )

returns 3

abs( -1.5in )

returns 2160tw (=1.5in)

abs( 12.67 )

returns 12.67


11.2. max

max(v1 as Numeric, v2 as Numeric) as Numeric

Returns the maximum of the two Numerics v1 and v2. Both must have the same power.

Example 5.41. 

max( -3, 5.2 )

returns 5.2

max( 1in, 15mm )

returns 1440tw (=1in)

max( 10, 20mm )

throws an exception, because the two values do not have the same power


11.3. min

min(v1 as Numeric, v2 as Numeric) as Numeric

Returns the minimum of the two Numerics v1 and v2. Both must have the same power.

Example 5.42. 

min( -3, 5.2 )

returns -3

min( 3cm, 1in )

returns 1440tw (=1in)

min( 10, 20mm )

throws an exception, because the two values do not have the same power


12. Other functions

12.1. app-buildnumber

app-buildnumber() as Numeric

This function returns the build number (as integer) of the upCast application running it.

You can use this to determine whether the build is high enough (or within a range) in which all required functions will be available, or you can make sure that you are running on a build where some important bug fix your code relies on is implemented.

12.2. debug

debug(items as Value, ...) as Void

Outputs the string values of the individual elements in items in a special debug format on the system console.

12.3. delay

delay(milliseconds as Numeric) as Void

This method pauses execution for the specified number of milliseconds.

Example 5.43. 

This lets you create a simple watched folder functionality from right within UPL, where you process any files in a folder every 60 seconds:

function watchFolder( $folder as String ) {
  while( true() ) { // Loop forever until user clicks Stop
    variable $files as List := { };
    variable $f as String :="";

    $files := list-files( $folder );
    for-each( $f in $files ) {
      process-file( $f );
    }

    delay( 60000 ); // Wait 60 seconds
  }
}

12.4. entering

entering() as Bool

This function lets you query the processing state of the rule for the current node. It returns true when the processing state is on entering the node, i.e. before processing its children, and false otherwise.

Note

You must enable leaveEvents support in UPL before you can take different actions in a rule depending on whether you enter or leave a node. leaveEvents is false (=off) by default, and entering() will always return true in this case.

12.5. eval-xpath

eval-xpath(xpathExpression as String) as List

This function lets you evaluate an XPath 1 expression against the internal document tree, the the XPath context node being the same as the UPL context node.

The result is always a List.

It contains an element for each of the list of items selected by the XPath expression.

When no items have been selected by the XPath expression, an empty List is returned.

The items returned by the XPath engine are converted to UPL values as follows:

XPath result type

UPL value type

Remarks

Document
Element

String

returns the XML serialization of the document or element, without XML declaration, UTF-8 encoding

Attribute

String

the value of the attribute

Text

String

the text content of the Text node

java.lang.Boolean

Bool

java.lang.Integer
java.lang.Long
java.lang.Float
java.lang.Double

Numeric

any other

String

created by performing toString() on the returned Java object representation

The implementation currently uses the Jaxen XPath engine.

12.6. get-environment-value

get-environment-value(key as String) as Value

This function lets you query several environment variables. Available key values and their meaning are described here.

12.7. get-outline-level

get-outline-level() as Numeric

Returns the outline level (as specified in an RTF document imported by the RTF Importer) of the context node, if it is a paragraph. The returned value is an integer between 1 and 9 for a paragraph if it has an outline level assigned.

The function returns 0 for a paragraph at body text level or if the context node is not a paragraph element.

12.8. get-realm-value-names

get-realm-value-names(realm as Id) as List

Returns a List of all variable names stored and available in the specified realm.

This method is only defined for the realms pipeline and module.

12.9. get-rulemode

get-rulemode() as String

Returns the currently set rule mode. See set-rulemode().

12.10. get-var

get-var(varname as Id) as Value

Returns the value of the in-scope UPL variable with the same name as the Id passed in varname.

If (and only if) the namespace of the Id is one of the variable realm namespaces, the value returned is the same as $varname would have returned. This construction is useful when the variable's name to fetch was calculated at runtime or retrieved from get-realm-value-names(). When the specified realm variable does not exist, an EvalException is thrown. You can test for the existence of a realm variable using exists-var(). The method performs a type coercion for realm variables according to the following table:

Source type

Returned type

java.lang.Boolean

Bool

java.lang.Double

Numeric

java.lang.Float

Numeric

java.lang.Integer

Numeric

java.lang.String

String

java.util.List

List

java.lang.Object

String

null

Null

Example 5.44. 

get-var( pipeline:SourceFile )

will return the current value of the pipeline variable SourceFile.


12.11. hoist-single-listpar

hoist-single-listpar() as Void

This method, when called a uci:par element context node and for which single-listpar-level returns a value greater than zero, removes all surrounding list structures and moves the leaf paragraph to the top (replacing the former top-level uci:list item).

Example 5.45. 

Calling the method

hoist-single-listpar()

on the uci:par element as context node with a structure like this:

<uci:list>
    <uci:item>
        <uci:par>Heading 1</uci:par>
    </uci:item>
</uci:list>

as a result the document tree after invocation of that function will look like this:

<uci:par>Heading 1</uci:par>

A potentially useful sequence of code would be a rule like this:

[element(uci:par) and single-listpar-level() > 0]
{
  set-heading-level( single-listpar-level() );
  hoist-single-listpar();
}

This will effectively make a heading (with corresponding sectioning in a subsequent Sectioner module in the pipeline) for all "headings" constructed in this way.

Tip

To reduce the possibility of making single-item, "real" lists a heading, you may want to add some more conditions to the selector, e.g. base your decision also on font size of the paragraph, length of text (headings are usually rather short) or similar.


12.12. leaving

leaving() as Bool

This function lets you query the processing state of the rule for the current node. It returns true when the processing state is on leaving the node, i.e. after having processed its children, and false otherwise.

Note

You must enable leaveEvents support in UPL before you can take different actions in a rule depending on whether you enter or leave a node. leaveEvents is false (=off) by default, and leaving() will always return false in this case.

12.13. markup-regex

markup-regex(regExpr as String, markupActions as List) as Bool

This function evaluates the regular expression on the plain character content of all descendants of the context node, optionally creating markup over the matched character sequences. This is similar to matches-list() with the source string being the CDATA content of the context node, but you can additionally specify one of several predefined actions to perform for each matching group. These actions are:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent level of the group, using the specified type

group-deep(type)

group as direct child of the context node, using the specified type

delete-shallow()

delete the group's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the group's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), the replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified is parameter must have the following signature:

function functionname( $current-group as String, $position as Numeric, $groups as List) as String

where:

current-group

is the text of the current group, i.e. the group for which the function is called

position

is the number of the group, with 0 being the complete matched pattern, and a value between 1 and n (the number of defined groups in the pattern) the groups from left to right in the pattern

groups

is a List of Strings of all the matched groups in the pattern, with the first element being the complete match of the pattern, and indices 2 (first group) to n+1 (n being the number of defined groups in the pattern) their respective text contents. Note that due to List indices starting at 1, the following holds: $current-group = value-at( $groups, $position + 1 )

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id.

The function returns true when there was at least one match for the regular expression, false otherwise.

Note on repeating groups

This function is backed by Java's regular expression support. As such, please note that you do not get a match for each occurrence of a repeating group, but only the last repetition that matched.

Example 5.46. 

With the context node

<uci:par>1.2.3.</uci:par>

and the call

markup-regex( "([0-9]\\.)*", { "ignore()", "group-shallow(levelnum)" } )

the result will be

<uci:par>1.2.<uci:inline uci:type="levelnum">3.</uci:inline></par>

The pattern group matches on "1." and "2." will not be grouped as only the last match of the repeating group is available from the regular expression engine.


We therefore recommend not to use repeating groups in patterns at all or – if you are aware of this limitation – only for groups whose associated action is "ignore()".

The workings of this function is best described with an example. Suppose we have the following XML fragment, the context node being the <p> element:

<p>[ID42] A paragraph with an <link>http://www.abc.de/<ins>new/ link</ins></link>.</p>

Let's have a look at some example applications of markup-regex():

Example 5.47. Marking up a textual ID

Let's say we want to markup the text ID42 with an element, at the same time discarding the surrounding square brackets. We'd write code like this:

markup-regex( "(\\[)(ID\\d+)(\\])",
    { "ignore()", "delete-shallow()", "group-shallow(myid)", "delete-shallow()" } );

We make three groups out of a matching string, (1) the leading square breacket, (2) the actual ID string, and (3) the trailing square bracket. We must therefore give four actions in the actions list, where the first action is the one to apply on the complete pattern match.

We do nothing with the complete pattern match, therefore we specify "ignore()". We want to delete the first matching pattern group (the square bracket), therefore we specify for the first group the action "delete-shallow()". Follows the ID group, when we want to markup using an inline element with type attribute value of myid. We therefore use "group-shallow(myid)". As with the first group, we want to delete it, so we again use "delete-shallow()".

The result after executing the above markup-regex() function will be:

<p><inline type="myid">ID42</inline> A paragraph with an <link>http://www.abc.de/<ins>new/ link</ins></link>.</p>

Example 5.48. Marking up the complete URL string (shallow)

Now, we want to markup just the full URL string, i.e. the text http://www.abc.de/new/. We'll use markup-regex() as follows:

markup-regex( "http[^\\s]+", { "group-shallow(url)" } );

As you see, we detect links by their prefix "http". We then make the simple assumption that a link ends at the first whitespace character following. Now, you'll see that the start and end of the character run we wish to markup will not form a well-formed subdivision in the XML fragment – so, what happens?

Here's a step-by-step schema of what's happening internally:

In Step 1, the complete sequence of text node children is considered for matching against the regular expression, and the character offsets for start and end points of a matching group is determined.

In Step 2, the splitting of nodes is performed. Since it is a shallow grouping action, the function will only split and cut up to the first common ancestor of both the matching group's start and end point, which in this example is the link element. It is important to see that for this splitting action, the ins element must be duplicated (which is exactly what happens internally). The algorithm tries to take care of any known ID type attributes on any split elements and remove them from the copy so that the document remains valid. All meta data attached to the node being split is also copied.

In Step 3, a new grouping element is created. This is either block (when splitting above paragraph level), or – as in our example – inline if the split is in mixed content (inline level). Both of these elements will reside in the upCast Internal namespace with the default prefix uci. The grouping element gets exactly one attribute, uci:type, which holds as value the type you specified in the action.

The resulting XML serialization would look like:

<p>[ID42] A paragraph with an <link><inline type="url">http://www.abc.de/<ins>new/</ins></inline><ins> link</ins></link>.</p>

Example 5.49. Marking up the complete URL string (deep)

Finally, we want to markup just the full URL string, i.e. the text http://www.abc.de/new/. We'll use markup-regex() as follows:

markup-regex( "http[^\\s]+", { "group-deep(url)" } );

The difference between this and the preceding example is that we now do a group-deep(), i.e. we want to make sure the grouping element is an immediate child of the content node. In this case, we need to cut through the tree until we reach the context node. Here's a step-by-step schema of what's happening internally:

In Step 1, the complete sequence of text node children is considered for matching against the regular expression, and the character offsets for start and end points of a matching group is determined.

In Step 2, the splitting of nodes is performed. Since it is a deep grouping action, the function will split and cut up the ancestor axis until the context node p is reached. It is important to see that for this splitting action, the ins and the link elements must be duplicated (which is exactly what happens internally).

In Step 3, a new grouping element is created as described.

The resulting XML serialization would look like:

<p>[ID42] A paragraph with an <inline type="url"><link>http://www.abc.de/<ins>new/</ins></link></inline><link><ins> link</ins></link>.</p>

Note

Note that in the UPL example code above, we needed to quote the backslash character because it is used in UPL as an escape character.

Example 5.50. Example: Marking up levels in a multi-level numbering string

Here's an example of how you might mark up the individual numbering levels in a multi-level numbering string where the separator character is the dot (.). Suppose we have the following context node:

<uci:par>2.4.3 Using markup-regex()</uci:par>

You may then markup the individual components of the numbering (Word supports nesting up to 9 levels deep, but we show only 5 here for clarity) using the following code:

markup-regex( "^(\\d+\\.?)(\\d+\\.?)?(\\d+\\.?)?(\\d+\\.?)?(\\d+\\.?)?", { "ignore()", "group-shallow(level1)", 
    "group-shallow(level2)", "group-shallow(level3)", "group-shallow(level4)", "group-shallow(level5)" } );

This will work for all numbering strings that contain at max 5 levels. It will work unchanged with any numbering strings that have less levels, as all groups except for the first one are optional.

The result of the above will be:

<uci:par><uci:inline uci:type="level1">2.</uci:inline><uci:inline uci:type="level2">4.</uci:inline><uci:inline uci:type="level3">3</uci:inline> Using markup-regex()</uci:par>

Example 5.51. Custom replace function: Keeping markup intact

Suppose the following XML:

<root>
  <p>fax 555-1234</p>
  <p>Fax 555-1234</p>
  <p>FAX 555-1234</p>
  <p><b>f</b>ax 555-1234</p>
</root>

We want to normalize the writing of "fax" to full uppercase, but keep any formatting intact. Note that in the last p, the 'f' of "fax" is bolded. We might use the following UPL code in a UPL Tree Processor:

[element(p)]{
  // normalize "Fax" et al. into "fax", keeping style properties intact
  markup-regex( "([fF])([aA])([xX])\\s", { "ignore()","replace-custom-shallow(replace-with-uppercase)", "replace-custom-shallow(replace-with-uppercase)", "replace-custom-shallow(replace-with-uppercase)" } );
}

function replace-with-uppercase( $current-group as String, $position as Numeric, $groups as List ) as String 
{
  return upper-case( $current-group );
}

This yields the result

<root>
  <p>FAX 555-1234</p>
  <p>FAX 555-1234</p>
  <p>FAX 555-1234</p>
  <p><b>F</b>AX 555-1234</p>
</root>

i.e. it normalizes "FAX", keeping its style properties (even of individual characters) within the word.


12.14. print

print(items as Value, ...) as Void

This function prints all its parameters to stdout (using System.out.print()).

For parameters other than String, the output will be as if passing the parameter to the to-string() function first.

12.15. println

println(items as Value, ...) as Void

This function prints all its parameters to stdout (using System.out.print(…)) and finally appends the newline character for the platform we are running on (using System.out.println()).

For parameters other than String, the output will be as if passing the parameter to the to-string() function first.

12.16. run-module

run-module(moduleID as Id, parameters as List) as Value

This method allows you to run a regular upCast pipeline module instance from anywhere within an UPL function or action part.

The moduleID parameter lets you set the module class you wish to run an instance of by ID. These are as specified in the upCast manual for the various modules:

  • pipelinevars

  • rtfimport

  • upl

  • uplcode

  • sectioner

  • grouper

  • xmlexport

  • css

  • commandline

  • unicodetranslator

  • validator

  • rtfexport

  • xslt

  • xmlimport

  • extpipeline

To set parameters on the created instance of the module, the function expects a List of Lists as its second parameter. Each element of the outer list represents one parameter, which itself is represented by an ordered, two-element list { name, value }.

The parameter list can be constructed programmatically, or specified as a constant directly in the call like in this

Example 5.52. 

Call the XSLT Processor module directly from within UPL code:

#namespace pipeline "http://www.infinity-loop.de/namespace/upcast-realm/pipeline";

run-module( xslt,
  {
    { "SourceFile", $pipeline:SourceFile },
    { "Stylesheet", $pipeline:PipelineBase + "/Resources/xslt/transformation.xsl" },
    { "DestinationFile", $pipeline:DestinationFolder + "/out.xml" },
    { "XSLTProcessor", "saxon" },
    { "StylesheetParameters", "debugmode=\"1\" rootelem=\"document\"" }
  }
);

The function returns an UPL value depending on the module. Currently, this is true for a successful execution.

The function throws an EvalException when there were errors during executing the module.

12.17. set-grouping

set-grouping(groupingFlag as Bool) as Void

This method, when applied to an uci:part element, determines whether that uci:part should group all contents up to the next occurrence of an uci:part, (groupingFlag set to true) or whether it should be just an empty element serving as position marker for the original section break in the imported RTF document (groupingFlag set to false).

The actual grouping is performed by a subsequent Sectioner module running on the internal tree.

12.18. set-heading-level

set-heading-level(level as Numeric) as Void

Sets the uci:heading-level attribute on the context node (if it is an uci:par element; otherwise, this method does nothing). This information is used by a subsequent Sectioner module to create a corresponding uci:section element nesting.

12.19. set-process-children

set-process-children(doProcess as Bool) as Void

This method lets you set whether you want the context node's children be processed. When doProcess is true, the UPL processor will continue processing the context node's children. When it is false, processing will continue with the following (as per XPath use of this term) node.

This method is only useful when called on entering node processing state (as on leaving, children will already have been processed). The default value for each new context node is true, i.e. children will be processed.

12.20. set-rulemode

set-rulemode(ruleMode as String) as Void

Sets the rule mode for this rule.

Each rule can decide on how to proceed when its actions have been executed. Normally, no further rules in the UPL program are considered, but the next node (resp. mode, when leave events are enabled and the execution mode changes) in document order is chosen and the UPL applied. However, you can also force to continue evaluating subsequent (in order specified) selectors, completely abort UPL processing or jump to a specific, labelled rule.

"break"

stop processing rules for the current context node and mode, and continue with the next mode or the next document node in document order. This is the default when not explicitly specified in a rule and when the rule mode has not been overridden using a #set defaultRuleMode directive, in which case the latter is used for th default.

"continue"

continue processing the rules list and execute the actions of the next matching rule (if that exists)

"exit"

same as "break", but also stops any further tree traversal, meaning that this was the last node in the tree for which UPL rule application has been performed

"jump:label"

like continue, but proceed processing on the rule prefixed by the label label.

The rule mode is reset to its default value (either the value specified using a #set defaultRuleMode directive or, if that is not specified, "break") whenever the context node or execution mode changes.

12.21. set-var

set-var(name as Id, value as Value) as Value

Lets you set an UPL variable of name name to the new value value.

Example 5.53. 

set-var( pipeline:SourceFile, "abc.txt" );

is equivalent to

$pipeline:SourceFile := "abc.txt";

You should always prefer the $varname := value notation except for those cases where the variable name is dynamically calculated at runtime.

12.22. single-listpar-level

single-listpar-level() as Bool

This method returns a value greater than zero if and only if the context node is an uci:par element and it is the only leaf element in a nested uci:list/uci:item structure.

Some authors use plain list numbering (instead of heading styles with outline level property) to create headings. When imported by the RTF Importer module, a structure similar to this one is created in this case:

<uci:list>
    <uci:item>
        <uci:par>Heading 1</uci:par>
    </uci:item>
</uci:list>

for what they want to be a heading at level 1. The above method will return 1 for this example when called with the uci:par node as context node, because that paragraph has no siblings and it is the only leaf within the uci:list/uci:item structure, and it is at the first uci:list nesting level.

You can use this for finding possible cases of thusly constructed "headings" and use that information in tandem with the flattening counterpart method, hoist-single-listpar().

12.23. stop

stop() as Void

This method immediately stops execution of the current pipeline module by throwing an EvalException.

12.24. stop

stop(scope as Id) as Void

This method immediately stops execution of the current UPL program by throwing an EvalException. You can choose whether you only want to stop the execution of the currently running module, or the whole pipeline using the scope parameter:

MODULE

stops the currently running module by throwing an EvalException

PIPELINE

stops the currently running pipeline by throwing an EvalException and setting the internal cancel flag, i.e. to the application it looks like the user has additionally clicked the Cancel button, aborting any further pipeline execution.

12.25. test

test() as String

Currently, there is no documentation available for this function.

12.26. throw

throw(exceptionType as Id) as Null

Throws an exception of the type set in exceptionType.

12.27. unique-timestamp

unique-timestamp() as String

This method creates a unique time stamp (within this Java Virtual Machine instance) of the form n…n-nnnn, with n being a decimal digit.

The first part of the time stamp identifier is determined by calling System.currentTimeMillis(), the second, four-digit part is generated by a ring counter incremented for each call to this method to make up for a lower, system dependent millisecond resolution.

12.28. unmangle-string

unmangle-string(mangledString as String) as List

Some internally generated attributes that can have a dynamically varying number of components are mangled in a proprietary way before storing them as a textual value in an element attribute. This method lets you convert that mangled string into a List that can be used for further processing in UPL.

mangledString is the mangled string that should be converted into a List.

When applying this method to a string value that not actually is mangled, it will return a one-element List with that value as its sole element.

12.29. wl-convert-doc-to-rtf

wl-convert-doc-to-rtf(sourcedoc as String, destrtf as String, command as String, timeout as Numeric) as Bool

This function uses the WordLink component to convert a Word binary (.doc) file to the equivalent RTF file.

The source Word binary file's absolute path must be specified in sourcedoc.

The desired RTF result file's absolute path must be specified in destrtf.

You can specify the WordLink command string to be additionally executed using the command parameter. Available commands are (case-sensitive!):

Pages

create page marker bookmarks according to the current document formatting into pages

Update

updates any contained fields

Premacro

runs the VB macro named il_premacro (if defined)

Lines

create line marker bookmarks according to the current document formatting into lines

Includelinkedimages

images only linked to the document will be converted to embedded images

Updatelinks

hypelinks will be updated/re-created

Concatenate the commands without any whitespace inbetween in the desired order of execution before exporting to RTF.

In timeout, you can specify a maximum timeout in seconds the conversion is allowed to take. After that time, the command is aborted and false is returned.

The function returns true when the conversion to RTF was performed successfully, false otherwise (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.54. 

wl-convert-doc-to-rtf( "/word/doc/test.doc", "/word/doc/converted.rtf", "UpdateIncludelinkedimagesPages", 60 );

will try to convert test.doc to converted.rtf, first updating fields, including linked images and marking pagebreaks, with a timeout of 60 seconds.


12.30. wl-convert-doc-to-rtf

wl-convert-doc-to-rtf(sourcedoc as String, destrtf as String, command as String) as Bool

This function uses the WordLink component to convert a Word binary (.doc) file to the equivalent RTF file.

The source Word binary file's absolute path must be specified in sourcedoc.

The desired RTF result file's absolute path must be specified in destrtf.

You can specify the WordLink command string to be additionally executed using the command parameter. Available commands are (case-sensitive!):

Pages

create page marker bookmarks according to the current document formatting into pages

Update

updates any contained fields

Premacro

runs the VB macro named il_premacro (if defined)

Lines

create line marker bookmarks according to the current document formatting into lines

Includelinkedimages

images only linked to the document will be converted to embedded images

Updatelinks

hypelinks will be updated/re-created

Concatenate the commands without any whitespace inbetween in the desired order of execution before exporting to RTF.

The (default) timeout used for the command is 300 seconds (= 5 minutes).

The function returns true when the conversion to RTF was performed successfully, false otherwise (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.55. 

wl-convert-doc-to-rtf( "/word/doc/test.doc", "/word/doc/converted.rtf", "UpdateIncludelinkedimagesPages" );

will try to convert test.doc to converted.rtf, first updating fields, including linked images and marking pagebreaks, with a timeout of 5 minutes (=the default timeout).


12.31. wl-convert-doc-to-rtf

wl-convert-doc-to-rtf(sourcedoc as String, destrtf as String) as Bool

This function uses the WordLink component to convert a Word binary (.doc) file to the equivalent RTF file.

The source Word binary file's absolute path must be specified in sourcedoc.

The desired RTF result file's absolute path must be specified in destrtf.

The (default) timeout used for the command is 300 seconds (= 5 minutes).

The function returns true when the conversion to RTF was performed successfully, false otherwise (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.56. 

wl-convert-doc-to-rtf( "/word/doc/test.doc", "/word/doc/converted.rtf", "UpdateIncludelinkedimagesPages" );

will try to convert test.doc to converted.rtf with a timeout of 5 minutes (=the default timeout).


12.32. wl-convert-rtf-to-doc

wl-convert-rtf-to-doc(sourcertf as String, destdoc as String, timeout as Numeric) as Bool

This function uses the WordLink component to convert a RTF file to a Word binary (.doc) file.

The source RTF file's absolute path must be specified in sourcertf.

The desired Word binary result file's absolute path must be specified in destdoc.

In timeout, you can specify a maximum timeout in seconds the conversion is allowed to take. After that time, the command is aborted and false is returned.

The function returns true when the conversion was performed successfully, false otherwise (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.57. 

wl-convert-rtf-to-doc( "/word/doc/test.rtf", "/word/doc/converted.doc", 60 );

will try to convert test.rtf to converted.doc with a timeout of 60 seconds.


12.33. wl-convert-rtf-to-doc

wl-convert-rtf-to-doc(sourcertf as String, destdoc as String) as Bool

This function uses the WordLink component to convert a RTF file to a Word binary (.doc) file.

The source RTF file's absolute path must be specified in sourcertf.

The desired Word binary result file's absolute path must be specified in destdoc.

The (default) timeout used for the command is 300 seconds (= 5 minutes).

The function returns true when the conversion was performed successfully, false otherwise (e.g. when running on a system where WordLink is not available or no Word application is installed; see also get-environment-value() with key wordlink-wordversion).

Example 5.58. 

wl-convert-rtf-to-doc( "/word/doc/test.rtf", "/word/doc/converted.doc" );

will try to convert test.rtf to converted.doc with a default timeout of 300 seconds.


13. Functions for working with styles

13.1. %

%(condition as BoolExpression) as Numeric

This method calculates the percentage of descendant text characters for which the passed boolean condition evaluates to true.

Conceptually, each descendant character is treated as if it was a context node and the expression applied. If it evaluates to true, that character is flagged as fulfilling the condition. After all descendant characters of the context node have been flagged this way, the percentage of all characters that are flagged as fulfilling the condition is returned.

Example 5.59. 

Suppose we have the following XML fragment (indented for legibility):

<par style="font-weight: normal; font-size: 12pt">
  <inline style="font-size: 18pt">
    <inline style="font-weight: bold">Creation</inline>
    vs.
    <inline style="font-weight: bold">Destruction?</inline>
  </inline>
</par>

The context node is par. The following UPL function call

%(@css:font-weight="bold" and @css:font-size > 16pt)

will evaluate to the value (8 + 12) / 25 = 0.76.

The text content of the context node is (25 characters):

Creation vs. Destruction?
xxxxxxxx.....xxxxxxxxxxxx

with the characters marked x (8 and 12) fulfilling the condition by being typeset in bold and having a font size greater than 16pt .

You can use the above e.g. to find headings based on the assumption that at least 75% of their characters will be bold and have a font size greater than 16pt. So you could use the above in a rule like the following:

[element(par) and %(@css:font-weight="bold" and @css:font-size > 16pt) >= 0.75]
{
  set-heading-level(1); /* make this a heading of level 1 */
} 

13.2. markup-style

markup-style(condition as BoolExpression, matchAction as String) as Bool

This function evaluates the style condition condition on the plain character content of all descendants of the context node, optionally creating markup over contiguous runs of matching characters according to the matchAction, which can have the following values:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent level of the matching run, using the specified type

group-deep(type)

group as direct child of the context node, using the specified type

delete-shallow()

delete the matching run's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the matching run's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), the replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified is parameter must have the following signature:

function functionname( $current-run as String, $position as Numeric, $runs as List) as String

where:

current-run

is the text of the current run, i.e. the text run for which the function is called

position

This is always 0, since matched runs cannot nest. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

runs

is a List that contains as its sole element the current text run as String. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id.

This function works very similar to markup-regex(), except that not a regular expression, but a boolean expression on character properties is used for determining the individual groups of contiguous runs of characters (with respect to the result of the boolean expression calculated for each of them) to mark up.

The properties in the condition parameter that can be used are restricted to CSS style attributes only. This means you can only use (synthesized) attributes from upCast's css, cssc and csso namespaces. You cannot query regular attributes (like e.g. uci:diffStyle), since those real attributes are not inherited to text nodes (which cannot have attributes in the first place).

Therefore, the working of the actions is identical, just the mechanism for determining the groups resp. text runs is different. For a graphic of the result of the various actions, please see the examples for markup-regex().

The function returns true when there was at least one match for the regular expression, false otherwise.

13.3. markup-style

markup-style(condition as BoolExpression, matchAction as String, nonmatchAction as String) as Void

This function evaluates the style condition condition on the plain character content of all descendants of the context node, optionally creating markup over contiguous runs of matching characters according to the matchAction (for characters macthing the condition) and nonmatchAction (for characters not matching the condition) which can have the following values:

ignore()

ignore the group (i.e. leave everything as-is)

group-shallow(type)

group as child of the nearest common parent level of the matching run, using the specified type

group-deep(type)

group as direct child of the context node, using the specified type

delete-shallow()

delete the matching run's contents as if by doing a group-shallow(), then deleting that

delete-deep()

delete the matching run's contents as if by doing a group-deep(), then deleting that

replace-shallow(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-shallow(), the replacing that group as described

replace-deep(…xml source fragment…)

replaces the matching run's contents with the XML tree fragment parsed from the action's argument as if by doing a group-deep(), then replacing that group as described

replace-custom-shallow(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-shallow(), then replacing that group as described

replace-custom-deep(functionname)

replaces the matching run's contents with the XML tree fragment parsed from the String returned from the custom UPL function named like the action's argument as if by doing a group-deep(), then replacing that group as described

For actions replace-custom-shallow() and replace-custom-deep(), the custom UPL function whose name is specified is parameter must have the following signature:

function functionname( $current-run as String, $position as Numeric, $runs as List) as String

where:

current-run

is the text of the current run, i.e. the text run for which the function is called

position

This is always 0, since matched runs cannot nest. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

runs

is a List that contains as its sole element the current text run as String. This parameter is only there for compatibility reasons of this function with functions written for markup-regex().

The function must return a (well-formed, if elements are contained) XML fragment serialized to a String. That returned value is then parsed into an internal XML tree representation and replaces the complete group (incl. its children) that would have been created at that position in the tree.

For actions other than replace-…(), on grouping, the function will wrap the character run to be grouped by either an uci:inline element (when currently at inline level) or a uci:block element (when currently at block level) with an uci:type attribute that has the value of the specified type id.

This function works like markup-style(), except that you can also specify an action for characters not matching the boolean expression. It effectively partitions the complete CDATA content of the context node into contiguous groups of character runs that either match or do not match the expression.

The properties in the condition parameter that can be used are restricted to CSS style attributes only. This means you can only use (synthesized) attributes from upCast's css, cssc and csso namespaces. You cannot query regular attributes (like e.g. uci:diffStyle), since those real attributes are not inherited to text nodes (which cannot have attributes in the first place).

The following equivalence holds:

match-style( expr, matchAction )

has the same effect as if writing

match-style( expr, matchAction, "ignore()" )

The function returns true when there was at least one match for the regular expression, false otherwise.

14. Functions on Strings

14.1. codepoints-to-string

codepoints-to-string(codepoints as List) as String

Creates a String from codepoints, which must be a List of Numerics representing Unicode code points. Returns the zero-length String if codepoints is an empty List.

14.2. contains

contains(source as String, searchText as String) as Bool

Determines if the string source contains the text searchText as substring.

14.3. ends-with

ends-with(source as String, searchText as String) as Bool

Determines if the string source ends in the text searchText.

14.4. escape-characters

escape-characters(sourceString as String, escapeChars as String, escapingMode as Numeric) as String

Escapes all the characters listed in escapeChars in sourceString using the specified escapingMode.

Currently, there are defined the following esacping modes:

0

This mode escapes the desired characters using the backslash '\' character. This mode automatically escapes any backslash character already present in the string, you do not need to specify it in the escapeChars parameter.

Example 5.60. 

escape-characters( "A*back\slash", "*c", 0)

will return the string "A\*ba\ck\\slash".


14.5. format-numeric

format-numeric(num as Numeric, dimension as Id, precision as Numeric) as String

This method serves to format a Numeric value into a string. This is most useful for length valued Numerics, as the function includes automatic unit conversion in this case.

num is the Numeric to be formatted.

dimension is the target unit or dimension (as an Id) to convert the result into.

precision is the integer number of decimals to which the formatted value should be rounded.

Example 5.61. 

format-numeric( 1in, cm, 2 )

will return the String "2.54cm".

format-numeric( 45.67mm, cm, 2 )

will return the String "4.57cm".


14.6. index-of

index-of(text as String, substring as String) as Numeric

Returns the index within text of the first occurrence of substring. If substring does not occur, -1 is returned.

14.7. index-of

index-of(text as String, substring as String, fromIndex as Numeric) as Numeric

Returns the index within text of the first occurrence of substring, starting at the specified fromIndex. If substring does not occur, -1 is returned.

14.8. lower-case

lower-case(text as String) as String

Returns the passed string converted to all lower-case. The result is the same as if calling Java's java.lang.String.toLowerCase() on the source.

14.9. matches

matches(sourceString as String, regExpr as String) as Bool

Determines if the sourceString matches the regular expression regExpr (passed as String).

14.10. matches-list

matches-list(sourceString as String, regExpr as String) as List

This function returns all matches of regExpr in sourceString as a List of Lists.

The inner lists contain as their first element the complete matching subsequence, followed by all capturing groups as defined in the match-pattern. Each occurring match of the while pattern in sourceString creates one entry in the outer list.

14.11. normalize-space

normalize-space(sourceString as String) as String

Normalizes whitespace in the passed string argument and returns the normalized string. Whitespace normalization is performed by stripping leading and trailing whitespace and replacing a sequence of two or more whitespace characters by a single space.

14.12. parse-numbering

parse-numbering(numbering as String, format as List) as List

This function lets you parse a formatted, possibly multi-level number into the (list of) integers it represents.

The function throws an EvalException when the numbering string does not match the format.

It is intended to e.g. parse a textual list or heading numbering string to determine the resulting nesting level and nesting structure of the respective logical item. This can be useful when the nesting structure has not been marked up explicitly in the source document by applying appropriate styles or choosing descriptive markup und must be inferred solely from the textual numbering string present.

numbering contains the numbering string (which may be multi-level) to parse.

format is the list of Strings defining the expected format of numbering.

The individual items of format are classified either as separator tokens or numbering format tokens.

Separator tokens: '*' | '?' | '+' | literalcharseq

The tokens have the following meanings:

*

matches 0, 1 or more characters

?

matches 0 or 1 character

+

matches 1 or more characters

literalcharseq

matches exactly the specified sequence of characters. If you want to match *, ?, +, # or \ literally, you must quote them with a backslash (\).

Numbering format tokens: '#' ( 'i' | 'I' | 'a' | 'A' | '1' | 'h' | 'H' | 'b' | 'lower-greek' | 'upper-greek' )

Numbering tokens are identified by a leading hash mark (#). The tokens have the following meaning:

i, I

roman numbering, either all lowercase or uppercase, respectively

a, A

alphabetic numbering, either all lowercase or uppercase, respectively

1

decimal numbering

h, H

hex numbering, either all lowercase or uppercase, respectively

b

binary numbering (only 0 or 1 as a sequence)

lower-greek, upper-greek

greek numbering, either all lowercase or uppercase, respectively. See also: CSS3 Lists.

Additionally, each token may have options that are added each with a leading '/' character. No no whitespace is allowed in between options.

Separator token options:

/allowed=charseq

[only for wildcards *,?,+] charseq lists the characters allowed in a matching wildcard string. To include the forward slash (/) in the list of allowed characters, you need to quote it with a backward slash (\) as otherwise, it would be treated as the start marker of the next option.

Numbering format tokens:

/repeat=integer

indicates that each numbering character must be repeated the specified number of times. The default is 1. For a numbering like "aa", "bb" use the format token "#a/repeat=2".

/ignore-case

the case of the numbering characters is disregarded. Use this option when the case of numbering may vary without changing its semantics. Most useful in hex numbering format (h, H) when you are cannot be sure whether the numbers are specified in upper or lower case (or even mixed).

The result is a List of Numeric values, one for each numbering format token in format, with the (1-based) numeric value of the numbering string.

Example 5.62. Examples:

parse-numbering( "A.iv", { "#A/ignore-case", "*", "#i" } )

will return

{ 1, 4 }

 

parse-numbering( "1bb", { "#1", "*", "#a/repeat=2" } )

will return

{ 1, 2 }

 

parse-numbering( "#4", { "\\#", "#1" } )

will return

{ 4 }

 

parse-numbering( "1.2", { "#1", ".", "#1", ".", "#1" } )

will throw an EvalException because the format does not match the numbering (it containes more components than the input has).


14.13. process-adjacent-text

process-adjacent-text(where as Id, action as String, regex as String, borderCondition as BoolExpression) as Void

This function lets you process text content that is adjacent to the context node’s left or right sides (i.e., start or end). This can be either text within the element or outside – which one is determined by the action. The text content to be processed is specified using a regular expression.

The where parameter specifies at which side of the context node adjacent text should be processed:

LEFT

process text adjacent to the left side (=start) of the element

RIGHT

process text adjacent to the right side (=end) of the element.

The action parameter string specifies what to do with any adjacent text that matches the specified regular expression:

pull-text()

Text adjacent to the outside of the context node is pulled into the node and inserted as first (where=LEFT) or last (where=RIGHT) child.

pull-text(group)

Text adjacent to the outside of the context node is pulled into the node and inserted as first (where=LEFT) or last (where=RIGHT) child. Additionally, that text is surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group.

push-text()

Text adjacent to the inside of the context node is pushed out of the node and inserted as a left sibling (where=LEFT) or right sibling (where=RIGHT) Text node of the context node.

push-text(group)

Text adjacent to the inside of the context node is pushed out of the node and inserted as left sibling (where=LEFT) or right sibling (where=RIGHT) Text node of the context node. Additionally, that text is surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group.

markup-text-inside(group)

Text adjacent to the inside of the context node is extracted from the tree, surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group and re-inserted as first (where=LEFT) or last (where=RIGHT) child of the context node.

markup-text-outside(group)

Text adjacent to the outside of the context node is extracted from the tree, surrounded by an uci:inline (or uci:block element, respectively) with an uci:type attribute with value group and re-inserted as left sibling (where=LEFT) or right sibling (where=RIGHT) of the context node.

delete-text-inside()

Text adjacent to the inside of the context node at the respective side is deleted from the tree. Existing element structures are not changed, with the exception that any Text nodes becoming empty after deleting matching content will be removed from the tree as well.

delete-text-outside()

Text adjacent to the outside of the context node at the respective side is deleted from the tree. Existing element structures are not changed, with the exception that any Text nodes becoming empty after deleting matching content will be removed from the tree as well.

The text to match is specified by the regex parameter. The range of supported expressions is identical to the one supported by the Java classes in the java.util.regex package. You should not include start of text (^) or end of text ($) match codes in your regular expression. The checking for adjacency is automatically taken care of by this function.

The parameter borderCondition lets you specify a node in the ancestor axis from the context node which serves as bounding element for regex matches. That node is identified as the first node in the ancestor axis (starting from the context node) for which the specified condition evaluates to true. Any text outside the subtree of that node is not considered for an adjacent outside match.

Example 5.63. 

The typical use case for this highly specific function is to account for markup inaccuracies in the source. Let’s assume that the author was supposed to mark up year numbers in green, since you want to create an index on them later. However, the document you get looks like this:

<uci:par>In <uci:inline css:color="green">1999</uci:inline>, upCast development started and continues to date <uci:inline css:color="green">(2008)</uci:inline>.</uci:par>

You’ll notice that for the second year number, the author included the parentheses within the green markup, which is undesirable. To fix such markup mistakes, you could use the following UPL rule:

[element(uci:inline) and @css:color="green"]
  process-adjacent-text( LEFT, "push-text()", "\\(", element(uci:par) );
  process-adjacent-text( RIGHT, "push-text()", "\\)", element(uci:par) );
}

This will yield the following result:

<uci:par>In <uci:inline css:color="green">1999</uci:inline>, upCast development started and continues to date (<uci:inline css:color="green">2008</uci:inline>).</uci:par>

As you can see, the undesirable parentheses within the green year markup have been pushed out of the uci:inline element. Furthermore, that rule has no effect on markup that is already correct (as is the case with the year number 1999).


Here's a more complex example:

Example 5.64. 

Given the following XML:

<par>A number <bold>123</bold>4<italic>5 example</italic>.</par>

a rule definition like

[element(bold)] {
  process-adjacent-text( RIGHT, "group-text-outside(num)", "[0-9]*", element(par) );
}

will result in the following:

<par>A number <bold>123</bold><uci:inline uci:type="num">45</uci:inline><italic> example</italic>.</par>

Note how the action group-text-outside() only works on the text and does not change or take into account any element structures at a higher level. What happens is that in a first step, the adjacent digits are removed from the document, and are then re-inserted with the grouping uci:inline element wrapped around them. The italic info for the digit 5 is not retained.


Another example:

Example 5.65. 

Given the following XML:

<par>The <term><firstchar>A</firstchar>bc</term>.</par>

a rule definition like

[element(term)] {
  process-adjacent-text( LEFT, "push-text(caps)", "[A-Z]+", element(par) );
}

will result in the following:

<par>The <uci:inline uci:type="caps">A</uci:inline><term><firstchar></firstchar>bc</term>.</par>

Note again how the action push-text() only works on the text and does not change or take into account any element structures at a higher level. What happens is that in a first step, the capital letter A is removed from the document and then re-inserted with the grouping uci:inline element wrapped around it. The firstchar info is not retained and remains a (now) empty element within the term element.


14.14. replace

replace(source as String, pattern as String, replacement as String) as String

The function returns the String that is obtained by replacing each non-overlapping substring of source that matches the given pattern with an occurrence of the replacement string.

If two overlapping substrings of source both match pattern, then only the first one (that is, the one whose first character comes first in the source string) is replaced.

The regular expression syntax used and understood by this function is the same as for the implementation of the java.lang.String.replaceAll() function of the underlying Java VM.

For Java 1.4.2, the regular expression syntax is documented here.

Example 5.66. 

replace( "a green glass", "gr(e+)n", "b$1r" )

returns the String "a beer glass".


14.15. replace

replace(source as String, patternlist as List, replacementlist as List) as String

This function is similar to replace() except that it performs a sequence of replacement operation on source. The two Lists patternlist and replacementlist must contain an equal number of corresponding pattern/replacement Strings which are applied to source in order. In that, the following equivalence holds:

$result := replace( $s, { "ä", "ö", "ü" }, { "ae", "oe", "ue" } );

is equivalent to

$result := replace( replace( replace( $s, "ä", "ae" ), "ö", "oe" ), "ü", "ue" );

which is equivalent to

$result := replace( $s, "ä", "ae" );
$result := replace( $result, "ö", "oe" );
$result := replace( $result, "ü", "ue" );

Example 5.67. 

replace( "ä ö ö ü ü", { "ä", "ö", "ü" }, { "ae", "oe", "ue" } ) )

returns the String "ae oe oe ue ue".


14.16. starts-with

starts-with(source as String, searchText as String) as Bool

Determines if the string source starts with the text searchText.

14.17. string-join

string-join(values as List, separator as String) as String

Returns a String created by concatenating the string values of the members of list (applying to-string(), where necessary) using separator as the separator string. If the value of separator is the zero-length string, then the members of list are concatenated without a separator.

If the value of list is an empty List, the zero-length string is returned.

Example 5.68. 

string-join( { "a", "b", "c" }, ", " )

returns "a, b, c"

string-join( { "a", "b", "c" }, "" )

returns "abc"

string-join( { "a", { b1, b2, b3}, c, 5.123 }, "/" )

returns "a/b1 b2 b3/c/5.123". Note how the to-string() function implicitly called on the second list member (which itself is a list) creates a string separated by one whitespace character.


14.18. string-length

string-length(theString as String) as Numeric

Returns the length in characters of the passed string argument.

14.19. string-to-codepoints

string-to-codepoints(s as String) as List

Returns the List of Unicode code points (as Numerics) that constitute the String s. If s is a zero-length string, an empty List is returned.

14.20. substring

substring(sourceString as String, startingLoc as Numeric) as String

Returns the portion of the value of sourceString beginning at the position indicated by the value of startingLoc. The characters returned do not extend beyond sourceString. If startingLoc is zero or negative, only those characters in positions greater than zero are returned.

14.21. substring

substring(sourceString as String, startingLoc as Numeric, length as Numeric) as String

Returns the portion of the value of sourceString beginning at the position indicated by the value of startingLoc and continuing for the number of characters indicated by the value of length. The characters returned do not extend beyond sourceString. If startingLoc is zero or negative, only those characters in positions greater than zero are returned.

14.22. substring-after

substring-after(text as String, afterThis as String) as String

Returns the substring from text that follows the first occurrence of afterThis. If text does not contain afterThis, it returns the empty string.

14.23. substring-before

substring-before(text as String, beforeThis as String) as String

Returns the substring from text that precedes the first occurrence of beforeThis. If text does not contain beforeThis, it returns the empty string.

14.24. substring-tail

substring-tail(sourceString as String, length as Numeric) as String

Returns the last length characters of sourceString. When length is greater than the size of sourceString, the empty string is returned.

Example 5.69. 

substring-tail( "Hello", 2 )

returns "lo".

substring-tail( "abc", 5 )

returns the empty string.


14.25. substring-tail

substring-tail(sourceString as String, startingLoc as Numeric, length as Numeric) as String

Returns length characters (in direction to the beginning) of sourceString, starting at the startingLocth last character of sourceString. When startingLoc is smaller than 1, it is set to 1. When startingLoc is greater than the size of sourceString, the empty string is returned. When length is less or equal than zero, the empty string is returned. When length is greater than there are characters to the including and towards the beginnging of startingLoc, only the actually available characters are returned.

Example 5.70. 

substring-tail( "Hello", 2, 2 )

returns "ll".

substring-tail( "abc", 2, 5 )

returns "ab".

substring-tail( "abc", 4, 2 )

returns "".


14.26. upper-case

upper-case(text as String) as String

Returns the passed string converted to all upper-case. The result is the same as if calling Java's java.lang.String.toUpperCase() on the source.