1.0 Introduction
This document explains the parsing of the XML document residing in the application server on the DOM-based approach. The intention of this document is to describe the step by step design procedure to parse the XML document. As a result of which the XML file can be used as an input file for an inbound interface.
2.0 What is XML?
XML 1.0 is a subset of an existing, widely used international text processing standard (Standard Generalized Markup Language) intended for use on the World Wide Web. XML retains ISO 8879's basic features - vendor independence, user extensibility, complex structures, validation, and human readability - in a form that is much easier to implement and understand. XML can be processed by existing commercial tools and a rapidly growing number of free ones.
XML is primarily intended to meet the requirements of large-scale Web content providers for industry-specific markup, vendor-neutral data exchange, media-independent publishing, one-on-one marketing, workflow management in collaborative authoring environments, and the processing of Web documents by intelligent clients. It is also expected to find use in metadata applications. XML is fully internationalized for both European and Asian languages, with all conforming processors required to support the Unicode character set. The language is designed for the quickest possible client-side processing consistent with its primary purpose as an electronic publishing and data interchange format.
3.0 Implementation Considerations
Allowing integrating heterogeneous systems, more and more software vendors seem to realize the advantage of making their software XML-compatible. Typical implementation scenarios include EDI, workflow, e-commerce or media-independent publishing.
The SAP-System — providing access to these types of applications or data in more or less proprietary data formats today — is an ideal candidate to act as both a server and a client for other XML-enabled systems. All it needs is a way to give SAP applications programmatic access to XML documents.
4.0 Few Keywords related to the XML Processing
The Defination, Use, Structure and Integration procedures for the some keywords, related to the XML processing are discussed in this section.
4.1 DOM
Definition
The DOM presents documents as a hierarchy of "Node" objects that also implement other, more specialized interfaces. Some types of nodes may have child nodes of various types, and others are leaf nodes that cannot have anything below them in the document structure.
Use
The DOM also specifies a "NodeList" interface to handle ordered lists of Nodes, such as the children of a Node, or the elements returned by the Element:getElementsByTagName method, and also a NamedNodeMap interface to handle unordered sets of Nodes referenced by their name attribute, such as the Attributes of an Element. NodeLists and NamedNodeMaps in the DOM are "live", that is, changes to the underlying document structure are reflected in all relevant NodeLists and NamedNodeMaps. For example, if a DOM user gets a NodeList object containing the children of an Element, then subsequently adds more children to that element (or removes children, or modifies them), those changes are automatically reflected in the NodeList without further action on the user's part. Likewise changes to a Node in the tree are reflected in all references to that Node in NodeLists and NamedNodeMaps.
Structure
The package DOM contains the following interfaces: (1)if_ixml_att_list_decl, (2)if_ixml_attribute, (3)if_ixml_attribute_decl, (4) if_ixml_cdata_section, (5)if_ixml_character_data, (6)if_ixml_comment, (7)if_ixml_cond_dtd_section, (8)if_ixml_content_particle, (9)if_ixml_document, (10)if_ixml_document_fragment, (11)if_ixml_document_type, (12)if_ixml_element, (13)if_ixml_element_decl, (14)if_ixml_entity_decl, (15)if_ixml_entity_ref, (16)if_ixml_named_node_map, (17)if_ixml_namespace_context, (18)if_ixml_namespace_decl, (19)if_ixml_node, (20)if_ixml_node_collection, (21)if_ixml_node_filter, (22)if_ixml_node_filter_combining, (23)if_ixml_node_iterator, (24)if_ixml_node_list, (25)if_ixml_notation_decl, (26)if_ixml_pi, (27)if_ixml_pi_parsed, (28)if_ixml_pi_unparsed, (29)if_ixml_text.
Integration
The iXML library implements a superset of the W3C DOM Level 1 Core and XML specification as defined in the document PR-DOM-Level-1-19980818.
Differences between the DOM Level 1 Specification and the iXML implementation are usually:
1) Differences in method or interface names: The DOM specification is somehow inconsistent in naming convensions for method names and interfaces. In addition to that a few classes/interfaces had to be renamed due to the iXML extensions for the DTD representation (e.g. Notation and Entity have been renamed to NotationDecl and EntityDecl). In some instances the W3C DOM does provide direct access to attributes of classes, where iXML provides a set and get method explicitly. In order to get a consistent naming scheme, some W3C method names have been prefix with "get".
2) iXML provides extensions to the W3C DOM classes and interfaces to represent the document type definition (DTD) as well. The W3C has not yet released a specification for the DOM representation of the DTD. As soon as this document becomes publicly available, the iXML interfaces might adopt the suggested methods.
3) In addition to NodeList and NamedNodeMap, iXML used the additional class NodeCollection. The NodeList class as defined in the W3C suggest an implementation as an iterator, whereas in a lot of cases a container or collection would be much more appropriate. For performance reasons, the iXML implementation of the DOM separates these two concepts clearly: NodeLists act as iterators over lists of consecutive nodes in the DOM tree (e.g. the child nodes of a node). NodeCollections in contrast are a set of references to arbitrary nodes, which don't have to be in a particular relation to each other. NodeCollections are - as the name suggests - used to collect references to otherwise unrelated nodes.
Other than that the iXML library follows the W3C DOM Level 1 specification very closely and does not deliberately introduce differences in the implementation. Everybody familiar with the W3C DOM will immediately recognize the similarities with the iXML implementation.
4.2 Event
Definition
The iXML library implements two modes of operation for the XML parser:
1) A mode in which the parser creates a DOM (document type definition) representation of the XML document and
2) A mode in which the parser signals the occurrence of certain logical elements in the XML document in form of so-called events as they are encountered during the parsing process. A logical element is e.g. an attribute, an element, a notation or entity declaration, a processing instruction etc.
Use
The Event package contains all definitions necessary for the eventing mode approach to XML parsing with the iXML library.
Each event can be distinguished at the two points in terms of the time:
(1) the time at which it can be told, what logical element (or node) has been found by the parser - the pre-event
(2) the time at which the complete logical element has been parsed - the post-event.
In addition to the distinction between pre- and post-events, there are different events for different logical elements. For each logical element or node class defined by the DOM, there is one matching event with two points in time at which it can occur. To make this concept more clear, here an example: the DOM defines a node type Attribute, which represents element attributes in an XML document. For this node type exists a matching event type with two points in time (pre and post), i.e. there is an event AttributePre and AttributePost. AttributePre is signaled, if the parser encounters an attribute in the XML input stream (i.e. when a name has been parsed in a start tag and the following character is an equal sign (=)). The event AttributePost is signaled, when the parser has finished parsing the attribute, i.e. the attribute's value has been parsed. The same concept applies to all logical elements - or node types - in a similar way.
Structure
The package Event contains the following interfaces: if_ixml_event.
Integration
An event is implemented as an interface - the iXMLEvent interface. Signaling an event to the caller means returning a reference to an iXMLEvent interface.
To find out what type of event an iXMLEvent interface reference represents, call iXMLEvent::getType(). This call will return the event's type.
For each iXMLEvent interface instance exists a corresponding DOM node instance. This DOM node stores the information that is relevant for the particular event (e.g. the name and value of an attribute). To get to the DOM node (or DOM interface to be more precise) associated with an Event interface, you can call the Event::getNode() method.
In order to avoid this step of indirection when accessing event related information (e.g. the element's name) a few convenience methods have been added to the Event interface: getName() to retrieve the logical element's name, getValue() to retrieve the logical element's value, getAttributes() to retrieve the associated attributes (if defined) and getParent() to find a node's parent node.
You have to be aware though that these methods are not always meaningful for a particular event (or associated node); e.g. calling getAttributes() on a CommentPre/Post event is undefined since comment nodes don't have attributes, or calling getValue() on a TextPre event is undefined since at this point in time the value of the Text node has not yet been parsed. To find out what information is available for each event and point in time, please refer to the Event::EventTypes documentation.
Calling update methods (e.g. setAttribute()) on the node associated with an event might lead to undesired and undefined results; it is therefore strongly discouraged! Calling read-only methods (e.g. Text::isWSOnly()) is allowed of course.
4.3 Stream
Definition
The stream package contains all definitions to handle XML stream I/O.
Input and output of XML documents is handled in terms of XML streams in the iXML library. Even though the streams used in the iXML library have a lot in common with the C++ standard streams, an independent implementation - not derived from the C++ standard streams - has been chosen for mainly two reasons:
(1) It should be very easy to add a new stream type (e.g. for internal/RFC tables) without having to implement the full C++ stream interface.
(2) There is only little used from the functionality of the C++ standard streams in the iXML implementation whereas on the other side a lot of other, missing functionality has to be implemented. In short: there are more differences than commonalities.
The stream package defines three major concepts: an XML stream factory (iXMLStreamFactory), an XML input stream (iXMLIStream) and an XML output stream (iXMLOStream).
Use
The stream factory is used to create XML input and output streams. Since different input sources and output destinations have to be considered, iXMLIStream and iXMLOStream interfaces will be implemented by different classes, each one capable of serving a particular source or destination. Each of these classes is registered or can be registered with the XML stream factory (prototype pattern) and can be queried about its capabilities. This allows the stream factory to create streams for all supported sources or destinations on request in an client application independent way. One of the advantages of this approach is the XML parsers capability to automatically resolve external entity references, as long as there is a stream type registered with the factory, that can handle the protocol defined by the URL.
Structure
The package Stream contains the following interfaces: (1)if_ixml_istream, (2)if_ixml_ostream, (3) if_ixml_stream, (4) if_ixml_stream_factory.
5.0 XML file processing from the Application Server
The following steps will be used to process the XML file from the application server.
5.1 Read the XML file in the binary mode
Read the XML file from the application server in the binary mode and store in the internal table in the hexadecimal format.
* Open the XML file of the application server.
OPEN DATASET g_physical_path FOR INPUT IN BINARY MODE.
IF sy-subrc = 0.
* Read the XML file of the application server.
DO.
READ DATASET g_physical_path INTO l_xml_line-data.
“The line is of type ‘X’ and of length 256.
IF sy-subrc = 0.
APPEND l_xml_line TO g_t_xml_table.
CLEAR:l_xml_line.
* Count the number of entries.
g_xml_table_size = g_xml_table_size + 1.
ELSE.
EXIT.
ENDIF.
ENDDO.
CLOSE DATASET g_physical_path.
- ENDIF.
* Calculate the File Size.
g_xml_table_size = g_xml_table_size * 256.
The file size will be used in the later part of the processing.
5.2 Create the document to hold the DOM tree
Create a DOM representation of an XML document as follows:
You will need the cl_ixml main factory. In addition to that we will need a cl_ixml_stream_factory object to create the input stream. Then place the DOM-tree into the cl_ixml_document object.
* Create the main iXML factory.
g_ixml = cl_ixml=>create( ). “g_ixml TYPE REF TO if_ixml.
* Create a stream factory.
g_streamfactory = g_ixml->create_stream_factory( ).
“g_streamfactory TYPE REF TO if_ixml_stream_factory
* Wrap the table containing the file into a stream.
g_istream = g_streamfactory->create_istream_itable( table = g_t_xml_table
size = g_xml_table_size ).
“g_istream TYPE REF TO if_ixml_istream
* Create the document.
g_document = g_ixml->create_document( ). “g_document TYPE REF TO if_ixml_document
5.3 Parse the XML document in the DOM-based approach
The XML document can be parsed in two ways: either creating a DOM representation of the XML document, or by the parser firing events as logical elements are encountered in a run through an XML document. The DOM-based parsing is discussed here.
In order to parse the document, you will also need a cl_ixml_parser object. The Parser can be obtained from the iXML factory by the following call.
* Create the Parser.
g_parser = g_ixml->create_parser( stream_factory = g_streamfactory
istream = g_istream
document = g_document ).
“g_parser type ref to if_ixml_parser
A cl_ixml_parser object is a "use once and throw away" object. That means that you create a new cl_ixml_parser, call it to parse one document and then throw the parser object away. There is no way of reusing the parser for an additional XML document.
Our goal was to parse an XML document into a DOM tree, so here we go now:
g_parser->parse( ).
If there haven't been any errors in the XML document we just parsed from the input stream provided, then the document object we passed to the factory method of the parser will now contain the DOM representation we were looking for.
Since errors usually happen, we should try to play this game a little bit safer, do some error checking and print out diagnostic messages:
DATA: parseerror TYPE REF TO if_ixml_parse_error,
str TYPE string,
i TYPE i,
count TYPE i,
index TYPE i.
* Parse the stream.
IF g_parser->parse( ) NE 0.
IF g_parser->num_errors( ) NE 0.
count = g_parser->num_errors( ).
WRITE: count, ' parse errors have occured:'.
index = 0.
WHILE index < count.
parseerror = g_parser->get_error( index = index ).
i = parseerror->get_line( ).
WRITE: 'line: ', i.
i = parseerror->get_column( ).
WRITE: 'column: ', i.
str = parseerror->get_reason( ).
WRITE: str.
index = index + 1.
ENDWHILE.
ENDIF.
- ENDIF.
Now the cl_ixml_document object can be used for the further processing.
5.4 Traverse the DOM tree with iterators
Iterators have the important features when used with the DOM:
1. They provide a consistent interface for access the different data structures
2.They allow to hide the internal workings of the data structures they work on
3. By doing so they allow to implement iterator-internal optimizations to utilize specific aspects of the data structure over which they iterate.
To traverse the complete DOM-tree in an DFS (Depth-First-Search) traversal, you can use an iterator created on an cl_ixml_document object. If you want to iterate over any sub-tree of the document, simply create the iterator on the root node of the sub-tree over which you want to iterate.
Once you have obtained the iterator, you can repeatedly call the if_ixml_node_iterator::get_next() method until null is returned, which signals the iteration has come to an end.
DATA: iterator TYPE ref to if_ixml_node_iterator,
node TYPE ref to if_ixml_node.
iterator = document->create_iterator( ).
node = iterator->get_next( ).
while not node is initial.
* Process the node values.
...
node = iterator->get_next( ).
- endwhile.
Sometimes you want to restrict the depth of the traversal. You can do so by passing the factory method of an iterator an additional depth parameter as in the following example, which will only iterate the level immediately below the cl_ixml_element instance.
DATA: iterator typeref to if_ixml_node_iterator.
iterator = element->create_iterator( 1 ).
6.0 Example
The requirement is taken from the object ‘AP2PLE-AP-TDD-E-10’ of the ‘Johnson & Johnson, Lynx Cornerstone’ project. The requirement was to upload a XML file from the application server, which contains the bank records. Then the bank records got posted in SAP using the BAPI ‘BAPI_BANK_CREATE’ and the bank records got changed in SAP using the BAPI ‘BAPI_BANK_CHANGE’.
The code dump is attached here to populate the bank records into the internal table from the XML file of the application server. Then the internal table can be used as per the requirement.
The test file with the bank records is also attached here and the content of the internal table is shown in the report format for the reference.