How To: Write an XML packet that conforms to XMP into a PDF file without using Acrobat or PDF Library APIs
Issue:
The Acrobat metadata SDK samples work based on plug in functionality to Acrobat i.e. they call an API when Acrobat runs. I want to write to the PDF file from my own C++ program, which is only linked with the XMP toolkit as opposed to a plug-in.
What are the data structures that need to be modified to stick the XMP packet into a PDF file. Can I just append it to the end of the file?
Solution:
An XML packet created with XMP tool kit, which conforms to XMP, can be added to the document level by attaching to the Catalog dictionary. It is also possible to add object level metadata to PDF component represented as a dictionary or stream. In either case, a reserved key "Metadata" associated with the dictionaries indirectly references the XMP metadata streams.
However, in reality, writing XML packet into PDF without using Acrobat APIs, is complicated and requires a good understanding of PDF. The client must handle it cautiously:
- construct an XMP conformant XML packet as the document metadata, which is in a writable XML Packet with enough padding provided for the in-place edits and expansion;
- synchronize the information in the metadata stream with that in the document information dictionary;
- write the XML packet to the right place in PDF;
- interpret the multiple versions of XML packet correctly.
As recommended in the XMP framework specification, applications should allocate 50% of the XML data size as padding, with a minimum of 4 KB. The purpose is to enable in-place edits and expansion of the embedded XML if the value of the "end" attribute is set as "w" in the packet trailer, <?xpacket end='w'?>.
In addition, applications that create PDF 1.4 documents (such as Acrobat 5.0) should include the metadata for a document in the document information dictionary as well as in the document’s metadata stream. Applications that support PDF 1.4 should check for the existence of a metadata stream and synchronize the information in it with that in the document information dictionary (see Implementation Note 104 on p. 804 of "PDF Reference: Third Edition, version 1.4").
Moreover, in Acrobat 5.0 (PDF1.4), the document level metadata is constructed automatically from the document information dictionary. However, due to the incremental update mechanism of PDF, it is possible to end up with more than one copy of XML packet in PDF. Whenever the PDF is "saved", a new copy of XMP metadata stream will be appended to the cross-reference section, although there may be only one or two properties are changed (i.e., ModDate and MetadataDate).
The following PDF sample illustrates the complications. When the PDF was first created, the metadata stream is defined in the object 19 associating with the /Metadata key in the /Catalog dictionary. The value of <xap:MetadataDate> property is 2002-02-11T13:43:24-08:00. When the PDF was saved later, a new XMP metadata packet was appended to the cross-reference section, which associated with the updated entry (/Metadata 21 0 R) in the catalog dictionary. The value of <xap:MetadataDate> property is 2002-02-11T13:46:52-08:00.
%PDF-1.4
...
7 0 obj
<<
/Type /Catalog
/Pages 3 0 R
/Metadata 19 0 R
/PageLabels 2 0 R
>>
endobj
...
19 0 obj
<< /Type /Metadata /Subtype /XML /Length 1338 >>
stream
...
<xap:MetadataDate>2002-02-11T13:43:24-08:00</xap:MetadataDate>
...
endstream
endobj
...
startxref
...
7 0 obj
<<
/Type /Catalog
/Pages 3 0 R
/Metadata 21 0 R
/PageLabels 2 0 R
>>
endobj
...
21 0 obj
<< /Type /Metadata /Subtype /XML /Length 1338 >>
stream
...
<xap:MetadataDate>2002-02-11T13:46:52-08:00</xap:MetadataDate>
...
endstream
endobj
...
xref
...
%%EOF
Because of the above complications, Adobe can only support use of PDF Library and the Acrobat SDK plug-in for adding new XMP packets into PDF.