External storage attachments recommended

Introduction

In this blog we will look at how data management is affected by database choice and in particular what consequences this has for attachments in SAP. One of the issues that often remains underexposed is the way attachments and generated documents are stored in SAP. It is not uncommon for attachment storage to take up a significant portion of the database.

Many SAP users link local PC files to SAP documents. Via the "Generic object services" button, available in almost all standard SAP transactions, for example PDF, Word or Excel files can be made accessible via the underlying SAP document. In practice, for example, quotation files are linked to purchase orders or incoming purchase orders are linked to sales orders. In addition, many documents, especially PDF documents, are generated by SAP. Think for example of outgoing purchase orders. All these documents end up in the SAP database.

In this blog we want to make clear that storage of these attachments is more efficient and much cheaper by moving the storage of the database to an external content server. Especially when a HANA database is introduced in the SAP landscape, this is from an architectural point of view more of a requirement than an option. We will also briefly discuss how this move from database to content server can be technically accomplished. 

Research among some of our customers shows that in some cases the table where attachments and generated documents are stored (table SOFFCONT1) can take up to 25% of the total database. When looking at the cost of storage in an SAP HANA environment, a reduction of the database can realize significant cost savings. But also when traditional databases are used the benefits can be significant. Think of backup procedures or system copies.

Data as a new currency

Due to the continuous developments that the ICT sector has experienced in recent decades, not only have the possibilities in terms of IT applications increased exponentially, but also the possibilities for storing data. As a result, companies now have access to large amounts of data and more and more opportunities to analyze and process this data. A frequently used term is "Data isthe new currency" . That data is valuable, seems to be a foregone conclusion to everyone. In order to optimize and accelerate business processes, not only the data itself but also rapid access to and processing of the same data is increasingly a requirement. Real-time data analysis is high on the priority list of most managers, but this is often obstructed by the current hardware and software within companies.

Traditional database versus HANA database

The limiting factor for fast and real-time data processing are the so-called traditional databases. A traditional database can only handle one work process at a time for a specific application. For each application that is developed, the data from the database is configured and optimized for that application. To do this, data is continuously moved and duplicated to meet the specific needs of each application. The more applications are used within a company, the harder it becomes to give each application quick access to the data on the database. A traditional solution for this is a data warehouse. The data is aggregated and consolidated in such a way that applications can quickly report on it. However, this has the consequence that the data is no longer accessible in real-time and that compromises are made with regard to the level of detail of the data.

To overcome the shortcomings of traditional databases, SAP has developed a so-called in-memory database where data is directly accessible, the SAP HANA database. In addition to quick access to the in-memory data, there is also a faster processing of, for example, queries on the data. Data does not have to be duplicated in order to be processed; this can take place directly on the database. Among other things, the performance optimization that is achieved with the introduction of a HANA database, removes the need to consolidate and aggregate data towards for example data warehouses. Data is directly and quickly accessible without compromising the level of detail of the data.

The SAP HANA database has been available in the IT landscape for some time (since 2010) and can therefore be used with for example a SAP ECC system. However, where a SAP HANA database is optional for a SAP ECC system, it is a requirement for an S/4HANA system. More and more companies are therefore switching to a SAP HANA database.

Cost HANA database

Where the introduction of an SAP HANA database brings possibilities for the processing of data, there are also some concerns. Storing data in-memory provides significant benefits in the performance of applications, but is a more expensive form of data storage compared to traditional databases. High performance from a traditional database is achieved through fast storage and processing (CPU) speed of servers. However, in SAP HANA the size of the memory itself (in-memory) becomes the most important resource.

Although storing data is becoming cheaper all the time, this cannot compensate for the continuous growth of the data itself. Furthermore, storing all data in memory is certainly not necessary; why would you make data directly accessible if it is never requested? Estimates are that on average 85% of the data in a database is so-called "cold data" and is rarely or never "touched". This leaves only 15% "hot data", which is estimated to account for 90% of the interactions with the database.

To properly deal with hot and cold data, one can choose to only store part of the data (hot) as in-memory and the other data (cold) on other databases. This cold data is still accessible, but will only be put in-memory when requested by a specific application.

One of the types of data that can be considered "cold" are attachments within a system. Examples include outgoing purchase orders, outgoing invoices, incoming invoices, emails, notes, etc. As an example we take the generated PDF of a purchase order that is sent to a supplier. This PDF attachment is created once, then printed or e-mailed to the supplier and linked to the purchase order in SAP. When everything is delivered by the supplier and paid for, the purchasing process is basically finished. Many of these created PDF documents will never or rarely be opened. So why load these documents into the memory at all times? It seems like a nonsense and in fact it is!

Storage of attachments

Traditionally, attachments and generated documents in SAP are stored in the underlying database; this is kind of the default setting in SAP. But even a traditional database is in principle not meant to store so called binary documents. What we often see with our customers is that this was once an initial setup of the system and unintentionally remains so for years. The recommended system setup for this is that so-called "flat" data is stored on the database and documents or "binary" data is stored on a content server (e.g. the SAP Content Server). A content server has the specific purpose of storing documents and then quickly retrieving them when required. It is also more efficient and cheaper to store these types of documents in a content server than in a database; think of backup procedures or system copies.

A content server can be deployed by means of a SAP Content Server. This is part of the regular SAP licence and can be used free of charge. External content servers can also be chosen, such as for example OpenText Archive. If, for example, an external content server is already in use for other processes, then it can be connected to this.

Migrate to HANA

Where in a SAP landscape with a traditional database it was still a recommended option to store documents on a content server, the introduction of S/4HANA and the associated SAP HANA database has made the need for switching from a SAP database to a (SAP) Content Server higher. When migrating to a SAP HANA database with or without an S/4HANA system, our advice is to definitely set up the content server for the document flow in SAP. Without a proper installation of a content server, all documents and attachments are by default stored on the underlying HANA database; this is unnecessary. If you are migrating to an S/4HANA system, it is our advice to first migrate all documents to a content server. This way there is no need for a database migration including all existing attachments; they are already on the content server.

Example scenario

In transaction ME21N or ME22N (purchase order create or modify), attachments can be linked via the "Services for objects" button.

When the attachment is linked to the purchase order , this document will be stored in the SAP database. The desire is to link PC documents such as Excel, PDF, et cetera to a purchase order without storing these attachments in the SAP database. These documents must be stored on an external storage medium.

To analyze if it makes sense to store attachments we will first look at the database size of the table SOFFCONT1. In the example below we see that this table has a size of about 1,074,542 MB. Compared to a total database of 5.661 GB this is about 20%! So in this example significant savings are possible.

We see that there are significant savings possible here. In the next section we will explain how.

Storing attachments on external storage medium

Through configuration it is possible to save attachments to an external data source by default, e.g. a SAP Content Server. The SAP Content Server is a stand-alone SAP component on which external documents can be stored. Of course, the files can be retrieved directly via SAP.

Documents are stored in a MaxDB database (part of the installation). SAP applications can access Content Server to upload, download or display documents using the Archivelink protocol.

As shown in the figure below, attachments will be maintained via KPro in the content repository SOFFDB (maintained via transaction OAC0). This refers to database table SOFFCONT1.

To store documents in an external content repository the following procedure must be followed: 

1. Create a content repository linked to an SAP Content Server.

In the Implementation Guide (IMG) (transaction SPRO) choose SAP NetWeaver → Application Server → Basic Services → ArchiveLink → Basic Customizing → Define Content Repositories.

2. Next, create a content repository for the storage category HTTP content server. This content repository contains the details about the connection to the SAP Content Server. 

So now there are 2 categories available: SOFFDB which is linked to the SAP Database, and SOFFHTTP which is linked to an SAP Content Server.

 
 

3. Assignment SOFFPHIO class to content category SOFFHTTP

  • Choose transaction SKPR08.

  • The field Previous Category SOFFDB contains the class SOFFPHIO. Enter the value SOFFHTTP in the field New Category.

  • Choose "Save".

If value SOFFPHIO is not available, it must be made available via transaction SE16N. To do this, change the field value CAT MAINT to X in table SDOKPHCL, line SOFFPHIO.

From now on, all attachments are stored in the SAP Content Server. One can check this by looking at the size of table SOFFCONT1 before and after attaching an attachment to for example a purchase order.

Migration

After the standard behaviour when saving documents has changed, it is important to also migrate the documents already present in the SAP database to the external data carrier. SAP has made a number of tools available for this purpose. This makes it easy to move the already linked attachments. 

SAP reports RSIRPIRL or RSGOS_RELOCATE_ATTA can be used to physically move the existing documents from SOFFCONT1 to an external content server.

These SAP programs for attachment migration are described in SAP Note 1634908 and Note 2459712

 
 

In the screenshot above, all attachments stored in the SAP Database from a given period are migrated to a SAP Content server. In this case in test mode. End users will not notice any difference after the migration, nothing will change for them. However, the SAP database will decrease significantly in size. As previously reported, in some cases by more than 20%!

Please note that this reduction in table size will not automatically result in a smaller database size. Your database administrator can easily help you with the reorganization of the database.

For questions or additional information on this topic, please contact Sander van der Wijngaart.