Archiving in the Federal Archive

Data for Eternity


by Heide Witte



The [German] Federal Archive preserves more than 300 kilometers of files and photos. Increasingly many digitized documents are also being added to its collection.


”We archive all records that are no longer continually needed by other federal agencies,” says IT director Wilhelm Valder, describing the task entrusted to the Federal Archive. “We securely store these records, index them, and make them usable to the public.” Original versions of digitized documents have also been stored for more than twenty years. The database currently consists of approximately 9.2 million files. These are divided among 203 digitized archival objects, i.e. data complexes whose contents belong together. These include, for example, census data gathered in the GDR between 1971 and 1981, the registration of forced laborers from the National Socialist era and the reparations they subsequently received, or employment statistics of the Federal Labor Office. “The oldest digitized data come from the 1970s and can still be interpreted today,” Valder says. The administration of paper-based records is expected to continue to decrease, while government agencies will increasingly operate with specialized electronic systems and electronic case processing, so the Federal Archive must be prepared to receive progressively greater volumes of data. But the infrastructure that had existed thus far wasn’t designed to accomplish this task.

The Solution: “ARCHIVEMANAGER” from Grau Data

In the context of the “Digital Archive” pilot project, electronic documents have been archived since the summer of last year on a storage system from Hewlett-Packard (HP). As archiving software, the Federal Archive uses “ARCHIVEMANAGER” from Grau Data AG, which is headquartered in Schwäbisch Gmünd. This solution is able to store data in the petabyte range (1 petabyte = 1,000 terabytes) and can independently archive this information in a central storage system. For example, the data can be copied in parallel and stored from one hard drive to another and/or on magnetic tapes. The software supports both the CIFS (Common Internet File System) network protocol and the NFS (Network File System), and it is also equipped with an open file-system interface. Compatible with every common third-party application, this interface is used wherever large volumes of data are generated and need to be stored – e.g. in video productions and hospitals, at Bavarian Radio, in police organizations, and in the French and Dutch ministries of defense.

IT landscape with Linux

The following criteria were decisive for IT director Valder for the choice of the new solution: it must be capable of storing data for an unlimited time, secured against revision, and in accord with diverse compliance regulations such as GoBS (fundamentals of orderly computer-assisted bookkeeping systems) and GDPdU (fundamentals about data access and the verifiability of digital documents / SOX). Furthermore, the solution should be capable of economically and inexpensively archiving extremely large volumes of data.

ARCHIVEMANAGER as the central archive storage system fits this concept: “This solution supports the Linux operating system which we primarily use,” Valder explains, “and it assures that the right items of information are available at the right time and in the right place.”

Selection Criteria

  • Unlimited, revision-secured storage;
  • Fulfillment of compliance regulations such as GDPdU and GoBS;
  • Administration of large volumes of data;
  • Linux support;
  • Monetary savings.
Realizing Potential Savings 

The Archivemanager first writes the data onto a hard disk (the performance disk) and prepares the data for archiving. After a predefined interval has elapsed and according to individually predetermined guidelines, the items of information are then stored on various media – for example, on WORM (Write Once Read Multiple) media, which are safeguarded against subsequent revision because they cannot be overwritten. In Valder’s opinion, economizing is the primary reason in favor of archiving on tapes. But Archivemanager is also advantageous because it can write onto several media simultaneously. Data kept on storage media can be read at any desired time, and the program works independently of hardware and software. Finally, it also improves the quality of servicing: automated file storage makes it easy to administrate multilevel storage solutions in the sense of hierarchical storage management and facilitates the adaptation of these solutions to fulfill specific business requirements. This reduces administrative labor and lessens the workload for IT personnel.

Valder also cites savings achieved outside the archiving project per se: “We sometimes receive digitized collections stored on disks which are extremely rarely accessed. That’s very expensive.” Archivemanager can help by migrating these collections onto significantly less costly tapes. Millions of photos in TIF format (Tagged Image Format), which were stored on costly disks in a picture archive, could likewise be migrated to tapes, thus freeing additional space. “Now, only pictures that are accessed daily are still stored on fast disks.”