Choosing the Right Storage for Application Data

What types of data you are dealing with? We will trycompressed and encrypted on-the-fly. Structured
to roughly classify them and divide into the followingstorages may be build with anti-tempering functions, or,
five categories. Naturally, this is not a comprehensiveshould the requirements be present, - provide an easy
classification, but it will help us to understand theway for data removal or replacement. As always,
options and approaches we have to keep in mind.such storages can be easily copied or moved without
need for taking special care to preserve data integrity.
1. Homogeneous data arrays containing elements of4. ZIP archives are rarely used for interim data
the same typestorage. Fast (as a rule) interim data turnaround makes
2. Multimedia - audio, video and graphics filesthem impractical in most situations. An encrypted
3. Temporary data for internal use (logs of variousarchive may be suitable for this type of data only
types, caches)when snapshots are to be stored for long time and
4. Streams of calculated data of various types (e.g.need to be protected from loss or tempering.
recorded video stream or massive computation5. Remote and distributed storages are used for
results)interim data streams basically due to space
5. Documents (simple or compound)considerations. They don't provide speed or easy
The ways for storing such a data are as follows.management and backup, often required for interim
data.
1. Files in file systemData streams
2. DatabasesLarge volumes of quickly generated data, such as
3. Structured storagesoutput data feeds, need to be stored efficiently.
4. Archives (as a specific form of structured storage)Regular file systems significantly limit file sizes,
5. Remote (distributed, cloud) storagesnecessitating design of specific handlers for data
Let us now discuss which storage mechanism will beoverflow at an expense of lost integrity and reliability.
the best suited for the types of data mentioned above.Since data of this type often contain privileged or
Homogeneous data arrayssensitive materials, fast on-the-fly encryption is a must.
Homogeneous data arrays contain elements of theThe same applies to efficiency of data compressions,
same type. Examples of a homogeneous data arraysince, obviously, sizes of these data feeds are usually
may be a simple table, temperature data over time orvery significant.
last year stock values.
1. Regular files are not well suited for this type of data.
1. For homogeneous data arrays, regular files do notQuickly increasing file sizes require creating many
provide possibility for convenient and fast search. Youintermediate caching files that need to be copied back.
have to create, maintain and constantly update specialEven in case of careful designs, an amount of
indexing files. Modification of the data structure ismemory or media consumed tends to grow in
almost impossible. Metainformation is limited. There isgeometrical progression. Handling, indexing, searching
no built-in run-time compression or encryption of data.and encrypting data streams stored in regular files
2. Relational databases are well suited forbecome a nightmare.
homogeneous data. They comprise a set of2. Relational databases pose almost exactly the same
predefined records with rigid internal format. Mainproblems as regular files. Add to that inefficiency of
advantage of relational databases is an ability to locatedatabase updates, rigid structure, and it can be seen
data quickly according to specified criterion, as well asthat relational databases are among least suitable
transactional support of data integrity. Their significantstorage solution for streams of data.
shortcoming is that relational databases will not work3. Repositories may be used for data streams storage
well for large-size data of variable length (BLOB fieldswhen requirements are present for security and low
are usually stored separately from the rest of thevulnerability at the expense of easy searches and fast
record). Moreover, keeping data in relational databasesretrievals. Data can be compressed, but fast and
requires: a) use of specific DBMS, which limits severelyefficient searches become almost impossible.
portability of the data and of the application itself, b)4. Structured storages have advantages of security,
pre-planning of database structure, includingintegrity and efficient searches. Data storages are
interrelational links and indexing policy, c) researchingautonomous single-file units, which can be easily
details of peak loads is required for efficient databasetransferred or copied. Access is easy and efficient.
development, which also may be a serious overhead.Data streams kept in them can be encrypted and
3. Structured storages are somewhat analogous to aprotected from tampering. Presence of thin partitioning
file system, i.e. storages are a specific set ofprovides another convenience for storage users: the
enveloped named streams (files). Such storage can bestorage will automatically grow with increase of data
stored at any location, i.e. in a single file on a disk, in asize.
database record, or even in RAM. The main5. Remote and distributed storages are well suited for
advantage of this approach is that it allows efficientstreaming data and are commonly used in projects
adding or deleting data in an existing storage, providesgenerating vast amount of data. Since such data are
the effective manipulation of data of various sizesfrequently analyzed by distributed system or clusters,
(from small to huge). The storages represent separatethe use of remote storages is the best fit. This type of
units (files) and therefore can be easily relocated,storages provides easy, but well controlled data
copied, duplicated, backed up. There is no need toaccess and guarantee against illegal tampering or
track all files generated by an application. Moreover,removal.
journal keeping makes it possible to restore contentDocuments
completely or partially, thus eliminating accidents orDocuments are rigidly structured data type specifically
failures. The disadvantage may be relatively slowerdesigned to store human-readable textual or graphical
search inside these huge data arrays.information. Documents are one of the most common
4. ZIP archives, as a specific form of the structuredforms of information, produced and used in business
storage, can be used for storing homogenous dataand personal activities.
arrays, but only in case when the most of access is
read-only. Standardized nature of ZIP format makes it1. Files are the most common way of storage for
easy to use, especially in cross-platform applications,documents. But when a concurrent access to
but this format is not suitable for the data to bedocuments is required, use of regular files is
modified after packing, so adding and deleting of datacomplicated. Since all the compound document
is a time-consuming operation.structure is stored sequentially in a flat file, any
5. Remote and distributed storages are the next leveldocument modifications require creation of a set of
of storage in which actual data location and datatemporary files, which contain a subset of document's
access are provided by specific layer used forelements to be edited. In addition, deletion of any
encapsulating of access mechanics. In such storageselements from the document will not reduce file size
data can actually be stored in databases or beautomatically. To optimize the size, an additional
distributed among different file systems, but the actualdocument copy must be created and saved into yet
storage organization does not matter for an end-user.another file. After edit operation is completed, the
The user observes only a set of objects accessedoriginal file must be deleted. If this is to be done
through an API, or, as a variant, through file systemautomatically by the editing software, the developer of
calls. Good example is cloud storages. These types ofthis software has got another task to remember
data storages are to be used in large softwareabout.
complexes. Among other advantages one can2. Relational databases will work well for some types
mention unified data access without a need to thinkof documents and can provide fast and efficient
about actual ways how data are stored. Itsindexing, search and retrieval - if there is an on-the-fly
disadvantages - they cannot be efficiently managedconversion to plain text is available. Databases suffer
and controlled, and backup or migration of data isfrom the same shortcomings applicable to storage of
complicated.homogeneous data arrays. Keeping data in relational
Audio, video and graphic filesdatabases requires a) use of a specific DBMS, b)
Storing a single (or several) multimedia files is simple.pre-planning of database structure, including
Complexities appear when you need to maintain ainterelational links and indexing policy c) researching
large number of files and want to perform a searchdetails of peak loads is required for efficient database
across the multimedia collection.development, which also may be a serious overhead.
3. Structured customizable storages are among the
1. Only very simple and sparse multimedia files can bebest choice when it comes to corporate use of
stored as regular files. Even for an average homedocuments. The main advantage of structured
collection, simple file-based multimedia data storagestorages is that they allow efficient adding or deleting
becomes unmanageable very quickly. This is mostlyof documents or their parts to existing storage,
due to size of these files, inability to handle anyprovides an effective document access restrictions
annotation, tags or metadata, and low speed ofetc. Complex documents, that contain embedded
copying or relocation.images or other multimedia, can be handled easier by
2. Relational databases are a dubious way of storingputting the text apart from the multimedia (doing this
audio, video or similar types of data. RDBMS are notwill reduce load/save time, make text search easier
well suited for keeping large BLOBs, especially when itetc). Moreover, journal keeping makes it possible to
comes to storing video files of big size. Also each typerestore content (completely or partially) after accidents
of data requires it's own table (due to different sets ofor failures. One more benefit is possibility to store
metadata that needs to be stored). On the other handmultiple editions or multiple alternative views of the
RDBMS can be handy as they offer powerful searchdata within one document. The disadvantage may be
capabilities, which is very suitable for read-onlyslower search, which should be implemented by using
collections.on-the-fly conversion to plain text.
3. Structured storages work perfectly well for storing4. ZIP files are used in some document formats such
of multimedia files when the storage supportsas Open Document Format to store document data.
metadata and fast search through them. If this searchMost of the advantages, described above for
is not supported, structured storage becomes a variantstructured storage, are applicable to ZIP file storage,
of the file system.but again, addition, modification and deletion of the
4. Remote and distributed storages are among theinformation are time-consuming operations and
best solutions when it comes to storing of video, musicsometimes require complete rewrite of the file. Also,
or similar data. Storage represents a single unit whereZIP file format doesn't allow you to attach metadata to
all elements of a multimedia or video game can bethe entries inside, and ZIP encryption capabilities are
safely stored. There is no risk of loosing a single butlimited (strong AES encryption is a recent addition to
important file. Searches are fast and efficient if thethe standard and it's not supported by many ZIP
storage supports tags and metadata.compression and decompression tools and libraries).
Temporary data5. Remote and distributed storages are becoming
Temporary data are generated by software on the flywidespread and popular. They allow easy collaboration
and usually have a validity term. Most of updates areduring document creation and use, and remote but
very frequent. In addition, such intermediate informationtightly controlled and secured access to them. Unlike
should stay easily accessible, integral, and, in manyhomogeneous data arrays, the document usually
cases, encrypted and secured. It is still possible to useconstitutes one object accessed and modified in its
regular files for these purposes. This approach willentirety, and this makes document retrieval and
result in high resource consumption, there is no reliablemanagement quite simple. The cons are the same as
way to control and enforce integrity of data and theirin previous paragraphs.
encryption functions should be implemented by yourSuggested solutions
software.A simple rule use the right tool for the right job is even
more important in the area of software design.
1. For a long time files have been used as a way ofIncorrect or under-thought data and information
interim data storage. They are quite suitable for storingstorage planning can lead to disastrous results.
low-priority unsecured temporary data of insignificant
size. Meanwhile, modern legislations of several1. For use of files you are faced with choice of file
countries dictate more careful and responsivesystems.
treatment of interim data. As a result, regular file2. There is a wide choice of commercial database
system becomes less suitable when issue of datasystems: Oracle DB2, etc. or open source solutions.
security, vulnerability, and protection from tampering3. Repositories can be created by commercial and
becomes paramount.public archiving solutions, such as Zip, etc.
2. Relational databases are not usually used for interim4. Examples of Structured storages include OLE
data storage due to absence (as a rule) of clearlyStructured Storage by Microsoft (offers basic storage
defined structure and interrelated nature of elements.capabilities, i.e. no encryption, compression or search
Low speed of upgrade, issues of compression andare available) or Solid File System by EldoS
security add to this unsuitability. At the same time, aCorporation.
relational database can contain interim data related to5. Remote storages are offered as can be designed
the database itself and its operation. Also a databasewith Solid File System OS Edition and Callback File
can be used for some kind of data cache or forSystem by EldoS Corporation, FUSE for Unix-based
storing activity logs (journal files). RDBMS doesn't suitsystems etc.
well, if the data are required to be stored for a longIn any case, only the project developer knows exact
term (years) and to be signed or encrypted.requirements and understands all the technologies, their
3. Structured storages may be considered as anfeatures and restrictions, and can make, therefore, an
optimal solution when a large volume of interim dataadequate choice of tools for successful
need to be stored, accessed, indexed and searched,implementation of his software project.