| What types of data you are dealing with? We will try | | | | compressed and encrypted on-the-fly. Structured |
| to roughly classify them and divide into the following | | | | storages may be build with anti-tempering functions, or, |
| five categories. Naturally, this is not a comprehensive | | | | should the requirements be present, - provide an easy |
| classification, but it will help us to understand the | | | | way for data removal or replacement. As always, |
| options and approaches we have to keep in mind. | | | | such storages can be easily copied or moved without |
| | | | need for taking special care to preserve data integrity. |
| 1. Homogeneous data arrays containing elements of | | | | 4. ZIP archives are rarely used for interim data |
| the same type | | | | storage. Fast (as a rule) interim data turnaround makes |
| 2. Multimedia - audio, video and graphics files | | | | them impractical in most situations. An encrypted |
| 3. Temporary data for internal use (logs of various | | | | archive may be suitable for this type of data only |
| types, caches) | | | | when snapshots are to be stored for long time and |
| 4. Streams of calculated data of various types (e.g. | | | | need to be protected from loss or tempering. |
| recorded video stream or massive computation | | | | 5. Remote and distributed storages are used for |
| results) | | | | interim data streams basically due to space |
| 5. Documents (simple or compound) | | | | considerations. They don't provide speed or easy |
| The ways for storing such a data are as follows. | | | | management and backup, often required for interim |
| | | | data. |
| 1. Files in file system | | | | Data streams |
| 2. Databases | | | | Large volumes of quickly generated data, such as |
| 3. Structured storages | | | | output data feeds, need to be stored efficiently. |
| 4. Archives (as a specific form of structured storage) | | | | Regular file systems significantly limit file sizes, |
| 5. Remote (distributed, cloud) storages | | | | necessitating design of specific handlers for data |
| Let us now discuss which storage mechanism will be | | | | overflow at an expense of lost integrity and reliability. |
| the best suited for the types of data mentioned above. | | | | Since data of this type often contain privileged or |
| Homogeneous data arrays | | | | sensitive materials, fast on-the-fly encryption is a must. |
| Homogeneous data arrays contain elements of the | | | | The same applies to efficiency of data compressions, |
| same type. Examples of a homogeneous data array | | | | since, obviously, sizes of these data feeds are usually |
| may be a simple table, temperature data over time or | | | | very significant. |
| last year stock values. | | | | |
| | | | 1. Regular files are not well suited for this type of data. |
| 1. For homogeneous data arrays, regular files do not | | | | Quickly increasing file sizes require creating many |
| provide possibility for convenient and fast search. You | | | | intermediate caching files that need to be copied back. |
| have to create, maintain and constantly update special | | | | Even in case of careful designs, an amount of |
| indexing files. Modification of the data structure is | | | | memory or media consumed tends to grow in |
| almost impossible. Metainformation is limited. There is | | | | geometrical progression. Handling, indexing, searching |
| no built-in run-time compression or encryption of data. | | | | and encrypting data streams stored in regular files |
| 2. Relational databases are well suited for | | | | become a nightmare. |
| homogeneous data. They comprise a set of | | | | 2. Relational databases pose almost exactly the same |
| predefined records with rigid internal format. Main | | | | problems as regular files. Add to that inefficiency of |
| advantage of relational databases is an ability to locate | | | | database updates, rigid structure, and it can be seen |
| data quickly according to specified criterion, as well as | | | | that relational databases are among least suitable |
| transactional support of data integrity. Their significant | | | | storage solution for streams of data. |
| shortcoming is that relational databases will not work | | | | 3. Repositories may be used for data streams storage |
| well for large-size data of variable length (BLOB fields | | | | when requirements are present for security and low |
| are usually stored separately from the rest of the | | | | vulnerability at the expense of easy searches and fast |
| record). Moreover, keeping data in relational databases | | | | retrievals. Data can be compressed, but fast and |
| requires: a) use of specific DBMS, which limits severely | | | | efficient searches become almost impossible. |
| portability of the data and of the application itself, b) | | | | 4. Structured storages have advantages of security, |
| pre-planning of database structure, including | | | | integrity and efficient searches. Data storages are |
| interrelational links and indexing policy, c) researching | | | | autonomous single-file units, which can be easily |
| details of peak loads is required for efficient database | | | | transferred or copied. Access is easy and efficient. |
| development, which also may be a serious overhead. | | | | Data streams kept in them can be encrypted and |
| 3. Structured storages are somewhat analogous to a | | | | protected from tampering. Presence of thin partitioning |
| file system, i.e. storages are a specific set of | | | | provides another convenience for storage users: the |
| enveloped named streams (files). Such storage can be | | | | storage will automatically grow with increase of data |
| stored at any location, i.e. in a single file on a disk, in a | | | | size. |
| database record, or even in RAM. The main | | | | 5. Remote and distributed storages are well suited for |
| advantage of this approach is that it allows efficient | | | | streaming data and are commonly used in projects |
| adding or deleting data in an existing storage, provides | | | | generating vast amount of data. Since such data are |
| the effective manipulation of data of various sizes | | | | frequently analyzed by distributed system or clusters, |
| (from small to huge). The storages represent separate | | | | the use of remote storages is the best fit. This type of |
| units (files) and therefore can be easily relocated, | | | | storages provides easy, but well controlled data |
| copied, duplicated, backed up. There is no need to | | | | access and guarantee against illegal tampering or |
| track all files generated by an application. Moreover, | | | | removal. |
| journal keeping makes it possible to restore content | | | | Documents |
| completely or partially, thus eliminating accidents or | | | | Documents are rigidly structured data type specifically |
| failures. The disadvantage may be relatively slower | | | | designed to store human-readable textual or graphical |
| search inside these huge data arrays. | | | | information. Documents are one of the most common |
| 4. ZIP archives, as a specific form of the structured | | | | forms of information, produced and used in business |
| storage, can be used for storing homogenous data | | | | and personal activities. |
| arrays, but only in case when the most of access is | | | | |
| read-only. Standardized nature of ZIP format makes it | | | | 1. Files are the most common way of storage for |
| easy to use, especially in cross-platform applications, | | | | documents. But when a concurrent access to |
| but this format is not suitable for the data to be | | | | documents is required, use of regular files is |
| modified after packing, so adding and deleting of data | | | | complicated. Since all the compound document |
| is a time-consuming operation. | | | | structure is stored sequentially in a flat file, any |
| 5. Remote and distributed storages are the next level | | | | document modifications require creation of a set of |
| of storage in which actual data location and data | | | | temporary files, which contain a subset of document's |
| access are provided by specific layer used for | | | | elements to be edited. In addition, deletion of any |
| encapsulating of access mechanics. In such storages | | | | elements from the document will not reduce file size |
| data can actually be stored in databases or be | | | | automatically. To optimize the size, an additional |
| distributed among different file systems, but the actual | | | | document copy must be created and saved into yet |
| storage organization does not matter for an end-user. | | | | another file. After edit operation is completed, the |
| The user observes only a set of objects accessed | | | | original file must be deleted. If this is to be done |
| through an API, or, as a variant, through file system | | | | automatically by the editing software, the developer of |
| calls. Good example is cloud storages. These types of | | | | this software has got another task to remember |
| data storages are to be used in large software | | | | about. |
| complexes. Among other advantages one can | | | | 2. Relational databases will work well for some types |
| mention unified data access without a need to think | | | | of documents and can provide fast and efficient |
| about actual ways how data are stored. Its | | | | indexing, search and retrieval - if there is an on-the-fly |
| disadvantages - they cannot be efficiently managed | | | | conversion to plain text is available. Databases suffer |
| and controlled, and backup or migration of data is | | | | from the same shortcomings applicable to storage of |
| complicated. | | | | homogeneous data arrays. Keeping data in relational |
| Audio, video and graphic files | | | | databases requires a) use of a specific DBMS, b) |
| Storing a single (or several) multimedia files is simple. | | | | pre-planning of database structure, including |
| Complexities appear when you need to maintain a | | | | interelational links and indexing policy c) researching |
| large number of files and want to perform a search | | | | details of peak loads is required for efficient database |
| across the multimedia collection. | | | | development, which also may be a serious overhead. |
| | | | 3. Structured customizable storages are among the |
| 1. Only very simple and sparse multimedia files can be | | | | best choice when it comes to corporate use of |
| stored as regular files. Even for an average home | | | | documents. The main advantage of structured |
| collection, simple file-based multimedia data storage | | | | storages is that they allow efficient adding or deleting |
| becomes unmanageable very quickly. This is mostly | | | | of documents or their parts to existing storage, |
| due to size of these files, inability to handle any | | | | provides an effective document access restrictions |
| annotation, tags or metadata, and low speed of | | | | etc. Complex documents, that contain embedded |
| copying or relocation. | | | | images or other multimedia, can be handled easier by |
| 2. Relational databases are a dubious way of storing | | | | putting the text apart from the multimedia (doing this |
| audio, video or similar types of data. RDBMS are not | | | | will reduce load/save time, make text search easier |
| well suited for keeping large BLOBs, especially when it | | | | etc). Moreover, journal keeping makes it possible to |
| comes to storing video files of big size. Also each type | | | | restore content (completely or partially) after accidents |
| of data requires it's own table (due to different sets of | | | | or failures. One more benefit is possibility to store |
| metadata that needs to be stored). On the other hand | | | | multiple editions or multiple alternative views of the |
| RDBMS can be handy as they offer powerful search | | | | data within one document. The disadvantage may be |
| capabilities, which is very suitable for read-only | | | | slower search, which should be implemented by using |
| collections. | | | | on-the-fly conversion to plain text. |
| 3. Structured storages work perfectly well for storing | | | | 4. ZIP files are used in some document formats such |
| of multimedia files when the storage supports | | | | as Open Document Format to store document data. |
| metadata and fast search through them. If this search | | | | Most of the advantages, described above for |
| is not supported, structured storage becomes a variant | | | | structured storage, are applicable to ZIP file storage, |
| of the file system. | | | | but again, addition, modification and deletion of the |
| 4. Remote and distributed storages are among the | | | | information are time-consuming operations and |
| best solutions when it comes to storing of video, music | | | | sometimes require complete rewrite of the file. Also, |
| or similar data. Storage represents a single unit where | | | | ZIP file format doesn't allow you to attach metadata to |
| all elements of a multimedia or video game can be | | | | the entries inside, and ZIP encryption capabilities are |
| safely stored. There is no risk of loosing a single but | | | | limited (strong AES encryption is a recent addition to |
| important file. Searches are fast and efficient if the | | | | the standard and it's not supported by many ZIP |
| storage supports tags and metadata. | | | | compression and decompression tools and libraries). |
| Temporary data | | | | 5. Remote and distributed storages are becoming |
| Temporary data are generated by software on the fly | | | | widespread and popular. They allow easy collaboration |
| and usually have a validity term. Most of updates are | | | | during document creation and use, and remote but |
| very frequent. In addition, such intermediate information | | | | tightly controlled and secured access to them. Unlike |
| should stay easily accessible, integral, and, in many | | | | homogeneous data arrays, the document usually |
| cases, encrypted and secured. It is still possible to use | | | | constitutes one object accessed and modified in its |
| regular files for these purposes. This approach will | | | | entirety, and this makes document retrieval and |
| result in high resource consumption, there is no reliable | | | | management quite simple. The cons are the same as |
| way to control and enforce integrity of data and their | | | | in previous paragraphs. |
| encryption functions should be implemented by your | | | | Suggested solutions |
| software. | | | | A simple rule use the right tool for the right job is even |
| | | | more important in the area of software design. |
| 1. For a long time files have been used as a way of | | | | Incorrect or under-thought data and information |
| interim data storage. They are quite suitable for storing | | | | storage planning can lead to disastrous results. |
| low-priority unsecured temporary data of insignificant | | | | |
| size. Meanwhile, modern legislations of several | | | | 1. For use of files you are faced with choice of file |
| countries dictate more careful and responsive | | | | systems. |
| treatment of interim data. As a result, regular file | | | | 2. There is a wide choice of commercial database |
| system becomes less suitable when issue of data | | | | systems: Oracle DB2, etc. or open source solutions. |
| security, vulnerability, and protection from tampering | | | | 3. Repositories can be created by commercial and |
| becomes paramount. | | | | public archiving solutions, such as Zip, etc. |
| 2. Relational databases are not usually used for interim | | | | 4. Examples of Structured storages include OLE |
| data storage due to absence (as a rule) of clearly | | | | Structured Storage by Microsoft (offers basic storage |
| defined structure and interrelated nature of elements. | | | | capabilities, i.e. no encryption, compression or search |
| Low speed of upgrade, issues of compression and | | | | are available) or Solid File System by EldoS |
| security add to this unsuitability. At the same time, a | | | | Corporation. |
| relational database can contain interim data related to | | | | 5. Remote storages are offered as can be designed |
| the database itself and its operation. Also a database | | | | with Solid File System OS Edition and Callback File |
| can be used for some kind of data cache or for | | | | System by EldoS Corporation, FUSE for Unix-based |
| storing activity logs (journal files). RDBMS doesn't suit | | | | systems etc. |
| well, if the data are required to be stored for a long | | | | In any case, only the project developer knows exact |
| term (years) and to be signed or encrypted. | | | | requirements and understands all the technologies, their |
| 3. Structured storages may be considered as an | | | | features and restrictions, and can make, therefore, an |
| optimal solution when a large volume of interim data | | | | adequate choice of tools for successful |
| need to be stored, accessed, indexed and searched, | | | | implementation of his software project. |