content top
The need to use multiple data systems...

I’ve come across a number of customer sites that have only a single backup stream from each client running sequentially to a tape drive device.  While there is a concern that the time to recover increases if you have multiplexed data on a tape, there is also a need to properly stream the tape drive.  For the most part these nice little pieces of hardware do not have a slow speed and will be required to stop and restart as data is presented.  This is known as “shoe shining” and can take a large toll on the device resulting in frequent repairs and even slower backups.

The following examples expand each step in multiplexing backups to illustrate how all tape drive devices should be used.

  1. Each example shows 5 Clients, each client having 4 possible jobs streams
  2. Each stream runs at 5 MB/Sec
  3. Tape device can handle 80MB/Sec

Figure A
The example in Figure A shows the consecutive approach currently deployed causing a cascading slow down affect.  There is simply not enough data being presented to the tape drive for it to stream effectively.
clip_image002_0000 clip_image003

Figure B
The example in Figure B is an improvement as each client can send a backup to the tape drive, shortening the overall backup window, and utilizing the tape drive at 20MB/Sec.
clip_image005

Figure C
The examples in Figure C and D show the possible configuration should the specific client not be able to handle multiple data streams or if everything could be sent to the tape drive at the same time.  The bottom right needs 100MB/Sec, but with only a slight compression ratio this scenario would keepclip_image007

Figure D
the tape drive streaming at an optimized rate.  If the original example had each stream taking one hour, the backup window would need to be 20 hours to complete; where as the final example could finish all backups in only 1 hour.
clip_image009

Data Deduplication, Backup, Disaster Rec...

And a few parting thoughts when considering a new backup solution, in general:

We strongly encourage customers to explore reference architecture design alternatives before choosing a specific product for deduplication. In most cases, backup architecture redesign is the best way to maximize re-use of existing backup assets and take full advantage of next generation backup technologies, such as VTL and deduplication. In addition, performing a vendor-neutral performance and sizing analysis for core and remote sites is the only way to properly plan for deduplication both from a technical and budgetary perspective.

-John Merryman, GlassHouse Service Director

a

Data Deduplication, Backup, Disaster Rec...

Archiving

Should backup and archive be the same service? The GlassHouse general position is ‘no’, however every environment faces need for long term retention of data, and in lieu of a true enterprise archiving platform, backup data retention is the only option for many enterprises. In general, we see tape playing a valuable role for long term retention of data, and most vtl/deduplication vendors tend to agree with this position. Operationally, backup and archiving should be logically segregated, with archives treated clearly as the exception than the rule with a clear distinction made between server archives (to backup environments) and true archives, such as those found in email or database archiving systems.

-John Merryman, GlassHouse Service Director

Data Deduplication, Backup, Disaster Rec...

Data replication and DR solutions

raidThe impact of disk based replication of deduplicated data has a profound impact on backup architectures and their role in disaster recovery. The traditional backup environment today uses offsite vaulting for tape, and disk or host based replication for more aggressive disaster recovery capabilities. The role of backup in disaster recovery can be significantly enhanced through any combination of backup software or hardware deduplication and replication schemes.

 

 

-John Merryman, GlassHouse Service Director

Data Deduplication, Backup, Disaster Rec...
  • Data Deduplication
    • Data deduplication is one of the most disruptive technology features to affect enterprise backup. This technology feature makes disk economical for backup data storage, and opens up options for remote site and core data center replication, previously unavailable. Software and hardware based deduplication have distinct differences in terms of scale/performance/capacity, and should be applied appropriately to production backup needs. The vendors are typically very aggressive positioning all technologies to do all things, and the amount of vendor hype, poor information, and quasi-solution designing is absolutely staggering.

-John Merryman, GlassHouse Service Director

Why are backups important?...

backupsI’ve spent over 17 years working in the backup space starting out as an evening ‘Tape Jockey’ contracting for Digital Equipment Corporation.  I’ve heard over that time that the profession of administrating a backup environment was “the most important job that no one wants”.  This concept has held true for a long time and has also intrigued me as to why people have shied away from it.  From a near primal approach if data is worth being saved to disk then assuredly it’s worth making a copy of.  Hasn’t everyone lost something thought to be saved and would like it brought back?

Now to say that you need to have the meeting minutes from last Tuesday, which were of course accidently deleted, back before this week’s agenda shows you missed an actionable item is obvious.  On the surface restores from these backups can keep the day to day business running since having information readily at hand can mean the difference.  When you pull back a layer you find that there are far more corporate and legal reasons for keeping and being able to access information over the long term.  These include government regulatory requirements based on the type of business and data that is being protected.  There are also binding agreements with customers and clients that warrant the retention of data.

The how to retain data, for what retention periods, and the need to bring it back will be topics of future blog postings.  If you’re a backup person or are just interested the art, stay tuned for more details from nearly two decades of living it.

-Richard Witherow, GlassHouse Senior Consultant

Why backups still fail…...

Someone recently asked me why backups continue to fail on a nightly basis. His contention was that, with all the technology advances of the past 10 years, backup failures should be a thing of the past. That got me thinking – what are the reasons for backup failures…?

The most obvious and common causes for backup failures have very little to do with the technology deployed. Simply put, most failures are the result of operational issues, such as misconfiguring a policy or file listing, or making changes that aren’t appropriately tracked. When someone fat-fingers a file listing or enters the hostname incorrectly, backups will fail. Repetitive failures of this nature are indicative of operational control issues that should be corrected via documented procedures and rigorous testing prior to “go-live” for the backups. Ensuring that the backup works prior to enabling regular nightly backups should become part of the standard deployment of any new server.

Once a backup client is properly configured within the backup application, there are multiple ways its backups may fail on any given night, but nearly all of them can be traced back to proper change management. Backup administrators often receive requests for ad-hoc backups, changes to the schedules, or modified file listings. If these changes are performed with minimal adherence to a change control policy, backup failures may occur. For instance, many times I’ve seen admins remove a client from a nightly backup policy due to planned maintenance on the server. I’ve also seen too many cases where the admin forgot to put that server back into the policy the following day! The result is that the host is no longer backed up. While it doesn’t register as a backup failure in the application, recovering that server will be impossible. Proper change control processes (tracking the change, notifying uses of the change, and the following up on the change) can prevent these types of issues.

I know most administrators hate talking about process, so stayed tuned for my next entry when I discuss some of the technical reasons why backups continue to fail!

-Jeff Harbert, GlassHouse Engagement Partner

Deduplication: Why for Backup Storage &#...

johnm1
In the big picture, deduplication is a feature, not a function, of disk technology. Nonetheless, it is a really disruptive gateway feature because it means you can store a lot more logical data on disk than you ever could before. The implications for backup data storage are tremendous, the implications for primary storage are less so.

Just for fun, let’s put the Terminator ‘Rise of the Machines’ spin on this topic. In primary storage (data spinning on disks for real-time usage), you have roughly 40-50% duplicate data (at a very detailed level) in data sets that are created by humans (files, emails, etc.), and typically much less duplicate data that is created by machines. So transaction logs for instance are highly unique and not going to contain much duplicate data. Video files (created by processors and software) are nearly entirely unique, unless you’re videotaping grass grow (again, stupid human behavior). Data within databases is mostly unique and machine generated, but the way humans design and deploy databases creates a large amount of duplicate data by spawning multiple copies of the same database or data sets. And that’s just for primary storage.

Then, when humans use backup software to create copies of data for purposes of data protection, huge amounts of duplicate data is created. When you start creating additional copies for backup purposes, the amount of duplicate data created goes up exponentially.

Ironically, if a human applies machine generated compression or encryption algorithms to data, it won’t deduplicate at all because duplicate data patterns are scrambled and undecipherable to deduplication algorithms. So any way you cut it, humans are sloppy and machines aren’t.

So in the land of primary data, the overall amount of duplicate data is roughly 10-15%, and best case 50% if you’re purely running a business on human generated files and emails. So at best case, you’re looking at a 2:1 deduplication ratio for primary data. So you can buy a disk storage device with the deduplication feature, and potentially double your logical storage consumption. So if you’re about to by a NAS device with this feature, eat your heart out, but don’t expect any revolutionary changes to life in the world of storage management.

Depending on the data volatility, data types, the backup platform, and backup policies, deduplication ratios range from 5:1 to 20:1 are typical in backup storage. That’s a game changer for disk vs. tape, making disk more economically viable for backup storage. Deduplication is also a gateway feature, enabling disruptive changes such as:

* Tape reduction and/or elimination strategies
* Remote replication of deduplicated data
* Enhanced disaster recovery capabilities for backup
* Tape elimination in remote sites
* Tape media encryption avoidance strategies
* Wholesale innovations in backup that have not been possible up until now.

That’s why my chips are on the machine (in this case deduplication algorithms run by machines) and backup storage. Next topic, let’s peel back the onion on backup.

-John Merryman, GlassHouse Service Director