UAB computer scientist Ragib Hasan, Ph.D., has been attracting attention in the computing community for his research into “waste data”—the vast amounts of computer hard drives that are occupied by files that are never used. “I got interested in this idea and did some tests on my own computers,” Hasan says. “I found that anywhere from 50 to 90 percent of the files on my laptop were not being read. I later repeated that on my desktop and on the department file server at Johns Hopkins and found the same numbers. So then I thought about the impact of this much waste and how we could handle it.”

The initial reaction to the waste-data problem is obvious, Hasan continues: Just delete the files. “But deletion isn’t free,” he points out. “If the files are on a hard disk, it takes considerable time to delete them. And if you have a smartphone or something else with a flash drive, you are going to be using up the available life cycle of the drive. Many people don’t realize this, but a flash drive has a fixed life cycle—often 10,000 times.”

Another seemingly simple solution—adding more storage—is also more complicated than it at first appears. “A terabyte drive may only cost $50 these days, but the total amount of person hours to maintain that amount of storage is five to seven times the amount that is spent on the drive itself.”

The problem is especially important for the “hundreds and thousands of disk drives” in cloud computer centers, Hasan says. He is convinced that the computing industry needs to take inspiration from real-life waste-management operators and focus on three steps: reduce, reuse, recycle. “In my research I have come up with a number of techniques for reducing data waste,” Hasan says. He will continue to work with student researchers at UAB to refine these techniques.

Matt Windsor

Back to main article