Projects - Advanced Storage Technology Lab


  • Research on Key Technologies of Reliability in Deduplication-based Storage Systems (NSFC No. 61772439, 2018.01~2018.12, PI: Dr. Mao).

  • Research on the Data Layout and Buffer Management Strategies for SSD-based Disk Arrays (NSFC No. 61472336, 2015.01~2018.12, PI: Dr. Wu).

  • Research on the Data Deduplication in Cloud Storage Systems (NSFC No. 61402385, 2015.01~2017.12, PI: Dr. Mao).

  • Research on the Performance and Reliability Improvement of eMMC Storage in Android Platforms (Huawei, 2014.12-, PI: Dr. Mao).

  • Completed:Research on the Availability Technology of the Cloud Storage Systems (Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, 2014.01~2016.12, PI: Dr. Mao).

  • Completed: Research on the design and implementation of a variable-size chunking based data deduplication system (The State Key Laboratory of High-end Server & Storage Technology, 2014.10~2016.4, PI: Dr. Wu).

  • Completed: Research on the Performance and Energy Efficiency of Data Restore in Deduplication-based Storage Systems (NSFC No. 61100033, 2012.01~2014.12, PI: Dr. Wu).

  • Completed: Research on the Availability Technology in Big Data Storage Systems (Huawei Innovation Research Program, 2014.01~2014.12, PI: Dr. Mao. Evaluation: Excellence).

Ongoing Research

1. Primary Data Deduplication: The data deduplication technology has been demonstrated to be very effective in the cloud backup and archiving applications. Recently, applying data deduplication in primary storage systems, such as VM platforms, becomes popular. However, the delay and power consumption of the restore operations from a deduplicated primary storage can be significantly higher than those without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in non-sequential locations on HDDs after deduplication, which can cause a subsequent read operation to invoke many HDD I/O requests involving multiple disk seeks. To address this problem, we are investigating various approaches. The preliminary results have been accepted by IPDPS'17, MSST'17, ICPADS'16, IPDPS'14, ACM Transactions on Storage (2014), NAS'12, OSDI'12 (poster), SOCC'13 (poster).


2. Systems for the New Storage Devices: Recently, various new storage devices (such as flash-based SSD, PRAM, et. al.) become popular in the storage systems and some are deployed in the real environment. However, the traditional storage systems are designed for the HDD-based storage systems, not for the new storage devices. The advantages of the new storage devices are not fully exploited by the upper applications. We are in the process to design new File systems, I/O scheduling, data placement and storage systems with hybrid devices. The preliminary results have been appeared in IEEE Transactions on Parallel and Distributed Systems (2017), IPDPS'17, ACM Transactions on Storage (2016, 2012), ICS'16, ICA3PP'15, ICA3PP'14, Cluster'12 and IPDPS'10.


3. High Available Storage Systems: Data centers are consisting of hundreds and thousands of electronic components, including NICs, CPUs, Disks and so on. Device failures in such an environment become common case rather than exception. Thus fast recovery from device failures, including partial and complete failures, becomes critical for the availability of large-scale storage systems. We have proposed some efficient data recovery algorithms and systems, please see our published papers in IEEE Transactions on Parallel and Distributed Systems (2016), IEEE Transactions on Computers (2016, 2015, 2011), IPDPS'15, USENIX LISA'12, USENIX FAST'09 and ICPADS'09.