skip to Main Content
+44 (0) 345 058 2374 info@arcastream.com
Case Study:
Imperial College London

Managing & Protecting Vital Research Data Throughout Its Lifecycle

All featured image credits: Stewart Oak, Imperial College London.

Imperial College London efficiently stores, manages & protects large volumes of world-leading research data throughout its life-cycle with a tailored high-performance & future-proof software-defined storage and data-management platform from ArcaStream.

Background

Imperial College London is home to 17,000 students and 8,000 staff, attracting undergraduates from more than 125 countries and awarding over 6,700 degrees every year. The University focuses on the four main disciplines of science, engineering, medicine and business and is one of the world’s leading university research centres – sharing ideas, expertise and technology to find answers to today’s big scientific questions and tackle global challenges.

As a centre for high-impact research, the University’s Research Computing Service (RCS) – part of the ICT department – plays a vital role in addressing the computing and storage needs of the research community. In 2018, the RCS team launched the Research Data Store (RDS) to provide new robust, reliable storage services to efficiently manage and protect large volumes of research data throughout its entire life-cycle. The innovative solution at the heart of these services was designed and delivered by ArcaStream.

Client objectives

  • Eliminate 30 fragmented islands of storage with a single centrally-managed and supported system
  • Guarantee consistent high-performance with an optimum user experience
  • Ensure responsible data management compliant with regulations
  • Charge users by consumption not capacity
  • Deliver a continual service for decades

The Solution

The requirement for guaranteed performance, scalability and integration with current and future compute systems substantially increased the complexity of the project and a bespoke solution was sought. ArcaStream was selected to provide a high-performance, scalable research storage solution to seamlessly integrate legacy infrastructure and support the University’s future storage strategies.

ArcaStream’s PixStor framework was deployed to deliver a protected, adaptable, scalable and collaborative central platform across the institution. A high-performance scalable storage platform based on IBM Spectrum Scale, it combines flash, disk, tape, and cloud storage into a single global namespace. With a software-defined architecture, it uses open standard commodity hardware to avoid vendor lock-in coupled with powerful data management tools – including tiering, cloud integration, monitoring, search and analytics – to drive workflow efficiencies and reduce costs.

PixStor guarantees consistent high-performance with no degradation as the file system fills, and the platform has been designed to easily extend, upgrade and replace for the foreseeable future, with no limits on how large it can grow, or for how long it can operate a continuous service.

Highlights

  • 10 PB research storage repository with a single centrally-managed platform serving existing 2,500 HPC nodes.
  • Simultaneously serving 3,000+ users seamlessly via the desktop.  
  • 20 GB/s throughput with no loss of interactive user performance.  
  • Disaster Recovery using Ngenea to tier replicated data to colder external storage without expanding the physical footprint as the system grows.
  • 1 Billion files replicated each night in less than 8 hours.
  • Analytics to make informed decisions on expansion and the true cost of existing data.

System Overview

Imperial College’s Resource Data Store (RDS) infrastructure is data-centre based with geographically dispersed primary and secondary sites deploying PixStor with asynchronous replication and intelligent automated tiering to external storage targets.

Performance & a Single Global Namespace

At the primary site, ArcaStream provided a 10PB research storage repository to the University using PixStor. Capable of simultaneously serving their existing 2,500 node high-performance computing estate as well as the desktop requirements of the wider research community, the combined solution delivered over 20GB/s of throughput with no loss in interactive usage performance – all consolidated into a single usable namespace.

DR & data tiering

A second site provides an offsite replica of the data for Disaster Recovery with dark fibre connection between the sites. This system uses ArcaStream’s Ngenea to tier replicated data to a colder, external tier of storage, providing the University with a more cost-effective replication site without the need to expand the physical DR footprint as the production system grows. Using Ngenea, the team has been able to redeploy existing Spectra Logic BlackPearl® object storage and a Spectra® T950 tape library for deep storage at the primary site, delivering a significant return on investment in their legacy hardware.

Replication

To facilitate the actual replication of data, ArcaStream’s PixStor Sync utility was used. With over one billion files and directories in the RDS, PixStor Sync was able to meet Imperial College’s requirement for a 24-hour recovery point objective. The complete replication process completes in less than 8 hours nightly, whilst also preserving all metadata (file ownership, access control lists, extended attributes, etc) such that the service is immediately and continually “live” at the DR site.

Analytics

PixStor Capacity Analytics and Search were also deployed, providing the University with a far greater awareness of the contents of the repository, the age of the data and who is using the system. This has enabled RCS to make more informed decisions on expansion requirements, realise the true cost of existing data and become more aware of the behaviour and needs of the repository’s users.

Sensitive Data

A smaller, isolated solution was also deployed, providing one petabyte of available capacity for secure requirements and sensitive data in compliance with data protection and GDPR regulations. The system provides the equivalent functionality but is subject to higher levels of auditing.

Featured Technologies

ArcaStream’s PixStor combines best-of-breed technologies from Dell, Excelero and Mellanox into a single integrated platform.

iu-2

Dell PowerEdge servers and Dell PowerVault storage to deliver exceptional reliability and performance at a commodity price point. R740 Servers run and serve the filesystem to the HPC Clients as well as to the general research community via the ArcaStream NAS stack.

Excelero’s NVMesh® software provides a scalable NVMe tier for extreme metadata performance, running on Dell PowerEdge R740XD servers. NVMesh enables shared NVMe across any network and supercharges any file system – accelerating interactive performance.

Mellanox Spectrum and ConnectX® technologies provide a hybrid network infrastructure to deliver data via both Infiniband and Ethernet. The 100GbE networking solution backbone allows seamless scalability of the solution as the University’s requirements increase.

Future-proof Confidence

Researchers at Imperial College London are now using ArcaStream’s PixStor platform with absolute confidence in its ability to support their research data storage needs. High performance, with enterprise-level reliability and integrity, along with robust continuous service through expansion, replacement, upgrades and refreshes, mean that the University can confidently plan for the long term – addressing continual multi-petabyte per year growth of their data holdings, safe in the knowledge that both the primary and DR solution can keep up.

The PixStor platform has already been expanded with additional capacity, and further expansion into tape storage using Ngenea is underway to provide a massive capacity boost without compromising on the service provided. This ongoing investment in the service speaks volumes about Imperial College London’s confidence in the PixStor platform and ArcaStream’s ability to deliver the level of support and service they require to strategically meet their always-evolving research data requirements.

End Results

  • Greater agility and insight to manage capacity and performance.
  • Reduced complexity and silos.
  • Improved security to meet stringent data regulations.
  • Efficient control of expenditure and growth strategies.
  • Delivers significant ROI, with integration of legacy and new systems.
  • Future-proof scalability and flexibility without hardware lock-in.
  • Peace of mind and an improved user experience.

Empower & Simplify Your Data Workflows

Contact ArcaStream today to discuss your specific workflow & data-management challenges with our expert team.

+44 (0) 345 058 2374
sales@arcastream.com

Back To Top