EMC ScaleIO part1 – Overview
During one of my recent project we had to figure out which storage solution we will use for our VMware and OpenStack environment. One of the product we are considering is EMC ScaleIO software defined storage.
ScaleIO is pure software defined storage solution. You can run it on any kind of x86 server, there is no limitation for any vendor (DELL) which gives you grate flexibility how to precisely define your storage characteristics.
You can use ScaleIO to provide generic storage for any kind of workload but in this series, we will be talking about deployment on VMware vSphere.
Nice thing is that you can mix and match SDC and SDS components either both on one server (hyper converged solution) or separated SDS storage nodes (for example you need to scale out your storage but you don’t need to add compute resources)
There are three key components of the system.
ScaleIO Data Client – SDC
The SDC is a lightweight block device driver that exposes ScaleIO shared block volumes to applications. The SDC runs on the same server as the application. This enables the application to issue an IO request and the SDC fulfills it regardless of where the particular blocks physically reside. The SDC communicates with other nodes (beyond its own local server) over TCP/IP-based protocol, so it is fully routable.
SDC can be small application running on your physical server (Windows or Linux) so you can mount ScaleIO directly in the system or it can be VMware kernel driver (or OpenStack Cinder driver) to provide storage access on the hypervisor level.
ScaleIO Data Server – SDS
The SDS owns local storage that contributes to the ScaleIO Storage Pools. An instance of the SDS runs on every server that contributes some or all of its local storage space (HDDs, SSDs, or PCIe flash cards) to the aggregated pool of storage within the ScaleIO virtual SAN. Local storage may be disks, disk partitions, even files. The role of the SDS is to actually perform the Back-End IO operations as requested by an SDC.
ScaleIO Meta data manager – MDM
The Meta Data Manager manages the ScaleIO system. The MDM contains all the metadata required for system operation; such as configuration changes. The MDM also allows monitoring capabilities to assist users with most system management tasks.
The MDM manages the meta data, SDC, SDS, devices mapping, volumes, snapshots, system capacity including device allocations and/or release of capacity, RAID protection, errors and failures, and system rebuild tasks including rebalancing. In addition, the MDM responds to all user commands or queries. In a normal IO flow, the MDM is not part of the data path and user data does not pass through the MDM. Therefore, the MDM is never a performance bottleneck for IO operations.
How does it look inside in terms of SDS and SDC in VMware vSphere when talking about hyper converged solution?
SDS is realized by VM that owns local storage resources (HDDs or SSDs). SDC running as a kernel module then connects over network to SDS to access the data itself.
This is a standard approach that can be seen among other solutions (like HPE StoreVirtual)
I will describe all the features in further posts so now just a summarization
- Elasticity and rebalancing
- Encryption at rest
- Hardware independent
- Mix and match nodes
- Protection domains
- QoS and IOPS limitation
- Scale-out architecture
- Storage pools
- Volume sharing
ScaleIO vs VMware VSAN
This is quite tricky because on the first point of view both solutions offer the same functionality – software defined storage. But there is lot more thing to think about.
VMware VSAN is perfect solution for vSphere (but only for vSphere at this time). Tight integration into the whole VMware ecosystem is nice for VMware admins.
ScaleIO on the other hand is grate for heterogeneous deployments (VMware, OpenStack, Physical servers) and is more flexible in terms of hardware that can be used as a storage.
- VSAN is object based, ScaleIO is block based
- VSAN offers some functionality that ScaleIO does not support (deduplication & compression)
- VSAN scales up to 64 hosts, ScaleIO up to 1000.
- VSAN will create one monolithic datastore, ScaleIO can represent hundreds of datastores (LUNs)
- VSAN is fully integrated into vSphere ecosystem
- ScaleIO can be used as a generic storage solution (not only for vSphere but for any kind of application)
If you run pure VMware based environment and scaling to hundreds of nodes is not an objective I would stick with VMware VSAN but if you are looking for generic software defined storage I would recommend to evaluate ScaleIO as well.
In next part I will go through installation on VMware vSphere and basic configuration.