How a good Restful API can benefit storage management

How a good restful API integration can change your world! I’m not a developer and yet within a few hours I wrote an app with the Pure Storage Powershell Toolkit to generate a Visio from live array’s.

I work for Pure Storage, I have worked for NetApp and HDS as a pre-sales engineer. Recently Pure released our PowerShell tool kit and I took it upon myself to see how easy it was to use so I built an application to connect to live controllers and generate Visio’s.
Within a couple of hours I had a working application that I have built into quite a solid application in my downtime over the last week or so.
Over the years I have tried many different tools and integration points including SMI-S and this is by far the easiest mechanism I have used.
The toolkit in its simplest form provides a shell to allow you to build applications around. My application is a read only tool, but you can also use it to perform snapshots, clones etc with very little effort.
Take a look at the blog posts that Barkz, the author of the tool-kit has written to show you just how easy it is.
http://www.purestorage.com/blog/pure-storage-rest-api-windows-powershell-part-1/http://www.purestorage.com/blog/pure-storage-rest-api-windows-powershell-part-2/
I also had to have .Net installed obviously and the VisioAutomation Powershell module.
https://visioautomation.codeplex.com/

Not only do you have get access, but also put access so you can generate snapshots, clones, eradicate LUNs etc and it is just so easy to do.

I hope to have this as a published tool shortly so if you have some Pure Storage in your DC and want to gather, LUN, Host and Port data and present it in a Visio you will easily be able to do it.

Next steps are also looking into developing something for a mobile platform using REST and JSON.

 

Location, Location, Location

When you decided where to live you had to make some conscious decisions. You had to decide where your house was in relation to things that are important to you – schools, the office, transport hubs, freeways or motorways.

You had to decide based on your method of transport how close you needed to be to those things so that your experience was acceptable or great. If you worked in New York, it would not be practical to live in beautiful New Zealand as the commute just wouldn’t work unless you could work remotely. So location is important for things that are important to you.

Storage works in much the same way. A traditional HDD has many platters and the data is typically accessed on the drive from the inside of the platter to the outside of the platter. Physics is physics so information on the inside smaller section of the platter will be quicker to read/write than information on the larger outside.

Back in the old days,  DBAs would have to really think about placement of data. Either by specific spindle and RAID allocation or even getting down to block location on disks. Seek times on the inside of the platter were faster than the outside, so that is where high-performing tables and things like temp-DB needed to live.

Memory has really helped that issue, so has other forms of cache so long as the blocks that need to be cached can fit in the amount of allocated memory or cache, if not you hit a latency wall as you have to go off and seek from disk. This means your reads are essentially free, but writes still require some overhead and I/O tax.

SSD really does go a long way to resolving those issues. SSDs and other flash media are locality reference free! Essentially they are binary charges off and on that means that they are basically free for reads and writes which is why IO latencies are typically sub millisecond.

DBAs now no longer need to worry about where and how they place their data  or even if they used share storage. DBAs will always worry, but now they can worry about things further up the stack.

Imagine if you could live in beautiful New Zealand, work in New York and holiday in the Caribbean without the challenges and cost overheads of mechanical travel.  That is what locality free data access with SSDs and the right Storage OE can give you.

So many miss-truths

I have been in the vendor world now for many years and its great, I really enjoy my job and the companies that I have worked for.

What really gets me is some of the blatant lies and mis-truths that get reported especially when they come from so-called Analysts or the like. I recently read a Edison Article on HP Thin Deduplication and laughed out loud more than once at the claims in the incredulously biased article.

Lets start with “Post processing removes any direct performance impact”. It may remove it at the time of ingest, but it does not remove the impact it has on CPU and memory altogether. Normally Post Process deduplication is a scheduled event that till run when you schedule it – what happens if you run a busy workload during that process. It definitely impacts performance.

Flash helps enables deduplication to happen inline as long as you also have a decent amount of cache and an OE that can efficiently handle metadata. As capacities get bigger so does your metadata table needs.  NetApp was the first to bring deduplication to market and it was, and still is a post process. I have seen the impact fist hand on this going wrong when it hasn’t been managed correctly.

Next lets cover the cost of flash. Data reduction is an integral part of moving to flash,  and also an integral part of the TCO of flash so it should be mentioned. However whats key is that when you talk to customers about this that you validate your data. The company that  I work for, Pure Storage will give you an average data reduction of 3:1 for Database, 6:1 for VSI and 10:1 for VDI workloads. This information is gathered from our cloud assist portal gathering information for all our arrays in the wild and reporting on the non thin-provisioned capacity. The article discussed expressly mentions the deduplication ratios easily being 10:1 over and over again, whereas HP online states that their average data reduction is 4:1. Why is that important – well if you do base your cost per GB on data reduction then 10:1 will mean your cost per GB is a lot better than 4:1, but which information is the accurate of the two? The Pure information, like the NetApp ticker information is consistent messaging across the board. It is key that you look at this information when making your decision and that it is consistent and verifiable.

Take a look at the competitive differences where they rate everyone and claim HP is the best. All 4 vendors rated have very different data reduction methodologies, some like Pure include deduplication and compression but don’t count Thin Provisioning, HP counts Thin Provisioning and deduplication. XtremeIO and Solidfire are different again. How can they say that theirs is the best?

The report said HP Looked at telemetry data from 10s of thousands of systems??  The report states than an analysis was done between 16KiB and 4KiB block sizes and that there was little difference between the two with modest savings of 15%. That information was gained from telemetry data from their phone home system. I have seen similar data sent from NetApp, Hitachi and Pure and I would be very surprised if this was true. The data sent is extremely dense and takes masses of processing power just to manage fault calls let alone do deep statistical analysis. I would like to see some more information on this. I think IMHO that its just simply a very expensive exercise to change the block size. Look what happened with EMC XtremeIO recently going from XIOS 2.4 to 3.0 being a destructive upgrade going from 4KB to 8KB block sizes.

 

Thats enough of a rant for now, but I encourage you to do your homework before investing in any new technology. When you look at analyst reports, they are all paid for, but some are more paid for biased than others.

My advice, ask your vendor to prove it. Put a controller on the floor, run some real world work loads and not synthetic on your own data.

 

SPAM

Well I learnt my lesson. I had my settings setup to allow anyone to comment which was a bad mistake as I got spammed.
So now if you want to comment on this blog, you will need to register.

What was interesting is that they were all hitting the same blog entry, the one about XtremeIOMG and not even the latest one.

Must be a Goggle BOT thing!