Solution Briefs

Hadoop: Analytics

Analytics applications are rapidly becoming the key applications for Big Data workloads. Analytics applications address the large data-sets that are generated by transactional processing to find the patterns in the data that can be leveraged to take decisive action in a fast-moving marketplace. 2 pages

Analytics is an umbrella term used to describe a number of specific workloads that are widely deployed within financial services companies. These workloads are needed to cope with the data tsunami that is hitting financial services firms— generated from a variety of data sources. Customers must quickly find the patterns in the data in order to make accurate and timely business decisions.

Hadoop is a leading software architecture that allows customers to identify the key data-points in extremely large multi-terabyte (TB) datasets. SanDisk® evaluated a Hadoop six data-node cluster, using the Terasort workload, to determine the impact of running those workloads on flash-enabled servers.

“Analytics is rapidly becoming the key application for Big Data workloads”

Jean S. Bozman


Hadoop is well-known both for its applicability to financial services applications, and for its ability to “scale up” along with the number of servers attached to a Hadoop cluster. The ability of Hadoop to work with large data-sets, and to parse out the computing tasks – mapping them to servers within the cluster – accounts for its wide adoption within the financial services world.

Parallelized workloads, like Hadoop, are ideally suited to a scale-out computing world, in which more servers can be added, as needed, as demand for computing increases. In fact, with cloud computing, customers can tap into the processing power of more than 100 for compute-capacity, if needed.

With Hadoop, the master server is the one that maps the computing tasks to specific servers – making it possible for each individual server to perform well, while adding more servers to the cluster.

Testing Hadoop

SanDisk has run Terasort benchmark tests on Hadoop servers, to see how solid-state drives (SSDs) can accelerate Hadoop, as it is running in real-time on servers.

In a test of a six data-node cluster, the Hadoop instance supporting a 1TB dataset running across all six nodes achieved results 32% faster at 15% less cost when compared to traditional harddisk drives (HDDs).

These results are shown for a six datanode Hadoop cluster, but the findings can be applied to larger clusters, with more server nodes included. All of the Hadoop processes, including loading the data, sorting the data, and completing the computation, benefits from the use of flash SSDs.

6 node Hadoop Cluster Example

Advantages of Flash

Flash technology accelerates the performance of Hadoop clusters, and its benefits are extensible, as the Hadoop cluster expands through the addition of nodes. The unique design of Hadoop software offloads the increasing data traffic from the master node to the individual nodes for processing – and then gathers the results. Customers who acquire flash-enabled servers will see performance benefits, with dramatically reduced latency for I/O – improving the time-to-results.

Using SSDs brings a number of advantages to customers in terms of CapEx and OpEx costs. First, in deployments with SSDs, fewer servers will be needed to deliver the same storage capacity as server deployments leveraging HDDs. The performance characteristics of SSDs make them much less subject to the response time issues that affect HDDs. Operational expenditures are less, because the number of servers required within the data center is less.

With fewer drives, and fewer systems required, power and cooling costs are lower than for an HDD-based server solution. SSDs save time and money, because they reduce latency, while improving quality of service (QoS). And with no moving parts, SSDs don’t experience failures due to mechanical parts wearing out. In terms of high availability for mission-critical data, SSDs’ non-volatile memory preserves data, reducing time to recovery from outages.


The digital universe is expanding – creating new demands on those who must analyze it, and take actions based on the analytics results. SanDisk SSDs can be put into use immediately through simple on-site replacements of existing HDDs. Or, SSDs can be acquired as builtin devices inside OEM systems vendor products that are being acquired for new projects.

For technology refresh, SanDisk SSDs plug into standardized interfaces for SAS, SATA, and PCIe directly—so they fit into existing data center systems with no disruption of the infrastructure. New deployments bring the benefits of flash technology, as well. Flash SSDs are built into the servers being acquired from major systems vendors worldwide. SanDisk SSDs are being shipped by 6 of the top 7 server and storage OEMs worldwide.

Fast-paced financial markets value technology that allows them to analyze transactional data—and to predict where the market is heading. Solid state drives provide rapid processing, and shorter time-frames to meet customers’ quick decision horizons.

앞으로 플래시를 사용할 준비가 되셨습니까?

포츈지 500대 대기업이든 5명이서 출발하는 신생 기업이든 관계없이, SanDisk는 인프라 활용도를 극대화하는 데 도움이 되는 솔루션을 보유하고 있습니다.


질문을 하시면 저희가 답변을 드리겠습니다.

대화해 봅시다

망설이지 말고, 잠시 얘기를 나누면서 완벽한 플래시 솔루션 구축을 시작하십시오.

판매 문의

몇 가지 간단한 질문을 하려는 경우든, 조직의 요구에 맞게 설정된 SanDisk 솔루션에 대해 논의할 준비가 되어 있든 관계없이, SanDisk 영업 팀이 항상 도움을 드리기 위해 대기하고 있습니다.

아래 양식을 작성해 주시면 질문에 기꺼이 답변해 드릴 것입니다. 영업 팀에 바로 연락해야 한다면 다음 전화 번호를 이용하십시오: 800.578.6007

필드를 비워둘 수 없습니다.
필드를 비워둘 수 없습니다.
올바른 이메일 주소를 입력하십시오.
필드에는 숫자만 사용할 수 있습니다.
필드를 비워둘 수 없습니다.
필드를 비워둘 수 없습니다.
필드를 비워둘 수 없습니다.
필드를 비워둘 수 없습니다.

관심 분야에 표시하십시오:

질문 또는 의견:

옵션을 선택해야 합니다.

감사합니다. 요청을 접수했습니다.