Hadoop - YottaByte Me

Why Hadoop?

Posted by gildenshelton565 January - 17 - 2015 ADD COMMENTS

Hadoop is an open-source software platform by the Apache Foundation for building clusters of servers for use in distributed computing. Server clustering is really nothing new or revolutionary but Hadoop is designed specifically for mass-scale computing, which involves thousands of servers. Based on a paper originally written by Google about their MapReduce system, Hadoop leverages concepts from functional programming to solve large computing problems. Hadoop is an ideal solution for working with large volumes of data in a variety of applications from scientific to searching through web pages.

Leveraging the Power of Functional Programming
Functional programming is a style of programming that has its roots in lambda calculus. Functional programming is based largely on the idea of applying functions to a set of data. In this style of programming there is no state or mutable data. This makes functional programming a natural style of development for systems that analyze large sets of data. In functional programming, a function is applied to each member of a set of data and the result is the output. However, the data that was input remains unchanged. This is exactly what you need when you’re working through very large set of data to analyze or transform them in some way.

MapReduce: Splitting a Big Problem into Little Pieces
Google applied the ideas of functional programming to the problem of how to manipulate large amounts of data about the contents of pages on the world wide web. The system they developed was called MapReduce and was described in a paper published by Google.

MapReduce is what allows Google to search and index the huge volume of web pages that their bot spiders. Maps are an element of functional programming languages in which a function is applied to a set of data items. The number of parameters in the original input is immaterial as the function will be applied to each one.

Google’s system took it a step further though. Their MapReduce system consists of thousands of servers. The process of applying a function to each element of the input can all happen on one machine or it could be split over as many servers as needed. This allows Google to leverage the computing power of many less expensive machines to sort, analyze and transform large sets of data.

A Java Based Open Source Implementation of MapReduce
While Google’s system is proprietary, they did publish the concepts behind it. This lead to the creation of Hadoop. Hadoop is implemented in the Java programming language and contains a suite of tools for creating mass-scale computer clusters. Hadoop consists of a number of components including a distributed file system, a scalable database, and a programming framework for building code that uses the MapReduce concepts to solve problems based on large sets of data. The system allows one to build highly reliable and scalable architectures for distributed computing.

Hadoop Uses and Benefits
While it is pretty easy to see that Hadoop is a powerful suite of tools, it may not be readily apparent why it might be useful or the kinds of tasks that would benefit from it. You can get some ideas by looking at who is using Hadoop and the kinds of tasks they are running on Hadoop.

Log File & Web Analytics
Large web sites generate tremendous amounts of data about their visitors. These log files can grow to be hundreds of gigabytes in size. Log files contain important information, however, about how users interact with a site, where they are coming from and can even help detect attacks and suspicious activity. Analyzing that data is a great job for Hadoop and that is exactly what companies like Facebook and Rackspace are using it for.

Ad Targeting
The amount of data that is analyzed to decide which ads should display on your favorite web site is staggering. Large advertising networks also have to collect data on millions of clicks and provide useful information to their clients about how users are responding to those ads. Analyzing this data to determine how to best serve ads is another ideal application for Hadoop. Advertising networks like Adknowledge use Hadoop to analyze the millions of clicks and determine which ads to display and when.

Scientific Applications
There are more applications that require analyzing large volumes of data beside web logs and advertising. There are a number of practical scientific applications for a system like Hadoop. Physics, biochemistry and genetics research all require analyzing each item in a large set of data. One company, Spadac.com, is using Hadoop to power geospatial processing.

Financial Analysis
The finance industry is another segment that generates large volumes of data. Hadoop can help solve financial problems by analyzing large sets of transactions or stock prices. It can be used to spot trends that might suggest fraud, find ways to improve the bottom line or discover trends in the stock market to help investors pick better investments. Pronux is a company that uses Hadoop to analyze transactions being posted by the bookkeeping department of large organizations. Another company uses it to do technical analysis of various stocks.

Search
Of course, searching a large data set, like Google does, lead to the concepts that inspired Hadoop. Hadoop is a powerful tool for indexing large amounts of data and searching through that data. It is used for this exact purpose by BaiDu, China’s leading Internet search engine; Amazon, for searching the many products they carry; and by LinkedIn, for suggesting people you might know and fun facts.

Based on the idea that a large problem could be split into smaller pieces and tackled by many computers, Hadoop provides an open source, scalable system to build clusters of thousands of servers. It is primarily designed to apply the concepts of functional programming to the analysis of large volumes of data. As such, Hadoop powers numerous systems in the web, search, finance, and scientific market segments.

Michael Dorf is a professional software architect and instructor with M.S. in Software Engineering and 12 years of industry experience. He teaches for LearnComputer! (learncomputer.com), which offers instructor-led local, online, and onsite Hadoop Training for companies and individuals. Our Hadoop Training with MapReduce course is recognized in the industry. See our Hadoop Training schedule for the list of upcoming Hadoop courses!

Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming…

Posted by mod198 January - 10 - 2015 ADD COMMENTS

Some recent Big Data Enterprise auctions on eBay:

Tags : Analytics, Class, Data, Enterprise, Hadoop, Streaming..., Understanding, Big Data Analytics

Is it Time for Hadoop Alternatives?

Posted by gildenshelton565 January - 7 - 2015 ADD COMMENTS

Is it Time for Hadoop Alternatives?
Hadoop has progressed from a large scale, batch-oriented analytics tool used by a handful of webscalers to a multi-application processing platform for webscale and enterprise users. The vendors of Hadoop's three major distributions uniformly …
Read more on Forbes

NoSQL Integration with the Hadoop Ecosystem By @MapR | @BigDataExpo
Apache Hadoop is an open source Big Data processing platform that comes with its own extensive ecosystem to support various business and technical needs. Hadoop's specialty is large-scale processing and analytics over volumes of data that cannot be …
Read more on SYS-CON Media (press release)

Altiscale Lands M To Continue Building Hadoop Cloud Service
When you think of Hadoop, you probably think of a complex on-premises set up in your data center, but Altiscale wants to reduce some of that complexity by moving the whole thing to the cloud and offering Hadoop as a service. Today it got $ 30M in Series …
Read more on TechCrunch

Tags : Alternatives, Hadoop, Time, Big Data Analytics

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Posted by mod198 January - 6 - 2015 ADD COMMENTS

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also

Price:

Related MapReduce Products

Tags : Algorithms, Analytics, Building, Design, Effective, Hadoop, MapReduce, Patterns, Systems, Big Data Analytics

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Posted by admin December - 30 - 2014 ADD COMMENTS

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Used Book in Good Condition

List Price: $ 44.99

Price:

Amazon Elastic MapReduce Developer Guide

This is official Amazon Web Services (AWS) documentation for Amazon Elastic MapReduce (Amazon EMR).
This guide provides a conceptual overview of Amazon EMR, an overview of how related AWS products work with Amazon EMR, and detailed information on Amazon EMR functionality.
Amazon EMR is a data analysis tool that simplifies the set-up and management of a computer cluster, the source data, and the computational tools that help you implement sophisticated data processing jobs quickly.

Price:

Tags : Algorithms, Analytics, Building, Design, Effective, Hadoop, MapReduce, Patterns, Systems, Big Data Analytics

Why Is Everyone Beating Up On Hadoop?

Posted by jaymepobre748 December - 27 - 2014 ADD COMMENTS

Why Is Everyone Beating Up On Hadoop?
Pop culture question: what do Rex Ryan, Barack Obama, and Hadoop have in common? Answer: Everybody is beating up on them for one reason or another. I'm not prepared to defend Ryan or Obama, but I will step up for Hadoop. First, though, let's look at …
Read more on Forbes

The Joys and Hype of Software Called Hadoop
Underpinning the big-data craze is Hadoop, a software suite named for a toy elephant belonging to the son of a Yahoo programmer who helped develop the software in the mid-2000s. While traditional databases like those offered by Oracle Corp. store …
Read more on Wall Street Journal

Tags : Beating, Everyone, Hadoop, Big Data Analytics

Hadoop Operations

Posted by gildenshelton565 December - 26 - 2014 ADD COMMENTS

Hadoop Operations

Used Book in Good Condition

If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Rather than run through all possible scena

List Price: $ 49.99

Price:

Related Hadoop Products

Tags : Hadoop, Operations, Big Data Analytics

Hadoop: The Definitive Guide

Posted by admin December - 23 - 2014 ADD COMMENTS

Hadoop: The Definitive Guide

Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new

Price:

More Hadoop Products

Tags : Definitive, Guide, Hadoop, Big Data Analytics

Apache Hadoop YARN : Moving Beyond Mapreduce and Batch Processing with Apache…

Posted by admin December - 20 - 2014 ADD COMMENTS

IBM Big Data InfoSphere BigInsights for Hadoop Enterprise T-Shirt Size Large Blk

Posted by admin December - 16 - 2014 ADD COMMENTS

Some recent Big Data Enterprise auctions on eBay:

Tags : BigInsights, Data, Enterprise, Hadoop, InfoSphere, Large, Size, TShirt, Big Data Opportunities

YottaByte Me

Why Hadoop?

Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming…

Is it Time for Hadoop Alternatives?

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Amazon Elastic MapReduce Developer Guide

Why Is Everyone Beating Up On Hadoop?

Hadoop Operations

Hadoop Operations

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide

Apache Hadoop YARN : Moving Beyond Mapreduce and Batch Processing with Apache…

IBM Big Data InfoSphere BigInsights for Hadoop Enterprise T-Shirt Size Large Blk

Recent Posts

Recent Comments

Archives

Categories