See on hortonworks.com
[W]hat if you want to schedule both memory and CPU, and you need them in possibly different and changing proportions? If you have 6GB and three cores, and I have 4GB and two cores, it’s pretty clear that I should get the next container. What if you have 6GB and three cores, but I have 4GB and four cores? Sure, you have a larger total number of units, but cores might be more valuable. To complicate things further, I might care about CPU more than you do.
See on blog.cloudera.com
Hadoop 2.0 underscores the fact that it is becoming a de facto ecosystem for Big Data developers across the globe. Got an idea for a new app or web service? Build it on top of Hadoop. Take the core product in a different direction. If others find that app or web service useful, expect further development on top of your work.
See on smartdatacollective.com
Google didn’t stop with MapReduce, but they developed other approaches for applications where MapReduce wasn’t a good fit, and I think this is an important message for the whole Big Data landscape. You cannot solve everything with MapReduce. You can make it faster by getting rid of the disks and moving all the data to in-memory, but there are tasks whose inherent structure makes it hard for MapReduce to scale.
Open source projects have picked up on the more recent ideas and papers by Google. For example, ApacheDrill is reimplementing the Dremel framework, while projects like Apache Giraph and Stanford’s GPS are inspired by Pregel.
See on architects.dzone.com
Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.
See on jameskaskade.com