Big Data - Guilherme Braccialli

articles, news and opinion about big data / data science

Wednesday, June 11, 2014

Biggest Hadoop environments

›
Some interesting numbers found in posts about biggest hadoop clusters: Yahoo:  32,000 nodes benchmarking environment: 300 nodes Twitt...
2 comments:

Hive: Materialized Queries / Memory Storage / Query Optimization

›
Worth reading, new proposals to boost hive performance using Materialized Queries and much more advanced in-memory resources / cache: Foll...

Video - Hadoop Founders (and competitors) discussion

›
This epic Beyond MapReduce panel explores what's driving new data processing models in Hadoop. Hadoop founders discuss how the competiti...
1 comment:

Python + Data Science - Quick Start Guide

›
Python is one of the most used language for Data Science. Where to start? IPython notebook is an interactive web-environment and scikit-...
1 comment:
Monday, June 9, 2014

Where Silicon Valley gets its talent

›
source: http://venturebeat.com/2014/06/06/this-map-reveals-exactly-where-silicon-valley-gets-its-talent/

HDFS Raid at Facebook

›
Facebook deployed is HDFS RAID, an implementation of Erasure Codes in HDFS to reduce the replication factor of data in HDFS. It maintains ...

Hive presentations at HadoopSummit 2014 San Jose

›
Very interesting hive presentations at Hadoop Summit 2014 - San Jose: 1- A Perfect Hive Query For A Perfect Meeting- Hive performance tuni...
1 comment:
Thursday, June 5, 2014

SAS University Edition - FREE for students

›
Now you can download a vmware with SAS software running totally functional and FREE for students. Features: - An intuitive interface that...
Tuesday, June 3, 2014

5 R's instead of 3 V's

›
5 R's: Relevant, Real-time, Realistic, Reliable, ROI source: http://www.mapr.com/blog/business-leaders-need-r%E2%80%99s-not-v%E2%8...

Dataviz - Languages

›
Languages of the world according Twitter: Wikipedia Languages: source: http://www.vox.com/a/internet-maps#list-37
Monday, June 2, 2014

Kaggle tips to avoid pitfalls in Machine Learning

›
"At Kaggle, we run machine learning projects internally and also crowdsources some projects through open competitions. We’ll cover the ...

Agile + Big Data

›
Interesting post about Agile + Big Data projects: http://strata.oreilly.com/2014/05/how-to-be-agile-with-your-big-data.html

Spark - difficulties

›
That's the first article I read about Spark talking about problems and difficulties. Special attention to tunning parameters: http://b...

R + Hadoop

›
Tutorial to set up R-Hadoop packages, making possible to execute R codes using map-reduce paradigm: http://www.rdatamining.com/tutorials/r...
Thursday, May 29, 2014

The 10 Algorithms That Dominate Our World

›
1. Google Search There was a time not too long ago when search engines battled it out for Internet supremacy. But along came Google and it...
1 comment:

Feature Selection - methods and algorithms

›
"Feature selection is often an important step in applications of machine learning methods and there are good reasons for this. Modern ...
Wednesday, May 28, 2014

Courses (MOOC) - Data Science

›
MOOC stands for Massive Open Online Courses. They became popular in 2012 with Coursera (most famous MOOC website/plataform). You can find ...

Deep Learning - Image tagging at Flickr

›
Batch (using Hadoop) and streaming (using Storm) image tagging at Flickr. article: http://code.flickr.net/2014/05/20/computer-vision-at-s...

Deep Learning - Skype real-time speech translation

›
"Microsoft will by the end of 2014 start offering on-the-fly language translation within Skype, firstly in a Windows 8 beta app and the...
1 comment:
Tuesday, May 27, 2014

Deep Learning - GPU + Neural Networks

›
Article about Andrew Ng's experiment (called google brain) using 16 computers with NVidia GPUs with performance compared to 1.000 comput...

Deep Learning - Google House Number in Street View

›
"Google can identify and transcribe all the views it has of street numbers in France in less than an hour, thanks to a neural network t...

Deep Learning - MIT find out what's happening in videos

›
"MIT researchers have developed an algorithm that learns what’s happening in videos by piecing together the things it sees into a compl...

Deep Learning - Facebook recognizes people

›
Article and paper about Facebook's deep learning recognizing people in images. links: http://gigaom.com/2014/03/18/facebook-shows-off...

91 job interview questions for data scientists

›
What is the biggest data set that you processed, and how did you process it, what were the results? Tell me two success stories about yo...
Monday, May 26, 2014

Prediction APIs - Automating Data Scientists Tasks

›
It's time to start automating data science tasks. Nowadays, most of data scientists spends too much time choosing best set of features...

List of skills

›
Just for fun... Somebody asked in a forum: What are some good resources for learning about distributed computing? The answer is a huge-...

Is there a Big Bubble?

›
No doubt we are on the top of Big Data Hype Cycle, but is there a bubble? Check-out this post: http://inside-bigdata.com/2014/03/10/big-d...
Friday, May 23, 2014

Google Papers and open source projects - where it all started

›
Most (if not all) open-source big data projects were inspired on Google's technologies after Google publishing papers describing how the...
1 comment:

Sampling-based Database

›
Everyone knows that the amount of data exploded, although technology also advanced, tasks involving exploration of petabyte datasets are not...
1 comment:

Typical steps of analytics projects

›
I read the article on:  http://inside-bigdata.com/2014/05/23/introduction-machine-learning/ It's not exactly an article, there are lot...

Webinar: Analyzing Data with Python

›
Upcoming webinar: Analyzing Data with Python "Python is quickly becoming the go-to language for data analysis, but it can be difficu...

Webinar: Java8 - Lambda

›
"There's a revolution calling! Lambda expressions are coming in Java 8 but how can developers benefit? We'll go through a serie...

Trick behind Google's Self-Driving Car

›
I confess I got a little bit disappointed reading the article below. Ok, it doesn't matter if they have tricks or not, a self-driving ...

Data Scientists to follow

›
Did you finished reading all the news, from all the blogs ? (not my case) If you have enough time, this link ( http://www.informationweek....

Large-scale Video Classification with Convolutional Neural Networks

›
Google paper about Video Classification using Neural Networks. paper:  https://plus.google.com/+ResearchatGoogle/posts/eqSPSviY2CH Look ...

Webinar: Data Analysis on Streams

›
Upcoming webinar: Data Analysis on Streams "Analyzing real-time data poses special kinds of challenges, such as dealing with large e...
Thursday, May 22, 2014

TED DataViz

›
using visualization to identify cultural connections among @TEDx source:  https://twitter.com/alexgiess/status/468832127283232768/pho...
Wednesday, May 21, 2014

50 Big Data Startups

›
1) Actian — Business-oriented data management solutions to transact, analyze and take automated action across business operations. They hav...
4 comments:

Top 10 billion-dollar tech startup founders

›
Airbnb ($10 billion) Ask CEO Brian Chesky about Airbnb, and he'll put it simply: "We are a hospitality company." Indeed, in t...
›
Home
View web version
Powered by Blogger.