According to Niall Kennedy’s Weblog:
Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters. The average MapReduce job ran across approximately 400 machines in September 2007, crunching approximately 11,000 machine years in a single month.
Footnote: 20 PBytes = 20,000 TBytes or 20,000,000 GBytes
Google processes its data on a standard machine cluster node consisting two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. This type of machine costs approximately $2400 each through providers such as Penguin Computing or Dell or approximately $900 a month through a managed hosting provider such as Verio (for startup comparisons).
The average MapReduce job runs across a $1 million hardware cluster, not including bandwidth fees, datacenter costs, or staffing.
KKai: In short, that is a lot of data, a lot of servers, a lot of manpower or machine-power, a lot of energy consumption, and of course, a lot of costs. Sorry about using the term, "a lot of", a lot.
Right now, the largest harddrive on the market is only 1 TBytes or 1,000 GBytes.
By the way, I don’t know if the data includes youtube’s traffic volume. If it does, I think youtube may contribute to the majority of data/traffic. I wish to see what constitutes of this amount of data, and the percentage of data in different categories. I also know that Google backs up information on the web. Therefore, it can be either youtube or the backup data of the web that is the biggest contributor to this amount of data. However, because I don’t know the actual data, I can only speculate. =D
Source: Google processes over 20 petabytes of data per day