Finding needles in a 20 TB haystack, 200 million times per day.
Dr Eran Gabber, Google
Tel Aviv University, Tel Aviv
Monday - 17/5/2004, 12:00 - 14:00, Schreiber - 309
Abstract:
Google faces two large technical challenges: ensuring that our search results are as relevant as possible, and serving hundreds of millions of queries in a fraction of a second at a reasonable cost. To solve the first problem, we perform an offline matrix computation to produce PageRank, a query independent measure of page reputation, and combine it with more traditional query-specific scoring. To solve the distributed computing problem, we use tens of thousands of commodity PCs and highly fault-tolerant software. I will discuss some details of these solutions, and also share some interesting statistical tidbits about search and the web.
Now this looks promising!