ehxz 发表于 2008-1-22 22:30:29

eBay 的数据量

作为电子商务领头羊的 eBay 公司,数据量究竟有多大? 很多朋友可能都会对这个很感兴趣。在这一篇
Web 2.0: How High-Volume eBay Manages Its Storage(从+1 GB/1 min得到的线索) 报道中,eBay 的存储主管 Paul Strong 对数据量做了一些介绍,管中窥豹,这些数据也给我们一个参考。

站点处理能力
平均每天的 PV 超过 10 亿 ;
每秒钟交易大约 1700 美元的商品 ;
每分钟卖出一辆车A ;
每秒钟卖出一件汽车饰品或者配件 ;
每两分钟卖出一件钻石首饰 ;
6 亿商品,2 亿多注册用户; 超过 130 万人把在 eBay 上做生意看作是生活的一部分。
在这样高的压力下,可靠性达到了 99.94%,也就是说每年 5 个小时多一点的服务不可用。从业界消息来看,核心业务的可用性要比这个高。

数据存储工程组控制着 eBay 的 2PB (1Petabyte=1000Terabytes) 可用空间。这是一个什么概念,对比一下 Google 的存储就知道了。每周就要分配 10T 数据出去,稍微算一下,一分钟大约使用 1G 的数据空间。

计算能力
eBay 使用一套传统的网格计算系统。该系统的一些特征数据:
170 台 Win2000/Win2003 服务器;
170 台 Linux (RHES3) 服务器;
三个 Solaris 服务器: 为 QA 构建与部署 eBay.com; 编译优化 Java / C++ 以及其他 Web 元素 ;
Build 整个站点的时间:过去是 10 个小时,现在是 30 分钟;
在过去的2年半, 有 200 万次 Build,很可怕的数字。
存储硬件
每个供货商都必须通过严格的测试才有被选中的可能,这些厂家或产品如下:
交换机: Brocade
网管软件:IBM Tivoli
NAS: Netapp (占总数据量的 5%,2P*0.05, 大约 100 T)
阵列存储:HDS (95%,这一份投资可不小,HDS 不便宜, EMC 在 eBay 是出局者) 负载均衡与 Failover: Resonate ;

搜索功能: Thunderstone indexing system ;
数据库软件:Oracle 。大多数 DB 都有 4 份拷贝。数据库使用的服务器 Sun E10000。另外据我所知, eBay 购买了 Quest SharePlex 全球 Licence 用于数据复制.


应用服务器

应用服务器有哪些特点呢?

使用单一的两层架构(这一点有点疑问,看来是自己写的应用服务器)

330 万行的 C++ ISAPI DLL (二进制文件有 150M)

数百名工程师进行开发

每个类的方法已经接近编译器的限制

应用服务器

应用服务器有哪些特点呢?

使用单一的两层架构(这一点有点疑问,看来是自己写的应用服务器)

330 万行的 C++ ISAPI DLL (二进制文件有 150M)

数百名工程师进行开发

每个类的方法已经接近编译器的限制

非常有意思,根据eWeek 的该篇文档,昨天还有上面这段划掉的内容,今天上去发现已经修改了:

架构
高分布式
拍卖站点是基于 Java 的,搜索的架构是用 C++ 写的
数百名工程师进行开发,所有的工作都在同样的代码环境下进行
可能是被采访者看到 eWeek 这篇报道,联系了采访者进行了更正。我还有点奇怪原来"两层"架构的说法。

其他信息
集中化存储应用程序日志;
全局计费:实时的与第三方应用集成(就是eBay 自己的 PayPal 吧?)
业务事件流:使用统一的高效可靠消息队列. 并且使用 Cookie-cutter 模式用于优化用户体验(这似乎是大型电子商务站点普遍使用的用于提高用户体验的手法)。
后记
零散作了一点流水帐。作为一个 DBA, 或许有一天也有机会面对这样的数据量。到那一天,再回头看这一篇电子垃圾。
更新:更详细信息请参考:Web 2.0: How High-Volume eBay Manages Its Storage。可能处于 Cache 的问题,好几个人看到的原文内容有差异

转自:http://www.dbanotes.net

ehxz 发表于 2008-1-22 22:30:47

Web 2.0: How High-Volume eBay Manages Its Storage

2006-10-27
http://www.eweek.com/c/a/Storage/Web-20-How-HighVolume-eBay-Manages-Its-Storage/

The ultrapopular auction/sales Web site continues its exponential growth and finds itself adding 10 terabytes of new storage every week. That's a lot of data.eBay, like Xerox and Google, is fast becoming its own generic verb for what it does ("Oh, just eBay it"). And when a company itself becomes the name for what it does, then a certain level of success has been reached.

Make that a very high level of success. Among Web 2.0 companies, San Jose, Calif.-based eBay is up there with Google, Amazon, Yahoo, eHarmony, Digg.com, and social networking sites MySpace.com and FaceBook.com as far as traffic, popularity and profitability are concerned.

Some of the facts and figures -- according to eBay itself -- about the worlds largest auction business and most popular commercial Web site are downright staggering:


The site averages more than 1 billion page views per day.
Users trade about $1,700 worth of goods on the site every second.
26 billion SQL queries per day.
A vehicle sells every minute.
A motor part or accessory sells every second.
Diamond jewelry sells every 2 minutes.
The site currently posts about 600 million listings per quarter and about 204 million registered users.
This one, in particular, is striking: 1.3 million people make all or part of their living selling on eBay.

To put this into a little perspective, FDRs two largest New Deal jobs programs, the Civilian Conservation Corps and the Civil Works Administration, employed a total of 6.5 million workers in the 1930s.


eBay is handling all those transactions, Web site page views and money changing hands on a near-no-latency, 24/7, international basis. The sites availability has been charted at 99.94 percent per day (a hiccup of about 50 seconds per day). When it was charted in June 1999, the site was latent an average of 43 minutes per day.

How eBay got to this level of technical efficiency and success is a long story. What we can do is offer an overview about how the company approaches its storage strategy—something that the company hasnt talked about with the media before.

Not many Web-based businesses have run into the kind of traffic and server-availability issues that eBay has experienced.

"Our growth has just been exponential for 11 years," eBay Research Labs Distinguished Engineer Paul Strong told eWEEK. "And since our job is to provide available, efficient, low-latency, 24/7 performance, we know we have a difficult job to do every day to keep the site running as perfectly as we need it to run."
eBays storage engineering team ("Eleven people," Strong said) utilizes 2 petabytes of raw digital space on a daily basis to run the site and store its data, yet has to add about 10 terabytes (or 75 volumes) of new storage every week to cover new transactions, Strong said.

That follows alongside the eHarmony story: That highly successful social networking site has to purchase additional storage about every 90 days.

eBay said it uses a traditional grid computing system with the following features to build the site:


about 170 Win2000/Win2003 servers
about 170 Linux (RHES3) servers
three Solaris servers: build and deploy eBay.com to QA; compile Java & C++; consolidate/optimize/compress XSL, JS and HTML
time to build site: was once 10 hours; now only 30 minutes
in the last 2.5 years, there have been 2 million builds.
Then, the content is deployed to a system of about 15,000 servers.

eBay uses a number of different products in its storage setup, including switches from Brocade, software framework from IBM Tivoli, NAS (network-attached storage) hardware from NetApp (5 percent of the system) and large arrays from Hitachi Data Systems (95 percent of the system), Strong said. It also runs Oracle DB, he said.

"Oh, Im sure Im leaving somebody out. Theres probably something from each of the major storage manufacturers somewhere in our system," Strong said.

eBay maintains four copies of most of its databases, according to Strong.

eBays main data centers are spread out over the continental United States, and it also has co-locations all around the world, he said.

Becoming a trusted eBay supplier is not an easy task, according to Strong. "It takes a long time for a company to prove itself enough for us to use them," Strong said. "There is a great deal of testing that goes on before we selected a vendor, for anything."

The storage environment is modular in design, so to add incremental storage containers or servers is a merely difficult—but not daunting—exercise.

"However, we really hesitate to add new brands of software and hardware, if possible," Strong said. "Wed love to be in a position where everything appears homogenous, so that it minimizes the skill sets our engineers need."

eBay home-cooks some of its own software to use within its system—customized specifically for the online auction/sales environment and eBays unique business needs, Strong said.

eBays structure is, according to Strong:


highly distributed
the auction site is Java-based; the search infrastructure is written in C++
have hundreds of developers, all working on the same code
To keep scaling with the growth of the business, Strong said eBay uses:


Centralized Application Logging, a scalable platform for logging fine-grained application information
Global billing: real-time integration with a third-party package
Business event streams: a unifying technology for efficient and reliable message queues. Cookie-cutter patterns are used within the system for optimal user experience
Reliable multicast infrastructure: allows for distributed analysis of massive amounts of data and keeps the companys growing search infrastructure up to date.
When it comes to handling all that traffic, Strong said the worlds time zones provide a kind of natural "load-balancer."

"When were busiest here in the U.S., thats generally when Europe is asleep—and vice versa," Strong said. "Although we have surges now and then, the natural divide between the two continents works well for us."
页: [1]
查看完整版本: eBay 的数据量