Bittorrent
BitTorrent is a peer-to-peer file sharing (P2P) communications protocol. BitTorrent is a method of distributing large amounts of data widely without the original distributor incurring the entire costs of hardware, hosting and bandwidth resources. Instead, when data is distributed using the BitTorrent protocol, each recipient supplies pieces of the data to newer recipients, reducing the cost and burden on any given individual source, providing redundancy against system problems, and reducing dependence on the original distributor.
The protocol is the brainchild of programmer Bram Cohen, who designed it in April 2001 and released a first implementation on 2 July 2001. It is now maintained by Cohen's company BitTorrent, Inc.[1]
Usage of the protocol accounts for significant traffic on the Internet, but the precise amount has proven difficult to measure.
There are numerous compatible BitTorrent clients, written in a variety of programming languages, and running on a variety of computing platforms.
A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer is any computer running an instance of a client.
To share a file or group of files, a peer first creates a "torrent." This small file contains metadata about the files to be shared and the tracker, the computer that coordinates the file distribution. Peers that want to download the file first obtain a torrent file for it, and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.
Though both ultimately transfer files over a network, a BitTorrent download differs from a classic full-file HTTP request in several fundamental ways:
- BitTorrent makes many small P2P requests over different TCP sockets, while web-browsers typically make a single HTTP GET request over a single TCP socket.
- BitTorrent downloads in a random or "rarest-first" approach that ensures high availability, while HTTP downloads in a contiguous manner.
Taken together, BitTorrent achieves much lower cost, much higher redundancy, and much greater resistance to abuse or "flash crowds" than a regular HTTP server. However, this protection comes at a cost: downloads take time to rise to full speed because these many peer connections take time to establish, and it takes time for a node to get sufficient data to become an effective uploader. As such, a typical BitTorrent download will gradually rise to very high speeds, and then slowly fall back down toward the end of the download. This contrasts with an HTTP server that, while more vulnerable to overload and abuse, rises to full speed very quickly and maintains this speed throughout.
In general, BitTorrent's non-contiguous download methods have prevented it from supporting "progressive downloads" or "streaming playback". But recent comments by Bram Cohen suggests that streaming torrent downloads will soon be commonplace.
[edit] Creating and publishing torrents
The peer distributing a data file treats it as a number of identically-sized pieces, typically between 64 kB and 1 MB each. A piece size of greater than 512 kB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol [1]. The peer creates a checksum for each piece, using a hashing algorithm, and records it in the torrent file. When another peer later receives that piece, its checksum is compared to the recorded checksum to test that it is error-free.[2] Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder.
The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By convention, the name of a torrent file has the suffix .torrent
. Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which is used by clients to verify the integrity of the data they receive.
Completed torrent files are typically published on websites or elsewhere, and registered with a tracker. The tracker maintains lists of the clients currently participating in the torrent.[2] Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. This is implemented by the BitTorrent, µTorrent, BitComet and KTorrent clients through the distributed hash table (DHT) method. Azureus also supports a trackerless method that is incompatible (as of April 2007) with the DHT offered by all other supporting clients.
In November 2006, BitTorrent Inc. introduced its "Publish Torrent" service, which creates and hosts a torrent file (seeded from an existing web-hosted media file) and tracks downloads. The service (http://www.bittorrent.com/publish) requires a client that supports web-seeding (currently only the official client, Azureus and μTorrent).
Downloading torrents and sharing files
Users browse the web to find a torrent of interest, download it, and open it with a BitTorrent client. The client connects to the tracker(s) specified in the torrent file, from which it receives a list of peers currently transferring pieces of the file(s) specified in the torrent. The client connects to those peers to obtain the various pieces. Such a group of peers connected to each other to share a torrent is called a swarm. If the swarm contains only the initial seeder, the client connects directly to it and begins to request pieces. As peers enter the swarm, they begin to trade pieces with one another, instead of downloading directly from the seeder.
Clients incorporate mechanisms to optimize their download and upload rates; for example they download pieces in a random order, to increase the opportunity to exchange data, which is only possible if two peers have different pieces of the file.
The effectiveness of this data exchange depends largely on the policies that clients use to determine to whom to send data. Clients may prefer to send data to peers that send data back to them (a tit for tat scheme), which encourages fair trading. But strict policies often result in suboptimal situations, where newly joined peers are unable to receive any data (because they don't have any pieces yet to trade themselves) and two peers with a good connection between them do not exchange data simply because neither of them wants to take the initiative. To counter these effects, the official BitTorrent client program uses a mechanism called “optimistic unchoking,” where the client reserves a portion of its available bandwidth for sending pieces to random peers (not necessarily known-good partners, so called preferred peers), in hopes of discovering even better partners and to ensure that newcomers get a chance to join the swarm.