Routing in P2P Overlay Networks

[Abstract] P2P (Peer-to-Peer) overlay network is one of the most popular distributed networks. It not only has advantages in scalability, but also can maximize the use of computing and storage resources of each terminal nodes. In the last ten years, P2P system is a hot topic in IT field. Recently, most researchers study structed P2P system, which is better than the former P2P structure. This paper focus on the routing algorithms based on DHT in second generation of P2P overlay networks.

1.Introduction

The emergence of P2P system is the consequence of the rapid development of Internet. The world’s population is increasing, and technology is also improving. The demand for Internet services is also expanding. Traditional network can only expand its service scope by increasing the number of servers, while the purchase and maintenance of servers will increase the costs of service providers and consume a lot of resources. In addition, the single server needs to extend the network bandwidth to expand the number of services. This is also one of the limiting factors in the traditional network itself.

Peer-to-Peer system is an instance of distributed system. Unlike traditional network architecture, there is no concept of centralized control in P2P system. In other words, each terminal node in this type of network is both the service provider and user. Peer-to-peer systems aim to support useful distributed services and applications using data and computing resources available in the personal computers and workstations that are present on the Internet and other networks in ever-increasing numbers [1].

In the P2P system, resources are stored in the nodes of the network. The structured overlay network is responsible for routing between any two P2P nodes. In the first generation of structured overlay network algorithms, each node records all other nodes’ pointer (one hop overlay), so that most of the message communication can be completed directly. These algorithms are suitable for small scale networks. In the second generation of structured overlay network algorithms, each node only records a small number of pointers of other nodes. The algorithms ensure that the message routing is completed within a certain number of hops. Typical algorithms such as Chord, Pastry, Tapestry and CAN. This paper mainly introduces the second generation of structured overlay network algorithm, then analyzes and compares its performance.

2.Routing Overlays

Routing overlay is a distributed algorithm in P2P system which takes responsibility for locating nodes and objects. Peer-to-peer systems usually store multiple replicas of objects to ensure availability. In that case, the routing overlay maintains knowledge of the location of all the available replicas and delivers requests to the nearest ‘live’ node (i.e., one that has not failed) that has a copy of the relevant object [1].

2.1.DHT

In P2P overlay networks, each object has unique GUID (Globally Unique Identifier). This value is calculated by a hash function (such as SHA-1) according to all or part of the state of the object. Because of this, routing overlay networks are sometimes called distributed hash tables (DHT).

In DHT model, a data item with a GUID of X will be stored in a node whose GUID is closest to X in value. Copies of the data item will also be stored on R hosts. The GUID of these hosts is last close to X, and R is the replication factor to ensure high availability.

The GUIDs cannot be read artificially, so you must get the object’s GUID through some indexing services. Then, the P2P system will find the location of these objects in the overlay networks through the following routing algorithms.

2.2.Pastry

Pastry is a network structure using DOLR (Distributed Object Location and Routing) technology. Messages are routed according to the keywords provided. In pastry, each node is assigned a 128 bits ID which is generated by the unified hash function according to the IP address or public host name of the node.

Each node has a routing table, a set of neighboring nodes and a set of leaf nodes. The IP address of the node is included in the routing table, and the first n bits of the ID of these nodes are the same, where n is the number of rows of the entry in the routing table. The IP addresses listed in the neighbor node set, and the corresponding IDs of these nodes are very similar to those of the nodes with the routing table. The ID of the node in the leaf node set is smaller than that of the node. Both routing table and leaf node set are used to route messages, but neighbor node set is only used to keep nodes locatable. When a node receives a message that it should route, it first checks the leaf node set, followed by the routing table. With the progress of routing, the ID gradually approaches the ID of the target node, and finally the message arrives at the target node.

2.3.Tapestry

Tapestry is another network structure using DOLR technology. Each tapestry node contains pointers to other nodes and maps between the object’s GUID and node ID. Queries are routed along nodes of adjacent links until appropriate object pointers are found.

Tapestry network is a multi-node overlay network. Each tapestry node contains links to the set of neighbors with the shared node ID prefix. The neighbor nodes of a node constitute the neighbor node set. All the sets come together to form a routing table. When you want a node to publish an object, a mapping message will be sent to the target node whose node ID is closest to the object ID. When a node searches for an object, it is not necessary to find the target node that is close to or matches the object. It is only necessary to find the node that has the location information of the object.

2.4.Chord

Chord algorithm is very simple. It uses a keyword to identify the file and stores the file on the node corresponding to the keyword. Chord stores keywords on the corresponding nodes in chord by using the consistent hash function. Each node in chord only needs to know the routing messages of a few other nodes. This is because the routing table in chord is decentralized, and each node gets path information by communicating with a few other nodes.

2.5.CAN

CAN (Content-Addressable Network) is also a network structure that maps keywords to nodes. CAN uses multidimensional identifier space to implement DHT algorithm. CAN maps all nodes into an n-dimensional Cartesian space and allocates a region to each node as evenly as possible. CAN uses hash function to hash the K in (K, V) pair to get a point in Cartesian space, and stores the (K, V) pair in the node which has the region of the point. The routing algorithm adopted by CAN is relatively direct and simple. After finding out the coordinates of the target point, it will send the request to the node whose coordinates are closest to the target point.

Each node in CAN system maintains a routing table, which stores the IP address and coordinate area of adjacent nodes. When looking for messages, the node routes the lookup message to the neighbor node closest to the coordinate of the target node.

3.Comparison

In this section, the paper will compare overlay networks with traditional networks and different overlay networks algorithms we mentioned in section 2.

3.1.Comparison of Overlay Networks and Traditional Networks

When it comes to the traditional network, the first thing to think of is the TCP/IP model. First, in terms of scale, the overlay network allows more usable space. Second, the placement of overlay network objects can be randomized, unlike the traditional network traffic model, which is associated with network topology. Thirdly, the update speed of overlay network routing table is much faster than that of traditional network. Fourth, in traditional networks, the target node is usually unique. While in overlay networks, it is enough to find the nearest copy of the target object. Finally, because the biggest characteristic of P2P network is decentralization, it has advantages in security and privacy compared with traditional networks.

3.2.Comparison of Different Routing Algorithms Based on DHT

Obviously, the management of P2P system based on centralized directory mechanism is simpler. The bandwidth cost of maintaining the network is also smaller. However, because the server needs to centrally manage all the node information of the whole network, the performance and network liaison of the server will become the bottleneck of the whole system. The robustness of distributed P2P system is strengthened, and it will not cause the network paralysis because of the failure of a few nodes. Neither Napster nor Gnutella can ensure that the target node is found through an attempt. Servers in Napster may be paralyzed, and file location failure may occur due to the limit of flooding times in Gnutella. The DHT algorithm of structured coverage network can solve these problems.

Several algorithms based on DHT are similar in scalability, distribution, load balancing and self-organization. But it is different in some ways. Chord provides a naming mechanism, the consistency problem when nodes join and the treatment when nodes fail are relatively perfect. Chord can be used in large-scale file sharing network, time sharing effective storage system and large-scale distributed computing platform. Chord, Pastry and Tapestry all have the same number of search hops, and the length of CAN search path is relatively longer, so it is not suitable for voice phone and real-time information service. However, due to the good scalability index mechanism of CAN network, it can effectively support content insertion and retrieval.

The following table comprehensively compares the four algorithms mentioned in this paper from several aspects.

Algorithms Insertion complexity Spatial complexity Average searching hops
Pastry O(logb⁡N) O(logb⁡N) O(logb⁡N)
Tapestry O(logb⁡N) O(Nlogb⁡N) O(logb⁡N)
Chord O(log2⁡N) O(Nlog2⁡N) O(log2⁡N)
CAN O(d) O(d) dN1/d

Illustration:

  • b is the length of the identifier.
  • N is the scale of the network.
  • d is the dimension of coordinate space.

4.Conclusions and Future Directions

At this stage, the main research work is around the current algorithm, which shows that the current routing and location algorithms need to be further improved. The delay of searching in chord network and the fault-tolerant mechanism in case of failure nodes need to be improved. When malicious nodes appear in can network, the validity of data also needs to be improved. Now a lot of research is to improve the existing Chord, Pastry, Tapestry and CAN network algorithm. But at present, there are also some new network structures and corresponding algorithms based on the existing network algorithms and combining the advantages of several algorithms.

Another research hotspot is to improve the performance of overlay network by using the physical distribution characteristics of nodes. Because in the above four network structures, the logical ID of the node in the overlay network is obtained by hashing according to the information of the node, which basically ignores the physical distribution characteristics of the node. Many new studies have improved and perfected this aspect.

There are two directions for successful P2P applications. One is blockchain applications, such as bitcoin. The other is file sharing applications, such as BitTorrent. The application prospect of overlay network is very bright, and the problems to be solved are also prominent and urgent. Therefore, further research in this field will be very valuable.

5.References

  • [1] Coulouris, G. , Dollimore, G. , Kindberg, J. , & Blair, T. . (2012). Distributed Systems: Concepts and Design (5th Edition).

  • [2] Chen, G. , Xu, C. Z. , Shen, H. , & Chen, D. . (2003). P2P Overlay Networks of Constant Degree. Grid & Cooperative Computing, Second International Workshop, Gcc, Shanghai, China, December, Revised Papers.

  • [3] Stoica, I. , Morris, R. T. , Karger, D. , Kaashoek, F. , & Balakrishnan, H. . (2001). Chord : a scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review, 31.

  • [4] Peng, X. Y., Yang, S. B., & Chen, D. F.. (2004). Mcan: A Scalable Modified Content -Addressable Network. Computer Science (11), 130-134.

  • [5](2006). Analysis of structured P2P overlay networks algorithms based on DHT. Journal of Chongqing University, 000(0z1), 131-134.

  • [6] Yang, Y. H. . (2014). Compare with Algorithms of P2P Resource Location based on DHT. Information Security and Technology.

文章作者: ColdSnap
文章链接: https://coldwave96.github.io/2021/09/17/RoutingInP2P/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 ColdSnap の Blog