Replica Selection Algorithm in Data Grids: The Best-Fit Approach

The design of Data Grids allows grid facilities to manage data fi les and their corresponding replicas from all around the globe. Replica selection in Data Grids is a complex service that selects the best replica place amongst several scattered places based on quality of service parameters. All replica selection algorithms look for the best replica for the requesting users without taking into account the limitation of their network or hardware capabilities to fi nd the best fi t. This leaves capable users with limited ability to connect with the best replica places without fully utilizing their download speed. It furthermore compromises the best replica places and shifts capable users to lower quality replica places and degrades the whole Data Grid environment. To improve quality of service parameters the solution we propose is, a matching algorithm that matches the capabilities of grid user with replica providers that are the best fi t. This best-fi t approach takes into account both the capabilities of grid users and the capabilities of replica places and creates matches of almost similar capabilities. Simulation results proved that the best-fi t algorithm outperforms previous replica selection algorithms.


INTRODUCTION
Data Grids are a great mechanism for problem solving in virtual organizations [1]. The emergence of Grid Computing was crucial facilitating groundwork for many disciplines such as engineering, science, earth sciences, high energy physics, astronomy and molecular biology. Grid computing has the possibility to back diff erent types of applications. For example, data-intensive applications, computer-intensive applications and the applications demanding scattered services. Three sorts of Grids advanced to provision these applications. They are characterized as Service Grids, Computational Grids and Data Grids. Data grids are anticipated to be the solution to the huge data storage issue and computational power issue of numerous current scientifi c projects. The evolving move in scientifi c applications in various fi elds like climate simulation [2], high-energy physics and data mining, demonstrates that such applications manipulate and yield enormous amounts of data [3]. These rustling huge data needs to be put in storage for additional exploration and shared with scholars collaborating within the scientifi c community who are scattered everywhere in the world.
Replica Selection [4] is a mechanism to choose the best replica place amongst several replica locations according to quality of service (QoS) parameters. There are several QoS parameters like response time (RsT), security, availability, reliability, and cost are very important and have crucial impact on the Grid environment as explained in [1,[5][6][7]. RsT or for simplicity time is a vital element that eff ects the replica selection and thus the job turnaround time. Earlier replica selection algorithms addressed time QoS parameter as the only parameter, and dedicated or shed on estimating it and put all the eff orts in selecting the replica place with fastest replica movement, from source to sink.
However, selecting the best replica place to a user who is not capable of utilizing the full speed replica transfer due to limitation of his network or hardware capabilities compared to the selected location, prevent other capable users from fully utilizing it, and switches them to a poorer quality grid sites. As a result, this situation demonstrates the bad impact on the capable users' performance and this will impact the whole data grid environment. Therefore, in this research, the replica selection is addressed from both sides, the sender and the receiver. It is, in all situations, better to choose a replica location that is consistent with or similar to the receivers capabilities. The previous mentioned case converts the problem from selecting the best replica location to selecting the best-fit replica location in order to improve the whole grid environment. So, the proposed algorithm takes into account two variables to decide the best replica location first it considers the capability of the user and second the capability of the data grid site. This algorithm is titled The best-fit algorithm (BFA). In this research, suitable replica and best-fit replica will be used interchangeably.
The rest of the paper is organized as follows. Section II describes the related work in replica management. Section III presents the system design while Section IV shows the performance evaluation. Section V presents the results and discussion. Finally the conclusion and future work are given.

RELATED WORK
The challenge in replica selection research began by considering the RsT as the only QoS factor to address. RsT is the amount of time required to transfer the replica from source to sink's local storage where the running task is being executed [8]. In this context many researches addressed the problem of finding the best replica place that show minimum RsT. However; since the RsT cannot be computed in advanced the main challenge was how to estimate the RsT [9] as there are many factors that play a role in estimating RsT. These include maintaining the previous RsT histories experienced by grid users.
Early replica selection paradigms [10] intended to choose the closest replica location to the grid user based on some static metric factors like : topological distance based hop counts, geographical distance in miles and utilize it for future prediction. On the other hand, authors of [11] utilize probing messages that sent from replica sites to grid user to first check the availability of the machine that holds the replica and second to find the grid site that shows shortest RsT to conclude it as the best replica location. Nevertheless, these approaches ignored the dynamic network nature which make these static metrics not adequate estimators to anticipate RsT. Authors of [8] argued that RsT is the sum of storage access latency (SAL), transfer time and request waiting time in the queue.
Dynamic paradigms for selecting the best replica [2,9,12] have arisen to enhance the estimated RsT anticipated by the data-grid-users, according to calculations of the network criteria, like: the hosted server latency for the request and the bandwidth of the network. A smart forecast founded on historic system logs is utilized to select the best replica location that shows the slightest transfer time. These techniques relay on data grid services to observer the resource powers and the conditions of the network, like the Grid Resource Information Services (GRIS) and the Network Weather Service (NWS) [13]. Authors of [2] during the run time, have used the bandwidth of the networks to automatically decide on the proper site that holds the replica. Definitely, this approach adjusts based on bandwidth fluctuations. On the other hand, the estimation tool of the researchers of [12] have relayed on only logs originated form GridFTP. However authors of [14] clarified why the GridFTP is not adequate for RsT estimation, instead a regression method has been constructed to estimate the time to move the best replica to the required location utilizing grid services : I/O Disk, NWS, GridFTP. Moreover researchers of [4,7] have incorporated the RsT with SAL where storage latency past history and data transfer time are utilized as a forecaster of future SAL but future forecast for SAL cannot be very precise due to grid resources dynamicity like storages as they fluctuate or upgrade as time passes.
For instance, the best replica place chosen from a given storage will not be the best place after a period of time due to utilization by the same or other grid user. However, techniques rely on historic information are more applicable in a steady grid environment. On the other hand authors of [8] take into account two new parameters firstly storage media specification which differ in speeds from one type to other or from brand to brand [15] the speed is measured as an I/O data transfer rate. It is well known that the tape drive is slower than the hard disk and the hard disks or the tape drives have different types or speeds. Secondly, taking into account orders in que as utmost mass storage devices can't take more than one order simultaneously as a result the incoming orders have to wait in ques before being served. In fact the thousands of orders to a certain storage device in grid environment, so the orders are queued in a storage handler queue which effect the RsT [16].
Furthermore, strategies [17,18] utilized parallel replica movement to escalate RsT where the needed replica is transferred concurrently from all the grid sites that held the required replica. In these techniques the needed replica is divided into parts and each part is moved from the available data grid site. The researchers of [18] introduced a novel replica transference technique labelled as rFTP that fetches replica fragments simultaneously while authors of [17] introduced three fetching approaches that are: matching with prediction approach, greedy approach and uniform approach. In matching with prediction approach, every replica site is in charge of a non-fixed quota of fragments that are consistent with its past performance saved in the log files. On the other hand, in the greedy approach, the needed data file is divided into fragments and each data grid site is assigned one fragment while in the uniform approach the needed replica is divided into equally fixed sized fragments consistent with the available number of data grid sites. However, concurrent approaches are suitable when there are few replica orders and lots of replica sites but usually the vice versa happened huge number of orders and a limited replica locations.
Finally, as mentioned above several researches integrated a number of QoS parameters a part of RsT such as reliability, security, availability in the selection process [1,[19][20][21][22][23][24]. In addition to these QoS parameters some works also added users preferences to guide the selection process [6,25]. Moreover, a recent work went for group decision making by considering multiple users preferences simultaneously prior to assign users to replica grid sites [6].

SYSTEM DESIGN
The structure of data grids is built into two levels lower and upper. The lower one provides fundamental services and the upper consists of a high-level service that supply fundamental services in the lower level. The new proposed algorithm is amongst the high level services which utilize several fundamental services. BFA works by obtaining the data grid users order from the Resource Broker (RB) and questions the Replica Location Service (RLS) for the associated physical replica title and their places. BFA receives the data grid nodes interrelated situation and the status of the network from GRIS [13] like GridFTP, Monitoring and Discovery Service (MDS) and NWS. Consequently, the best-fit place is chosen for the grid user's task. In fact, the best-fit replica place in this research means the place with specifications that can send the file to the grid user in a speed that the user can absorb without delay or without a congestion at the user site. Figure 1, presents an overview of BFA and its related entities. Hence, BFA is a high level, dynamic , optimization service, where the best-fit replica place for a certain data grid user may not be the best-fit replica place for other grid users or the same user as time passes and this is because the dynamic nature of data grid resources. To select the best-fit replica place the algorithm behaves as follows: 1. Obtain the tasks from the RB. 2. Collects the places of the replicas from RLS. 3. Collects historical logs from the system files.

Collects the instant values of QoS parameters
like the bandwidth from the information service provider like GridFTP, MDS and NWS. 5. Evaluate each replica place based on its RsT and assign it a rate value. 6. Evaluate the requesting user QoS parameters and assign it a RsT value. 7. Sort replica places based on their RsT values in an ascending order. 8. Select the best-fit replica place for the underlying user, which shows a RsT of equal or greater value. 9. Record the latest information concerning the data transmission speed into the historic logs.
RsT in this study is the total of three parameters that are: Transfer time (TT), Storage access latency (SAL) and the waiting time in the queue (WTQ) and is calculated by the following equation: RsT = TT + SAL + WTQ (1) TT denotes the replica movement through the network, that rely on the network bandwidth and the file size [16] obtained by the formula: The important role of the operating system is to schedule the I/O requests in a manner that enhance the system performance [26].
Scheduling considers queued requests for the storage device. Therefore, the number of requests in queue and the speed of the storage device and have a significant influence in the average RsT.
As a result, SAL is time delay required by the storage device to reply to an order and it is relevant to the file size and storage speed. Consequently, larger replicas yields higher SAL which can be calculated by the following formula: Typically, orders arrive to each storage device that can't serve them simultaneously as it can serve one order at a time, hence several orders queued forming a waiting in the queue. Therefore, the current order has to wait for all former queued orders. The time needed for the underlying order that is the first one in the queue of order is the same SAL time, so has to wait for the sum of SALs of the preceding orders in the queue. Accordingly, WTQ is calculated by the following formula: (4) where: n -number of orders waiting in the queue preceding the current order.

PERFORMANCE EVALUATION
A collection of simulation tools available to support the data grids structures [27]. For example MicroGrid, ChicSim, Monarc, OptorSim, SimGrid and Bricks. But, after carrying techniques [3,28]. Accordingly, this research adopted out a comprehensive search on parallel and distributed algorithms simulation tools the conclusion is OptorSim simulator is the most suitable one for the proposed algorithm as it mainly focuses on replica choice tactics and data replication OptorSim by performing tiny amendments to make it more fit to the proposed algorithm.

Simulation Setup
In order to assess the performance of replica selection methods the OptorSim simulator was built and was unlike other task scheduling algorithms. OptorSim consists of several elements to attain realistic grid environment such as Computing Elements (CEs) where tasks are directed, Storage Elements (SEs) where data is saved. The network elements to link data grid nodes. Like real grids the bandwidth between the grid nodes is embodied in the simulation. The other elements are the Resource Broker (RB), that assigns tasks to grid nodes based on the scheduling technique. Where Replication Manager (RM) contributes to the replication optimization techniques. To be consistent with real grids, OptorSim imitate the real EU DataGrid configuration and topology. The topology includes 20 grid nodes in Europe and the USA that have been utilized during the data production form of CMS experiment [28]. FNAL and CERN sites create the huge data and save them locally with a storage volume of 100 GB each and other grid nodes have a minimum of one CE with a storage volume of 50 GB each.

Performance metrics
The roles of grid users takes place by sending their tasks to RB that searches for the best grid location to carry out the task. However, usually these tasks need data files so the optimizer role is to get the best place of the needed files for the tasks. Nevertheless, the task needs to wait in the queue and requires some time to be accomplished.
As a result, the task's duration begins when the RB sends the task and completed once the task's accomplished. This duration is known as task turnaround time and comprises the RsT. The best-fit replica selection based on the new proposed algorithm decreases the RsT and accordingly decreases the task turnaround time. For that reason, the Mean task Turnaround Time (MTTT) is an appropriate performance metric that assesses the performance of the proposed algorithm and is calculated by the following formula:

RESULTS AND DISCUSSION
MTTT is the proposed metric to evaluate and compare the new algorithm and it stands for the average of the whole times needed to complete all the tasks submitted to the grid. Based on the fact that the sizes of the replicas and the number of the tasks (workload) impact the replicas transfer time, the performance of the new algorithm has been evaluated through nine different cases, by changing the size of the replicas and the number of the tasks each time.
In the first case, small replicas has been used with sizes range between 100 to 1000 MB and workload 600 tasks while the second case with a workload of 1200 tasks and the third case with a workload equal to 1800 tasks as shown in Table  1. Cases four to six are similar to one to three but with medium replicas with sizes range between 1 to 10 GB as shown in Table 2. Cases seven to nine are similar to the previous ones but with large replicas with sizes range between 10 to 100 GB as shown in Table 3. Each case has been experimented ten times each time the replica sizes and sites QoS are varied randomly. The experiments carried out utilizing the proposed algorithm (BFA) and the previous algorithm (PA) in order to choose the best replica place that has the lowest transferal time and already included in OptorSim [2,18]. The algorithms do not make replication or caching, instead they read the chosen replicas remotely. The simulation results demonstrated that the MTTT experienced by the proposed algorithm is less than MTTT experienced by the previous algorithm for all cases as presented in Tables 1 to 3 and Figure. 2. Additionally, the efficiency of the proposed algorithm is calculated to prove its superior performance over the previous one using the following formula: Table 1 demonstrates that in all the experiments BFA performs better than PA. The efficiency of BFA is 15.6% at its best and 14.3% in its worst case when the number of tasks is 600 while the efficiency ranges between 13.9% and 15.6% when the number of tasks is 1200 and ranges between 12.8% to 15.3 when the number of tasks 1800. The efficiencies in all experiments is slightly different which indicates the robustness of the proposed algorithm. BFA performs better in all scenarios, where the response time is reduced, the task turnaround time is reduced respectively. Furthermore, the proposed algorithm can scale up from hundreds to thousands of tasks. Table 2 and 3 demonstrate  almost similar results to Table 1, which indicates scalability in terms of replica size in addition to scalability in terms of number of tasks which proves that BFA always outperforms PA.

CONCLUSION
The response time is the only variable that has been addressed when choosing the best replica place for the running task. Simulation results proved that the proposed algorithm surpasses the previous algorithm and can deliver the replicas to grid tasks in the shortest time. With a significant increase in performance of 12.8-16% decrease in the time needed to complete all the tasks submitted to the grid. Additionally, we found that the  best performance (≈16% decrease in turnaround time) was exhibited when there was a larger number of tasks and the highest number of tasks. This in turn decreases the turnaround time for all jobs improving the whole data grid environment. The proposed algorithm can be incorporated in a real grid middleware like Globus. Our future work will focus in implementing the proposed algorithm on cloud computing.