REDUCING TRANSFER COSTS OF FRAGMENTS ALLOCATION IN REPLICATED DISTRIBUTED DATABASE USING GENETIC ALGORITHMS

Distributed databases were developed in order to respond to the needs of distributed computing. Unlike traditional database systems, distributed database systems are a set of nodes that are connected with each other by network and each of nodes has its own database, but they are available by other systems. Thus, each node can have access to all data on entire network. The main objective of allocated algorithms is to attribute fragments to various nodes in order to reduce the shipping cost. Thus, firstly fragments of nodes must be accessible by all nodes in each period, secondly, the transmission cost of fragments to nodes must be reduced and thirdly, the cost of updating all components of nodes must be optimized, that results in increased reliability and availability of network. In this study, more efficient hybrid algorithm can be produced combining genetic algorithms and previous algorithms.


INTRODUCTION
Advances in networking and database technology in recent decades has led to development of distributed database systems.Data assignment is used in distributed database in order to achieve the objectives.The first objective is to minimize the total cost of transmission for processing and the second objective is to unify implementation strategy.The primary concerns of distributed database systems are fragmentation and allocation of fragments in main database.Data fragment unit can be a file; in this case, allocation subject is file allocation problem that is NP degree which requires fast heuristics in order to produce effective solutions.In addition, the optimal allocation of database fragments are strongly dependent on query execution strategy that have been implemented by distributed database.Fragments allocation problem has been done in many ways, including repetitive and non-repetitive distributed database, in this article, we have discussed this approach combined with genetic algorithm.

RELATED WORKS
Fragments allocation solution can be divided into two categories including static and dynamic and articles related to static method are briefly examined and its advantages and disadvantages are discussed.

STATIC ALLOCATION ALGORITHM
In 2002, Quang Cook & Goode Berg et al., presented a genetic algorithm; fragments can be distributed by this method among sites so that it results in transmission cost reduction.These papers evaluate update costs in order to reduce transmission costs when allocating fragments of two basic parameters named fragments transmission cost reduction.

Transmission cost reduction
Node that requests a fragment must send its request to a node holding the fragments that do not lead to increased shipping costs.

Update cost
Since the fragments are provided for several sites in each period, then updating the fragment after writing operation on each fragment will be necessary that must be done automatically by the system.Meybodi et al. [2010] presented the genetic algorithm and two considered like previous method factors of reducing the transmission costs as well as updating factor as a parameter for fitness function; another parameter called machine-based learning separated system from other systems.
We will discuss on combining the genetic algorithm with algorithm in distributed database discuss, all sites are formed in a set called F = {S1, S2, ..., Sn}.Each distributed database is made of an array ArrSizeNode [], each Si is determined by its capacity which is the sum of all fragments size S i = {Fragment 1 + Fragment 2 + …. + Fragment n}.

REQUIREMENT MATRIX
Each fragment may be required for at least one of sites in the near future.Each site need for each fragment will be determined by a matrix called requirement matrix, where Rij represents the site i need for fragment j which does not have this fragment in its local database, then it must put this demand in requirement matrix so that distributed system becomes aware of this practice.For example, the node number 5 has the demand for fragment number 25.In general, this requirement will be displayed by means of an actual amount that is weight, but another way is to use a binary value.Then, row 5 and column 25 will change requirement matrix amount from 0 to 1 (Figure 1).

Transmission Cost Matrix
This matrix contains the cost of fragments transmission from one node to other nodes.Gen-eration function of random numbers is used in order to determine the weight and random value of this function will be greater than 50 and less than 100, what is determined as below.
Rand.Next (50, 100); It must be noted that according to vast communications of World Wide Web each node can communicate with other nodes that follow from the protocols of distributed system.The cost of transmission from one node to target node does not differ, so in this case we can reduce transmission cost of matrix, and this means that a matrix instead of having rows and columns of the size equal to number of nodes, the matrix can be outlined up triangular or lower triangular.
It must be noted that according to vast communications of World Wide Web each node can communicate with other nodes that follow from the protocols of distributed system.The cost of transmission from one node to target node does not differ, so in this case we can reduce transmission cost of matrix, and this means that a matrix instead of having rows and columns of the size equals to the number of nodes, the matrix can be outlined up triangular or lower triangular.
As shown in the Figure 2, the transmission cost will be 67 in order to transfer fragment from node 1 to node 2. According to the definitions given in previous sections, evaluation formula in order to allocate fragments will be formed of three relationships: 2) The transmission cost will be optimal.
3) The node that will do transmission with the lowest cost of transmission and update.

Chromosomes view in genetic algorithms
1) The function of initial population In this function, the number of rows is equal to the number of chromosomes and number of columns is equal to number of fragments and the number of genes within chromosomes will be equal to the number of nodes that have used these fragments as well as we have considered the initial population for each generation as 50 (Figure 4).

2) Combinational function
In the combinational function, according to conventional methods of function in this paper, two parents one point method is used for this algorithm, and combinational rate is considered equal to 0.7 (Figure 5).

3) The mutation function
The mutation is a one parent one point method, however, in mutation method random numbers between 0 and 1, are produced using generation function; if this number is equal to 1 it indicates that add a node to nodes having this fragment, but we must not forget one thing and that is whether the node that is going to be owner of this fragment has had it previously or not, and if it is true replace new node, otherwise select But if the generated random value is equal to 0, this indicates that mutation operation will remove a node among the current nodes, then a problem will occur; for solving this problem, it will firstly be checked whether the number of nodes that have this fragment is more than one or not, if this is true, remove the node, otherwise if elimination is done, certainly the availability of distributed system will be disappeared and the system will fail in the near future.It is better to choose another gene from the chromosome and this will be repeated until the problem is resolved and desired result is reached.The mutation rate is considered equal to 0.3.In figure below, removing node 5 from list of nodes having fragment 9 (Figure 7).

Fitness function
In fitness function according to the parameters established in the previous method and were tested, that is transmission cost and update cost, two other parameters have been added it in order to increase efficiency in selection of optimal chromosome.If the node that has the desired fragment fails for any reason, we can restore the node fragments that are provided for other sites.As you know, hardware fragments are not put together in distributed systems, so that they can be repaired, so the fragment and the site will be out of control and availability of the distributed system will be in crisis, and the whole systems may fail; for solving this problem we will use an counter for counting the number of genes in chromosomes that face the problem.
Another idea that was discussed in this paper is when the information is fragmented by a system, it is better to number fragments by the same number which have inter dependency to each other.At this time, nodes that demand these fragments when assigning the fragments, they are asked whether dependent fragments are sent to this fragment or not.If the node accepts the demand, the fragment will be sent to node with related parts.Now, this practice helps all the fragments to be sent in a package to the destination and reduces the cost of transmission.Since the transaction of fragment may need fragments related to the main fragment which reduces the cost of resending fragment.This means, the node that demands fragment will obtain fragments in a package instead of searching twice nodes and paying costs.

Selection function
Choosing the best chromosome is done by tournament from population of chromosomes.Tournament calculates each generation chromosomes according to the main parameters, such as minimum transmission cost, the cost of update, the number of available fragments of a node, and chromosome that can do allocation operation with the lowest cost will be selected as optimal chromosome.

SIMULATION RESULTS
Tests are shown for two proposed measurement factors separately on simulation software and finally, applying these two parameters the transmission cost will be evaluated in the previous and proposed method.
Applying the first measurement parameter of fragments that belong to a node.As the results show, in the proposed GA-F algorithms, a number of fragments that are available for a node are declining and directly effects on availability and reliability (Figure 8).

Applying the second parameter of measurement
Number of fragments that are provided for node with determined ID number (Fig. 9, 10, 11).

Recommendations and future works
This paper provides complete descriptions on different methods of reducing transmission cost when assigning fragments of duplicate distributed database using genetic algorithm.In addition, our proposed method could affect the optimality of GA.For future work, we intend to offer our partners to research on the following: • combining cellular automata with genetic algorithm in order to increase the efficiency, • examining non-randomized and intelligent methods for initial population of GA, • using clustering method for this system and combining it with GA.

CONCLUSION
In this paper, we examine the transmission cost reduction when allocating fragments of duplicate distributed database using a genetic algorithm, in addition to previous methods that have implemented this algorithm; we have decided to try advantages of each method and add more effective measurement parameters so that generated output become more effective.Also, we changed some states of genetic algorithm which were specified as hypotheses.As a result, the output of proposed method will show that if the allocation of fragments is done reasonably at the basic steps, they can be very effective in reducing the cost of transmission.We must note that the needed cost for doing this must not be so high.