Data comparison is a difficult and resource-intensive process.
For convenience, this process can be divided into several
steps.
First, you should compare tables from one database on one server
with the database on the other server. You should choose columns
for data comparison, and also choose a column that will be a
comparison key.
The next step is to choose all data from these tables or some
specified part of the data.
The third and the most important step is comparison of the two
tables by the selected comparison key itself. During this process
the status of each record is set to “only in source”, “only in
target”, “different”, or “equal”.
The final steps of the data comparison process are including
records to the synchronization and synchronization itself. During
these steps records needed for synchronization are chosen, update
script is created, and after that the script is executed.
You can read a detailed description of the comparison process
here.
Now let’s look at the third step (data comparison) thoroughly.
There are several ways of data comparison that differ only by the side where data comparison is going to be performed – on the server side or on the client PC.
Data comparison on the server side is performed using the
resources of the server.
The algorithm of comparison is the following:
1. For each record of each of the two tables its checksum is
calculated;
2. Then the checksum of every record from one table is compared
to the checksum of the corresponding record from another table
and conclusion if the records are equal or different is
made;
3. The comparison result is stored in a temporary table on the
server.
Performance indicators:
1. The speed of data comparison directly depends on the server
capacity and occupancy;
2. The maximal size of database for comparison is limited by the
resources of the server itself.
Advantages:
1. There is no need to transfer large amounts of data for
comparison to the client PC through network. This way we save
network traffic;
2. The speed of comparison does not depend on the client PC
resources;
3. Ability to compare blob data of any size.
Disadvantages:
1. Because of the record checksum calculation algorithm in some
cases different data can result in equal checksum, and instead of
the expected “different” status the “equal” status will be
received;
2. There is no flexibility in the synchronization and comparison
options usage;
3. There is no possibility to view records differences and
exclude a part of the records from the synchronization
manually;
4. During the synchronization script creation you should perform
data transfer from the server to the client side;
5. The control checksum calculation of a large amount of records
consumes all server resources;
6. One should provide extra space on the server for the
comparison results storage in the temporary table.
As we can see, this way of comparison has more disadvantages than advantages, that’s why this way is rarely used.
Data comparison on the client PC is performed using the client machine resources, and the server only provides data for comparison. In turn, this way of comparison can be divided into several more ways depending on the way how comparison information will be stored.
Comparing Data on local PC when comparison result is
stored in RAM.
The comparison algorithm is the following:
1. Server passes all data from both tables to the local PC;
2. Every record of every table is placed to RAM and is compared
without checksum calculation;
3. If a record gets “only in source”, “only in target” or “equal”
status, only comparison key is stored in RAM. If records get
“different” status, they are placed to RAM for storage
completely.
Performance indicators:
1. The speed of data comparison directly depends on the client PC
resources and on the speed of data transfer through the
network;
2. Maximum size of the database for comparison is limited by the
size of RAM on the client PC, and this maximum size also depends
on the degree to which the databases that should be compared are
different – the smaller is the amount of different records, the
larger databases can be compared.
Advantages:
1. Minimal server occupancy – server performs only simple data
selection;
2. The simplest algorithm of data comparison because records are
sorted on the client side;
3. Flexibility in the comparison options usage;
4. Minimal size of the comparison data store;
5. Status of every record for any data is always correct.
Disadvantages:
1. To view records with “only in source”, “only in target”, or
“equal” status an extra data selection is needed;
2. An extra data selection is needed to create a synchronization
script;
3. OutOfMemory Exception may be arisen when there are a lot of
differences in data in databases;
4. Possibility to compare blob data only of the size that equals
to the size of free RAM.
This way of comparison is implemented in dbForge Data Compare for SQL Server v1.10,
dbForge Data Compare for MySQL v2.00 and allows to compare
databases of any size if data in these databases does not differ
a lot.
Comparing Data on local PC when comparison result is
stored as a cashed file on the disk.
The algorithm of comparison is the following:
The server passes all data from both tables sorted by comparison
key to a local PC. Data is read by bytes, compared without
checksum calculation and written to a file on the disk.
Performance indicators:
1. The speed of data comparison directly depends on the client PC
resources and on the speed of data transferring through the
network;
2. The maximum size of a database to compare is limited by free
disk space and does not depend on the degree of data difference
in databases.
Advantages:
1. Medium server occupancy – server performs data sorting and
selection;
2. To view records and synchronization script creation extra
requests to the server are not necessary;
3. The status for every record is always correct for any
data;
4. Possibility to compare blob data of the size equal to the size
of free space available on the disk.
Disadvantages:
1. Difficult algorithm of data comparison for the records
comparison key of which is of the string data type;
2. Difficult algorithm of disk cash for temporary information
storage creation.
We can see that in this case the only disadvantage of this way of
comparison is the difficulty of implementation. There are more
advantages than in the ways of comparison listed above. That’s
why this way of comparison will be used in the new version of
dbForge Data Compare for SQL Server v2.00
and
dbForge Data Compare for MySQL v3.00 for data comparison.