How to Reduce Network Transfer between Storage & SQL Engine
Distributed Database is to handle big data with some remote networks nodes, so how to reduce network data is necessary issue.
Especially using database for OLTP use, it is essential to reduce it.
The distributed database has storage engine architecture. Alinous Elastic DB has following one.
The storage engine is accessed by SQL table access coordinator, which is called Region Manager in this database. It request scanning table to the storage engine, then it returns the result.
Then network transaction occurs. SELECT, UPDATE, and DELETE statements cause this issue.
If the result data is very big, it is tough burden to the network. Therefore it has to make the result small as possible as it can.
Alinous Elastic DB has distributed algorithm to reduce the quantity of result data. That is executed in following way.
Before executing SQL, the Transaction Engine calculates execution plan. In SQL Optimization Phase, it makes plan to scan each table.
Then it calculate following stuffs.
- Which index key to use ( or full scan)
- Essential additional conditions to the scanned result.
The additional condition is sent to the storage engine on requesting scan and it is used by the storage engine then, therefore, the result to return is reduced.
This method is effective for SELECT stattement, which has conditions in JOIN and WHERE clause.
If a table partition key is included in the condition to filter scanned result, Region Manager does not send scan request to the remote storage node which never has result.
That makes network transfer a little bit smaller. To filter result affect much better.
But the reason why I want to introduce that, is it reduce CPU cost of storage engine very much, instead of networking.
And if the storage node is replicated cluster, packets transferred in network decrease more, but it is little bit.