Friday, September 20, 2019

How To Bulk Import Data Into InnoDB Cluster?

If you need to do bulk importing into InnoDB Cluster, it is certainly possible to do so by using any of:


Unfortunately both imports will add load to instances and to channels interconnecting instances: data imported on the primary instance needs to be replicated to the rest of instances. And the bigger the data to import, the higher the load (and this could end up affecting performance). The import operation could be batched to reduce load, and Group Replication allows at least to throttle workload with flow control or to split messages in several smaller messages with group_replication_communication_max_message_size option.

How to import data into InnoDB Cluster?
But in case data to import is a whole table (MySQL 8 adds also flexibility to swap partitions and tables, may become handy), or data can be first loaded into an InnoDB table, there's simple way to have an arbitrary amount of data pushed to InnoDB Cluster, and it takes advantage of tablespaces copying feature. I made a quick test to import a table.

I created the table t5 on an arbitrary instance and added a few rows. Then exported as in the instructions (does nothing but flush it and create an auxiliary .cnf file for definition validation at import time, not mandatory to use it but recommended):

FLUSH TABLES t5 FOR EXPORT;

On the InnoDB Cluster setup, I created the table t5 with same definition from the primary, then again on the primary:

ALTER TABLE t5 DISCARD TABLESPACE;

This will remove the t5.ibd tablespace on all the 3 instances. And with a simple SELECT, I made sure that this is as expected:

mysql> select * from test.t5;
ERROR 1814 (HY000): Tablespace has been discarded for table 't5'

After that, I copied t5.ibd from the former instance under the related schema folder in *each* GR node.
Let's check initial GTID set:

mysql> select @@GLOBAL.GTID_EXECUTED;
+------------------------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                                     |
+------------------------------------------------------------+
| 550fa9ee-a1f8-4b6d-9bfe-c03c12cd1c72:1-270:1000011-1000014 |
+------------------------------------------------------------+
1 row in set (0.00 sec)

Then on the primary, did:

mysql> ALTER TABLE t5 IMPORT TABLESPACE;
Query OK, 0 rows affected, 1 warning (0.03 sec)

I am lazy and did not perform validation using .cfg (more from the instructions):

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                 |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1810 | InnoDB: IO Read error: (2, No such file or directory) Error opening './test/t5.cfg', will attempt to import without schema verification |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

And all the tablespaces are loaded from local file system into the GR member. And new GTID set is:

mysql> select @@GLOBAL.GTID_EXECUTED;
+------------------------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                                     |
+------------------------------------------------------------+
| 550fa9ee-a1f8-4b6d-9bfe-c03c12cd1c72:1-271:1000011-1000014 |
+------------------------------------------------------------+
1 row in set (0.00 sec)

Let's test it's all ok:

mysql> select * from test.t5;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
|    7 |
|  777 |
+------+
8 rows in set (0.00 sec)

So data will be available on the rest of the nodes at no bandwidth and protocol cost, only this is indeed replicated, from binlog.

# at 2714
#190919  1:34:34 server id 1  end_log_pos 2821  Query   thread_id=34    exec_time=0     error_code=0
SET TIMESTAMP=1568849674/*!*/;
ALTER TABLE t5 IMPORT TABLESPACE
/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Most important, I broke nothing!

mysql> SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members;
+--------------------------------------+--------------+
| MEMBER_ID | MEMBER_STATE |
+--------------------------------------+--------------+
| 43a67ea3-e1a0-11e7-8a9a-0021f66e910e | ONLINE |
| 45ab082d-e1a0-11e7-8b73-0021f66e910e | ONLINE |
| 48be28c0-e1a0-11e7-8c19-0021f66e910e | ONLINE |
+--------------------------------------+--------------+
3 rows in set (0.00 sec)


To wrap up, instead of loading GB or TB into InnoDB Cluster and have the cluster replicate massive amount of rows, this trick can push your data at no cost.
Comments are welcome!

4 comments:

  1. Hi Mortensi,
    Awesome workaround!
    Does it work with encrypted (TDE) tables as well ?

    ReplyDelete
    Replies
    1. I haven't tested, but the idea is the same, with a little caveat, per docs https://dev.mysql.com/doc/refman/8.0/en/tablespace-copying.html.

      .cfp files must be imported on all instances "When exporting an encrypted tablespace, InnoDB generates a .cfp file in addition to a .cfg metadata file"

      Delete
  2. This is my first time i visit here. I found so many interesting stuff in your blog especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the enjoyment here keep up the good work hikedatabase.com/united-states/hiking-in-rhode-island/

    ReplyDelete
  3. Very efficiently written information. It will be beneficial to anybody who utilizes it, including me. Keep up the good work. For sure i will check out more posts. This site seems to get a good amount of visitors. Outlook CRM

    ReplyDelete