Interfacing with Relational Database using MySQL.jl and PyMySQL

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//julia/python/packages/relational-databases/2019/07/07/Interfacing-with-Relational-Database-using-MySQL.jl-and-PyMySQL.html

Prior to the advent of computing, relational database can be thought of log books typically used for inventory and visitor’s time-in time-out. These books contain rows that define the item/transaction, and columns describing the features of each row. Indeed, these are the core attributes of any relational database. Unlike spreadsheets, which are used for handling small datasets, databases are mostly used for storing huge transactional data for later use. They run on a server and often at the backend of any user (client) interface such as websites and mobile applications. These applications communicate with database via processing servers (e.g. Flask and Django). The figure below illustrates the request and response communcations between client and servers.

As mentioned earlier, databases are meant to store data for later use — in the sense that we can use it as a response to client’s requests, such as viewing or data extraction for insights. In this article, we are interested in data extraction from the database. In particular, the objective is to illustrate how to send request to MySQL server, and how to process response both from Julia and Python.

MySQL Server Setup

To start with, we need to setup MySQL server in our machine. Click the following link to download and install the application.

Note that I recommend you to download the latest version of MySQL since the setup above is using the old version.

Query: Creating Database

In order to appreciate what we are aiming in this article, we need to go through some basic SQL queries to understand what type of request to send to MySQL server. I’m using macOS, but the following should work on Windows as well. For macOS users, open the MySQL Server Shell by running mysql -u root -p (hit return or enter , and type in your MySQL root password you specified during the installation setup from the previous section) in the terminal. For windows, try to look for it in the Start Menu.



From here, we are going to check the available databases in MySQL server. To do this, run the following:

SHOW DATABASES;
-- +--------------------+
-- | Database |
-- +--------------------+
-- | information_schema |
-- | mysql |
-- | performance_schema |
-- | sys |
-- +--------------------+
-- 4 rows in set (0.00 sec)

Indeed, there are four out-of-the-box defined databases already, and we don’t want to touch that. Instead, we are going to create our own database, let’s call it tutorial. To do this, run the following codes:
CREATE DATABASE tutorial;
-- Query OK, 1 row affected (0.01 sec)
SHOW DATABASES;
-- +--------------------+
-- | Database |
-- +--------------------+
-- | information_schema |
-- | mysql |
-- | performance_schema |
-- | sys |
-- | tutorial |
-- +--------------------+
-- 5 rows in set (0.00 sec)
USE tutorial;
-- Database changed
SELECT DATABASE();
-- +------------+
-- | DATABASE() |
-- +------------+
-- | tutorial |
-- +------------+
-- 1 row in set (0.00 sec)

The best thing about SQL syntax is that, everything is self-explanatory, except maybe for line 19, which simply confirmed that we are using tutorial as our database.

Query: Creating Table

Next is to create a table for our database, we are going to use the 2019 Philippine Election results with columns: Last Name, First Name, Party, Votes. Further, for purpose of illustration, we are going to use the top 5 senators only.

CREATE TABLE senators(last_name VARCHAR(10), first_name VARCHAR(20), party VARCHAR(10), votes INT);
-- Query OK, 0 rows affected (0.03 sec)
SHOW TABLES;
-- +--------------------+
-- | Tables_in_tutorial |
-- +--------------------+
-- | senators |
-- +--------------------+
-- 1 row in set (0.01 sec)

Query: Inserting Values

The following codes will insert the top five senators from the 2019 Philippine election results.

INSERT INTO senators VALUES ("VILLAR", "CYNTHIA", "NP", 25128563);
-- Query OK, 1 row affected (0.01 sec)
INSERT INTO senators VALUES ("POE", "GRACE", "IND", 21922209);
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO senators VALUES ("GO", "BONG GO", "PDPLBN", 20465005);
-- Query OK, 1 row affected (0.01 sec)
INSERT INTO senators VALUES ("CAYETANO", "PIA", "NP", 19645110);
-- Query OK, 1 row affected (0.00 sec)
INSERT INTO senators VALUES ("DELA ROSA", "BATO", "PDPLBN", 18815388);
-- Query OK, 1 row affected (0.00 sec)

Query: Show Data

To view the encoded data, we simply select all (*) the columns from the table.

SELECT * FROM senators;
-- +-----------+------------+--------+----------+
-- | last_name | first_name | party | votes |
-- +-----------+------------+--------+----------+
-- | VILLAR | CYNTHIA | NP | 25128563 |
-- | POE | GRACE | IND | 21922209 |
-- | GO | BONG GO | PDPLBN | 20465005 |
-- | CAYETANO | PIA | NP | 19645110 |
-- | DELA ROSA | BATO | PDPLBN | 18815388 |
-- +-----------+------------+--------+----------+
-- 5 rows in set (0.00 sec)

MySQL Clients on Julia and Python

In this section, we are going to interface with MySQL server on Julia and Python. This is possible using MySQL.jl and PyMySQL libraries. To start with, install the necessary libraries as follows:


using Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("MySQL")

For this exercise, our goal is to save the NYC Flights (2013) data into the database and query it from Julia and Python.

Downloading NYC Flights Data

I have a copy of the dataset on Github, and so the following code will download the said data:


using CSV
using DataFrames
using MySQL # if error try running Pkg.build("DecFP")
url = "https://raw.githubusercontent.com/estadistika/assets/master/data/nycflights13.csv";
dir = joinpath(homedir(), "Downloads", "nycflights13.csv");
download(url, dir);
# read the downloaded data
data = CSV.read(dir);

Connect to MySQL Server

In order for the client to send request to MySQL server, the user/client needs to connect to it using the credentials set in the installation.


con = MySQL.connect("localhost", "root", "12345678", db = "tutorial");

Note that you need to have a strong password, and this configuration should not be exposed to the public. The above snippets are meant for illustration.

First Request

To test the connection, let’s send our first request — to show the tables in the database:


MySQL.Query(con, "SHOW TABLES;") |> DataFrame
#> 1×1 DataFrame
#> │ Row │ Tables_in_tutorial │
#> │ │ String⍰ │
#> ├─────┼────────────────────┤
#> │ 1 │ senators │

In Julia, the response is recieved as a MySQL.Query object and can be viewed using DataFrame. For Python, however, you will get a tuple object.

Create NYC Flights Table

At this point, we can now create the table for our dataset. To do this, run the following:


MySQL.Query(con, "DROP TABLE IF EXISTS nycflights");
query = "CREATE TABLE nycflights(" *
join(
string.(names(data))[2:end] .*
repeat([" "], inner = length(names(data)) - 1) .*
map(x -> x == "Int64" ? "INT" : x == "String" ? "VARCHAR(20)" : "VARCHAR(20)", string.(eltypes(data)))[2:end], ", "
) *
");";
query
#> "CREATE TABLE nycflights(year INT, month INT, day INT, dep_time VARCHAR(20), sched_dep_time INT, dep_delay VARCHAR(20), arr_time VARCHAR(20), sched_arr_time INT, arr_delay VARCHAR(20), carrier VARCHAR(20), flight INT, tailnum VARCHAR(20), origin VARCHAR(20), dest VARCHAR(20), air_time VARCHAR(20), distance INT, hour INT, minute INT, time_hour VARCHAR(20));"
MySQL.Query(con, query);

As shown in the previous section, sending request to the server both in Julia and in Python is done by simply using a string of SQL queries as input to MySQL.jl and PyMySQL APIs. Hence, the query object (in line 3 of Julia code and line 4 of Python code) above, simply automates the concatenation of SQL query for creating a table. Having said, you can of course write the query manually. To check if we have indeed created the table, run the following codes:


MySQL.Query(con, "SHOW TABLES;") |> DataFrame
#> 2×1 DataFrame
#> │ Row │ Tables_in_tutorial │
#> │ │ String⍰ │
#> ├─────┼────────────────────┤
#> │ 1 │ nycflights │
#> │ 2 │ senators │
MySQL.Query(con, "SELECT * FROM nycflights;") |> DataFrame
#> 0×19 DataFrame

As you can see, we’ve created it already, but with no entry yet.

Populating the Table

Finally, we are going to populate the table in the database by inserting the values row by row.


stmt = MySQL.Stmt(con,
"INSERT INTO nycflights VALUES (" *
join(repeat(["?"], inner = length(names(data)) - 1), ", ") *
");"
);
stmt
#> MySQL.Stmt(Ptr{Nothing} @0x00007f80fdb495f0, "INSERT INTO nycflights VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", 19, 0)
@time for i in 1:nrow(data)
@info("processing row " * string(i))
MySQL.execute!(stmt, Vector(data[1, names(data)[2:end]]))
end
# 219.653593 seconds (52.89 M allocations: 3.547 GiB, 0.46% gc time)

From the above Julia code, the result of the stmt is an SQL INSERT query with placeholder values indicated by ?. The timed (@time in Julia code) loop in line 9 above maps the values of the vector, one-to-one, to the elements (?) of the tuple in stmt. Having said, MySQL.Stmt has no equivalent in PyMySQL. Further, one major difference between these libraries is that, PyMySQL will not populate the table even after executing all sorts of SQL queries unless you commit it (con.commit), as shown above. This is contrary to MySQL.jl which automatically commits every execution of the SQL queries. I do like the idea of having con.commit in PyMySQL, since this avoids accidental deletion or modification in the database, thus adding a layer of security. To check if we have indeed populated the table, run the following:


resp = @time MySQL.Query(con, "SELECT * FROM nycflights;");
#> 0.566995 seconds (58 allocations: 2.859 KiB)
nycf = resp |> DataFrame;
nycf
#> 336776×19 DataFrame. Omitted printing of 6 columns
#> │ Row │ year │ month │ day │ dep_time │ sched_dep_time │ dep_delay │ arr_time │ sched_arr_time │ arr_delay │ carrier │ flight │ tailnum │ origin │
#> │ │ Int32⍰ │ Int32⍰ │ Int32⍰ │ String⍰ │ Int32⍰ │ String⍰ │ String⍰ │ Int32⍰ │ String⍰ │ String⍰ │ Int32⍰ │ String⍰ │ String⍰ │
#> ├────────┼────────┼────────┼────────┼──────────┼────────────────┼───────────┼──────────┼────────────────┼───────────┼─────────┼────────┼─────────┼─────────┤
#> │ 1 │ 2013 │ 1 │ 1 │ 517 │ 515 │ 2 │ 830 │ 819 │ 11 │ UA │ 1545 │ N14228 │ EWR │
#> │ 2 │ 2013 │ 1 │ 1 │ 517 │ 515 │ 2 │ 830 │ 819 │ 11 │ UA │ 1545 │ N14228 │ EWR │
#> ⋮
#> │ 336774 │ 2013 │ 1 │ 1 │ 517 │ 515 │ 2 │ 830 │ 819 │ 11 │ UA │ 1545 │ N14228 │ EWR │
#> │ 336775 │ 2013 │ 1 │ 1 │ 517 │ 515 │ 2 │ 830 │ 819 │ 11 │ UA │ 1545 │ N14228 │ EWR │
#> │ 336776 │ 2013 │ 1 │ 1 │ 517 │ 515 │ 2 │ 830 │ 819 │ 11 │ UA │ 1545 │ N14228 │ EWR │

To disconnect from the server, run MySQL.disconnect(con) (Julia) or con.close() (Python).

Benchmark

For the benchmark, I added a timelapse recorder in populating and reading the table in the previous section. The figure below summarizes the results.

The figure was plotted using Gadfly.jl. Install this package using Pkg as described above (see the first code block under MySQL Clients on Julia and Python section), along with Cario.jl and Fontconfig.jl. The latter two packages are used to save the plot in PNG format. See the code below to reproduce:

using Cairo
using Compose
using DataFrames
using Gadfly
using Measures
Gadfly.push_theme(:dark)
benchmark = DataFrame(
Time = [219.653593, 305.19222807884216, 0.566995, 12.32638692855835],
Clients = repeat(["MySQL.jl", "PyMySQL"], outer= 2),
Tasks = repeat(["Populating", "Reading"], inner = 2)
)
p = plot(
benchmark,
x = :Time,
y = :Tasks,
color = :Clients,
Guide.xlabel("Processing Time in Seconds (Shorter is Better)"),
Guide.colorkey(pos = [.775w, -0.13h]),
Geom.bar(position = :dodge, orientation = :horizontal),
Guide.annotation(
compose(
context(),
(context(), Compose.text(0.05w, 0.14h, "MySQL.jl took 0.57 second in reading the server response."), Compose.fill("#a0a0a0"))
)
),
Guide.annotation(
compose(
context(),
(context(), Compose.text(0.15w, 0.74h, "Populating MySQL Table"), Compose.fill(colorant"#ffffff"), Compose.stroke(nothing))
)
),
Guide.annotation(
compose(
context(),
(context(), Compose.text(0.15w, 0.8h, "with 336,776 rows."), Compose.fill("#ffffff"))
)
)
);
p |> PNG("benchmark-database.png", 6.5inch, 4inch, dpi = 200)

Conclusion

The aim of this article was simply to illustrate the usage of MySQL.jl APIs in comparison to the PyMySQL; and I would say both libraries have similarities in APIs (as expected) and are stable for the tasks. I should emphasize though that, I do like the con.commit of PyMySQL since this adds a level of security, and I think this is a good addition to MySQL.jl in the future.

Complete Codes

If you are impatient, here are the complete codes excluding the MySQL codes and the plots. These should work after installing the required libraries shown above:


using CSV
using DataFrames
using MySQL # if error try running Pkg.build("DecFP")
url = "https://raw.githubusercontent.com/estadistika/assets/master/data/nycflights13.csv";
dir = joinpath(homedir(), "Downloads", "nycflights13.csv");
download(url, dir);
# read the downloaded data
data = CSV.read(dir);
# make connection
con = MySQL.connect("localhost", "root", "12345678", db = "tutorial");
# check if there are tables available
MySQL.Query(con, "SHOW TABLES;") |> DataFrame
# before we create the nycflights table,
# make sure we don't have the same table name in the server
MySQL.Query(con, "DROP TABLE IF EXISTS nycflights");
# create the table
query = "CREATE TABLE nycflights(" *
join(
string.(names(data))[2:end] .*
repeat([" "], inner = length(names(data)) - 1) .*
map(x -> x == "Int64" ? "INT" : x == "String" ? "VARCHAR(20)" : "VARCHAR(20)", string.(eltypes(data)))[2:end], ", "
) *
");";
MySQL.Query(con, query);
MySQL.Query(con, "SHOW TABLES;") |> DataFrame
MySQL.Query(con, "SELECT * FROM nycflights;") |> DataFrame
stmt = MySQL.Stmt(con,
"INSERT INTO nycflights VALUES (" *
join(repeat(["?"], inner = length(names(data)) - 1), ", ") *
");"
);
@time for i in 1:nrow(data)
@info("processing row " * string(i))
MySQL.execute!(stmt, Vector(data[1, names(data)[2:end]]))
end
#> 219.653593 seconds (52.89 M allocations: 3.547 GiB, 0.46% gc time)
resp = @time MySQL.Query(con, "SELECT * FROM nycflights;");
#> 0.566995 seconds (58 allocations: 2.859 KiB)
nycf = resp |> DataFrame;
MySQL.disconnect(con)

References and Resources

  • MySQL.jl Github Repo: https://github.com/JuliaDatabases/MySQL.jl
  • PyMySQL Github Repo: https://github.com/PyMySQL/PyMySQL
  • Flaticon: https://www.flaticon.com/

Software Versions

========
Julia
========
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA = /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia
JULIA_EDITOR = "/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper"
========
Python
========
3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)]