Tag Archives: Data Visualization

Diagramming + Data Visualization with Julia

By: DSB

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/diagramming-data-visualization-with-julia-37147ce63168?source=rss-8bd6ec95ab58------2

A new approach to Data Visualization using Vizagrams.jl

Data visualization and diagramming are usually treated as distinct subjects. Yet, for the keen observer, they are quite similar. In fact, one could say that data visualization is a subset of diagramming, where data is used in a structured manner to generate a diagram drawing. This idea is explored in the Julia package Vizagrams.jl, which implements a diagramming Domain Specific Language (DSL) with a data visualization grammar built on top of it.

In this article, we show how to use Vizagrams.jl both for diagramming and data visualization. I highly recommend using a notebook such as Jupyter or Pluto in order to follow along.

Installation

The package is registered in the Julia general repository, hence, it can be installed by simply doing:

julia>]
pkg> add Vizagrams

The Basics of Diagramming

Vizagrams implements a diagramming DSL inspired in the great Haskell library Diagrams. We can think of a diagram as simply a collection of primitive graphical objects (circles, rectangles, lines…), where the final drawing simply renders each object one after the other, much like an SVG.

We start with the most basic example, a single circle:

using Vizagrams
# My first Diagram
d = Circle()
draw(d)

By default, our circle is drawn as black. We can modify its color by applying a style transformation.

d = S(:fill=>:red)*Circle()
draw(d)

We can make things more interesting by adding another object to our drawing. How can we do this? Well, just add it:

d = S(:fill=>:red)*Circle() + Square()
draw(d)

Note that the order of the summation matters. By adding a square after the circle, Vizagrams renders the square above the circle.

We have used the S(:fill=>:red) to apply the fill color red to the circle. Besides stylistic transformations, we can also apply geometric transformations, such as translation T , rotation R and scaling U .

d = S(:fill=>:red)*Circle() +
S(:stroke=>:blue,:fill=>:white)T(2,0)*Square() +
S(:fill=>:white)R(3.14/4)*Square()

draw(d)

Finally, we can combine also combine existing diagrams to form new ones.

d_2 = d + T(4,0) * d
draw(d_2)

Constructing Data Visualizations

Let us now go to plotting. Again, we start with the most basic example:

plt = plot(x=rand(100),y=rand(100),color=rand(["a","b"],100))
draw(plt)

As expected, the result is a simple scatter plot. Yet, there is something interesting going on. The variable plt is actually holding a diagram. Which means that we can manipulate it just like we did previously with other diagrams, and we can also combine it with other diagrams.

d = R(π/4)plt +
T(250,0)*plt +
Line([[180,250],[350,350],[350,200]]) +
T(350,350)Circle(r=10)

draw(d)

Although the example above was silly, it does illustrates the possibilities of what can be done. In fact, it is easy to see how one can combine diagramming operations to construct more useful visualizations, such as:

Scatter Plot with a PCA fit over the data. The histogram represents the error distribution in the PCA fit. This plot was draw using Vizagrams.jl.

Visualization Grammars with Diagrams

In Vizagrams, the diagramming DSL was used to build a data visualization grammar, i.e. a specification that can produce a variety of visualizations. The syntax for the grammar is based on Vega-Lite, which is a very popular grammar in the Data Visualization community (the Python package Altair is based on Vega-Lite, and there is also a Julia package called VegaLite.jl).

Explaining visualization grammars would take an article on its own. Yet, the specification style if fairly simple, so hopefully even those not familiar with Vega-Lite will understand what is going on.

# Importing DataFrames to store the data
using DataFrames

# VegaDatasets is used to load the dataset `cars`
using VegaDatasets
df = DataFrame(dataset("cars"));
df = dropmissing(df);

# Here is where Vizagrams plot specification actually starts
plt = Plot(
data=df,
encodings=(
x=(field=:Horsepower,),
y=(field=:Miles_per_Gallon,),
color=(field=:Origin,),
size=(field=:Acceleration,),
),
graphic=Circle()
)
draw(plt)

Note that we specified our plot by first passing on the dataset. Then, we defined the “encodings”, which were x , y , color and size. Each encoding variable has a parameter field which says which column in the dataset is mapped to it. Thus, in our example, we are mapping “Horsepower” to the x-axis, “Miles_per_Gallon” to the y-axis, the color is varying according to the “Origin” and the size according to “Acceleration”. At last, the “graphic” is specifying what is to be drawn in this plot. Since we passed Circle() , this means that we are drawing circles.

Here is where things get interesting. We can pass diagrams to this “graphic” parameter in the plot specification. First, let us create a diagram:

d = S(:fill=>:white,:stroke=>:black)*Circle(r=2) +
Circle() +
T(2,0)*Square()
draw(d)

Now, we place this diagram inside the plot specification, and…

plt = Plot(
data=df,
encodings=(
x=(field=:Horsepower,),
y=(field=:Miles_per_Gallon,),
color=(field=:Origin,),
size=(field=:Acceleration,),
),
graphic = Mark(d)
)

draw(plt)

Again, this example is not very useful, yet, it illustrates the sort of things that can be achieved. Here is perhaps a more “useful” example:

Scatter plot using Penguin mark for the famous Palmer Penguis dataset. The complete example can be found in Vizagrams documentation.

Some Final Words

This article was a mere introduction to Vizagrams. There is much more to be explored, such as graphical marks creation, graphic expressions, stacking operations, and so on.

If you are interested in learning more about Vizagrams, check out the documentation. Again, I recommend using a notebook (Jupyter or Pluto), as one can quickly experiment with different designs.


Diagramming + Data Visualization with Julia was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Using DataFrames and PyPlot in Julia

By: Jaafar Ballout

Re-posted from: https://www.supplychaindataanalytics.com/using-dataframes-and-pyplot-in-julia/

Julia includes many packages that could be used in operation research and optimization in general. This article serves as a brief introduction to different packages available in Julia’s ecosystem. I will focus on two packages that are stepping-stones in future work: DataFrames.jl and PyPlots. DataFrames.jl provides a set of tools for working with tabular data similar to Pandas in Python. The PyPlots module provides a Julia interface to the Matplotlib plotting library from Python. Although PyPlots is a distribution of Python and Matplotlib, Julia can install a private distribution that can’t be accessed outside Julia’s environment.

Adding the necessary packages

Open a new terminal window and run Julia. Use the code below to add the packages required for this article.

julia> using Pkg
julia> Pkg.add("DataFrames")
julia> Pkg.add("CSV")
julia> Pkg.add("Arrow")

Open a new terminal window and run Julia. Initialize the PYTHON environment variable:

julia> ENV["PYTHON"] = ""
""

Install PyPlot:

julia> using Pkg
julia> Pkg.add("PyPlot")

After adding all the required packages using Julia REPL, the following code is used to import the packages in the Jupyter editor (i.e. Jupyter Notebook).

using DataFrames
using CSV
using Arrow
using PyPlot

Using DataFrames in Julia

DataFrames requires some packages in the backend like CSV and Arrow to complete its operations properly. Thus, these packages were added initially in the section before.

Using PyPlots in Julia

Because PyPlots is an interface for Matplotlib in Julia, all the documentation is available on Matplotlib’s main page.

Covid-19 showcase using Julia

The first covid-19 case was detected on the 17th of November, 2019. Although two years passed since the pandemic started, the virus is still persisting and cases are increasing exponentially. Thus, data analysis is important to understand the growth of cases around the world. I will be using a .csv file containing data about the cases in country X. The file is available in a public repository on my GitHub page, so I can copy it to an excel sheet and move it to the directory file where the Jupyter notebook is located. Also, I can import the data from the web using excel and attach the link of the raw format of the database on GitHub. In both cases, saving the file as time.csv is necessary to fit with the code in later stages.

Here, I am showing the database, in csv format, that I opened via excel on my desktop.

I will read the csv file, in the Jupyter notebook, using the code block below:

df  = CSV.File("time.csv") |> DataFrame; # reading the csv file using CSV package and changing it to a DataFrame using the arrow operation |>
df[1:5,:] # output the first five rows

Importing data files is smoother using DataFrames. Some of the operations present in the code blocks below are explained in a previous post introducing Julia. Plotting is another important tool to understand data and visualize it better. Matplotlib is introduced before to the blog but in Python.

The code block below creates a bar chart showing the number of cumulative tests and cumulative negative cases over a period of six days from the DataFrame, df, imported above.

y1 = df[20:25,2]; 
y2 = df[20:25,3];
x = df[20:25,1];

fig = plt.figure() 
ax = plt.subplot() 
ax.bar(x, y1, label="cumulative tests",color="black")
ax.bar(x, y2, label="cumulative negative cases",color="grey")
ax.set_title("Covid-19 data in Country X",fontsize=18,color="green")
ax.set_xlabel("date",fontsize=14,color="red")
ax.set_ylabel("number of cases",fontsize=14,color="red")
ax.legend(fontsize=10)
ax.grid(b=1,color="blue",alpha=0.1)
plt.show()

The bar chart above could be improved by allocating a bar for each category. This is done using the code block below.

barWidth = 0.25
br1 = 1:1:length(x)
br2 = [x + barWidth for x in br1]

fig = plt.figure()
ax = plt.subplot() 
ax.bar(br1, y1,color="r", width=barWidth, edgecolor ="grey",label="cumulative tests")
ax.bar(br2, y2,color ="g",width=barWidth, edgecolor ="grey", label="cumulative negative cases")
ax.set_title("Covid-19 data in Country X",fontsize=18,color="green")
ax.set_xlabel("date", fontweight ="bold",fontsize=14)
ax.set_ylabel("number of cases", fontweight ="bold",fontsize=14)
ax.legend(fontsize=10)
ax.grid(b=1,color="blue",alpha=0.1)
plt.xticks([r + barWidth for r in br1],  ["2/8/2020", "2/9/2020", "2/10/2020", "2/11/2020", "2/12/2020","2/13/2020"])
plt.legend()
plt.show()

Understanding the exponential growth associated with covid19 requires plotting the whole dataset which is over the span of 44 days. Therefore, the next code block aims to show the covid cases of the complete dataset.

days = 1:1:44
days_array = collect(days)
confirmed_cases = df[:,4]

fig = plt.figure(figsize=(10,10))
ax = plt.subplot()
ax.bar(days_array,confirmed_cases , color ="blue",width = 0.4)
ax.set_title("A bar chart showing cumulative confirmed covid-19 cases in country X",fontsize=22,color="darkgreen")
ax.set_xlabel("day number",fontsize=16,color="darkgreen")
ax.set_ylabel("number of cases",fontsize=16,color="darkgreen")
ax.xaxis.set_ticks_position("none")
ax.yaxis.set_ticks_position("none")
ax.xaxis.set_tick_params(pad = 5)
ax.yaxis.set_tick_params(pad = 10)
ax.grid(b = 1, color ="grey",linestyle ="-.", linewidth = 0.5,alpha = 0.2)
fig.text(0.3, 0.8, "SCDA-JaafarBallout", fontsize = 10,color ="grey", ha ="right", va ="bottom",alpha = 0.6)
plt.show()

Even though bar charts are powerful in our case, it is still nice to observe the exponential curve. For that, a code block is presented below to plot the growth curve of confirmed cases.

fig = plt.figure(figsize=(10,10))
ax = plt.subplot()
ax.plot(days_array,confirmed_cases , color ="blue",marker="o",markersize=6, linewidth=2, linestyle ="--")
ax.set_title("A bar chart showing cumulative confirmed covid-19 cases in country X",fontsize=22,color="darkgreen")
ax.set_xlabel("day number",fontsize=16,color="darkgreen")
ax.set_ylabel("number of cases",fontsize=16,color="darkgreen")
ax.grid(b = 1, color ="grey",linestyle ="-.", linewidth = 0.5,alpha = 0.2)
fig.text(0.3, 0.8, "SCDA-JaafarBallout", fontsize = 10,color ="grey", ha ="right", va ="bottom",alpha = 0.6)
plt.xticks(size=16, color ="black")
plt.yticks(size=16, color ="black")
plt.show()

Realizing a meme! (with Julia)

Recently, this meme got viral. So, it is nice to figure the missing functions by plotting.

fig, axs = plt.subplots(2, 2);
fig.tight_layout(pad=4);

x1 = range(-10, 10, length=1000);
y1 = range(1, 1,length =1000);

axs[1].plot(x1, y1, color="blue", linewidth=2.0, linestyle="-");
axs[1].set_xlabel("x1");
axs[1].set_ylabel("y1");
axs[1].set_title("Constant");

x2 = range(-10, 10, length=1000);
y2 = x2.^3;

axs[2].plot(x2, y2, color="blue", linewidth=2.0, linestyle="-");
axs[2].set_xlabel("x2");
axs[2].set_ylabel("y2");
axs[2].set_title("Y = X^(3)");

x3 = range(-10, 10, length=1000);
y3 = x2.^2;

axs[3].plot(x3, y3, color="blue", linewidth=2.0, linestyle="-");
axs[3].set_xlabel("x3");
axs[3].set_ylabel("y3");
axs[3].set_title("Y = X^(2)");

x4 = range(-5, 5, length=1000);
y4 = cos.(x4);

axs[4].plot(x4, y4, color="blue", linewidth=2.0, linestyle="-");
axs[4].set_xlabel("x4");
axs[4].set_ylabel("y4");
axs[4].set_title("Y = cos(X)");

Graphing networks by hand or traditional software seems exhausting. In future posts, I will demonstrate how to draw complicated networks using Julia and its dependencies. Then, I will solve the network problem using the JuMP package.

The post Using DataFrames and PyPlot in Julia appeared first on Supply Chain Data Analytics.

JuliaCon 2015: Everyday Analytics and Visualization (video)

By: randyzwitch - Articles

Re-posted from: http://randyzwitch.com/juliacon-2015-everyday-analytics-and-visualization-video/

At long last, here’s the video of my presentation from JuliaCon 2015, discussion common analytics tasks and visualization. This is really two talks, the first being an example of using the citibike NYC API to analyze ridership of their public bike program, and the second a discussion of the Vega.jl package.

Speaking at JuliaCon 2015 at MIT CSAIL is the professional highlight of my year; hopefully even more of you will attend next year.

Enjoy!

Edit: For those of you who would like to follow-along using the actual presentation code, it is available on GitHub.