Diagramming + Data Visualization with Julia

By: DSB

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/diagramming-data-visualization-with-julia-37147ce63168?source=rss-8bd6ec95ab58------2

A new approach to Data Visualization using Vizagrams.jl

Data visualization and diagramming are usually treated as distinct subjects. Yet, for the keen observer, they are quite similar. In fact, one could say that data visualization is a subset of diagramming, where data is used in a structured manner to generate a diagram drawing. This idea is explored in the Julia package Vizagrams.jl, which implements a diagramming Domain Specific Language (DSL) with a data visualization grammar built on top of it.

In this article, we show how to use Vizagrams.jl both for diagramming and data visualization. I highly recommend using a notebook such as Jupyter or Pluto in order to follow along.

Installation

The package is registered in the Julia general repository, hence, it can be installed by simply doing:

julia>]
pkg> add Vizagrams

The Basics of Diagramming

Vizagrams implements a diagramming DSL inspired in the great Haskell library Diagrams. We can think of a diagram as simply a collection of primitive graphical objects (circles, rectangles, lines…), where the final drawing simply renders each object one after the other, much like an SVG.

We start with the most basic example, a single circle:

using Vizagrams
# My first Diagram
d = Circle()
draw(d)

By default, our circle is drawn as black. We can modify its color by applying a style transformation.

d = S(:fill=>:red)*Circle()
draw(d)

We can make things more interesting by adding another object to our drawing. How can we do this? Well, just add it:

d = S(:fill=>:red)*Circle() + Square()
draw(d)

Note that the order of the summation matters. By adding a square after the circle, Vizagrams renders the square above the circle.

We have used the S(:fill=>:red) to apply the fill color red to the circle. Besides stylistic transformations, we can also apply geometric transformations, such as translation T , rotation R and scaling U .

d = S(:fill=>:red)*Circle() +
S(:stroke=>:blue,:fill=>:white)T(2,0)*Square() +
S(:fill=>:white)R(3.14/4)*Square()

draw(d)

Finally, we can combine also combine existing diagrams to form new ones.

d_2 = d + T(4,0) * d
draw(d_2)

Constructing Data Visualizations

Let us now go to plotting. Again, we start with the most basic example:

plt = plot(x=rand(100),y=rand(100),color=rand(["a","b"],100))
draw(plt)

As expected, the result is a simple scatter plot. Yet, there is something interesting going on. The variable plt is actually holding a diagram. Which means that we can manipulate it just like we did previously with other diagrams, and we can also combine it with other diagrams.

d = R(π/4)plt +
T(250,0)*plt +
Line([[180,250],[350,350],[350,200]]) +
T(350,350)Circle(r=10)

draw(d)

Although the example above was silly, it does illustrates the possibilities of what can be done. In fact, it is easy to see how one can combine diagramming operations to construct more useful visualizations, such as:

Scatter Plot with a PCA fit over the data. The histogram represents the error distribution in the PCA fit. This plot was draw using Vizagrams.jl.

Visualization Grammars with Diagrams

In Vizagrams, the diagramming DSL was used to build a data visualization grammar, i.e. a specification that can produce a variety of visualizations. The syntax for the grammar is based on Vega-Lite, which is a very popular grammar in the Data Visualization community (the Python package Altair is based on Vega-Lite, and there is also a Julia package called VegaLite.jl).

Explaining visualization grammars would take an article on its own. Yet, the specification style if fairly simple, so hopefully even those not familiar with Vega-Lite will understand what is going on.

# Importing DataFrames to store the data
using DataFrames

# VegaDatasets is used to load the dataset `cars`
using VegaDatasets
df = DataFrame(dataset("cars"));
df = dropmissing(df);

# Here is where Vizagrams plot specification actually starts
plt = Plot(
data=df,
encodings=(
x=(field=:Horsepower,),
y=(field=:Miles_per_Gallon,),
color=(field=:Origin,),
size=(field=:Acceleration,),
),
graphic=Circle()
)
draw(plt)

Note that we specified our plot by first passing on the dataset. Then, we defined the “encodings”, which were x , y , color and size. Each encoding variable has a parameter field which says which column in the dataset is mapped to it. Thus, in our example, we are mapping “Horsepower” to the x-axis, “Miles_per_Gallon” to the y-axis, the color is varying according to the “Origin” and the size according to “Acceleration”. At last, the “graphic” is specifying what is to be drawn in this plot. Since we passed Circle() , this means that we are drawing circles.

Here is where things get interesting. We can pass diagrams to this “graphic” parameter in the plot specification. First, let us create a diagram:

d = S(:fill=>:white,:stroke=>:black)*Circle(r=2) +
Circle() +
T(2,0)*Square()
draw(d)

Now, we place this diagram inside the plot specification, and…

plt = Plot(
data=df,
encodings=(
x=(field=:Horsepower,),
y=(field=:Miles_per_Gallon,),
color=(field=:Origin,),
size=(field=:Acceleration,),
),
graphic = Mark(d)
)

draw(plt)

Again, this example is not very useful, yet, it illustrates the sort of things that can be achieved. Here is perhaps a more “useful” example:

Scatter plot using Penguin mark for the famous Palmer Penguis dataset. The complete example can be found in Vizagrams documentation.

Some Final Words

This article was a mere introduction to Vizagrams. There is much more to be explored, such as graphical marks creation, graphic expressions, stacking operations, and so on.

If you are interested in learning more about Vizagrams, check out the documentation. Again, I recommend using a notebook (Jupyter or Pluto), as one can quickly experiment with different designs.


Diagramming + Data Visualization with Julia was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

The main thing in Julia 1.11

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/07/05/main.html

Introduction

This is my last blog post with the previews of an upcoming Julia 1.11 release.
The functionality I want to cover today is an option of defining an entry point to the Julia script.

The code was tested under Julia 1.11 RC1.

A traditional Julia script

Traditionally when writing a Julia script you assumed that when you run a julia some_script.jl command.
In this case Julia sequentially executes the contents of the some_script.jl file and terminates.

When I was writing Julia code that was meant to be executed in this way my typical approach was to always encapsulate all executed code in functions.
In this way we can avoid many problems that are introduced by writing code that is executed in global scope, including some of the common issues:

  • scope of variables (no need to think about the global keyword);
  • performance (code inside functions is compiled, thus fast);
  • an accidental use of the same name for different objects in global scope spaghetti code (I think everyone has been bit by this issue);
  • pollution of RAM memory (large objects that have bindings in global scope are kept alive and it is easy to forget to unbind them to alow garbage collection).

Therefore a typical structure of my code was:

...
some definitions of data structures and code inside functions
...

function main(ARGS)
    ...
    the operations I want to have executed by the script
    ...
end

main(ARGS)

This is a style that is natural for programmers used to such languages as e.g. C, where the main function is an entry point.

Script under Julia 1.11

Julia 1.11 adds an option to mark the main function as an entry point. It makes sure that main(ARGS) gets called after execution of the script.

It is quite easy to mark the main function as an entry point. It is enough to just replace main(ARGS) with (@main)(ARGS) in my example above.
Thus, starting from Julia 1.11 I can write my scripts as:

...
some definitions of data structures and code inside functions
...

function (@main)(ARGS)
    ...
    the operations I want to have executed by the script
    ...
end

This seemingly small change is in my opinion significant as it standardizes the way Julia scripts are written.
And such standardization is a good feature improving code readability and maintainability.
Additionally, this feature helps in unification of interactive and compiled workflows of using Julia.

Let me show a minimal working example of writing a script using the @main macro:

$ julia -e "using InteractiveUtils; (@main)(args) = versioninfo()"
Julia Version 1.11.0-rc1
Commit 3a35aec36d (2024-06-25 10:23 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × 12th Gen Intel(R) Core(TM) i7-1250U
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
$

In this example we invoke the versioninfo function inside the main(args) function defined using the @main macro.
Note that we did not have to explicitly call the main function in the code. It was invoked automatically because it has
been created using the @main macro.

Conclusions

Now I hope you know what @main macro does and how to use it in Julia 1.11. Enjoy scripting with Julia!