Tag Archives: Visualization

The Chart Missing From ALL Spreadsheet Software

By: DSB

Re-posted from: https://davisbarreira.medium.com/the-chart-missing-from-all-spreadsheet-software-b7fede90d634?source=rss-8bd6ec95ab58------2

And how to implement it in Julia

This chart has been discussed in a recent video on the Minute Physics YouTube Channel. It is actually quite simple, consisting of a floating stacked bar chart. The idea of such plot is that we can use it to visualize range values, while assigning a color to represent such range.

Despite its simplicity, one might be surprised to find out that this chart is not readily available in many of the spreadsheet softwares. It is possible to draw it, yet, one needs to resort to some ad-hoc methods. This and more are discussed in the Minute Physics video.

The Challenge

We want to implement this visualization using the Julia programming language; but we want to be able to do this without resorting to a bunch of ad-hoc methods, and without having to write a humongous amount of code.

One possible solution would be to search for a charting library that covers such example. Yet, as in the case with spreadsheets, this search might be futile. Another possible approach would be to use a more generic visualization library that enables us to “directly” specify such graphic.

The Graphic Description

Before we can implement a solution, we must first get a clear picture of how this graphic is constructed. Consider the following dataset:

In order to reproduce the desired chart, we start by rearranging the dataset. We transform the locations (“Ontario”, “England” and “Kentucky”) into row values, and we pair together the temperature labels, for example, “winter mean low | annual mean low”.

If the dataset organized as such, we can more easily describe how to construct our graphic. Here is the construction logic. For each row, draw a “bar” with the horizontal position according to the location column, the lower end of the bar according to the low_temp column, the higher end of the bar according to the high_temp column, and the bar color is based on the label_temp column.

That’s it! With the dataset structured in such a way, the procedure to construct our graphic is fairly intuitive. We are now ready to implement our solution.

The Implementation

In order to draw our graphic, we need a visualization library that allows us to implement the construction logic we just described. This can be done using the package called Vizagrams.jl.

Vizagrams is a visualization grammar with a syntax very similar to VegaLite. If you have used packages such as Altair or ggplot, you probably know what I am talking about. The difference compared to these grammars is that Vizagrams implements what is called “graphic expression”.

A graphic expression is a constructive description of a graphic. In other words, we can use graphic expressions to encode our constructive logic without having to resort to ad-hoc methods, or writing a bunch of code.

This tutorial does not aim to thoroughly explain Vizagrams or graphic expressions. So let us cut short our explanation here, and move on to the solution.

In the code below, we start by importing Vizagrams. We assume that df holds the dataframe already properly transformed, and we use my-colors to specify the colors to be used (one could also just pick an existing colorscheme). We then specify the plot, which is stored into the plt variable.

The plot specification contains the data, the encoding variables (x, y, color, low_y) and the graphic, which is where we write the graphic expression. Our graphic expression simply iterates over each row in the dataset, and draws a trail having width 20 and with the respective color, lower point and higher point. The trail mark is simpler to use (in this example) and delivers the same result as using the bar mark.

using Vizagrams

df # Our transformed dataframe
my_colors = ["#F28E2B","#4E79A7","#FFBE7D","#A0CBE8"] # Our bar colors

plt = Plot(
data=df,
x = :location,
y = (field=:high_temp,scale_domain=(-10,40), scale_range=(0,200)),
low_y = (field=:low_temp,scale_domain=(-10,40), scale_range=(0,200)),
color = (field=:label_temp,scale_range=my_colors),
graphic =
∑() do row
S(:fill=>row.color)*Trail([[row.x,row.low_y],[row.x,row.y]],20)
end
)

draw(plt)

Conclusion

And there we have it! We implemented the desired chart. The example with all the code and data necessary can be found in Vizagrams’ gallery under the “Floating Stacked Bar” section.

To get a better understanding of what is going on, you should hop on a notebook and try the code yourself. Try changing the graphic expression using different graphical marks, or changing how the encoding variables are used.

Diagramming + Data Visualization with Julia

By: DSB

Re-posted from: https://medium.com/coffee-in-a-klein-bottle/diagramming-data-visualization-with-julia-37147ce63168?source=rss-8bd6ec95ab58------2

A new approach to Data Visualization using Vizagrams.jl

Data visualization and diagramming are usually treated as distinct subjects. Yet, for the keen observer, they are quite similar. In fact, one could say that data visualization is a subset of diagramming, where data is used in a structured manner to generate a diagram drawing. This idea is explored in the Julia package Vizagrams.jl, which implements a diagramming Domain Specific Language (DSL) with a data visualization grammar built on top of it.

In this article, we show how to use Vizagrams.jl both for diagramming and data visualization. I highly recommend using a notebook such as Jupyter or Pluto in order to follow along.

Installation

The package is registered in the Julia general repository, hence, it can be installed by simply doing:

julia>]
pkg> add Vizagrams

The Basics of Diagramming

Vizagrams implements a diagramming DSL inspired in the great Haskell library Diagrams. We can think of a diagram as simply a collection of primitive graphical objects (circles, rectangles, lines…), where the final drawing simply renders each object one after the other, much like an SVG.

We start with the most basic example, a single circle:

using Vizagrams
# My first Diagram
d = Circle()
draw(d)

By default, our circle is drawn as black. We can modify its color by applying a style transformation.

d = S(:fill=>:red)*Circle()
draw(d)

We can make things more interesting by adding another object to our drawing. How can we do this? Well, just add it:

d = S(:fill=>:red)*Circle() + Square()
draw(d)

Note that the order of the summation matters. By adding a square after the circle, Vizagrams renders the square above the circle.

We have used the S(:fill=>:red) to apply the fill color red to the circle. Besides stylistic transformations, we can also apply geometric transformations, such as translation T , rotation R and scaling U .

d = S(:fill=>:red)*Circle() +
S(:stroke=>:blue,:fill=>:white)T(2,0)*Square() +
S(:fill=>:white)R(3.14/4)*Square()

draw(d)

Finally, we can combine also combine existing diagrams to form new ones.

d_2 = d + T(4,0) * d
draw(d_2)

Constructing Data Visualizations

Let us now go to plotting. Again, we start with the most basic example:

plt = plot(x=rand(100),y=rand(100),color=rand(["a","b"],100))
draw(plt)

As expected, the result is a simple scatter plot. Yet, there is something interesting going on. The variable plt is actually holding a diagram. Which means that we can manipulate it just like we did previously with other diagrams, and we can also combine it with other diagrams.

d = R(π/4)plt +
T(250,0)*plt +
Line([[180,250],[350,350],[350,200]]) +
T(350,350)Circle(r=10)

draw(d)

Although the example above was silly, it does illustrates the possibilities of what can be done. In fact, it is easy to see how one can combine diagramming operations to construct more useful visualizations, such as:

Scatter Plot with a PCA fit over the data. The histogram represents the error distribution in the PCA fit. This plot was draw using Vizagrams.jl.

Visualization Grammars with Diagrams

In Vizagrams, the diagramming DSL was used to build a data visualization grammar, i.e. a specification that can produce a variety of visualizations. The syntax for the grammar is based on Vega-Lite, which is a very popular grammar in the Data Visualization community (the Python package Altair is based on Vega-Lite, and there is also a Julia package called VegaLite.jl).

Explaining visualization grammars would take an article on its own. Yet, the specification style if fairly simple, so hopefully even those not familiar with Vega-Lite will understand what is going on.

# Importing DataFrames to store the data
using DataFrames

# VegaDatasets is used to load the dataset `cars`
using VegaDatasets
df = DataFrame(dataset("cars"));
df = dropmissing(df);

# Here is where Vizagrams plot specification actually starts
plt = Plot(
data=df,
encodings=(
x=(field=:Horsepower,),
y=(field=:Miles_per_Gallon,),
color=(field=:Origin,),
size=(field=:Acceleration,),
),
graphic=Circle()
)
draw(plt)

Note that we specified our plot by first passing on the dataset. Then, we defined the “encodings”, which were x , y , color and size. Each encoding variable has a parameter field which says which column in the dataset is mapped to it. Thus, in our example, we are mapping “Horsepower” to the x-axis, “Miles_per_Gallon” to the y-axis, the color is varying according to the “Origin” and the size according to “Acceleration”. At last, the “graphic” is specifying what is to be drawn in this plot. Since we passed Circle() , this means that we are drawing circles.

Here is where things get interesting. We can pass diagrams to this “graphic” parameter in the plot specification. First, let us create a diagram:

d = S(:fill=>:white,:stroke=>:black)*Circle(r=2) +
Circle() +
T(2,0)*Square()
draw(d)

Now, we place this diagram inside the plot specification, and…

plt = Plot(
data=df,
encodings=(
x=(field=:Horsepower,),
y=(field=:Miles_per_Gallon,),
color=(field=:Origin,),
size=(field=:Acceleration,),
),
graphic = Mark(d)
)

draw(plt)

Again, this example is not very useful, yet, it illustrates the sort of things that can be achieved. Here is perhaps a more “useful” example:

Scatter plot using Penguin mark for the famous Palmer Penguis dataset. The complete example can be found in Vizagrams documentation.

Some Final Words

This article was a mere introduction to Vizagrams. There is much more to be explored, such as graphical marks creation, graphic expressions, stacking operations, and so on.

If you are interested in learning more about Vizagrams, check out the documentation. Again, I recommend using a notebook (Jupyter or Pluto), as one can quickly experiment with different designs.


Diagramming + Data Visualization with Julia was originally published in Coffee in a Klein Bottle on Medium, where people are continuing the conversation by highlighting and responding to this story.

Summary of Julia Plotting Packages

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/summary-of-julia-plotting-packages/

This is a repost of my response on the Julia Discourse on this topic. I was asked to make a blog post so here you go!

The “Main” Plotting Packages

Here’s a quick summary of the most widely used plotting packages. I may have missed one, but I haven’t missed one that is very widely used.

  • Plots.jl is the most used. It’s probably the most documented, used in the most tutorials, and is used in many videos.
    • Pros: Its main draw is that it has a lot of plugins to other packages through its recipes system, which means that a lot of odd things like `plot(sol::ODESolution)` or showing the sparsity of a `BandedMatrix` just works. With all of these integrations, it’s normally what I would recommend first to newcomers since they will generally get the most done with the least work. It has a backend system, so you can make it generate plots via GR (the default), Plotly (i.e. make webpages), pyplot, PGFPlots (Latex output), UnicodePlots (i.e. output plots as text). This ease of use and flexibility is what its main draw is.
    • Cons: Its downside has traditionally been its startup time, though it’s nearly a second now so that’s fine. Its main downside now is mostly that it’s not as configurable as something like Makie, and it’s not as optimized if you get up to millions of points. Its flexibility means it’s not just for standard plots but also for animations, building small graphical user interfaces, and building small apps.
  • Makie is probably the second most popular. It’s natively Julia so it’s cool in that aspect, you can see code all the way down.
    • Pros: It’s very optimized for large plots, especially with GPU acceleration via the OpenGL backend (GLMakie). It has a lot of examples these days.
    • Cons: Its downside is that it’s a bit less “first user friendly”, given that its flexibility means there’s a lot more options you’re forced to specify everywhere. It has a recipe system now but it’s fairly new and not well-integrated with most of the ecosystem, so it’s not as seamless as Plots, though by 2024 I would assume that would largely be fixed. It has the longest startup time, used to be in minutes but now it’s like 5-10 seconds.
  • AlgebraOfGraphics.jl is a grammar of graphics front-end to Makie. This essentially means it has an API that looks and acts like R’s ggplot2. Thus it has largely the same pros and cons as Makie, since it’s just calling Makie under the hood, but with the additional pro of being more familiar to users coming from R or con if you haven’t worked with grammar of graphics before (or don’t like the style).
  • Gadfly is a grammar of graphics based library.
    • Pros: It’s very familiar to a ggplot2 user. Its default theme is pretty nice looking.
    • Cons: It’s a bit high on startup time, closer to Makie than Plots. Also, it’s pretty feature poor. In particular, it is missing 3D plots, animations, the ability to make interactive apps with buttons, etc. For these reasons more and more people are adopting AlgebraOfGraphics, but if you’re just doing some standard statistics it’s fine.
  • Vega and VegaLite are of the same camp as Gadfly in the focus towards “standard” statistics and data science, but using wrappers to Javascript libraries.
    • Pro: Fast startups
    • Cons Similar to Gadfly, little to no flexibility (making apps, animations, …) and integration with Julia libraries beyond Queryverse.
  • PlotlyLight is a no-frills wrapper to Plotly.
    • Pro: No startup time
    • Cons: Requires reading the Plotly docs to know how to use it and has little flexibility or integration into Julia libraries.
  • GR is a front end to a C library GR. It’s actually used as the default front-end from Plots.jl. Many more people use it from Plots.jl than directly due to the integrations and docs, but it is nice for some things on its own.
    • Pros: It’s fast, scales fairly well, has a fast startup time, has a nice GUI for investigating results, integrates well with ITerm, very flexible.
    • Cons: It’s docs are bit difficult, and it doesn’t have any integrations with Julia libraries.
  • PGFPlotsX.jl is a front-end to generate plots for Latex.
    • Pros: Fast startup, output to Latex which makes it easy to then further modify in publication documents.
    • Cons: Its interface is wonky, even if you are familiar with the pgfplots Latex package. This makes quite hard to use and teach. Very few integrations with Julia libraries (Measurements and Colors only?). Lacking flexibility in terms of animations and making apps, though it’s quite flexible in its ability to modify the plots and make weird things.
  • UnicodePlots.jl is very simple, fast startup, and plots to text. Its downside of course is that text is the only output it has.
  • Gaston.jl a front-end to gnuplot.
    • Pros: Fast startup.
    • Pretty basic, lacking flexibility and integrations with Julia packages. Requires gnuplot so limitations on where it can be installed (only supports linux?).
  • GMT.jl is “generic mapping tools”. It has some plotting tools highlighted here.
    • Pros: Has good examples in the docs. Nice extra tools for maps.
    • Cons: Missing some standard plot types like trisurf, missing integrations with other Julia packages.
  • GNUPlot.jl uses gnuplot under the hood.
    • Pros: Instant startup, has some interesting data science integrations for things like named datasets, very complete set of plots
    • Cons: Not the most complete documentation, requires gnuplot so limitations on where it can be installed (only supports linux?)

tl;dr on plotting in Julia

Plots.jl is the most used package in the Julia programming language for a reason. It’s very flexible, integrates with the most Julia packages so you’ll find it all throughout other docs, and it has many of the advantages of the other libraries through its backend system. Thus if you needed Latex output, use the pgfplots backend. If you needed a webpage, use the Plotly backend. Unicodeplots backend when you want text output. Or the GR default for the basics. With Julia v1.9 its startup time is much improved (and it’s like sub second on v1.10 beta), which was its major complaint before. If you’re going to use one plotting library and don’t care too much about every little detail, then Plots.jl is a good one to go with. It’s definitely not the best in any of the cases, animations are better in Makie, Latex is better in PGFPlotsX, etc., but it’s capable everywhere.

Makie.jl is catching up and may be the default in the near future. It scales well and its getting all of the niceties of Plots.jl. I wouldn’t learn it first if you’re new to Julia (right now, though that will likely change by 2024). But if you need animations or want to add custom buttons to a window (make a quick GUI-like thing), Makie is unmatched. If it makes its standard plotting interface a bit simpler, gets a few more integrations, and thus matches Plots.jl in simplicity, it may hit a “best of most worlds” soon.

Otherwise it’s a bit domain specific. If you were using Plots.jl and needed more flexibility for publication-quality plots, PGFPlotsX.jl can help. Or if you prefer grammar of graphics, AlgebraOfGraphics.jl is good. If you’re a stats person you may find Gadfly or VegaLite familiar, though I wouldn’t recommend them first because these don’t satisfy general user needs (try making a plot of an FEM output and see what I mean).

All of these are pretty good. You have a lot of options. In the end, pick the one that suits your needs best.

The post Summary of Julia Plotting Packages appeared first on Stochastic Lifestyle.