Tag Archives: Data Visualization

JuliaCon 2015: Everyday Analytics and Visualization (video)

Re-posted from: http://randyzwitch.com/juliacon-2015-everyday-analytics-and-visualization-video/

At long last, here’s the video of my presentation from JuliaCon 2015, discussion common analytics tasks and visualization. This is really two talks, the first being an example of using the citibike NYC API to analyze ridership of their public bike program, and the second a discussion of the Vega.jl package.

Speaking at JuliaCon 2015 at MIT CSAIL is the professional highlight of my year; hopefully even more of you will attend next year.

Enjoy!

Edit: For those of you who would like to follow-along using the actual presentation code, it is available on GitHub.

Vega.jl Rebooted – Now with 100% More Pie and Donut Charts!

By: randyzwitch - Articles

Re-posted from: http://randyzwitch.com/vega-jl-julia/

Mmmmm, chartjunk!

Rebooting Vega.jl

Recently, I’ve found myself without a project to hack on, and I’ve always been interested in learning more about browser-based visualization. So I decided to revive the work that John Myles White had done in building Vega.jl nearly two years ago. And since I’ll be giving an analytics & visualization workshop at JuliaCon 2015, I figure I better study the topic in a bit more depth.

Back In Working Order!

The first thing I tackled here was to upgrade the syntax to target v0.4 of Julia. This is just my developer preference, to avoid using Compat.jl when there are so many more visualizations I’d like to support. So if you’re using v0.4, you shouldn’t see any deprecation errors; if you’re using v0.3, well, eventually you’ll use v0.4!

Additionally, I modified the package to recognize the traction that Jupyter Notebook has gained in the community. Whereas the original version of Vega.jl only displayed output in a tab in a browser, I’ve overloaded the writemime method to display :VegaVisualization inline for any environment that can display HTML. If you use Vega.jl from the REPL, you’ll still get the same default browser-opening behavior as existed before.

The First Visualization You Added Was A Pie Chart…

…And Followed With a Donut Chart?

Yup. I’m a troll like that. Besides, being loudly against pie charts is blowhardy (even if studies have shown that people are too stupid to evaluate them).

Adding these two charts (besides trolling) was a proof-of-concept that I understood the codebase sufficiently in order to extend the package. Now that the syntax is working for Julia v0.4, I understand how the package works (important!), and have improved the workflow by supporting Jupyter Notebook, I plan to create all of the visualizations featured in the Trifacta Vega Editor and other standard visualizations such as boxplots. If the community has requests for the order of implementation, I’ll try and accommodate them. Just add a feature request on Vega.jl GitHub issues.

Why Not Gadfly? You’re Not Starting A Language War, Are You?

No, I’m not that big of a troll. Besides, I don’t think we’ve squeezed all the juice (blood?!) out of the R vs. Python infographic yet, we don’t need another pointless debate.

My sole reason for not improving Gadfly is just that I plain don’t understand how the codebase works! There are many amazing computer scientists & developers in the Julia community, and I’m not really one of them. I do, however, understand how to generate JSON strings and in that sense, Vega is the perfect platform for me to contribute.

Collaborators Wanted!

If you’re interested in visualization, as well as learning Julia and/or contributing to a package, Vega.jl might be a good place to start. I’m always up for collaborating with people, and creating new visualizations isn’t that difficult (especially with the Trifacta examples). So hopefully some of you will be interested in enough to join me to adding one more great visualization library to the Julia community.

Visualizing Analytics Languages With VennEuler.jl

By: randyzwitch - Articles

Re-posted from: http://randyzwitch.com/visualizing-analytics-languages-venneuler-jl/

It often doesn’t take much to get me off track, and on a holiday weekend…well, I was just begging for a fun way to shirk. Enter Harlan Harris:

someone redo this area-prop’l Venn w/ my Julia pkg! http://t.co/Mh8rXZbRgY http://t.co/RDWNQHTw3S http://t.co/ljujd9DG0T via @revodavid

— Harlan Harris (@HarlanH) August 29, 2014

Hey, I’m someone looking for something to do! And I like writing Julia code! So let’s have a look at recreating this diagram in Julia using VennEuler.jl (IJulia Notebook link):

Source: Revolution R/KDNuggets

http://blog.revolutionanalytics.com/2014/08/r-tops-kdnuggets-data-analysis-software-poll-for-4th-consecutive-year.html

Installing VennEuler.jl

Because VennEuler.jl is not in METADATA as of the time of writing, instead of using Pkg.add() you’ll need to run:

 Pkg.clone("https://github.com/HarlanH/VennEuler.jl.git")

Note that VennEuler uses some of the more exotic packages (at least to me) like NLopt and Cairo, so you might need to have a few additional dependencies installed with the package.

Data

The data was a bit confusing to me at first, since the percentages add up to more than 100% (people could vote multiple times). In order to create a dataset to use, I took the percentages, multiplied by 1000, then re-created the voting pattern. The data for the graph can be downloaded from this link.

Code – Circles

With a few modifications, I basically re-purposed Harlan’s code from the package test files. The circle result is as follows:

	#Circles
	eo = make_euler_object(labels, data, EulerSpec()) # circles, for now

	(minf,minx,ret) = optimize(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=120, pop=1000)
	println("got $minf at $minx (returned $ret)")

	render("/home/rzwitch/Desktop/kd.svg", eo, minx)

view raw venneuler-circle.jl hosted with

by GitHub

Since the percentage of R, SAS, and Python users isn’t too dramatically different (49.81%, 33.42%, 40.97% respectively) and the visualizations are circles, it’s a bit hard to tell that R is about 16% points higher than SAS and 9% points higher than Python.

Code – Rectangles

Alternatively, we can use rectangles to represent the areas:

	# Rectangles
	eo = make_euler_object(labels, data, [EulerSpec(:rectangle), EulerSpec(:rectangle, [.5, .5, .4], [0, 0, 0]),
	EulerSpec(:rectangle)],
	sizesum=.3)


	(minf,minx,ret) = optimize_iteratively(eo, random_state(eo), ftol=-1, xtol=0.0025, maxtime=5, pop=100)
	println("phase 1: got $minf at $minx (returned $ret)")
	(minf,minx,ret) = optimize(eo, minx, ftol=-1, xtol=0.001, maxtime=30, pop=100)
	println("phase 2: got $minf at $minx (returned $ret)")

	render("/home/rzwitch/Desktop/kd-rects.svg", eo, minx)

view raw venneuler-rect.jl hosted with

by GitHub

Here, it’s a slight bit easier to see that SAS and Python are about the same area-wise and that R is larger, although the different dimensions do obscure this fact a bit.

Summary

If I spent more time with this package, I’m sure I could make something even more aesthetically pleasing. And for that matter, it’s still a pre-production package that will no doubt get better in the future. But at the very least, there is a way to create an area-accurate representation of relationships using VennEuler.jl in Julia.

juliabloggers.com

A Julia Language Blog Aggregator

Tag Archives: Data Visualization

JuliaCon 2015: Everyday Analytics and Visualization (video)