My first Twitch live streaming session

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/05/12/twitch.html

Introduction

In 24 hours I will have my first Twitch live streaming session. It will begin on
Friday, May 13, 7 PM EDT on ManningPublications channel.

In this post I want to share the source material I am going to present so that
everyone interested can easily follow it.

The codes are a shortened version of contents of chapters 8 and 9 of my upcoming
Julia for Data Analysis book.

Environment setup

I will run the codes under Julia 1.7.2. You will need to install the following
packages (I show you the versions of the packages I use):

  • CSV.jl 0.10.4
  • CodecBzip2.jl 0.7.2
  • DataFrames.jl 1.3.4
  • Loess.jl 0.5.4
  • Plots.jl 1.28.1

The problem

In the session I will analyze Lichess puzzles database. It contains
information about over 2,000,000 puzzles, covering such data as number of times
a given puzzle was played, how hard the puzzle is, how much Lichess users like
the puzzle, or what chess themes the puzzle features. My goal is to check the
relationship between the puzzle hardness and how much users like it.

Source codes

Here are the source codes that I am going to present and explain during the
session.

I will start with fetching the data from the internet, unpacking it, and reading
it into a data frame:

import Downloads
Downloads.download("https://database.lichess.org/lichess_db_puzzle.csv.bz2",
                   "puzzles.csv.bz2")

using CodecBzip2
compressed = read("puzzles.csv.bz2")
plain = transcode(Bzip2Decompressor, compressed)

using CSV
using DataFrames
puzzles = CSV.read(plain, DataFrame;
                   header=["PuzzleId", "FEN", "Moves", "Rating","RatingDeviation",
                           "Popularity", "NbPlays", "Themes","GameUrl"])

describe(puzzles)

Next, I will perform exploratory data analysis of the data base and subset it
to only keep the puzzles that I will later want to analyze:

using Plots
plot([histogram(puzzles[!, col]; label=col) for
      col in ["Rating", "RatingDeviation", "Popularity", "NbPlays"]]...)

using Statistics
plays_lo = median(puzzles.NbPlays)
rating_lo = 1500
rating_hi = quantile(puzzles.Rating, 0.99)
row_selector = (puzzles.NbPlays .> plays_lo) .&&
               (rating_lo .< puzzles.Rating .< rating_hi)

sum(row_selector)
count(row_selector)

good = puzzles[row_selector, ["Rating", "Popularity"]]

plot(histogram(good.Rating; label="Rating"),
     histogram(good.Popularity; label="Popularity"))

describe(good)

Finally I will perform some aggregation data of the data stored in the Lichess
database and analyze the relationship between puzzle difficulty and popularity:

grouped_good = groupby(good, :Rating, sort=true)
agg_good = combine(grouped_good, :Popularity => mean)
scatter(agg_good.Rating, agg_good.Popularity_mean;
        xlabel="rating", ylabel="mean popularity", legend=false)

using Loess
model = loess(agg_good.Rating, agg_good.Popularity_mean)
agg_good.pred = predict(model, float.(agg_good.Rating))
plot!(agg_good.Rating, agg_good.pred; width=5)

Conclusions

I invite everyone to join me during the Twitch live streaming session.
If you would have any questions please do not hesitate to ask them in chat and I
will try to answer them live. I hope you will enjoy it!