Author Archives: Navi

World Happiness Report – EDA & clustering with Julia

By: Navi

Re-posted from: https://indymnv.dev/posts/005_happines/index.html

World Happiness Report – EDA & clustering with Julia

Date: 2023-11-23

Summary: An exploration of Happiness Report using Julia

tags: #Julia #economy #clustering #EDA

Introduction
Packages used
Clustering
Conclusions

Introduction

The purpose of this post is to show Julia as a language for data analysis and Machine Learning. Sadly Kaggle does not support Julia Kernels (hopefully, they will add it in the future). Therefore I wanted to take advantage of this space to show a reimplementation of Python/R Notebooks to Julia. In this context, I took data on happiness in countries in 2021 and some factors considered in this exciting survey.

You can get the dataset in Kaggle
The full code is in my Github

Packages used

I'm using Julia version 1.8.0 in this project, and the library versions are in the Project.toml, there are some installed that I didn't end up using for this analysis, but these are the important ones

using DataFrames
using DataFramesMeta
using CSV
using Plots
using StatsPlots
using Statistics
using HypothesisTests
Plots.theme(:ggplot2)

Let's start reading the file.

df_2021 = DataFrame(CSV.File("./data/2021.csv", normalizenames=true))

You can see the dataset in the REPL.

julia> df_2021 = DataFrame(CSV.File("./data/2021.csv", normalizenames=true))
149×20 DataFrame
 Row │ Country_name    Regional_indicator            Ladder_score  Standard_error_of_ladder_score  upperwhi ⋯
     │ String31        String                        Float64       Float64                         Float64  ⋯
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Finland         Western Europe                       7.842                           0.032         7 ⋯
   2 │ Denmark         Western Europe                       7.62                            0.035         7
   3 │ Switzerland     Western Europe                       7.571                           0.036         7
   4 │ Iceland         Western Europe                       7.554                           0.059         7
   5 │ Netherlands     Western Europe                       7.464                           0.027         7 ⋯
   6 │ Norway          Western Europe                       7.392                           0.035         7
   7 │ Sweden          Western Europe                       7.363                           0.036         7
   8 │ Luxembourg      Western Europe                       7.324                           0.037         7
   9 │ New Zealand     North America and ANZ                7.277                           0.04          7 ⋯
  10 │ Austria         Western Europe                       7.268                           0.036         7
  11 │ Australia       North America and ANZ                7.183                           0.041         7
  12 │ Israel          Middle East and North Africa         7.157                           0.034         7
  13 │ Germany         Western Europe                       7.155                           0.04          7 ⋯
  14 │ Canada          North America and ANZ                7.103                           0.042         7
  ⋮  │       ⋮                      ⋮                     ⋮                      ⋮                      ⋮   ⋱
 136 │ Togo            Sub-Saharan Africa                   4.107                           0.077         4
 137 │ Zambia          Sub-Saharan Africa                   4.073                           0.069         4
 138 │ Sierra Leone    Sub-Saharan Africa                   3.849                           0.077         4 ⋯
 139 │ India           South Asia                           3.819                           0.026         3
 140 │ Burundi         Sub-Saharan Africa                   3.775                           0.107         3
 141 │ Yemen           Middle East and North Africa         3.658                           0.07          3
 142 │ Tanzania        Sub-Saharan Africa                   3.623                           0.071         3 ⋯
 143 │ Haiti           Latin America and Caribbean          3.615                           0.173         3
 144 │ Malawi          Sub-Saharan Africa                   3.6                             0.092         3
 145 │ Lesotho         Sub-Saharan Africa                   3.512                           0.12          3
 146 │ Botswana        Sub-Saharan Africa                   3.467                           0.074         3 ⋯
 147 │ Rwanda          Sub-Saharan Africa                   3.415                           0.068         3
 148 │ Zimbabwe        Sub-Saharan Africa                   3.145                           0.058         3
 149 │ Afghanistan     South Asia                           2.523                           0.038         2

To see the columns name, simply use

names(df_2021)

getting a vector with all column names

julia> names(df_2021)
20-element Vector{String}:
 "Country_name"
 "Regional_indicator"
 "Ladder_score"
 "Standard_error_of_ladder_score"
 "upperwhisker"
 "lowerwhisker"
 "Logged_GDP_per_capita"
 "Social_support"
 "Healthy_life_expectancy"
 "Freedom_to_make_life_choices"
 "Generosity"
 "Perceptions_of_corruption"
 "Ladder_score_in_Dystopia"
 "Explained_by_Log_GDP_per_capita"
 "Explained_by_Social_support"
 "Explained_by_Healthy_life_expectancy"
 "Explained_by_Freedom_to_make_life_choices"
 "Explained_by_Generosity"
 "Explained_by_Perceptions_of_corruption"
 "Dystopia_residual"

To see what is a regional indicator, we can see how every country is grouped.

julia> unique(df_2021.Regional_indicator)
10-element Vector{String}:
 "Western Europe"
 "North America and ANZ"
 "Middle East and North Africa"
 "Latin America and Caribbean"
 "Central and Eastern Europe"
 "East Asia"
 "Southeast Asia"
 "Commonwealth of Independent States"
 "Sub-Saharan Africa"
 "South Asia"

Let's do a simple operation with the dataframe getting the number of countries by regional indicator and sorting those

sort(
    combine(groupby(df_2021, :Regional_indicator), nrow), 
    :nrow
)

Getting this output

julia> sort(
           combine(groupby(df_2021, :Regional_indicator), nrow),
           :nrow
       )
10×2 DataFrame
 Row │ Regional_indicator                 nrow
     │ String                             Int64
─────┼──────────────────────────────────────────
   1 │ North America and ANZ                  4
   2 │ East Asia                              6
   3 │ South Asia                             7
   4 │ Southeast Asia                         9
   5 │ Commonwealth of Independent Stat…     12
   6 │ Middle East and North Africa          17
   7 │ Central and Eastern Europe            17
   8 │ Latin America and Caribbean           20
   9 │ Western Europe                        21
  10 │ Sub-Saharan Africa                    36

With this, we can see a more significant number of countries in Sub-Saharan Africa and only a smaller group of countries in North America and ANZ.

Now, let's try to slice our data. We will create a data frame called float_df that contains only the Float64 variables but excludes the "explained_" variables. This new dataframe will help us with some operations later.

#Get all columns Float64
float_df = select(df_2021, findall(col -> eltype(col) <: Float64, eachcol(df_2021)))#Take away the Explained variables
float_df = float_df[:,Not(names(select(float_df, r"Explained")))]

Let's make our first plot.

scatter(
    df_2021.Social_support,
    df_2021.Ladder_score,
    size = (1000,800),
    label="country",
    xaxis = "Social Support",
    yaxis = "Ladder Score",
    title = "Relation between Social Support and Happiness Index Score by country"
)

![scatterplot with ladder score and social support](/assets/005_happines/scatterplot.png)

If we want a view of all float variables in several histograms, we can add this code using Statsplots.

N = ncol(float_df)
numerical_cols = Symbol.(names(float_df,Real))
@df float_df Plots.histogram(cols();
                             layout=N,
                             size=(1400,800),
                             title=permutedims(numerical_cols),
                             label = false)

Histogram of all variables

And If we want to compare it with boxplots.

@df float_df boxplot(cols(), 
                     fillalpha=0.75, 
                     linewidth=2,
                     title = "Comparing distribution for all variables in dataset",
                     legend = :topleft)

Boxplot all variables

Without going into so much detail, we can affirm that the Ladder Score is the variable related to the result of the survey on the degree of happiness in the country (our dependent variable). Explained variables correspond to the preprocessing to build the Ladder Score, for this reason, we remove them from the dataframe and will hold with only the raw data.

What are the top 5 countries and bottom 5?

# Top 5 and bottom 5 countries by ladder score
sort!(df_2021, :Ladder_score, rev=true)
plot(
    bar(
        first(df_2021.Country_name, 5 ),
        first(df_2021.Ladder_score, 5 ),
        color= "green",
        title = "Top 5 countries by Happiness score",
        legend = false,
    ),
    bar(
        last(df_2021.Country_name, 5 ),
        last(df_2021.Ladder_score, 5 ),
        color ="red",
        title = "Bottom 5 countries by Happiness score",
        legend = false,
    ),
size=(1000,800),
yaxis = "Happines Score",
)

top5 and bottom 5

And the classic heatmap for correlation with the following function.

function heatmap_cor(df)
    cm = cor(Matrix(df))
    cols = Symbol.(names(df))    (n,m) = size(cm)
    display(
    heatmap(cm, 
        fc = cgrad([:white,:dodgerblue4]),
        xticks = (1:m,cols),
        xrot= 90,
        size= (800, 800),
        yticks = (1:m,cols),
        yflip=true))
    display(
    annotate!([(j, i, text(round(cm[i,j],digits=3),
                       8,"Computer Modern",:black))
           for i in 1:n for j in 1:m])
    )
end

heatmap

And now, we can build a function where we can get the mean ladder score by regional indicator and compare it with the distribution of all countries.

function distribution_plot(df)
    display(
        @df df density(:Ladder_score,
        legend = :topleft, size=(1000,800) , 
        fill=(0, .3,:yellow),
        label="Distribution" ,
        xaxis="Happiness Index Score", 
        yaxis ="Density", 
        title ="Comparison Happiness Index Score by Region 2021") 
    )
    display(
        plot!([mean(df_2021.Ladder_score)],
        seriestype="vline",
        line = (:dash), 
        lw = 3,
        label="Mean")
    )
    for element in unique(df_2021.Regional_indicator)
        display(
            plot!(
            [mean(mean([filter(row->row["Regional_indicator"]==element, df).Ladder_score]))],
            seriestype="vline",
            lw = 3,
            label="$element") 
        )
    end
end

distribution region

Suppose we want to try the same idea but with countries. In that case, we can take advantage of multiple dispatch and create a function that receives a list of countries and creates a variation of the distribution with countries.

function distribution_plot(df, var_filter, list_elements)
    display(
        @df df density(:Ladder_score,
        legend = :topleft, size=(1000,800) , 
        fill=(0, .3,:yellow),
        label="Distribution" ,
        xaxis="Happiness Index Score", 
        yaxis ="Density", 
        title ="Happiness index score compare by countries 2021") 
    )
    display(
        plot!([mean(df_2021.Ladder_score)],
        seriestype="vline",
        line = (:dash), 
        lw = 3,
        label="Mean")
    )
    for element in list_elements
        display(
            plot!(
            mean([filter(row->row[var_filter]==element, df).Ladder_score]),
            seriestype="vline",
            lw = 3,
            label="$element") 
        )
    end
end

Let's test our new function, comparing three countries.

distribution_plot(df_2021, "Country_name", ["Chile",
                                            "United States",
                                            "Japan",
                                           ])

distribution countries

Here we can see how the USA has the highest score, followed by Chile and Japan.

To end the first part, let's apply some statistical tests. We will use an equal variance T-test to compare distribution from different regions. The function is as follows.

# Perform a simple test to compare distributions
# This function performs a two-sample t-test of the null hypothesis that s1 and s2 
# come from distributions with equal means and variances 
# against the alternative hypothesis that the distributions have different means 
# but equal variances.
function t_test_sample(df, var, x , y)
    x = filter(row ->row[var] == x, df).Ladder_score
    y = filter(row ->row[var] == y, df).Ladder_score
    EqualVarianceTTest(vec(x), vec(y))
end

We will have this output if we compare Western Europe and North America and ANZ.

t_test_sample(df_2021, "Regional_indicator", "Western Europe", "North America and ANZ")

julia> t_test_sample(df_2021, "Regional_indicator", "Western Europe", "North America and ANZ")
Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -0.213595
    95% confidence interval: (-0.9068, 0.4796)Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.5301Details:
    number of observations:   [21,4]
    t-statistic:              -0.6374218416101513
    degrees of freedom:       23
    empirical standard error: 0.3350924366753546

We don't have enough evidence to reject the hypothesis that these samples come from distributions with equal means and variance. On another side, if we try comparing Western Europe with South Asia, we can see this:

julia> t_test_sample(df_2021, "Regional_indicator", "South Asia", "Western Europe")
Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -2.47305
    95% confidence interval: (-3.144, -1.802)Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-07Details:
    number of observations:   [7,21]
    t-statistic:              -7.576776118465833
    degrees of freedom:       26
    empirical standard error: 0.32639840222022687

In this case, we can reject that hypothesis.

Clustering

Now we will cluster the countries using the popular algorithm Kmeans. My first option was to use clustering.jl. However, determining the ideal number of clusters is necessary to get the Wcss (within-cluster sum of the square). With this, we can evaluate it with the elbow method, so I used Scikit-learn wrapper. I also include an issue. Well, let's continue with the last part. I started adding some libraries.

using Random
using ScikitLearn
using PyCall@sk_import preprocessing: StandardScaler
@sk_import cluster: KMeans

Let's take out from the float_df all the variables related to Ladder_score, and keep only the variables considered in the survey.

select!(float_df, Not([:Standard_error_of_ladder_score, 
                           :Ladder_score, 
                           :Ladder_score_in_Dystopia, 
                           :Dystopia_residual]))

To train our model, we need to standardize the data, and then we will create a list to retrieve the wcss in every iteration. The function is as follows:

function kmeans_train(df)
    X = fit_transform!(StandardScaler(), Matrix(df))    wcss = []
    for n in 1:10        Random.seed!(123)
        cluster =KMeans(n_clusters=n,
                        init = "k-means++",
                        max_iter = 20,
                        n_init = 10,
                        random_state = 0)
        cluster.fit(X)
        push!(wcss, cluster.inertia_)
    end
    return wcss
end

Let's invoke the function and plot the wcss.

wcss = kmeans_train(float_df)plot(wcss, title = "wcss in each cluster",
    xaxis = "cluster",
   yaxis = "Wcss")

Elbow Method

In this case, I decided to go for three clusters. We can <del>abuse</del> make use of multiple dispatch again, adding n for a defined number of clusters.

function kmeans_train(df, n)
    X = fit_transform!(StandardScaler(), Matrix(df))    Random.seed!(123)
    cluster =KMeans(n_clusters=n,
                    init = "k-means++",
                    max_iter = 20,
                    n_init = 10,
                    random_state = 0)
    cluster.fit(X)
    return cluster
endcluster= kmeans_train(float_df, 3)

If we take the first plot we did at the beginning of the post, but now we add the cluster labels, we have this plot.

scatter(df.Social_support,
        df.Ladder_score,
        marker_z = cluster.labels_,
        legend = false,
        size = (1000,800),
        xaxis = "Social Support",
        yaxis = "Ladder Score",
        title = "Comparison between social support and ladder score by country incorporating clustering")

Scatter with cluster

With these clusters, we have a group with developed countries with the highest happiness index score. For example, Finland, Australia and Germany, followed by a group of emerging countries. Finally, countries that still have a significant debt for the well-being of their population.

histogram(filter(row ->row.cluster ==1,df).Ladder_score, label = "cluster 1", title = "Distribution of Happiness Score by Cluster", xaxis = "Ladder Score", yaxis="n° countries")
histogram!(filter(row ->row.cluster ==2,df).Ladder_score, label = "cluster 2")
histogram!(filter(row ->row.cluster ==3,df).Ladder_score, label = "cluster 3")

histogram happiness cluster

Finally, we can compare how this cluster affects all the variables.

@df float_df Plots.density(cols();
                             layout=N,
                             size=(1600,1200),
                             title=permutedims(numerical_cols),
                             group = df.cluster,
                             label = false)

Distribution by variables with cluster

Conclusions

From my experience using Python for about two years in data analysis and recently dabbling with Julia, I can say that the ecosystem generally seems quite mature for this purpose. I had some questions that the community immediately answered on Julia Discourse. More content like this is needed so that the data science community can more widely adopt this technology.

Creating your own blog with Julia and Franklin

By: Navi

Re-posted from: https://indymnv.dev/posts/006_build_blog/index.html

Creating your own blog with Julia and Franklin

Date: 2023-08-10

Summary: Describing the steps to create your own blog, so you can stop posting your code on Tinder

tags: #Julia #Writing #WebDevelopment #Franklin

Introduction
Some Reasons to Create Your Own Blog
Installation
First Steps
1. Selecting a template
2. Cleaning the template
3. Creating your first post
Deployment
1. Hosting in a different domain (optional)
RSS and Tags
1. Host your Feed to JuliaBloggers (optional)
Conclusions
Acknowledgment

Introduction

In this post, we are going to discuss how to build your own blog with Julia and Franklin.jl, a popular static site generator among Julia users who create their own blogs or even build websites for tutorials. I hope that if you are reading this entry and you don't have your own space, it can motivate you to build your own website.

Some Reasons to Create Your Own Blog

Blogs may sound old-fashioned, something created by people who are still living in the 90s, typing with passion about the political system while listening to Soundgarden in the background and drinking some kind of cheap beer… or programmers. And because if you are reading this content, you're probably at least the second one, you should consider that having a blog is a nice way to:

Track your progress in your field
Generate content that can be useful for somebody else
Help the open-source community with diffusion, tutorials, etc.
Create your own space and adapt it you your needs
Build your personal brand and help you to find a job

But why Franklin? Franklin is one of the most popular libraries for this purpose in Julia. It offers seamless integration with running Julia scripts so you can use julia for demostrations in your blog this coud be harder with other static site generators. If you only want to create basic entries with some code and images, perhaps Franklin.jl might not be that different from Hugo or Jekyll.

Installation

The first step is to create a folder where you will save your project. Once you are ready, open the Julia REPL in the location where the folder should be. When it's ready, type ] to activate the package manager and then type:

(@v1.9) pkg> add Franklin

then, return to the Julia Repl and import the library:

julia> using Franklin

Remember to make sure you have successfully installed the Franklin library before trying to import it.

First Steps

To create your website, you can choose one of the templates available. In my case, I just used the basic one, but if you have a different preference, feel free to go ahead; they all follow similar structures. You can also import another template that you like more and adapt it to your website. Please read the documentation for instructions on how to do this.

Selecting a template

Once you have decided your template, type in the REPL the next instruction

julia> newsite("myBlog", template="basic") #you can choose another name and template

This will create a folder with various directories and elements. It will also activate the environment inside the project. So, if you verify the project with ], it should display the name of your project.

.
├── 404.md            # Page for error 404
├── Manifest.toml     # The typical toml files for Julia development project
├── Project.toml
├── __site            # Generate your full website.
├── _assets           # You can add pictures and images here
├── _css              # All related to styling your website
├── _layout           # All related to the structure of your website
├── _libs             # Here will go all elements for website like katex, searchbar, etc  
├── _rss              # A couple of files related to rss feed, 
├── config.md         # Set Global variables for your website
├── index.md          # Main landing page
├── pages.md          # All your pages / you can create your folder or organize in different way
└── utils.jl          # Julia File for setting some configurations

Finally type:

julia> serve()

It should open your website locally in the browser, and it should look exactly the same as the template website you chose.

starting template

From this point, it's time to delete some files and content. You might also want to add some pages for your projects, about, contact, etc. This is up to you, but for now, we are going to keep just 2 pages: one for the main "about" page and another to host all your posts.

Cleaning the template

Now, go to the "index.md" page and delete all its content. This page will become your main page, and you can mix HTML and Markdown in this file to add whatever you want to it.

# Welcome to my blog
## I am using Franklin~~~
    <img src="/assets/rndimg.jpg" height="300" class="main-picture" >
    <p>
    <p>
~~~
This is an introductory message

You might have noticed that in our main page, there are four links to different pages. You can choose to keep those links or delete them all. However, for the purpose of creating a blog section, let's use one of those links. To do that, follow these steps:

Go to the "header.html" file located in the "layout" folder.
Modify the code in the "header.html" file to something like this:

<header>
<div class="blog-name"><a href="/"></a>Amazing Blog</div>
<nav>
  <ul>
    <li><a href="/">Home</a></li>
    <li><a href="/menu1/">Blog</a></li>
  </ul>
  <img src="/assets/hamburger.svg" id="menu-icon">
</nav>
</header>

If you're looking to change the background color to something more interesting than white, now is the time to showcase your frontend skills. Follow these steps:

Navigate to the "franklin.css" file.
In the first block of code, add the background color that you prefer. For instance:

:root {
  --block-background: hsl(0, 0%, 94%);
  --output-background: hsl(0, 0%, 98%);
  --small: 14px;
  --normal: 19px;
  --text-color: hsv(0, 0%, 20%);
	background-color: aqua;
}

Finally, after making these modifications, the result should look something like this:

frontend

Creating your first post

Now, if you're ready to start your own blog, here's how you can set up the "posts" folder to add your articles, create a new folder named "posts" in the same root directory as your other folders. Is important to consider this things.

Inside the "posts" folder, you can add all your articles. You have the flexibility to use both Markdown files and HTML files for your articles.
If you're doing literate programming with tools like Pluto or Jupyter, you can export your notebooks to HTML format and place them in the "posts" folder. This way, anyone can easily view your data science projects.

For now, let's add a file called test1.md inside the posts folder and you can add some text

# This is a title in my first post
 
So I can write anything## Here is an introductionWe are going to write some code:using LinearAlgebra
a = [1, 2, 3, 3, 4, 5, 2, 2]
@show dot(a, a)
println(dot(a, a))

Then, go to the menu1.md file, erase the remaining content, and create a link to the test1.md file. This is as simple as:

* [This is the title of my blog](../posts/test1)

If you save it, and navigate to http://localhost:8000/posts/test1/, you should see your post displayed clearly. This page will include your "about" section and the space to write your blog content. Congratulations! You now have a basic understanding of how Franklin works and can make any further edits or modifications you desire.

If you wish to further style your website, please go ahead and customize it to your heart's content.

Deployment

Now it's time to host your website in some place. One of the most straightforward options is using GitHub. Here's how you can do it:

Create a Repository: Go to your GitHub account and create an empty repository. When entering the name of your project, you have two paths to choose from:

a. If this is a personal website or organization, the name of your project should be something like username.github.io.

b. You can create your own custom name for your project, like myblog.

If you're unsure which option to choose, I recommend going with option (a) because it's more straightforward. If you choose option (b), you'll need to define a prepath variable in your config.md with the name of that project. For instance: @def prepath = "myblog".

Upload Your Project: Now upload your project to GitHub, following the instructions in your repository.
Configure GitHub Pages: Once you've pushed your project, go to the Settings tab in your repository. Then navigate to GitHub Pages. In the Source dropdown, select gh-pages. If you see a message indicating success, your project is now live.
Check Your Website: You can now open your web browser and enter the link of your project, which would be username.github.io. If you can see your website, congratulations! Your blog is now live on the internet.

By following these steps, you've successfully hosted your Franklin-generated website on GitHub Pages. It's now accessible to anyone with the link, and you can share your content with the world.

Hosting in a different domain (optional)

If you're hesitant to share your GitHub username due to its lengthy or unconventional extension, or if you prefer a more professional-looking link, you might want to consider an alternative domain, such as .com or .dev. You can purchase a domain and link it to your website. For example, you can use services like Google Domains to find and purchase a domain that suits your preference.

Once you've found and acquired the domain you like, you can proceed to link it to your website. To do this, you need to configure the DNS settings. You can find detailed explanations about custom domains and GitHub Pages in the documentation. In a nutshell, follow these steps:

Go to Google Domains, select your domain, and navigate to the DNS section.
Configure the DNS records, as shown below:

dns_setup

After correctly setting up the DNS records, go to your GitHub project repository's settings, then navigate to Pages and enter your custom domain:

custom_domain

If everything is set up correctly, GitHub will confirm the configuration. In a few minutes, your website should become accessible via your new custom domain.

By following these steps, you'll be able to link a custom domain to your Franklin-generated website, providing a more personalized and professional web presence.

RSS and Tags

Now that your website is up and running, setting up an RSS feed is important for people who want to stay updated on your new articles without having to visit your website daily. Tools like Newsboat or Inoreader help users keep track of updates from various websites, making an RSS feed a valuable addition to your blog.

Thankfully, Franklin makes setting up an RSS feed quite simple. All you need to do is go to each page in your "posts" folder and add a small description within +++ brackets, like this:

+++
tags = ["Julia", "Writing"] rss_title = "Creating your own blog with Julia and Franklin"
rss_description = "Describing the steps to create your own blog, so you can stop posting your code on Instagram"
rss_pubdate = Date(2023, 8, 10) 
+++

The RSS fields you add will be included in the information extracted by platforms like Newsboat. From these applications, I can read the title, a brief description, and the publication date and all the content if it's available. Additionally, you'll notice a "tags" section. This is also important because it allows users to filter by topics. For example, if you write different blogs about topics ranging from Julia programming to analysis of Shakira's new songs, users can select the topics they're specifically interested in.

To share your blog's RSS feed, you'll need a URL like https://www.yourdomain.com/feed.xml. Make sure to prominently display this URL in your website so that readers can easily find and subscribe to your feed.

Host your Feed to JuliaBloggers (optional)

Lastly, if you're considering writing about Julia and want to contribute to the community, don't hesitate to share your work. Whether it's a calculator project, a website, a 2D game, or a cutting-edge machine learning algorithm, your contributions will help the Julia community grow and provide valuable insights for others to learn from.

Visit the JuliaBloggers Website and add your information. In the "Feed URL" field, you can use a URL similar to the first example you mentioned, like:

http://indymnv.dev/tag/julia/feed/

Once you've submitted this information, every time you publish a new post on your website, the community will be able to see it. If you want to test this process first, you can use an RSS reader like Newsboat or Inoreader to ensure that your updates are being picked up as expected.

Conclusions

I hope you enjoyed reading this article. If you haven't yet created your own website, I hope it serves as motivation to get started, whether you choose to use Franklin or another static site generator. Having your own online space to write about your interests and dive as deep as you like is a rewarding endeavor. Don't hesitate to embark on this journey and create a platform that showcases your passion and expertise. Happy blogging!

Acknowledgment

I also want to thank Thibaut Lienart, who is the main developer of Franklin. His work has been incredibly beneficial for the community.

set up NeoVim + Tmux for a Data Science Workflow with Julia

By: Navi

Re-posted from: https://indymnv.dev/posts/004_nvim/index.html

set up NeoVim + Tmux for a Data Science Workflow with Julia

Date: 2023-07-04

Summary: Notes to start walking with Neovim and Tmux in the Data Science World

tags: #Python #Julia #rstats #tmux #neovim #tooling

Introduction
Why start using Neovim and Tmux? My motivations
Instalation process required
Editing the init.lua and tmux.config
Some challenges for improving the workflow
Conclusions

Introduction

In this post, I will provide some notes on getting started with Neovim for a Data Science Workflow. This setup is not strictly related to Julia and can also be used with Python and R. The idea is to work with a double panel structure, where one side contains your code and the other side has the REPL, which receives the snippets of code you send from the code side.

In this blog, I will mention the things you can add to make it comfortable for data analysis or more serious development. With this typical kick starter in Neovim and Tmux, I will explain some changes and new packages that are important for this purpose. Finally, I will dive into the details that still need improvement.

Why start using Neovim and Tmux? My motivations

I must say, I like notebooks. I used them extensively in my first job in analytics, and they really helped me dive into the problem and experiment with different use cases. I have also worked with VSCode, although I am not a big fan of it, it has helped me in some specific use cases where a more "software engineering" perspective is needed.

With that in mind, when I read the book "Approaching Almost Any Machine Learning Problem" by Abhishek Thakur, and then watched the controversial and yet funny conference by Joel Grus on why he dislikes notebooks, I started to think deeply about the perspective of writing software that follows good practices, is expressive, and still easy to prototype. Unfortunately, I think Grus is right about it. Working with notebooks can lead to some weird behaviors, like running cells in different positions, populating your analysis with too much unnecessary information and plots, not creating abstractions when needed, and having issues with reproducibility. On the other hand, the script perspective didn't help me with fast iteration when I needed quick answers to simple questions.

When I moved to Julia, I realized that the REPL is in another level, and I understood that an important part of this community uses (Neo)Vim or Emacs for development. I was curious about using these tools for data science projects. Although there are not many articles about it, and the community using Vim/Emacs is quite small compared to other options, I found it pretty cool because it is minimalistic (though you can customize it extensively), fast, and promises to increase productivity after mastering the Vim keybindings (in 10 years).

What I realized is that you can still prototype like notebooks (but with a perspective closer to RStudio) with one pane for your code and another pane with your REPL open. Then, you can start transforming your code to make it look like serious software, all within one window without moving all the .ipynb file contents to another .py or .jl file. I found this workflow more enjoyable.

Instalation process required

First of all, make sure to install Neovim and Tmux. There are plenty of tutorials out there on this topic, so I won't go into the details here.

The important thing is to create an init.lua file. If you don't want to install everything one by one, I recommend following the kick starter, provided by nvim-lua/kickstart.nvim. It provides the basic tools for working with Neovim, including a package manager, Treesitter, LSP integration, etc. This kickstarter uses LazyVim, which should be faster and doesn't require frequent updates like PackerNvim. Just create the init.lua file, copy and paste all the content into that file, save it, and quit. When you open it again, it should start installing or upgrading everything.

Also you want to create a Tmux config, the default config in Tmux is already ok for working with Data Science or the Julia experience, but anyway you would want to edit a file for setting a colorscheme or edit some shortcuts, to create just add ~/.tmux.conf

You also want to create a Tmux config file. The default config in Tmux is already suitable for working with Data Science or the Julia experience, but you may want to edit it to set a colorscheme or modify some shortcuts. Just add ~/.tmux.conf to create the file.

Editing the init.lua and tmux.config

Here are the things required for editing the init.lua file:

Add Julia in the init.lua for treesitter and also consider to add the julials = {} for local servers
Make sure to add the sysimage for the languageserver. in this discussion, they summarize the procedures well, Follow the instructions and add the snippet of code to your init.lua

-- Run Julia LSP
require'lspconfig'.julials.setup{
    on_new_config = function(new_config, _)
        local julia = vim.fn.expand("~/.julia/environments/nvim-lspconfig/bin/julia")
        if require'lspconfig'.util.path.is_file(julia) then
	    -- vim.notify("Hello!")
            new_config.cmd[1] = julia
        end
    end
}

This should be enough for setting up Julia. If you open a Julia file, Neovim should be able to detect the LSP and work with other properties like jump to definition, etc. For R and Python, this should be a bit more straightforward for now (not need step 2).
There are other things you want to add, for a data science project, one is vim-slime, This package is great for sending snippets of code from your file to a Julia REPL. Make sure to install it and add the following code to your init.lua. This will allow you to open a Tmux pane and start interacting with the file. You can modify the snippet below to change your target_pane (if you prefer the REPL on your left side or above, you can change this). The actual shortcut is Ctrl-c + Ctrl-c, which you can modify if you prefer.

vim.g.slime_target = 'tmux'
-- vim.g.slime_default_config = {"socket_name" = "default", "target_pane" = "{last}"}
vim.g.slime_default_config = {
  -- Lua doesn't have a string split function!
  socket_name = vim.api.nvim_eval('get(split($TMUX, ","), 0)'),
  target_pane = '{top-right}',
}

For Tmux, there are some things you can add. Here is my config, which is really simple. However, I encourage you to find your own taste with Tmux.

set -g mouse on
set -g history-limit 102400
set -g base-index 1
set -g pane-base-index 1
set -g renumber-windows onunbind C-b
set -g prefix C-x# vim key movements between panes
# smart pane switching with awareness of vim splits
bind h select-pane -L
bind j select-pane -D
bind k select-pane -U
bind l select-pane -R
# reloading for now:wunbind r 
bind r source-file ~/.tmux.conf \; display "Reloaded ~/.tmux.conf"
# plugin
# Initialize TMUX plugin manager (keep this line at the very bottom of tmux.conf)set -g @plugin 'egel/tmux-gruvbox'
set -g @tmux-gruvbox 'dark' # or 'light'
run '~/.tmux/plugins/tpm/tpm'

For this purpose, I have considered the following in my Tmux config: activate the mouse, increase the history limit in the panes (this is necessary because the default limit in Tmux is quite constrained), count from 1 with panes, change the prefix (I found it easier to use Ctrl-x rather than Ctrl-b), and add the keybindings to move between paneslike vim. The r shortcut is used to restart the config file when you add or modify features, so you can use prefix + r to apply your changes. Finally, in Tmux, you can use a package manager called TPM. Make sure to added in your config file.

Some challenges for improving the workflow

So far, the workflow with Neovim and Tmux has been set up nicely. However, there are some areas that can be improved. One of them is the visualization aspect. As a data scientist, you need to constantly iterate and visualize your data. If you want to have a deep understanding of your dataset and generate plenty of visualizations, the current setup may not be the best. However, in Julia, you can easily switch to Pluto to display all the figures you want. One thing I have tried is to constantly display those plots you are working on inside the terminal. One way to do this is by using unicodeplots. If you like working with Plots.jl, you can change your backend from gr() to unicodeplots(). In my opinion, the quality of the visualization may not be the best, but it allows for instant plots in your terminal without the need for third-party software. For fast iteration, it is good enough.

Another important point to consider is maintaining consistency in the workflow between Julia code and the Julia REPL. Currently, I have a workflow with Vim and my code, but the REPL follows a different logic. This is where the aforementioned repository could potentially help, as it aims to bridge the gap and maintain homogeneity between the two panels. Integrating Vim keybindings into the REPL would provide a seamless experience, allowing for a smoother transition and enhancing the overall workflow. It is definitely an area I look forward to exploring in the future to further improve my development process.

Conclusions

In this blog, I have explained how to set up Neovim and Tmux with Julia (or any other data science programming language). This setup provides a minimalist perspective. For people who like to have a variety of tools at hand, it may feel a bit lacking. However, if you are someone who is looking for a lightweight tool, minimalistic design, and enjoys working within the terminal, I highly recommend giving it a try.

juliabloggers.com

A Julia Language Blog Aggregator

Author Archives: Navi

World Happiness Report – EDA & clustering with Julia

World Happiness Report – EDA & clustering with Julia

Table of Contents

Introduction

Packages used

Clustering

Conclusions

Creating your own blog with Julia and Franklin

Creating your own blog with Julia and Franklin

Table of contents

Introduction

Some Reasons to Create Your Own Blog

Installation

First Steps

Selecting a template

Cleaning the template

Creating your first post

Deployment

Hosting in a different domain (optional)

RSS and Tags

Host your Feed to JuliaBloggers (optional)

Conclusions

Acknowledgment

set up NeoVim + Tmux for a Data Science Workflow with Julia

set up NeoVim + Tmux for a Data Science Workflow with Julia

Table of Contents

Introduction

Why start using Neovim and Tmux? My motivations

Instalation process required

Editing the init.lua and tmux.config

Some challenges for improving the workflow

Conclusions