Category Archives: Julia

In-Place Modifications

By: Jonathan Carroll

Re-posted from: https://jcarroll.com.au/2024/09/25/in-place-modifications/

In this post I explore some differences between R, python, julia, and APL in
terms of mutability, and try to make something that probably shouldn’t exist.

I watched this code_report video which describes
a leetcode problem;

You are given an integer array nums, an integer k, and an integer multiplier.

You need to perform k operations on nums. In each operation:

  • Find the minimum value x in nums. If there are multiple occurrences of the minimum value, select the one that appears first.
  • Replace the selected minimum value x with x * multiplier.

Return an integer array denoting the final state of nums after performing all k operations.

Conor’s python solution in the video was

def getFinalState(nums, k, m): 
  for _ in range(k): 
    i = nums.index(min(nums)) 
    nums[i] *= m
  return nums

x = [2, 1, 3, 5, 6]
k = 5
mult = 2

getFinalState(x, k, mult)
## [8, 4, 6, 5, 6]

and, as always, I wanted to see how I’d do that in R. I came up with this

getFinalState = function(nums, k, m) {
  for (i in 1:k) {
    m <- which.min(nums)[1]
    nums[m] <- mult * nums[m]
  }
  nums
}

x <- c(2, 1, 3, 5, 6)
k <- 5
mult <- 2

getFinalState(x, k, mult)
## [1] 8 4 6 5 6

It’s worth noting that I can’t use a map in this function because iterations
are dependent; the minimum value at any iteration depends on the previous
values.

I also had a chance to discuss this solution with some APL’ers at a meetup and
a J solution was presented, but I don’t think I wrote it down.

My solution is nearly word-for-word the same as the python solution with a
couple of notable exceptions arising from the difference between the two
languages:

First, R has which.min() as a built-in rather than needing to query the index
of the minimum value (and two references to nums). Also, R has no compound
assignment like x *= 2 which modifies in-place – the closest thing I can think
of is the %<>% operator in {magrittr} (not re-exported in {dplyr} because this
behaviour is considered bad practice in R, despite not really being “in-place”)

library(magrittr)

m <- data.frame(x = 1:6, y = letters[1:6])
m
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
m %<>% head(2)
m
##   x y
## 1 1 a
## 2 2 b

although I can certainly see the case for it – this operator avoids repeating
the variable being used and assigned, because the alternative using the
traditional pipe is

m <- data.frame(x = 1:6, y = letters[1:6])
m
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
m <- m %>% head(2)
m
##   x y
## 1 1 a
## 2 2 b

One could argue that writing out even a longer variable name twice still makes
it clear that shadowing is taking place; the value is being overwritten with
a new value, but it does feel a little frustrating to have to type it out twice

important_variable <- important_variable * 2

Back to my R solution, the indexing at a specific set of values got me thinking
that it would be clean if we could pass a function to [ so that we could
write

nums[which.min] <- value

(maybe not so much for this example where m is used twice, but it piqued my
interest)

Let’s say I want to set all the even values of a vector to some other value.
That’s easy enough to do

x[x %% 2 == 0] <- 0

but I don’t love that it requires two references to x, which may (should?) be
a much longer name

important_variable[important_variable %% 2 == 0] <- 0

I want something like x[f] <- y to set the values of x where f(x) is
TRUE to y. This seemed like it might be possible, maybe with a function
method to [<-, but [<- dispatches on the class of x, not what’s inside
[, so no dice. In theory (which will never happen) the built-in [<- could
have some branch logic for dealing with a function passed as the indices to be
modified, but I’m not about to go rebuilding R from source myself just to play
with that idea.

Nonetheless, if I define some functions that do accomplish this

is_even <- function(z) z %% 2 == 0

set_if <- function(x, f, value) {
  x[f(x)] <- value
  x
}

then I can try this out on a vector

a <- 1:10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
set_if(a, is_even, 0)
##  [1] 1 0 3 0 5 0 7 0 9 0
a # unchanged
##  [1]  1  2  3  4  5  6  7  8  9 10

It works, but I’m back to having to write a <- do_stuff(a) because a isn’t
actually modified by this function.

Ideally, my function would operate the same as this does

a <- 1:10
a[is_even(a)] <- 0
a
##  [1] 1 0 3 0 5 0 7 0 9 0

which does modify a in-place; R is not entirely pure, and does occasionally
allow what looks like direct mutation, though under the hood, it’s not – a new
object is actually created

# not using a range e.g. 1:n because that's internally 
# a "compact" representation
a <- c(2, 3, 4)
.Internal(inspect(a))
## @63a4d9b05be8 14 REALSXP g0c3 [REF(2)] (len=3, tl=0) 2,3,4
a[2] <- 9
.Internal(inspect(a))
## @63a4d9b0fbf8 14 REALSXP g0c3 [REF(1)] (len=3, tl=0) 2,9,4

Note that the memory address has changed.

If I was working with a language which did support (enable?) modify-in-place
then that might look like

def is_even(x):
   return x % 2 == 0

def set_if(x, f, value):
     for i in range(len(x)):
         if f(x[i]):
             x[i] = value

a = list(range(10))
a
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
set_if(a, is_even, 0)
a
## [0, 1, 0, 3, 0, 5, 0, 7, 0, 9]

Now, that’s not always a great thing. In such a language with mutable structures
(e.g. lists) we can do maddening things like this

x = [3, 4, 5]
y = x
y is x
## True
y[1] = 9
x # still 'bound' to y
## [3, 9, 5]

Here, is means “are these two things identical in the sense of referring to
the same block of memory”, noting that literals (e.g. single numbers) are
referenced that way, but tuples aren’t

abc = (11, 99)
xyz = (11, 99)
abc is xyz
## False
abc == xyz
## True

The big question is can I hack together some solution that does work in-place
in R? Yeah, with some ill-advised calls

set_if <- function(x, f, value) {
  # can't use <<- because the value passed in as the x argument isn't 
  # necessarily named 'x' in the parent scope
  .x <- x
  .x[f(.x)] <- value
  e <- parent.env(environment())
  assign(deparse(substitute(x)), .x, pos = e)
  invisible(.x)
}


a <- 1:10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
set_if(a, is_even, 0)
a
##  [1] 1 0 3 0 5 0 7 0 9 0

As I note in the comment there, I can’t use the super-assignment arrow <<-
inside this function because I don’t know the name of the variable I’m updating;
it needs to be deparsed from the incoming argument.

This means that it works regardless of the name of the variable being modified

b <- 10:20
b
##  [1] 10 11 12 13 14 15 16 17 18 19 20
set_if(b, is_even, 0)
b
##  [1]  0 11  0 13  0 15  0 17  0 19  0

I tried to think of some other languages which might support this sort of in-place
set_if(x, f, value) modification and (Dyalog) APL was worth a thought.

    ⍝ create a vector from 1 to 10
    x←⍳10
    x
1 2 3 4 5 6 7 8 9 10

    ⍝ the function {0=2|⍵} calculates a boolean vector with 
    ⍝ 1 where the value is even
    {0=2|⍵} x
0 1 0 1 0 1 0 1 0 1

    ⍝ the `@` operator takes a value (or function) on the left and 
    ⍝ a function (or boolean values) on the right and applies it to the 
    ⍝ other argument on the right
    0@{0=2|⍵} x 
1 0 3 0 5 0 7 0 9 0

    ⍝ alternatively a point-free function defined as the negation (`~`) of a 
    ⍝ binding (`∘`) of the value 2 to modulo (`|`); the negation is needed
    ⍝ otherwise this returns the result of the modulo, not where it is 0
    0@(~2∘|)⍳10
1 0 3 0 5 0 7 0 9 0

    ⍝ x is, however, unchanged as APL is typically immutable
    x
1 2 3 4 5 6 7 8 9 10

So there’s no way to do the in-place modification. it is nice, though, that
0@(~2∘|)x only refers to x once.

Julia makes a nice distinction between functions which mutate arguments and
those which don’t; (by convention) the former are named ending with an
exclamation mark, e.g.

vec = collect(1:5)
## 5-element Vector{Int64}:
##  1
##  2
##  3
##  4
##  5
# non-mutating
reverse(vec)
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1
vec
## 5-element Vector{Int64}:
##  1
##  2
##  3
##  4
##  5
# mutating
reverse!(vec)
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1
vec
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1

In julia, the iseven() function is already built-in, but vectorisation is
explicit via a broadcast operator . and the setting of even values to 0
looks like

x = collect(1:10);
x[iseven.(x)] .= 0;
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

which looks very much like the R version with some dots where scalar functions
are vectorised. If I don’t use the last . to perform vectorised assignment,
the error tells me that the failure involved the setindex! function which does
sound like what I want, but this doesn’t work

setindex!(x, 0, iseven.(x))

because it’s trying to assign the value 0 multiple times and I only provided one
of them. Instead,

x = collect(1:10);
setindex!(x, zeros(Int8, 5), iseven.(x));
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

does work, but I had to manually count how many 0 entries this requires, so the
[ approach seems cleaner. Either way, I’ve had to explicitly calculate
iseven(x) and pass that result somewhere.

Since Julia allows users to extend methods, I could do that modification myself!

import Base.setindex! 
  
function setindex!(A::Vector{Int64}, v::Int64, f::Function) 
  A[f.(A)] .= v
end
## setindex! (generic function with 240 methods)
x = collect(1:10);
setindex!(x, 0, iseven);
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

which I could just as easily call set_if!

set_if! = setindex!;
x = collect(1:10);
set_if!(x, 0, iseven);
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

Nice! I do wonder if I can “hack” (ahem, extend) Julia’s [ to get my prized
x[f] = 0 solution but I doubt it’s worth it when the above does the right
thing.

I don’t imagine I’ll package up my set_if() anywhere, and I should probably
even avoid using it myself, but it’s been an interesting journey thinking about
this stuff. Maybe there’s a better way to do it? Maybe there’s a language which
better supports something like that? If you know, or you have comments or
suggestions, I can be found on
Mastodon or use the comment section below.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.3 (2024-02-29)
##  os       Pop!_OS 22.04 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_AU.UTF-8
##  ctype    en_AU.UTF-8
##  tz       Australia/Adelaide
##  date     2024-09-25
##  pandoc   3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.19    2024-02-01 [1] CRAN (R 4.3.3)
##  bookdown      0.36    2023-10-16 [1] CRAN (R 4.3.2)
##  bslib         0.8.0   2024-07-29 [1] CRAN (R 4.3.3)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
##  callr         3.7.3   2022-11-02 [3] CRAN (R 4.2.2)
##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.3)
##  crayon        1.5.2   2022-09-29 [3] CRAN (R 4.2.1)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.2)
##  digest        0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
##  ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
##  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.3.3)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
##  fs            1.6.4   2024-04-25 [1] CRAN (R 4.3.3)
##  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.3)
##  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.2)
##  httpuv        1.6.12  2023-10-23 [1] CRAN (R 4.3.2)
##  icecream      0.2.1   2023-09-27 [1] CRAN (R 4.3.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.3)
##  JuliaCall     0.17.5  2022-09-08 [1] CRAN (R 4.3.3)
##  knitr         1.48    2024-07-07 [1] CRAN (R 4.3.3)
##  later         1.3.1   2023-05-02 [1] CRAN (R 4.3.2)
##  lattice       0.22-5  2023-10-24 [4] CRAN (R 4.3.1)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
##  magrittr    * 2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
##  Matrix        1.6-5   2024-01-11 [4] CRAN (R 4.3.3)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.3)
##  mime          0.12    2021-09-28 [1] CRAN (R 4.3.3)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.2)
##  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.2)
##  pkgload       1.3.3   2023-09-22 [1] CRAN (R 4.3.2)
##  png           0.1-8   2022-11-29 [1] CRAN (R 4.3.2)
##  prettyunits   1.2.0   2023-09-24 [3] CRAN (R 4.3.1)
##  processx      3.8.3   2023-12-10 [3] CRAN (R 4.3.2)
##  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.2)
##  promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.2)
##  ps            1.7.6   2024-01-18 [3] CRAN (R 4.3.2)
##  purrr         1.0.2   2023-08-10 [3] CRAN (R 4.3.1)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.3)
##  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.2)
##  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.2)
##  reticulate    1.34.0  2023-10-12 [1] CRAN (R 4.3.2)
##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.3.3)
##  rmarkdown     2.28    2024-08-17 [1] CRAN (R 4.3.3)
##  rstudioapi    0.15.0  2023-07-07 [3] CRAN (R 4.3.1)
##  sass          0.4.9   2024-03-15 [1] CRAN (R 4.3.3)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
##  shiny         1.7.5.1 2023-10-14 [1] CRAN (R 4.3.2)
##  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.3.3)
##  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.3.3)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.2)
##  usethis       3.0.0   2024-07-29 [1] CRAN (R 4.3.3)
##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
##  xfun          0.47    2024-08-17 [1] CRAN (R 4.3.3)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.2)
##  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
## 
##  [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.3
##  [2] /usr/local/lib/R/site-library
##  [3] /usr/lib/R/site-library
##  [4] /usr/lib/R/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /home/jono/.virtualenvs/r-reticulate/bin/python
##  libpython:      /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
##  pythonhome:     /home/jono/.virtualenvs/r-reticulate:/home/jono/.virtualenvs/r-reticulate
##  version:        3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
##  numpy:           [NOT FOUND]
##  
##  NOTE: Python version was forced by VIRTUAL_ENV
## 
## ──────────────────────────────────────────────────────────────────────────────

Everything you need to know about the Gemini API as a developer in less than 5 minutes

By: Logan Kilpatrick

Re-posted from: https://medium.com/around-the-prompt/everything-you-need-to-know-about-the-gemini-api-as-a-developer-in-less-than-5-minutes-5e75343ccff9?source=rss-2c8aac9051d3------2

Get started building with the Gemini API

Image by Author

Gemini is Google’s family of frontier generative AI models, built from the ground up to be multi-modal and long context (more on this later). Gemini is available across the entire Google suite, from Gmail to the Gemini App. For developers who want to build with Gemini, the Gemini API is the best place to get started.

In this article, we will explore what the Gemini API offers, how to get started using Gemini for free, and more advanced use cases like fine-tuning. As always, you are reading my personal blog, so you guessed it, these are my personal views. Let’s dive in!

How can I test the latest Gemini models?

If you want to first test the Gemini models (everything from the latest experimental models to production models) without writing running any code, you can head to Google AI Studio. Once you get done testing there, you can also generate a Gemini API key in AI Studio (“Get API Key” in the top left corner). AI Studio is free and there is a generous free tier on the API as well, which includes 1,500 requests per day with Gemini 1.5 Flash.

Image captured by Author in aistudio.google.com

What does the Gemini API offer?

The Gemini API comes standard with most of the things developers are looking for. At a high level, it comes with:

And much more! In general, the Gemini API offers most if not all of the features developers have come to expect when building with large language model API’s, in addition to many things that are unique to Gemini (like long context, video understanding, and more).

What models does the Gemini API support?

By default, the two model variants available in the Gemini API as of September 21st, 2024 are Gemini 1.5 Flash and Gemini 1.5 Pro. There are different instances of these models available, some of which are newer and have performance updates. Each model also offers different features, such as the context length of ability for the model to be tuned. You can check out the Gemini models page for more details.

Image captured by Author on ai.google.dev

Sending your first Gemini API request

With as little as 6 lines of code, you can send your first API request, make sure to get your API key from Google AI Studio before running the code below:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["API_KEY"])

model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Explain how AI works")
print(response.text)

The Gemini API SDK’s also support creating a chat object which makes it so you can append messaged to a simple structure:

model = genai.GenerativeModel("gemini-1.5-flash")
chat = model.start_chat(
history=[
{"role": "user", "parts": "Hello"},
{"role": "model", "parts": "Great to meet you. What would you like to know?"},
]
)
response = chat.send_message("I have 2 dogs in my house.")
print(response.text)
response = chat.send_message("How many paws are in my house?")
print(response.text)

If you want a simple repo with a little more complexity to get started with, check out the official Gemini API quickstart repo on GitHub.

How much does the Gemini API cost?

There are two tiers in the Gemini API, the free tier and paid. The former is well, free, and the later comes with an increased rate limit intended to support production workloads. Gemini 1.5 Flash is the most competitively priced large language model in its capability class and recently had its price decreased by 70%.

Image captured from Google Developers Blog

Or put another way, you can access 1.5 billion tokens for free with Gemini every single day:

Fine-tuning Gemini 1.5 Flash

Gemini 1.5 Flash can be fine-tuned for free through Google AI Studio and the tuned model does not cost more to use than the base model, a benefit that is rather unique in the AI ecosystem. Once you tune the model, it can be used as a drop in replacement in the existing code you have. Google AI Studio also comes with sample datasets to do testing tuning with and a mode called “Structured prompting” which is useful for creating fine-tuning datasets.

Image capture by Author in Google AI Studio

Closing thoughts

The Gemini API continues to get better week over week, there is a steady stream of new features landing which continue to improve the developer experience. If you have feedback, suggestions, or questions, join the conversation on the Google AI developer forum. Happy building!


Everything you need to know about the Gemini API as a developer in less than 5 minutes was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

Alpha Capture and Acquired

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2024/09/19/Alpha-Capture-and-Acquired.html

People are never short of a trade idea. There is a whole industry of
researchers, salespeople and amateurs coming up with trading ideas and
making big calls on what stock will go up, what country will cut
interest rates and what the price of gold will do next. Alpha capture
is about systematically assessing ideas and working out who has
alpha and generates profitable ideas and who is just making it up as
they are going along.


Enjoy these types of posts? Then you should sign up for my newsletter.


Alpha capture started as a way of profiling a broker’s stock
recommendation. If you have 50 people recommending you 50 different
ideas, how do you know who is good? You’ll quickly run out of money if
you blindly follow all the recommendations that hit your
inbox. Instead, you need to profile each person’s idea and see
who on average can make good recommendations. Whoever is good at
picking stocks probably deserves more of your business.

It has since expanded that some hedge fund have internal desks that
are doing a similar analysis on their portfolio managers (PMs) to double
down on profitable bets and mitigate risks of all the PMs picking the
same stock. Picking stocks and managing a portfolio across many PMs
are two different skills and different departments at your modern
hedge fund.

A simple way to measure the alpha of a PM or broker recommendation
will be to see if the price of a stock they buy (or recommend) goes up
after the day they suggest it. Those with alpha would see their
picks move higher on a large enough sample and those without alpha
would average out to zero, some ideas would go higher, some ideas
lower, the net result being 0 alpha. If a PM has the opposite effect,
every stock they buy goes down they are a contrarian
indicator so take their idea and do the opposite!

Alpha capture markout graph

Alpha Capture Systems: Past, Present, and Future
Directions

goes through the history of alpha capture and is a good short read
that inspired this blog post.

Basic Alpha Capture

What if we wanted to try our own Alpha Capture? We need some stock recommendations and a way of calculating what happens to the price after the recommendation. This is where the Acquired podcast comes in.

Acquired logo

Acquired tells the stories and strategies of great companies (taken from their website). It’s a pretty popular podcast and each episode gets close to a million listeners. So this makes it an ideal Alpha Capture study – when they release an episode about a company does the stock price of that company go higher or lower on average?
If it were to go higher then each time an episode is released call your broker and go long the stock!

They aren’t explicitly recommending a stock by talking about
it, as they say in their intro. So it’s just a toy exercise to see if
there is any correlation between the stock price and the release date
of an episode.

To systematically test this we need to get a list of the episodes and calculate a ‘markout’ from each episode.

Collecting Podcast Data

The internet is a wonderful thing and each episode of Acquired is
available as a XML feed from transistor.fm. So doing some fun parsing
of XML I can get the full history of the podcast with each date
and title.

function parseEpisode(x)
  rawDate = first(simplevalue.(x[tag.(x) .== "pubDate"]))
  date = ZonedDateTime(rawDate, dateformat"eee, dd uuu yyyy HH:MM:ss z")

  Dict("title" => first(simplevalue.(x[tag.(x) .== "title"])),
       "date" =>date)
end

function parse_date(t)
   Date(string(split(t, "T")[1]))
end

url = "https://feeds.transistor.fm/acquired"

data = parse(Node, String(HTTP.get(url).body))

episodes = children(data[3][1])
filter!(x -> tag(x) == "item", episodes)
episodes = children.(episodes)

episodeData = parseEpisode.(episodes)

episodeFrame = vcat(DataFrame.(episodeData)...)
CSV.write("episodeRaw.csv", episodeFrame)

After writing the data to a CSV I need to somehow parse the episode
title into a stock ticker. This is a tricky task as the episode names
are human friendly not computer friendly. So time for our LLM
overlords to lend a hand a do the heavy lifting. I drop the CSV into
Perplexity and prompt it to add the relevant stock ticker to the
file. I then reread the CSV into my notebook.

episodeFrame = CSV.read("episodeTicker.csv", DataFrame)
episodeFrame.date = ZonedDateTime.(String.(episodeFrame.date), dateformat"yyyy-mm-ddTHH:MM:SS.sss-z")

vcat(first(@subset(episodeFrame, :stock_ticker .!= "-"), 4),
        last(@subset(episodeFrame, :stock_ticker .!= "-"), 4))
date
ZonedDateTime
title
String
stock_ticker
String15
sector_etf
String7
2024-03-17T17:54:00.400+07:00 Renaissance Technologies RNR PSI
2024-02-19T17:56:00.410+08:00 Hermès RMS.PA GXLU
2024-01-21T17:59:00.450+08:00 Novo Nordisk (Ozempic) NOVO-B.CO IHE
2023-11-26T16:24:00.250+08:00 Visa V IPAY
2018-09-23T18:28:00.550+07:00 Season 3, Episode 5: Alibaba BABA KWEB
2018-08-20T09:20:00.370+07:00 Season 3, Episode 3: The Sonos IPO SONO GAMR
2018-08-05T18:15:00.030+07:00 Season 3, Episode 2: The Xiaomi IPO XIACF KWEB
2018-07-16T21:40:00.560+07:00 Season 3, Episode 1: Tesla TSLA TSLA

It’s done an ok job. Most of the episodes seem to correspond to the
right ticker but we can see it has hallucinated the RenTech stock
ticker as RNR. RenTech is a private company, no stock ticker and
instead, Perplexity has decided the RNR (a reinsurance company) is the
correct stock ticker. So not 100% accurate. Still, it has saved me a
good chunk of time and we can move on to getting the stock price data.

We want to measure the average price move of a stock after an episode is released. If Acquired had stock-picking skill, you expect the price to increase after the release of an episode as they are generally speaking positively about the various companies.

So using AlpacaMarkets.jl we get the stock price for the days before and the days after the episode. As AlpacaMarkets only has US stock data then only some of the episodes end up with a full dataset.

What is a Markout?

We calculate the percentage change relative to the episode date and then aggregate all the stock tickers together.

\[\text{Markout} = \frac{p – p_{\text{episode released}}}{p_{\text{episode released}}}\]

Acquired is about great companies so they choose to speak favourably about a company, therefore I think it’s a reasonable assumption that we expect the stock price to increase after everyone gets round to listening to it.
So once we aggregate all the episodes we should hopefully have
enough data to decide if this is true.

function getStockData(stock, startDate)
  prices = AlpacaMarkets.stock_bars(stock, "1Day", startTime=startDate - Month(1), limit=10000)[1]
  prices.date .= startDate
  prices.t = parse_date.(prices.t)
  prices[:, [:t, :symbol, :vw, :date]]
end

function calcMarkout(data)
   arrivalInd = findlast(data.t .<= data.date)
   arrivalPrice = data[arrivalInd, :vw]
   data.arrivalPrice .= arrivalPrice
   data.ts = [x.value for x in (data.t .- data.date)]
   data.markout = 1e4*(data.vw .- data.arrivalPrice) ./ data.arrivalPrice
   data
end

res = []

for row in eachrow(episodeFrame)
    
    try 
        stockData = getStockData(row.stock_ticker, Date(row.date))
        stockData = calcMarkout(stockData)
        append!(res, [stockData])
    catch e
        println(row.stock_ticker)
    end
end

res = vcat(res...)

With the data pulled we now aggregate by each day before and after the episode.

markoutRes = @combine(groupby(res, :ts), :n = length(:markout), 
                                         :avgMarkout = mean(:markout),
                                         :devMarkout = std(:markout))
markoutRes = @transform(markoutRes, :errMarkout = :devMarkout ./sqrt.(:n))

Always need error bars as this data gets noisy.


markoutResSub = @subset(markoutRes, :ts .<= 60, :n .>= 10)
plot(markoutResSub.ts, markoutResSub.avgMarkout, yerr=markoutResSub.errMarkout, 
     xlabel = "Days", ylabel = "Markout", title = "Acquired Alpha Capture", label = :none)
hline!([0], ls = :dash, color = "grey", label = :none)
vline!([0], ls = :dash, color = "grey", label = :none)

Average markout

Not really a pattern. The majority of the error bars are intercepting zero after the podcast is released.
If you squint a little bit there seems to be a bit of a downward trend post-episode which would suggest they talk about a company at the peak of the stock price.

Beforehand there is a bit of positive momentum, again suggesting that
they release the podcast at the peak of the stock price. Now this is
even more of a stretch given there is only 1 podcast a month and it
takes more than 20 days to prepare an episode (I imagine!), so
more noise than signal.

markoutIndRes = @combine(groupby(res, [:symbol, :ts]), :n = length(:markout), 
                                         :avgMarkout = mean(:markout),
                                         :devMarkout = std(:markout))
markoutIndRes = @transform(markoutIndRes, :errMarkout = :devMarkout ./sqrt.(:n))

p = plot()
hline!(p, [0], ls = :dash, color = "grey", label = :none)
vline!(p, [0], ls = :dash, color = "grey", label = :none)
for sym in ["TSLA", "V", "META"]
   markoutResSub = sort(@subset(markoutIndRes, :symbol .== sym, :ts .<= 60, :n .>= 1), :ts)
    plot!(p, markoutResSub.ts, markoutResSub.avgMarkout, yerr=markoutResSub.errMarkout, 
     xlabel = "Days", ylabel = "Markout", title = "Acquired Alpha Capture", label = sym, lw =2) 
end
p

Individual markouts

When we pull out 3 examples of episodes we can see the randomness and specifically the volatility of TSLA here.

Conclusion

From this, we would not put any specific weight on the stock
performance after an episode is released. There doesn’t appear to be
any statistical pattern to exploit. No alpha means no alpha
capture. It is a nice exercise though and has hopefully explained the
concept of a markout.