Author Archives: Logan Kilpatrick

How to build in the age of AI, advice from Chamath Palihapitiya

There are lots of phenomena happening in AI right now. On one hand, going from idea to code to working app has never been easier. AI has proved it can dramatically accelerate the creation of very good demos / MVPs. But where is the value created in the world? I would posit that much of it comes down to actually making things work in production. This is more true now than ever when the barrier for entry in AI continues to go down.

Tools like https://bolt.new, https://lovable.dev/, https://v0.dev and others are enabling this new wave of accelerated software creation. For the long tail of builders, these tools work very well, but one of the main limitations is how to capture the “cartilage” that makes lots of companies actually work. I had a conversation with Chamath Palihapitiya about this, and he did a great job of capturing the state of this:

<a href="https://medium.com/media/20a6fb34944709b4682c1b3396cf88be/href">https://medium.com/media/20a6fb34944709b4682c1b3396cf88be/href</a>

So how do we get the last 2% and make some of these more difficult problems work? This is the $1,000,000 question. Right now, it still takes a lot of human work in order to translate super complex legacy processes into something powered by AI. Part of my inclination is that agents might be helpful to do this, but as Chamath mentioned, it’s likely this is going to be a “10 year process”.

One of the things I like to think about is the bitter lesson, which if folks have not heard about this can be summarized as the fact that general purpose approaches usually win out vs specialized approaches in technology specifially. In the content of getting this last 2% of reliability, you might imagine that what you go do is build a bunch of scaffolding, 100 different vertical agents, or even completely re-engineer some human system in order to work well for the age of AI. A lot of this depends on your timelines, but if you believe that model capabilities will keep scaling and generalizing to solve new problems, it is worth considering how much of an investment you should make into any one of those today, vs just waiting for the models to get good enough and solve the problem out of the box for you. The caveat here is the level of agency you should take vs waiting for the innovation to come to you is likely a factor of how much this change is going to disrupt you. If the chance is high, then you should pay the cost of building the scaffolding, doing the process re-engineering, etc in order to migrate the risk of large scale change.

At the same time as of all that is true, I was reminded by Sully this morning of just how beautiful it is that the barrier to creating software has come down 10x in the last 2 years, and what you can build has increased by 10x. The only thing that is stopping you is having an idea and the desire to solve the problem.

<a href="https://medium.com/media/e118586966c648a18b444c4237ea2327/href">https://medium.com/media/e118586966c648a18b444c4237ea2327/href</a>

So yeah, solving problems in large legacy systems is not easy (regulated industries, large companies, etc), but if you just want to build 0 to 1, there has never been a better time in human history than today to do so. So go build something people want, bet on the models progressing, and make the world better along the axis you care about.

Going from 98% to 99.9% in AI is where all the work is was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

Everything you need to know about the Gemini API as a developer in less than 5 minutes

By: Logan Kilpatrick

Re-posted from: https://medium.com/around-the-prompt/everything-you-need-to-know-about-the-gemini-api-as-a-developer-in-less-than-5-minutes-5e75343ccff9?source=rss-2c8aac9051d3------2

Get started building with the Gemini API

Gemini is Google’s family of frontier generative AI models, built from the ground up to be multi-modal and long context (more on this later). Gemini is available across the entire Google suite, from Gmail to the Gemini App. For developers who want to build with Gemini, the Gemini API is the best place to get started.

In this article, we will explore what the Gemini API offers, how to get started using Gemini for free, and more advanced use cases like fine-tuning. As always, you are reading my personal blog, so you guessed it, these are my personal views. Let’s dive in!

How can I test the latest Gemini models?

If you want to first test the Gemini models (everything from the latest experimental models to production models) without writing running any code, you can head to Google AI Studio. Once you get done testing there, you can also generate a Gemini API key in AI Studio (“Get API Key” in the top left corner). AI Studio is free and there is a generous free tier on the API as well, which includes 1,500 requests per day with Gemini 1.5 Flash.

Image captured by Author in aistudio.google.com

What does the Gemini API offer?

The Gemini API comes standard with most of the things developers are looking for. At a high level, it comes with:

Fine-tuning support for Gemini 1.5 Flash
Context caching, to help reduce production deployment costs
Code execution, to augment the models capabilities by running code
Structured outputs, to extract data from input sources
Video, image, and audio understanding
Document processing, supporting PDFs up to 1,000 pages long

And much more! In general, the Gemini API offers most if not all of the features developers have come to expect when building with large language model API’s, in addition to many things that are unique to Gemini (like long context, video understanding, and more).

<a href="https://medium.com/media/432672276d4baf75d0bab4ef5cd3c587/href">https://medium.com/media/432672276d4baf75d0bab4ef5cd3c587/href</a>

What models does the Gemini API support?

By default, the two model variants available in the Gemini API as of September 21st, 2024 are Gemini 1.5 Flash and Gemini 1.5 Pro. There are different instances of these models available, some of which are newer and have performance updates. Each model also offers different features, such as the context length of ability for the model to be tuned. You can check out the Gemini models page for more details.

Image captured by Author on ai.google.dev

Sending your first Gemini API request

With as little as 6 lines of code, you can send your first API request, make sure to get your API key from Google AI Studio before running the code below:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["API_KEY"])

model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Explain how AI works")
print(response.text)

The Gemini API SDK’s also support creating a chat object which makes it so you can append messaged to a simple structure:

model = genai.GenerativeModel("gemini-1.5-flash")
chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hello"},
        {"role": "model", "parts": "Great to meet you. What would you like to know?"},
    ]
)
response = chat.send_message("I have 2 dogs in my house.")
print(response.text)
response = chat.send_message("How many paws are in my house?")
print(response.text)

If you want a simple repo with a little more complexity to get started with, check out the official Gemini API quickstart repo on GitHub.

How much does the Gemini API cost?

There are two tiers in the Gemini API, the free tier and paid. The former is well, free, and the later comes with an increased rate limit intended to support production workloads. Gemini 1.5 Flash is the most competitively priced large language model in its capability class and recently had its price decreased by 70%.

Image captured from Google Developers Blog

Or put another way, you can access 1.5 billion tokens for free with Gemini every single day:

<a href="https://medium.com/media/c12dcafb260435d98e04066ea29271f6/href">https://medium.com/media/c12dcafb260435d98e04066ea29271f6/href</a>

Fine-tuning Gemini 1.5 Flash

Gemini 1.5 Flash can be fine-tuned for free through Google AI Studio and the tuned model does not cost more to use than the base model, a benefit that is rather unique in the AI ecosystem. Once you tune the model, it can be used as a drop in replacement in the existing code you have. Google AI Studio also comes with sample datasets to do testing tuning with and a mode called “Structured prompting” which is useful for creating fine-tuning datasets.

Image capture by Author in Google AI Studio

Closing thoughts

The Gemini API continues to get better week over week, there is a steady stream of new features landing which continue to improve the developer experience. If you have feedback, suggestions, or questions, join the conversation on the Google AI developer forum. Happy building!

Everything you need to know about the Gemini API as a developer in less than 5 minutes was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

The future of AI agents with Yohei Nakajima

By: Logan Kilpatrick

Re-posted from: https://logankilpatrick.medium.com/the-future-of-ai-agents-with-yohei-nakajima-2602e32a4765?source=rss-2c8aac9051d3------2

Delving into AI agents and where we are going next

The future is going to be full of AI agents, but there are still a lot of open questions on how to get there & what that world will look like. I had the chance to sit down with one of the deepest thinkers in the world of AI agents, Yohei Nakajima. If you want to check out the video of our conversion, you can watch it on YouTube:

<a href="https://medium.com/media/14b34006e9adc85e3cb22077614fd9b4/href">https://medium.com/media/14b34006e9adc85e3cb22077614fd9b4/href</a>

Where are we today?

There has been a lot of talk of agents over the last year since the initial viral explosion of HustleGPT, where the creator famously told the chatbot system that it had $100 and asked it to try and help him make money for his startup.

Since then, the conversation and interest around agents has not stopped, despite there being a shockingly low number of successful agent deployments. Even as someone who is really interested in AI and has tried many of the agent tools, I still have a grand total of zero agents actually running in production right now helping me (which is pretty disappointing).

Despite the lack of large scale deployments, companies are still investing heavily in the space as it is widely assumed this is the application of LLMs that will end up providing the most value. I have been looking more and more into Zapier as the potential launching point for large scale agent deployments. Most of the initial challenge with agent platforms is they don’t actually hook up to all the things you need them too. They much support Gmail but not Outlook, etc. But Zapier already does the dirty work of connecting with the worlds tools which gets me excited about the prospect this could work out as a tool.

Why haven’t AI agents taken off yet?

To understand why agents have not taken off, you need to really understand the flow that autonomous agents take when solving tasks. I talked about this in depth when I explored what agents were in another post from earlier last year. The TLDR is that current agents typical use the LLM system itself as the planning mechanism for the agent. In many cases, this is sufficient to solve a simple task, but as anyone who uses LLMs frequently knows, the limitations for these planners are very real.

Simply put, current LLMs lack sufficient reasoning capabilities to really solve problems without human input. I am hopeful this will change in the future with forthcoming new models, but it might also be that we need to move the planning capabilities to more deterministic systems that are not controlled by LLMs. You could imagine a world where we also fine-tune LLMs to specifically perform the planning task, and potentially fine-tune other LLMs to do the debugging task in cases where the models get stuck.

Beyond the model limitations, the other challenge is tooling. Likely the closest thing to a widely used LLM agent framework is the OpenAI Assistants API. However, it lacks many of the true agentic features that you would need to really build and autonomous agent in production. Companies like https://www.agentops.ai/ and https://e2b.dev are taking a stab at trying to provide a different layer of tooling / infra to help developers building agents, but these tools have not gained widespread adoption.

Where are we going from here?

The agent experience that gets me excited is the one that is spun up in the background for me and just automates away some task / workflow I used to do manually. It still feels like we are a very long way away from this, but many companies are trying this using browser automation. In those workflows, you can perform a task once and the agent will learn how to mimic the workflow in the browser and then do it for you on demand. This could be one possible way to decrease the friction in making agents work at scale.

Another innovation will certainly be at the model layer. Increased reasoning / planning capabilities, while coupled with increased safety risks, present the likeliest path to improved adoption of agents. Some models like Cohere’s Command R model are being optimized for tool use which is a common pattern for agents to do the things they need. It is not clear yet if these workflows will require custom made models, my guess is that general purpose reasoning models will perform the best in the long term but the short term will be won by tool use tailored models.

juliabloggers.com

A Julia Language Blog Aggregator