Author Archives: Logan Kilpatrick

Everything you need to know about the Gemini API as a developer in less than 5 minutes

By: Logan Kilpatrick

Re-posted from: https://medium.com/around-the-prompt/everything-you-need-to-know-about-the-gemini-api-as-a-developer-in-less-than-5-minutes-5e75343ccff9?source=rss-2c8aac9051d3------2

Get started building with the Gemini API

Image by Author

Gemini is Google’s family of frontier generative AI models, built from the ground up to be multi-modal and long context (more on this later). Gemini is available across the entire Google suite, from Gmail to the Gemini App. For developers who want to build with Gemini, the Gemini API is the best place to get started.

In this article, we will explore what the Gemini API offers, how to get started using Gemini for free, and more advanced use cases like fine-tuning. As always, you are reading my personal blog, so you guessed it, these are my personal views. Let’s dive in!

How can I test the latest Gemini models?

If you want to first test the Gemini models (everything from the latest experimental models to production models) without writing running any code, you can head to Google AI Studio. Once you get done testing there, you can also generate a Gemini API key in AI Studio (“Get API Key” in the top left corner). AI Studio is free and there is a generous free tier on the API as well, which includes 1,500 requests per day with Gemini 1.5 Flash.

Image captured by Author in aistudio.google.com

What does the Gemini API offer?

The Gemini API comes standard with most of the things developers are looking for. At a high level, it comes with:

And much more! In general, the Gemini API offers most if not all of the features developers have come to expect when building with large language model API’s, in addition to many things that are unique to Gemini (like long context, video understanding, and more).

What models does the Gemini API support?

By default, the two model variants available in the Gemini API as of September 21st, 2024 are Gemini 1.5 Flash and Gemini 1.5 Pro. There are different instances of these models available, some of which are newer and have performance updates. Each model also offers different features, such as the context length of ability for the model to be tuned. You can check out the Gemini models page for more details.

Image captured by Author on ai.google.dev

Sending your first Gemini API request

With as little as 6 lines of code, you can send your first API request, make sure to get your API key from Google AI Studio before running the code below:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["API_KEY"])

model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Explain how AI works")
print(response.text)

The Gemini API SDK’s also support creating a chat object which makes it so you can append messaged to a simple structure:

model = genai.GenerativeModel("gemini-1.5-flash")
chat = model.start_chat(
history=[
{"role": "user", "parts": "Hello"},
{"role": "model", "parts": "Great to meet you. What would you like to know?"},
]
)
response = chat.send_message("I have 2 dogs in my house.")
print(response.text)
response = chat.send_message("How many paws are in my house?")
print(response.text)

If you want a simple repo with a little more complexity to get started with, check out the official Gemini API quickstart repo on GitHub.

How much does the Gemini API cost?

There are two tiers in the Gemini API, the free tier and paid. The former is well, free, and the later comes with an increased rate limit intended to support production workloads. Gemini 1.5 Flash is the most competitively priced large language model in its capability class and recently had its price decreased by 70%.

Image captured from Google Developers Blog

Or put another way, you can access 1.5 billion tokens for free with Gemini every single day:

Fine-tuning Gemini 1.5 Flash

Gemini 1.5 Flash can be fine-tuned for free through Google AI Studio and the tuned model does not cost more to use than the base model, a benefit that is rather unique in the AI ecosystem. Once you tune the model, it can be used as a drop in replacement in the existing code you have. Google AI Studio also comes with sample datasets to do testing tuning with and a mode called “Structured prompting” which is useful for creating fine-tuning datasets.

Image capture by Author in Google AI Studio

Closing thoughts

The Gemini API continues to get better week over week, there is a steady stream of new features landing which continue to improve the developer experience. If you have feedback, suggestions, or questions, join the conversation on the Google AI developer forum. Happy building!


Everything you need to know about the Gemini API as a developer in less than 5 minutes was originally published in Around the Prompt on Medium, where people are continuing the conversation by highlighting and responding to this story.

The future of AI agents with Yohei Nakajima

By: Logan Kilpatrick

Re-posted from: https://logankilpatrick.medium.com/the-future-of-ai-agents-with-yohei-nakajima-2602e32a4765?source=rss-2c8aac9051d3------2

Delving into AI agents and where we are going next

The future is going to be full of AI agents, but there are still a lot of open questions on how to get there & what that world will look like. I had the chance to sit down with one of the deepest thinkers in the world of AI agents, Yohei Nakajima. If you want to check out the video of our conversion, you can watch it on YouTube:

Where are we today?

There has been a lot of talk of agents over the last year since the initial viral explosion of HustleGPT, where the creator famously told the chatbot system that it had $100 and asked it to try and help him make money for his startup.

Since then, the conversation and interest around agents has not stopped, despite there being a shockingly low number of successful agent deployments. Even as someone who is really interested in AI and has tried many of the agent tools, I still have a grand total of zero agents actually running in production right now helping me (which is pretty disappointing).

Despite the lack of large scale deployments, companies are still investing heavily in the space as it is widely assumed this is the application of LLMs that will end up providing the most value. I have been looking more and more into Zapier as the potential launching point for large scale agent deployments. Most of the initial challenge with agent platforms is they don’t actually hook up to all the things you need them too. They much support Gmail but not Outlook, etc. But Zapier already does the dirty work of connecting with the worlds tools which gets me excited about the prospect this could work out as a tool.

Why haven’t AI agents taken off yet?

To understand why agents have not taken off, you need to really understand the flow that autonomous agents take when solving tasks. I talked about this in depth when I explored what agents were in another post from earlier last year. The TLDR is that current agents typical use the LLM system itself as the planning mechanism for the agent. In many cases, this is sufficient to solve a simple task, but as anyone who uses LLMs frequently knows, the limitations for these planners are very real.

Simply put, current LLMs lack sufficient reasoning capabilities to really solve problems without human input. I am hopeful this will change in the future with forthcoming new models, but it might also be that we need to move the planning capabilities to more deterministic systems that are not controlled by LLMs. You could imagine a world where we also fine-tune LLMs to specifically perform the planning task, and potentially fine-tune other LLMs to do the debugging task in cases where the models get stuck.

Image by Simform

Beyond the model limitations, the other challenge is tooling. Likely the closest thing to a widely used LLM agent framework is the OpenAI Assistants API. However, it lacks many of the true agentic features that you would need to really build and autonomous agent in production. Companies like https://www.agentops.ai/ and https://e2b.dev are taking a stab at trying to provide a different layer of tooling / infra to help developers building agents, but these tools have not gained widespread adoption.

Where are we going from here?

The agent experience that gets me excited is the one that is spun up in the background for me and just automates away some task / workflow I used to do manually. It still feels like we are a very long way away from this, but many companies are trying this using browser automation. In those workflows, you can perform a task once and the agent will learn how to mimic the workflow in the browser and then do it for you on demand. This could be one possible way to decrease the friction in making agents work at scale.

Another innovation will certainly be at the model layer. Increased reasoning / planning capabilities, while coupled with increased safety risks, present the likeliest path to improved adoption of agents. Some models like Cohere’s Command R model are being optimized for tool use which is a common pattern for agents to do the things they need. It is not clear yet if these workflows will require custom made models, my guess is that general purpose reasoning models will perform the best in the long term but the short term will be won by tool use tailored models.

Don’t forget about GPT-4

By: Logan Kilpatrick

Re-posted from: https://logankilpatrick.medium.com/dont-forget-about-gpt-4-d5ab8c9493fc?source=rss-2c8aac9051d3------2

Exploring the model that changed the path of AI and machine learning history

Image created by Author and DALL-E 3

The age of powerful language-based AI is upon us, and few players compare to the might and potential of OpenAI’s GPT-4. Let’s delve into the intricacies, capabilities, and potential applications of this revolutionary language model.

Picture the Power of GPT-4

Image captured from source video [1]

GPT-4 has truly broken barriers with its ability to generate up to 25,000 words of text, a monumental increase of about eight times compared to its predecessor, chat GPT. This leap forward enhances GPT-4’s abilities in handling long passages of text, making it a significant tool for a range of applications requiring long-duration interactions or wide-spanning narratives.

Advanced Image Understanding

Image captured from source video [1]

GPT-4’s advance into understanding, interpreting, and coherently describing images revolutionizes the idea of automated systems. Imagine snapping a picture of a scene, uploading it to GPT-4, and having the AI describe the visual elements perfectly. The idea that an AI can not only “see” an image but also make sense of different elements and predict outcomes, like explaining that cutting the strings of balloons would make them fly away, is fascinatingly next-gen.

GPT-4’s ability to understand images makes it an invaluable assistant in several fields — from virtual education to diverse areas where describing visuals in word processing is required.

Unique Challenges and Improvements

Image captured from source video [1]

Like any technology, AI language models come with their challenges, including adversarial usage, unwanted content, and privacy concerns. However, OpenAI has put substantial effort into mitigating these issues. With GPT-4, the team has implemented further measures for safety, alignment, and usefulness to make the model more user-friendly and secure.

Groundbreaking Applications in Education

GPT-4’s potential in revolutionizing education is immense. Imagine enriching every classroom with a personal AI tutor capable of addressing questions on a wide range of subjects. Or a fifth-grader getting unlimited time for personalized math tutoring with this AI that never gets tired or impatient. GPT-4 makes tailor-made tutoring accessible to all, directly in the comfort of their homes.

Image captured from source video [1]

Ultimately, GPT-4 elevates everyday life through advancements in AI. Whether it’s boosting productivity, teaching new skills, or simply organizing our days, AI like GPT-4 stands to ameliorate our lives in countless ways.

Shaping the Future of AI with Microsoft

The strategic partnership between OpenAI and Microsoft is aimed at transforming AI technology into useful tools accessible to everyone. Their concerted efforts lay the groundwork for harnessing AI’s full potential to enhance productivity, ultimately leading to an improved quality of life. GPT-4, a product born from the convergence of numerous technology advances, holds incredible promise for the future.

From enhancing education with AI-powered tutors to bringing valuable assistance into our lives, GPT-4 is on the verge of redefining our interactions with technology. As with any tool, ensuring that AI serves us correctly and safely is essential to leverage its benefits fully. As we sculpt the future of AI, learning, updating, improving, and transparency stand as our guiding tenets.

As we eagerly anticipate wider access to GPT-4 and similar AI, it’s critical to approach this revolutionary technology with informed understanding and responsible usage. OpenAI’s breakthrough serves as a testament to humanity’s unyielding prowess to innovate and evolve, even in the realms of artificial intelligence. Happy coding!

Source video [1]: https://www.youtube.com/watch?v=–khbXchTeE

Note: This blog post was generated by a GPT-4 pipeline as part of a demo for the AI Engineer Summit presentation in collaboration with Simon Posada Fishman.