Basic use of the Gemini API¶
To use the Gemini API first set up the code environment, then register for an API key and save the key in a file called .gemini_api
in your home directory:
Then the API is simple to use: authenticate, load the image, and make a query with the image and a question in natural language. Essentially we just ask ‘In this image, what is the rainfall for January 5th’.
#!/usr/bin/env python3
# Basic test of the Gemini API
import os
import PIL.Image
import google.generativeai as genai
# You will need an API key get it from https://ai.google.dev/gemini-api/docs/api-key
# I keep my API key in the .gemini_api file in my home directory.
with open("%s/.gemini_api" % os.getenv("HOME"), "r") as file:
api_key = file.read().strip()
# Default protocol is 'GRPC' - but that is blocked by the Office firewall.
# Use 'REST' instead.
genai.configure(api_key=api_key, transport="rest")
# Load the sample image
img = PIL.Image.open("../images/jpgs_300dpi/Devon_1941-1950_RainNos_1651-1689-293.jpg")
# Pick an AI to use - this one is the latest as of 2025-01-29
model = genai.GenerativeModel("gemini-2.0-flash-exp")
# Ask a question about the image
result = model.generate_content([img, "\n\n", "What is this document?"])
with open("sample_result.txt", "w") as file:
file.write(str(result))
# print(result)
The output is a data structure including the answer to the question.
response:
GenerateContentResponse(
done=True,
iterator=None,
result=protos.GenerateContentResponse({
"candidates": [
{
"content": {
"parts": [
{
"text": "This is a **British Rainfall Organization register of rainfall** for the year 1947, specifically for the station located at **Badworthy Cottage, S. Brent** in **Devon**. It records daily rainfall measurements in inches."
}
],
"role": "model"
},
"finish_reason": "STOP",
"safety_ratings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability": "NEGLIGIBLE"
}
],
"avg_logprobs": -0.4376134669527094
}
],
"usage_metadata": {
"prompt_token_count": 265,
"candidates_token_count": 47,
"total_token_count": 312
},
"model_version": "gemini-2.0-flash-exp"
}),
)
We asked “What is this document?” and got the response “This is a British Rainfall Organization register of rainfall for the year 1947, specifically for the station located at Badworthy Cottage, S. Brent in Devon. It records daily rainfall measurements in inches.”
This answer is correct in all respects (compare the image), and shows the power of modern AIs. Gemini has read and understood the document, so we can transcribe data just by asking for what we want to know. This is vanilla Gemini - no specialist fine-tuneing, and it’s never seen this document before. This is basically magic - how far can we go with it?
The output also tells us what the query cost: 312 tokens. At the time of writing, tokens are approximately $0.30 / 1 million tokens, so this query cost about $0.0001 (one hundredth of one U.S. cent).