Table of Contents
This post will demonstrate how to query LLM APIs (in this case, OpenAI) using Python. It walks through installing the official SDK, loading your API key from environment variables, sending a basic text request, streaming tokens as they arrive, and requesting structured JSON via a schema. You’ll also see a lightweight retry pattern for transient errors and equivalent cURL calls so you can test endpoints from the command line.
1) Setup
Install the SDK and helpers:
pip install --upgrade openai python-dotenv
Create a .env
file next to your script and add your key:
OPENAI_API_KEY=sk-...your key...
Load it and initialize the client:
from openai import OpenAIfrom dotenv import load_dotenvimport os
load_dotenv()client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
If you prefer, you can omit
api_key=
and rely on the environment variable directly; the SDK will pick it up.
2) Basic: send a prompt and get text back
def summarize(text: str) -> str: res = client.responses.create( model="gpt-4.1-mini", input=f"Summarize in 3 bullet points:
{text}" ) return res.output_text
if __name__ == "__main__": print(summarize("""Astro is a content-first web framework. It ships little client-side JS by default and supports islands of interactivity."""))
Run it:
python app.py
You should see three concise bullets. Swap gpt-4.1-mini
for another model if you like.
3) Stream tokens as they arrive (SSE)
Streaming gives users feedback immediately. The SDK provides a convenient stream context manager:
def stream_answer(prompt: str): with client.responses.stream( model="gpt-4.1-mini", input=prompt ) as stream: # Print tokens as they come in for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True)
# Wait for completion and fetch the final text (optional) final_text = stream.get_text() print("\n\n[done]\n") return final_text
if __name__ == "__main__": stream_answer("Give me 5 quick tips for writing clearer commit messages.")
Tip: In a web app, forward these deltas to the browser via Server‑Sent Events or websockets.
4) Ask for strict JSON (schema enforced)
When you need machine‑readable output, define a JSON schema and let the model conform to it.
import json
def plan_todo(item: str): schema = { "name": "TodoItem", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "due_date": {"type": "string", "description": "ISO-8601 date or empty if none"}, "priority": {"type": "integer", "minimum": 1, "maximum": 5} }, "required": ["title", "priority"], "additionalProperties": False }, "strict": True }
res = client.responses.create( model="gpt-4.1-mini", input=f"Create a task for: {item}", response_format={ "type": "json_schema", "json_schema": schema }, temperature=0 )
# Get validated JSON string and load into Python json_text = res.output_text data = json.loads(json_text) return data
if __name__ == "__main__": print(plan_todo("Ship v1 of the blog theme"))
This will return something like:
{ "title": "Ship v1 of the blog theme", "due_date": "", "priority": 3 }
Because strict
is True
, fields outside your schema are rejected.
5) Robust calls with retries (rate limits, timeouts)
Use a small wrapper with exponential backoff. The example below retries on common transient failures.
import timeimport httpx
def call_with_retries(make_call, attempts=5): for i in range(attempts): try: return make_call() except (httpx.ReadTimeout, httpx.ConnectTimeout) as e: if i == attempts - 1: raise sleep_s = 2 ** i print(f"Timeout, retrying in {sleep_s}s...") time.sleep(sleep_s)
def ask(prompt: str) -> str: def _do(): res = client.responses.create( model="gpt-4.1-mini", input=prompt ) return res.output_text return call_with_retries(_do)
Bonus: cURL examples
If you just want to test from a shell, here are minimal cURL calls. Make sure OPENAI_API_KEY
is set in your environment.
Basic text generation
curl https://api.openai.com/v1/responses -H "Authorization: Bearer $OPENAI_API_KEY" -H "Content-Type: application/json" -d '{ "model": "gpt-4.1-mini", "input": "Write a single-sentence summary of the benefits of unit tests." }'
Structured JSON (same schema as above)
curl https://api.openai.com/v1/responses -H "Authorization: Bearer $OPENAI_API_KEY" -H "Content-Type: application/json" -d '{ "model": "gpt-4.1-mini", "response_format": { "type": "json_schema", "json_schema": { "name": "TodoItem", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "due_date": {"type": "string"}, "priority": {"type": "integer","minimum":1,"maximum":5} }, "required": ["title","priority"], "additionalProperties": false }, "strict": true } }, "input": "Create a todo for fixing flaky tests" }'
Troubleshooting
- 401/403: Check your API key and that it’s in the environment where your code runs.
- 404: Verify endpoint and model name.
- 429: You’re rate‑limited; add backoff and reduce concurrency.
- 5xx: Transient; retry with exponential backoff.
- Invalid JSON: Use
response_format
with a schema (as above) or settemperature
lower.
Where to go next
- Add conversation state by sending previous messages/outputs along with your new prompt.
- Stream to the browser with SSE for real‑time UX.
- Wrap the client in your own module so you can swap models later without touching the rest of your app.