Streaming Like ChatGPT: Build Server-Sent Events from Scratch

Ever wonder how ChatGPT streams responses token-by-token? Or how to push real-time data from a server to a browser? The secret is Server-Sent Events (SSE).

I built an SSE server from scratch (now streaming LLM responses!) to understand how it works. Turns out, it's ridiculously simple.

What's SSE?

SSE is a way for servers to push data to browsers. Unlike WebSockets, it's:

One-way: Server → Client only
Just HTTP: No special protocol, no handshake dance
Auto-reconnect: Browser handles reconnection automatically
Simple: You can implement it in 20 lines of code

Use cases:

LLM streaming (ChatGPT-style responses)
Live notifications
Stock tickers
Social media feeds
Log streaming
Any "subscribe to updates" scenario

The Entire Protocol in 30 Seconds

Here's how SSE works:

1. Client makes a regular HTTP request:

const eventSource = new EventSource('/events');

2. Server responds with special headers:

HTTP/1.1 200 OK
Content-Type: text/event-stream    ← This is the magic
Cache-Control: no-cache
Connection: keep-alive

3. Server keeps sending data:

data: Hello!\n
\n
data: Another message\n
\n
data: Keep going...\n
\n

That's it. The connection stays open, and the server pushes messages whenever it wants.

The Code

Here's the core of an SSE server (streaming LLM responses):

def stream_events(client, prompt):
    # SSE headers - this is what makes it work
    response = "HTTP/1.1 200 OK\r\n"
    response += "Content-Type: text/event-stream\r\n"  # Magic!
    response += "Cache-Control: no-cache\r\n"
    response += "\r\n"
    
    client.sendall(response.encode())
    
    # Connect to LM Studio (or any OpenAI-compatible API)
    llm_client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
    
    # Stream LLM response token-by-token
    stream = llm_client.chat.completions.create(
        model="local-model",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            token = chunk.choices[0].delta.content
            client.sendall(f"data: {token}\n\n".encode())

On the client side:

const events = new EventSource('/events?prompt=' + userInput);
 
events.onmessage = function(e) {
    // Each token arrives here as it's generated
    outputDiv.innerHTML += e.data;  // Accumulate the response
};

Browser does all the heavy lifting. It keeps the connection open and fires onmessage for each token as it streams from the LLM.

SSE Message Format

Messages are dead simple:

data: your message here\n
\n

Start with data:
End with \n\n (two newlines)

For multi-line messages:

data: line one\n
data: line two\n
\n

You can also add event types and IDs:

event: notification\n
id: 42\n
data: You have a new message!\n
\n

The client can listen for specific events:

events.addEventListener('notification', function(e) {
    alert(e.data);
});

SSE vs WebSockets

Feature	SSE	WebSockets
Direction	Server → Client	Bidirectional
Protocol	HTTP	Custom (ws://)
Complexity	Simple	More complex
Reconnection	Automatic	Manual
Binary data	No (text only)	Yes
Browser support	All modern	All modern

Use SSE when:

You only need server-to-client
You want simplicity
You're already using HTTP

Use WebSockets when:

You need client-to-server too (chat, games)
You need binary data
You need absolute minimum latency

Why SSE Is Underrated

It's just HTTP — works through proxies, load balancers, firewalls
No library needed — EventSource is built into browsers
Auto-reconnect — if connection drops, browser retries automatically
Simple to debug — it's plain text, use curl or browser devtools
No handshake — starts immediately, no back-and-forth

Why SSE Is Perfect for LLM Streaming

This is why every major LLM provider (OpenAI, Anthropic, Google) uses SSE for streaming:

Token-by-token delivery — Each token is a separate SSE event, displayed immediately
Built-in buffering — Browser handles all the complexity of assembling partial responses
Automatic recovery — If connection drops mid-generation, browser can reconnect
Unidirectional — LLMs only need server → client (no need for WebSocket bidirectional complexity)
HTTP-based — Works everywhere, through all proxies and firewalls

The alternative would be:

Long polling → Horrible latency, terrible UX
WebSockets → Overkill for one-way streaming
Wait for full response → No streaming feel, slow perceived performance

SSE is the sweet spot. That's why ChatGPT, Claude, and every other AI chat interface uses it under the hood.

Try It

First, set up LM Studio:

Download from https://lmstudio.ai/
Load any model (Llama, Mistral, Phi, etc.)
Start the local server (defaults to port 1234)

Then:

pip install openai
python server.py

Open http://localhost:8080, type a prompt, and click "Generate". Watch the LLM response stream in token-by-token, just like ChatGPT.

Open DevTools → Network tab. You'll see a single /events request that stays open. The "Size" column keeps growing as tokens stream in.

That's SSE powering real-time LLM responses.

The Bottom Line

SSE is HTTP's built-in streaming solution. If you need real-time server-to-client updates and don't want the complexity of WebSockets, SSE is your answer.

It's been around since 2006, it works in all browsers, and you can implement it from scratch in an afternoon.

Sometimes the simple solution is the right one.

Ever wonder how ChatGPT streams responses token-by-token? Or how to push real-time data from a server to a browser? The secret is Server-Sent Events (SSE).

I built an SSE server from scratch (now streaming LLM responses!) to understand how it works. Turns out, it's ridiculously simple.

What's SSE?

SSE is a way for servers to push data to browsers. Unlike WebSockets, it's:

One-way: Server → Client only
Just HTTP: No special protocol, no handshake dance
Auto-reconnect: Browser handles reconnection automatically
Simple: You can implement it in 20 lines of code

Use cases:

LLM streaming (ChatGPT-style responses)
Live notifications
Stock tickers
Social media feeds
Log streaming
Any "subscribe to updates" scenario

The Entire Protocol in 30 Seconds

Here's how SSE works:

1. Client makes a regular HTTP request:

const eventSource = new EventSource('/events');

2. Server responds with special headers:

HTTP/1.1 200 OK
Content-Type: text/event-stream    ← This is the magic
Cache-Control: no-cache
Connection: keep-alive

3. Server keeps sending data:

data: Hello!\n
\n
data: Another message\n
\n
data: Keep going...\n
\n

That's it. The connection stays open, and the server pushes messages whenever it wants.

The Code

Here's the core of an SSE server (streaming LLM responses):

def stream_events(client, prompt):
    # SSE headers - this is what makes it work
    response = "HTTP/1.1 200 OK\r\n"
    response += "Content-Type: text/event-stream\r\n"  # Magic!
    response += "Cache-Control: no-cache\r\n"
    response += "\r\n"
    
    client.sendall(response.encode())
    
    # Connect to LM Studio (or any OpenAI-compatible API)
    llm_client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
    
    # Stream LLM response token-by-token
    stream = llm_client.chat.completions.create(
        model="local-model",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            token = chunk.choices[0].delta.content
            client.sendall(f"data: {token}\n\n".encode())

On the client side:

const events = new EventSource('/events?prompt=' + userInput);
 
events.onmessage = function(e) {
    // Each token arrives here as it's generated
    outputDiv.innerHTML += e.data;  // Accumulate the response
};

Browser does all the heavy lifting. It keeps the connection open and fires onmessage for each token as it streams from the LLM.

SSE Message Format

Messages are dead simple:

data: your message here\n
\n

Start with data:
End with \n\n (two newlines)

For multi-line messages:

data: line one\n
data: line two\n
\n

You can also add event types and IDs:

event: notification\n
id: 42\n
data: You have a new message!\n
\n

The client can listen for specific events:

events.addEventListener('notification', function(e) {
    alert(e.data);
});

SSE vs WebSockets

Feature	SSE	WebSockets
Direction	Server → Client	Bidirectional
Protocol	HTTP	Custom (ws://)
Complexity	Simple	More complex
Reconnection	Automatic	Manual
Binary data	No (text only)	Yes
Browser support	All modern	All modern

Use SSE when:

You only need server-to-client
You want simplicity
You're already using HTTP

Use WebSockets when:

You need client-to-server too (chat, games)
You need binary data
You need absolute minimum latency

Why SSE Is Underrated

It's just HTTP — works through proxies, load balancers, firewalls
No library needed — EventSource is built into browsers
Auto-reconnect — if connection drops, browser retries automatically
Simple to debug — it's plain text, use curl or browser devtools
No handshake — starts immediately, no back-and-forth

Why SSE Is Perfect for LLM Streaming

This is why every major LLM provider (OpenAI, Anthropic, Google) uses SSE for streaming:

Token-by-token delivery — Each token is a separate SSE event, displayed immediately
Built-in buffering — Browser handles all the complexity of assembling partial responses
Automatic recovery — If connection drops mid-generation, browser can reconnect
Unidirectional — LLMs only need server → client (no need for WebSocket bidirectional complexity)
HTTP-based — Works everywhere, through all proxies and firewalls

The alternative would be:

Long polling → Horrible latency, terrible UX
WebSockets → Overkill for one-way streaming
Wait for full response → No streaming feel, slow perceived performance

SSE is the sweet spot. That's why ChatGPT, Claude, and every other AI chat interface uses it under the hood.

Try It

First, set up LM Studio:

Download from https://lmstudio.ai/
Load any model (Llama, Mistral, Phi, etc.)
Start the local server (defaults to port 1234)

Then:

pip install openai
python server.py

Open http://localhost:8080, type a prompt, and click "Generate". Watch the LLM response stream in token-by-token, just like ChatGPT.

Open DevTools → Network tab. You'll see a single /events request that stays open. The "Size" column keeps growing as tokens stream in.

That's SSE powering real-time LLM responses.

The Bottom Line

SSE is HTTP's built-in streaming solution. If you need real-time server-to-client updates and don't want the complexity of WebSockets, SSE is your answer.

It's been around since 2006, it works in all browsers, and you can implement it from scratch in an afternoon.

Sometimes the simple solution is the right one.

Streaming Like ChatGPT: Build Server-Sent Events from Scratch

What's SSE?

The Entire Protocol in 30 Seconds

The Code

SSE Message Format

SSE vs WebSockets

Why SSE Is Underrated

Why SSE Is Perfect for LLM Streaming

Try It

The Bottom Line

On this page

Streaming Like ChatGPT: Build Server-Sent Events from Scratch

What's SSE?

The Entire Protocol in 30 Seconds

The Code

SSE Message Format

SSE vs WebSockets

Why SSE Is Underrated

Why SSE Is Perfect for LLM Streaming

Try It

The Bottom Line

On this page