https://github.com/raohai/aws-lambda-response-streaming
An AWS lambda python streaming response example
https://github.com/raohai/aws-lambda-response-streaming
lambda openai serverless streaming
Last synced: 10 months ago
JSON representation
An AWS lambda python streaming response example
- Host: GitHub
- URL: https://github.com/raohai/aws-lambda-response-streaming
- Owner: RaoHai
- License: apache-2.0
- Created: 2024-04-09T07:34:11.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-09T10:54:43.000Z (almost 2 years ago)
- Last Synced: 2025-04-09T00:23:20.101Z (12 months ago)
- Topics: lambda, openai, serverless, streaming
- Language: Python
- Homepage:
- Size: 3.56 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Building streaming functions with Python on AWS Lambda
This example shows streaming response from OpenAI completions with FastAPI on AWS Lambda.
Credit to [aws-lambda-web-adapter](https://github.com/awslabs/aws-lambda-web-adapter)

## How does it work?
This example uses FastAPI provides inference API. The inference API endpoint invokes OpenAI, and streams the response. Both Lambda Web Adapter and function URL have response streaming mode enabled. So the response from OpenAI are streamed all the way back to the client.
This function is packaged as a Docker image. Here is the content of the Dockerfile.
```dockerfile
FROM public.ecr.aws/docker/library/python:3.12.0-slim-bullseye
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.1 /lambda-adapter /opt/extensions/lambda-adapter
# Copy function code
COPY . ${LAMBDA_TASK_ROOT}
# from your project folder.
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" -U --no-cache-dir
CMD ["python", "main.py"]
```
Notice that we only need to add the second line to install Lambda Web Adapter.
```dockerfile
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.1 /lambda-adapter /opt/extensions/
```
In the SAM template, we use an environment variable `AWS_LWA_INVOKE_MODE: RESPONSE_STREAM` to configure Lambda Web Adapter in response streaming mode. And adding a function url with `InvokeMode: RESPONSE_STREAM`.
```yaml
FastAPIFunction:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
MemorySize: 512
Environment:
Variables:
AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
FunctionUrlConfig:
AuthType: NONE
InvokeMode: RESPONSE_STREAM
Policies:
- Statement:
- Sid: BedrockInvokePolicy
Effect: Allow
Action:
- bedrock:InvokeModelWithResponseStream
Resource: '*'
```
## Build and deploy
Run the following commends to build and deploy this example.
```bash
sam build --use-container
sam deploy --guided
```
## Test the example
After the deployment completes, curl the `FastAPIFunctionUrl`.
```bash
curl -v -N --location '${{FastAPIFunctionUrl}}/api/chat/stream' \
--header 'Content-Type: application/json' \
--header 'Transfer-Encoding: chunked' \
--data '{"messages":[{"role":"user","content":"Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ..."}],"prompt":""}'
```
