https://github.com/scrapegraphai/scrapegraphai-ruby
https://github.com/scrapegraphai/scrapegraphai-ruby
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/scrapegraphai/scrapegraphai-ruby
- Owner: ScrapeGraphAI
- License: apache-2.0
- Created: 2025-08-12T10:40:20.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-08-25T02:14:00.000Z (5 months ago)
- Last Synced: 2025-09-06T15:01:46.204Z (5 months ago)
- Language: Ruby
- Size: 167 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# Scrapegraphai Ruby API library
The Scrapegraphai Ruby library provides convenient access to the Scrapegraphai REST API from any Ruby 3.2.0+ application. It ships with comprehensive types & docstrings in Yard, RBS, and RBI – [see below](https://github.com/stainless-sdks/scrapegraphai-ruby#Sorbet) for usage with Sorbet. The standard library's `net/http` is used as the HTTP transport, with connection pooling via the `connection_pool` gem.
It is generated with [Stainless](https://www.stainless.com/).
## Documentation
Documentation for releases of this gem can be found [on RubyDoc](https://gemdocs.org/gems/scrapegraphai).
The REST API documentation can be found on [scrapegraphai.com](https://scrapegraphai.com).
## Installation
To use this gem, install via Bundler by adding the following to your application's `Gemfile`:
```ruby
gem "scrapegraphai", "~> 0.0.1"
```
## Usage
```ruby
require "bundler/setup"
require "scrapegraphai"
scrapegraphai = Scrapegraphai::Client.new(
api_key: ENV["SCRAPEGRAPHAI_API_KEY"], # This is the default and can be omitted
environment: "environment_1" # defaults to "production"
)
completed_smartscraper = scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")
puts(completed_smartscraper.request_id)
```
### Handling errors
When the library is unable to connect to the API, or if the API returns a non-success status code (i.e., 4xx or 5xx response), a subclass of `Scrapegraphai::Errors::APIError` will be thrown:
```ruby
begin
smartscraper = scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")
rescue Scrapegraphai::Errors::APIConnectionError => e
puts("The server could not be reached")
puts(e.cause) # an underlying Exception, likely raised within `net/http`
rescue Scrapegraphai::Errors::RateLimitError => e
puts("A 429 status code was received; we should back off a bit.")
rescue Scrapegraphai::Errors::APIStatusError => e
puts("Another non-200-range status code was received")
puts(e.status)
end
```
Error codes are as follows:
| Cause | Error Type |
| ---------------- | -------------------------- |
| HTTP 400 | `BadRequestError` |
| HTTP 401 | `AuthenticationError` |
| HTTP 403 | `PermissionDeniedError` |
| HTTP 404 | `NotFoundError` |
| HTTP 409 | `ConflictError` |
| HTTP 422 | `UnprocessableEntityError` |
| HTTP 429 | `RateLimitError` |
| HTTP >= 500 | `InternalServerError` |
| Other HTTP error | `APIStatusError` |
| Timeout | `APITimeoutError` |
| Network error | `APIConnectionError` |
### Retries
Certain errors will be automatically retried 2 times by default, with a short exponential backoff.
Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, >=500 Internal errors, and timeouts will all be retried by default.
You can use the `max_retries` option to configure or disable this:
```ruby
# Configure the default for all requests:
scrapegraphai = Scrapegraphai::Client.new(
max_retries: 0 # default is 2
)
# Or, configure per-request:
scrapegraphai.smartscraper.create(
user_prompt: "Extract the product name, price, and description",
request_options: {max_retries: 5}
)
```
### Timeouts
By default, requests will time out after 60 seconds. You can use the timeout option to configure or disable this:
```ruby
# Configure the default for all requests:
scrapegraphai = Scrapegraphai::Client.new(
timeout: nil # default is 60
)
# Or, configure per-request:
scrapegraphai.smartscraper.create(
user_prompt: "Extract the product name, price, and description",
request_options: {timeout: 5}
)
```
On timeout, `Scrapegraphai::Errors::APITimeoutError` is raised.
Note that requests that time out are retried by default.
## Advanced concepts
### BaseModel
All parameter and response objects inherit from `Scrapegraphai::Internal::Type::BaseModel`, which provides several conveniences, including:
1. All fields, including unknown ones, are accessible with `obj[:prop]` syntax, and can be destructured with `obj => {prop: prop}` or pattern-matching syntax.
2. Structural equivalence for equality; if two API calls return the same values, comparing the responses with == will return true.
3. Both instances and the classes themselves can be pretty-printed.
4. Helpers such as `#to_h`, `#deep_to_h`, `#to_json`, and `#to_yaml`.
### Making custom or undocumented requests
#### Undocumented properties
You can send undocumented parameters to any endpoint, and read undocumented response properties, like so:
Note: the `extra_` parameters of the same name overrides the documented parameters.
```ruby
completed_smartscraper =
scrapegraphai.smartscraper.create(
user_prompt: "Extract the product name, price, and description",
request_options: {
extra_query: {my_query_parameter: value},
extra_body: {my_body_parameter: value},
extra_headers: {"my-header": value}
}
)
puts(completed_smartscraper[:my_undocumented_property])
```
#### Undocumented request params
If you want to explicitly send an extra param, you can do so with the `extra_query`, `extra_body`, and `extra_headers` under the `request_options:` parameter when making a request, as seen in the examples above.
#### Undocumented endpoints
To make requests to undocumented endpoints while retaining the benefit of auth, retries, and so on, you can make requests using `client.request`, like so:
```ruby
response = client.request(
method: :post,
path: '/undocumented/endpoint',
query: {"dog": "woof"},
headers: {"useful-header": "interesting-value"},
body: {"hello": "world"}
)
```
### Concurrency & connection pooling
The `Scrapegraphai::Client` instances are threadsafe, but are only are fork-safe when there are no in-flight HTTP requests.
Each instance of `Scrapegraphai::Client` has its own HTTP connection pool with a default size of 99. As such, we recommend instantiating the client once per application in most settings.
When all available connections from the pool are checked out, requests wait for a new connection to become available, with queue time counting towards the request timeout.
Unless otherwise specified, other classes in the SDK do not have locks protecting their underlying data structure.
## Sorbet
This library provides comprehensive [RBI](https://sorbet.org/docs/rbi) definitions, and has no dependency on sorbet-runtime.
You can provide typesafe request parameters like so:
```ruby
scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")
```
Or, equivalently:
```ruby
# Hashes work, but are not typesafe:
scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")
# You can also splat a full Params class:
params = Scrapegraphai::SmartscraperCreateParams.new(
user_prompt: "Extract the product name, price, and description"
)
scrapegraphai.smartscraper.create(**params)
```
### Enums
Since this library does not depend on `sorbet-runtime`, it cannot provide [`T::Enum`](https://sorbet.org/docs/tenum) instances. Instead, we provide "tagged symbols" instead, which is always a primitive at runtime:
```ruby
# :queued
puts(Scrapegraphai::CompletedSmartscraper::Status::QUEUED)
# Revealed type: `T.all(Scrapegraphai::CompletedSmartscraper::Status, Symbol)`
T.reveal_type(Scrapegraphai::CompletedSmartscraper::Status::QUEUED)
```
Enum parameters have a "relaxed" type, so you can either pass in enum constants or their literal value:
```ruby
Scrapegraphai::CompletedSmartscraper.new(
status: Scrapegraphai::CompletedSmartscraper::Status::QUEUED,
# …
)
Scrapegraphai::CompletedSmartscraper.new(
status: :queued,
# …
)
```
## Versioning
This package follows [SemVer](https://semver.org/spec/v2.0.0.html) conventions. As the library is in initial development and has a major version of `0`, APIs may change at any time.
This package considers improvements to the (non-runtime) `*.rbi` and `*.rbs` type definitions to be non-breaking changes.
## Requirements
Ruby 3.2.0 or higher.
## Contributing
See [the contributing documentation](https://github.com/stainless-sdks/scrapegraphai-ruby/tree/main/CONTRIBUTING.md).