https://github.com/scrapegraphai/scrapegraphai-ruby

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/scrapegraphai/scrapegraphai-ruby
Owner: ScrapeGraphAI
License: apache-2.0
Created: 2025-08-12T10:40:20.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-08-25T02:14:00.000Z (5 months ago)
Last Synced: 2025-09-06T15:01:46.204Z (5 months ago)
Language: Ruby
Size: 167 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

# Scrapegraphai Ruby API library

The Scrapegraphai Ruby library provides convenient access to the Scrapegraphai REST API from any Ruby 3.2.0+ application. It ships with comprehensive types & docstrings in Yard, RBS, and RBI – [see below](https://github.com/stainless-sdks/scrapegraphai-ruby#Sorbet) for usage with Sorbet. The standard library's `net/http` is used as the HTTP transport, with connection pooling via the `connection_pool` gem.

It is generated with [Stainless](https://www.stainless.com/).

## Documentation

Documentation for releases of this gem can be found [on RubyDoc](https://gemdocs.org/gems/scrapegraphai).

The REST API documentation can be found on [scrapegraphai.com](https://scrapegraphai.com).

## Installation

To use this gem, install via Bundler by adding the following to your application's `Gemfile`:

```ruby
gem "scrapegraphai", "~> 0.0.1"
```

## Usage

```ruby
require "bundler/setup"
require "scrapegraphai"

scrapegraphai = Scrapegraphai::Client.new(
api_key: ENV["SCRAPEGRAPHAI_API_KEY"], # This is the default and can be omitted
environment: "environment_1" # defaults to "production"
)

completed_smartscraper = scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")

puts(completed_smartscraper.request_id)
```

### Handling errors

When the library is unable to connect to the API, or if the API returns a non-success status code (i.e., 4xx or 5xx response), a subclass of `Scrapegraphai::Errors::APIError` will be thrown:

```ruby
begin
smartscraper = scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")
rescue Scrapegraphai::Errors::APIConnectionError => e
puts("The server could not be reached")
puts(e.cause) # an underlying Exception, likely raised within `net/http`
rescue Scrapegraphai::Errors::RateLimitError => e
puts("A 429 status code was received; we should back off a bit.")
rescue Scrapegraphai::Errors::APIStatusError => e
puts("Another non-200-range status code was received")
puts(e.status)
end
```

Error codes are as follows:

| Cause | Error Type |
| ---------------- | -------------------------- |
| HTTP 400 | `BadRequestError` |
| HTTP 401 | `AuthenticationError` |
| HTTP 403 | `PermissionDeniedError` |
| HTTP 404 | `NotFoundError` |
| HTTP 409 | `ConflictError` |
| HTTP 422 | `UnprocessableEntityError` |
| HTTP 429 | `RateLimitError` |
| HTTP >= 500 | `InternalServerError` |
| Other HTTP error | `APIStatusError` |
| Timeout | `APITimeoutError` |
| Network error | `APIConnectionError` |

### Retries

Certain errors will be automatically retried 2 times by default, with a short exponential backoff.

Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, >=500 Internal errors, and timeouts will all be retried by default.

You can use the `max_retries` option to configure or disable this:

```ruby
# Configure the default for all requests:
scrapegraphai = Scrapegraphai::Client.new(
max_retries: 0 # default is 2
)

# Or, configure per-request:
scrapegraphai.smartscraper.create(
user_prompt: "Extract the product name, price, and description",
request_options: {max_retries: 5}
)
```

### Timeouts

By default, requests will time out after 60 seconds. You can use the timeout option to configure or disable this:

```ruby
# Configure the default for all requests:
scrapegraphai = Scrapegraphai::Client.new(
timeout: nil # default is 60
)

# Or, configure per-request:
scrapegraphai.smartscraper.create(
user_prompt: "Extract the product name, price, and description",
request_options: {timeout: 5}
)
```

On timeout, `Scrapegraphai::Errors::APITimeoutError` is raised.

Note that requests that time out are retried by default.

## Advanced concepts

### BaseModel

All parameter and response objects inherit from `Scrapegraphai::Internal::Type::BaseModel`, which provides several conveniences, including:

1. All fields, including unknown ones, are accessible with `obj[:prop]` syntax, and can be destructured with `obj => {prop: prop}` or pattern-matching syntax.

2. Structural equivalence for equality; if two API calls return the same values, comparing the responses with == will return true.

3. Both instances and the classes themselves can be pretty-printed.

4. Helpers such as `#to_h`, `#deep_to_h`, `#to_json`, and `#to_yaml`.

### Making custom or undocumented requests

#### Undocumented properties

You can send undocumented parameters to any endpoint, and read undocumented response properties, like so:

Note: the `extra_` parameters of the same name overrides the documented parameters.

```ruby
completed_smartscraper =
scrapegraphai.smartscraper.create(
user_prompt: "Extract the product name, price, and description",
request_options: {
extra_query: {my_query_parameter: value},
extra_body: {my_body_parameter: value},
extra_headers: {"my-header": value}
}
)

puts(completed_smartscraper[:my_undocumented_property])
```

#### Undocumented request params

If you want to explicitly send an extra param, you can do so with the `extra_query`, `extra_body`, and `extra_headers` under the `request_options:` parameter when making a request, as seen in the examples above.

#### Undocumented endpoints

To make requests to undocumented endpoints while retaining the benefit of auth, retries, and so on, you can make requests using `client.request`, like so:

```ruby
response = client.request(
method: :post,
path: '/undocumented/endpoint',
query: {"dog": "woof"},
headers: {"useful-header": "interesting-value"},
body: {"hello": "world"}
)
```

### Concurrency & connection pooling

The `Scrapegraphai::Client` instances are threadsafe, but are only are fork-safe when there are no in-flight HTTP requests.

Each instance of `Scrapegraphai::Client` has its own HTTP connection pool with a default size of 99. As such, we recommend instantiating the client once per application in most settings.

When all available connections from the pool are checked out, requests wait for a new connection to become available, with queue time counting towards the request timeout.

Unless otherwise specified, other classes in the SDK do not have locks protecting their underlying data structure.

## Sorbet

This library provides comprehensive [RBI](https://sorbet.org/docs/rbi) definitions, and has no dependency on sorbet-runtime.

You can provide typesafe request parameters like so:

```ruby
scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")
```

Or, equivalently:

```ruby
# Hashes work, but are not typesafe:
scrapegraphai.smartscraper.create(user_prompt: "Extract the product name, price, and description")

# You can also splat a full Params class:
params = Scrapegraphai::SmartscraperCreateParams.new(
user_prompt: "Extract the product name, price, and description"
)
scrapegraphai.smartscraper.create(**params)
```

### Enums

Since this library does not depend on `sorbet-runtime`, it cannot provide [`T::Enum`](https://sorbet.org/docs/tenum) instances. Instead, we provide "tagged symbols" instead, which is always a primitive at runtime:

```ruby
# :queued
puts(Scrapegraphai::CompletedSmartscraper::Status::QUEUED)

# Revealed type: `T.all(Scrapegraphai::CompletedSmartscraper::Status, Symbol)`
T.reveal_type(Scrapegraphai::CompletedSmartscraper::Status::QUEUED)
```

Enum parameters have a "relaxed" type, so you can either pass in enum constants or their literal value:

```ruby
Scrapegraphai::CompletedSmartscraper.new(
status: Scrapegraphai::CompletedSmartscraper::Status::QUEUED,
# …
)

Scrapegraphai::CompletedSmartscraper.new(
status: :queued,
# …
)
```

## Versioning

This package follows [SemVer](https://semver.org/spec/v2.0.0.html) conventions. As the library is in initial development and has a major version of `0`, APIs may change at any time.

This package considers improvements to the (non-runtime) `*.rbi` and `*.rbs` type definitions to be non-breaking changes.

## Requirements

Ruby 3.2.0 or higher.

## Contributing

See [the contributing documentation](https://github.com/stainless-sdks/scrapegraphai-ruby/tree/main/CONTRIBUTING.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scrapegraphai/scrapegraphai-ruby

Awesome Lists containing this project

README