{"id":31600242,"url":"https://github.com/xaquake/yandex-images-parser","last_synced_at":"2026-05-16T00:38:22.112Z","repository":{"id":316763502,"uuid":"1064752156","full_name":"xaquake/yandex-images-parser","owner":"xaquake","description":"library for parsing and downloading images from Yandex Images via geckodriver with C API","archived":false,"fork":false,"pushed_at":"2025-09-27T18:30:21.000Z","size":72,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-06T06:57:08.853Z","etag":null,"topics":["c","firefox","firefox-esr","gcc","gecko","geckodriver","libcurl","parser","parsing","parsing-library","web","webdriver","yandex","yandex-api","yandex-image-parser","yandex-images","yandex-images-crawler","yandex-images-parser"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xaquake.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-26T13:55:51.000Z","updated_at":"2025-09-27T18:40:08.000Z","dependencies_parsed_at":"2025-09-29T20:17:07.982Z","dependency_job_id":null,"html_url":"https://github.com/xaquake/yandex-images-parser","commit_stats":null,"previous_names":["xaquake/yandex_images_parser"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/xaquake/yandex-images-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xaquake%2Fyandex-images-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xaquake%2Fyandex-images-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xaquake%2Fyandex-images-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xaquake%2Fyandex-images-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xaquake","download_url":"https://codeload.github.com/xaquake/yandex-images-parser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xaquake%2Fyandex-images-parser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281397300,"owners_count":26493908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","firefox","firefox-esr","gcc","gecko","geckodriver","libcurl","parser","parsing","parsing-library","web","webdriver","yandex","yandex-api","yandex-image-parser","yandex-images","yandex-images-crawler","yandex-images-parser"],"created_at":"2025-10-06T06:54:29.941Z","updated_at":"2025-10-28T07:03:13.806Z","avatar_url":"https://github.com/xaquake.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Yandex Images Parser Library\n\nA high-performance C library for parsing and downloading images and videos from Yandex Images search engine using WebDriver automation.\n\n## Features\n\n- **Image \u0026 Video Search** - Extract media URLs from Yandex Images search results\n- **Batch Downloads** - Download multiple images and videos with optimized performance\n- **Multi-threaded Downloads** - Concurrent downloads with significant speed improvement\n- **Streaming Downloads** - Chunked I/O for faster video downloads\n- **WebDriver Integration** - Seamless GeckoDriver/Firefox automation\n- **Range Selection** - Download specific image and video ranges\n- **File Operations** - Save image and video URLs to files for later processing\n- **Optimized Performance** - Assembly-optimized with maximum compiler optimizations\n- **Error Handling** - Comprehensive error codes and status reporting\n\n## Prerequisites\n\n- **libcurl4-openssl-dev** - HTTP client library\n- **firefox-esr** - Firefox browser for headless operation  \n- **geckodriver** - WebDriver implementation for Firefox\n- **cmake** - Build system\n- **pthread** - Multi-threading support\n\n## Building\n\nThe library uses CMake with maximum optimization settings:\n\n```bash\n# Create build directory\nmkdir build\ncd build\n\n# Configure with CMake\ncmake ..\n\n# Build with optimizations\nmake\n\n# Install globally (requires sudo)\nsudo make install\n```\n\n## Performance\n\n### Memory Usage\n- **Resident Memory**: ~12MB (RSS)\n- **Virtual Memory**: ~93MB (VSZ)\n\n### Optimization Features\n- **Compiler Flags**: `-O3 -flto -march=native -mtune=native -funroll-loops -finline-functions -fomit-frame-pointer`\n- **Linker Flags**: `-Wl,--gc-sections -Wl,--strip-all`\n- **Assembly Optimization**: Native CPU optimizations applied\n- **Size Optimization**: Function and data sections optimization\n\n## API Reference\n\n### Session Management\n```c\n// Create a new search session\nyandex_session_t* yandex_create_session(const char* query, int scroll_count);\n\n// Free session memory\nvoid yandex_free_session(yandex_session_t* session);\n\n// Search for images\nint yandex_search_images(yandex_session_t* session);\n\n// Search for videos\nint yandex_search_videos(yandex_session_t* session);\n\n// Get image range\nint yandex_get_image_range(yandex_session_t* session, size_t start, size_t end, yandex_image_t** result, size_t* count);\n\n// Get video range\nint yandex_get_video_range(yandex_session_t* session, size_t start, size_t end, yandex_video_t** result, size_t* count);\n\n// Free images\nvoid yandex_free_images(yandex_image_t* images, size_t count);\n\n// Free videos\nvoid yandex_free_videos(yandex_video_t* videos, size_t count);\n```\n\n### Download Functions\n```c\n// Download single image\nint yandex_download_single_image(const char* url, const char* filename);\n\n// Download single video\nint yandex_download_single_video(const char* url, const char* filename);\n\n// Download single video with streaming\nint yandex_download_single_video_stream(const char* url, const char* filename, size_t chunk_size);\n\n// Download all images\nint yandex_download_images(yandex_session_t* session, const char* download_dir);\n\n// Download all videos\nint yandex_download_videos(yandex_session_t* session, const char* download_dir);\n\n// Download image range\nint yandex_download_image_range(yandex_session_t* session, size_t start, size_t end, const char* download_dir);\n\n// Download video range\nint yandex_download_video_range(yandex_session_t* session, size_t start, size_t end, const char* download_dir);\n```\n\n### Multi-threaded Downloads\n```c\n// Download images with multiple threads\nint yandex_download_images_multithreaded(yandex_session_t *session, const char *download_dir, int max_threads);\n\n// Download videos with multiple threads  \nint yandex_download_videos_multithreaded(yandex_session_t *session, const char *download_dir, int max_threads);\n\n// Download with custom thread pool\nint yandex_download_multithreaded_pool(yandex_download_pool_t *pool);\n\n// Free download pool\nvoid yandex_free_download_pool(yandex_download_pool_t *pool);\n```\n\n### File Operations\n```c\n// Save image URLs to file\nint yandex_save_images(yandex_session_t* session, const char* filename);\n\n// Save video URLs to file\nint yandex_save_videos(yandex_session_t* session, const char* filename);\n\n// Save image range to file\nint yandex_save_image_range(yandex_session_t* session, size_t start, size_t end, const char* filename);\n\n// Save video range to file\nint yandex_save_video_range(yandex_session_t* session, size_t start, size_t end, const char* filename);\n```\n\n### Utility Functions\n```c\n// Get error string\nconst char* yandex_get_error_string(int error_code);\n\n// Get image count\nint yandex_get_image_count(yandex_session_t* session);\n\n// Get video count\nint yandex_get_video_count(yandex_session_t* session);\n\n// Get download status\nint yandex_get_download_status(yandex_image_t* image);\nint yandex_get_video_download_status(yandex_video_t* video);\n```\n\n### GeckoDriver Management\n```c\n// Start GeckoDriver\nint yandex_start_geckodriver(void);\n\n// Stop GeckoDriver\nvoid yandex_stop_geckodriver(void);\n\n// Check if GeckoDriver is running\nint yandex_is_geckodriver_running(void);\n\n// Set GeckoDriver port\nint yandex_set_geckodriver_port(int port);\n\n// Get GeckoDriver port\nint yandex_get_geckodriver_port(void);\n```\n\n## Data Structures\n\n### Session Structure\n```c\ntypedef struct {\n    char *session_id;\n    char *query;\n    yandex_image_t *images;\n    size_t image_count;\n    yandex_video_t *videos;\n    size_t video_count;\n    int scroll_count;\n} yandex_session_t;\n```\n\n### Image Structure\n```c\ntypedef struct {\n    char *url;\n    char *title;\n    char *source;\n    size_t index;\n    char *local_path;\n    int download_status;\n} yandex_image_t;\n```\n\n### Video Structure\n```c\ntypedef struct {\n    char *url;\n    char *title;\n    char *source;\n    char *thumbnail_url;\n    size_t index;\n    char *local_path;\n    int download_status;\n    int duration;\n    char *format;\n} yandex_video_t;\n```\n\n## Error Codes\n\n```c\n#define YANDEX_SUCCESS                   0\n#define YANDEX_ERROR_INVALID_PARAM      -1\n#define YANDEX_ERROR_MEMORY             -2\n#define YANDEX_ERROR_NETWORK            -3\n#define YANDEX_ERROR_DRIVER             -4\n#define YANDEX_ERROR_NO_IMAGES          -5\n#define YANDEX_ERROR_DOWNLOAD           -6\n#define YANDEX_ERROR_NO_VIDEOS          -7\n#define YANDEX_ERROR_DRIVER_NOT_FOUND   -8\n#define YANDEX_ERROR_DRIVER_START_FAILED -9\n#define YANDEX_ERROR_DRIVER_CONNECTION -10\n#define YANDEX_ERROR_FIREFOX_NOT_FOUND  -11\n#define YANDEX_ERROR_SESSION_TIMEOUT   -12\n#define YANDEX_ERROR_HTTP_ERROR         -13\n#define YANDEX_ERROR_FILE_WRITE         -14\n#define YANDEX_ERROR_STREAM_INIT        -15\n#define YANDEX_ERROR_THREAD_CREATE      -16\n#define YANDEX_ERROR_THREAD_JOIN        -17\n```\n\n## Usage Examples\n\n### Basic Image Download\n```c\n#include \u003cyandex_parser.h\u003e\n#include \u003cstdio.h\u003e\n\nint main() {\n    printf(\"Starting GeckoDriver...\\n\");\n    if (yandex_start_geckodriver() != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Error: Failed to start GeckoDriver\\n\");\n        return 1;\n    }\n    \n    yandex_session_t* session = yandex_create_session(\"nature\", 3);\n    if (!session) {\n        fprintf(stderr, \"Error: Failed to create session\\n\");\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    int result = yandex_search_images(session);\n    if (result != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Error: %s\\n\", yandex_get_error_string(result));\n        yandex_free_session(session);\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    int image_count = yandex_get_image_count(session);\n    if (image_count \u003e 0) {\n        printf(\"Found %d images\\n\", image_count);\n        printf(\"Downloading first 5 images...\\n\");\n        \n        result = yandex_download_image_range(session, 0, 5, \"images\");\n        if (result == YANDEX_SUCCESS) {\n            printf(\"Images downloaded successfully\\n\");\n        } else {\n            fprintf(stderr, \"Download error: %s\\n\", yandex_get_error_string(result));\n        }\n    } else {\n        printf(\"No images found\\n\");\n    }\n    \n    yandex_free_session(session);\n    yandex_stop_geckodriver();\n    return 0;\n}\n```\n\n### Video Download\n```c\n#include \u003cyandex_parser.h\u003e\n#include \u003cstdio.h\u003e\n\nint main() {\n    printf(\"Starting GeckoDriver...\\n\");\n    if (yandex_start_geckodriver() != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Error: Failed to start GeckoDriver\\n\");\n        return 1;\n    }\n    \n    yandex_session_t* session = yandex_create_session(\"animals\", 2);\n    if (!session) {\n        fprintf(stderr, \"Error: Failed to create session\\n\");\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    int result = yandex_search_videos(session);\n    if (result != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Error: %s\\n\", yandex_get_error_string(result));\n        yandex_free_session(session);\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    int video_count = yandex_get_video_count(session);\n    if (video_count \u003e 0) {\n        printf(\"Downloading %d videos...\\n\", video_count);\n        \n        result = yandex_download_videos(session, \"videos\");\n        if (result == YANDEX_SUCCESS) {\n            printf(\"Videos downloaded successfully\\n\");\n        } else {\n            fprintf(stderr, \"Download error: %s\\n\", yandex_get_error_string(result));\n        }\n    } else {\n        printf(\"No videos found\\n\");\n    }\n    \n    yandex_free_session(session);\n    yandex_stop_geckodriver();\n    return 0;\n}\n```\n\n### Multi-threaded Download Example\n```c\n#include \u003cyandex_parser.h\u003e\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n\nint main() {\n    printf(\"Starting GeckoDriver...\\n\");\n    if (yandex_start_geckodriver() != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Error: Failed to start GeckoDriver\\n\");\n        return 1;\n    }\n    \n    // Create session for images\n    yandex_session_t* image_session = yandex_create_session(\"landscape\", 5);\n    if (!image_session) {\n        fprintf(stderr, \"Error: Failed to create image session\\n\");\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    // Create session for videos\n    yandex_session_t* video_session = yandex_create_session(\"wildlife\", 3);\n    if (!video_session) {\n        fprintf(stderr, \"Error: Failed to create video session\\n\");\n        yandex_free_session(image_session);\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    printf(\"Searching for images...\\n\");\n    int result = yandex_search_images(image_session);\n    if (result != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Image search error: %s\\n\", yandex_get_error_string(result));\n        yandex_free_session(image_session);\n        yandex_free_session(video_session);\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    printf(\"Searching for videos...\\n\");\n    result = yandex_search_videos(video_session);\n    if (result != YANDEX_SUCCESS) {\n        fprintf(stderr, \"Video search error: %s\\n\", yandex_get_error_string(result));\n        yandex_free_session(image_session);\n        yandex_free_session(video_session);\n        yandex_stop_geckodriver();\n        return 1;\n    }\n    \n    int image_count = yandex_get_image_count(image_session);\n    int video_count = yandex_get_video_count(video_session);\n    \n    printf(\"Found %d images, %d videos\\n\", image_count, video_count);\n    \n    if (image_count \u003e= 30 \u0026\u0026 video_count \u003e= 10) {\n        printf(\"Downloading 30 images with multi-threading...\\n\");\n        result = yandex_download_images_multithreaded(image_session, \"images\", 4);\n        if (result != YANDEX_SUCCESS) {\n            fprintf(stderr, \"Image download error: %s\\n\", yandex_get_error_string(result));\n        } else {\n            printf(\"Images downloaded successfully\\n\");\n        }\n        \n        printf(\"Downloading 10 videos with multi-threading...\\n\");\n        result = yandex_download_videos_multithreaded(video_session, \"videos\", 4);\n        if (result != YANDEX_SUCCESS) {\n            fprintf(stderr, \"Video download error: %s\\n\", yandex_get_error_string(result));\n        } else {\n            printf(\"Videos downloaded successfully\\n\");\n        }\n    } else {\n        printf(\"Not enough media found (need 30+ images, 10+ videos)\\n\");\n    }\n    \n    yandex_free_session(image_session);\n    yandex_free_session(video_session);\n    yandex_stop_geckodriver();\n    return 0;\n}\n```\n\n## Compilation\n\n### CMake Build System (Recommended)\n```bash\n# Create build directory\nmkdir build\ncd build\n\n# Configure and build\ncmake ..\nmake\nsudo make install\n```\n\n### Manual Compilation\n```bash\n# Basic compilation\ngcc -o my_program my_program.c -lyandex_parser -lcurl -lpthread\n\n# With optimization\ngcc -O3 -flto -march=native -o my_program my_program.c -lyandex_parser -lcurl -lpthread\n```\n\n## Troubleshooting\n\n- **GeckoDriver not found**: Install geckodriver and ensure it's in PATH\n- **Firefox not found**: `sudo apt install firefox-esr`\n- **Compilation errors**: Check that all dependencies are installed\n- **WebDriver errors**: Ensure GeckoDriver is running on the correct port\n- **Memory errors**: Always call `yandex_free_session()` to prevent memory leaks\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\n- libcurl for HTTP operations\n- Firefox ESR for WebDriver automation\n- GeckoDriver for WebDriver implementation\n- pthread for multi-threading support\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxaquake%2Fyandex-images-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxaquake%2Fyandex-images-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxaquake%2Fyandex-images-parser/lists"}