{"id":34636776,"url":"https://github.com/0xarchit/duckduckgo-webscraper","last_synced_at":"2025-12-24T17:04:10.797Z","repository":{"id":303652436,"uuid":"1016209927","full_name":"0xarchit/duckduckgo-webscraper","owner":"0xarchit","description":"Python based basic webscraper that uses rotating proxies from free working only proxylists (updates every 60minutes): https://webscrape.0xcloud.workers.dev/?key=test\u0026query=","archived":false,"fork":false,"pushed_at":"2025-07-15T14:05:03.000Z","size":131,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-16T01:37:32.435Z","etag":null,"topics":["proxy","proxy-list","proxylist","webscraper","webscraping"],"latest_commit_sha":null,"homepage":"https://duckduckgo-webscraper.onrender.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0xarchit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-08T16:48:14.000Z","updated_at":"2025-07-15T17:19:35.000Z","dependencies_parsed_at":"2025-07-08T19:05:57.687Z","dependency_job_id":null,"html_url":"https://github.com/0xarchit/duckduckgo-webscraper","commit_stats":null,"previous_names":["0xarchit/duckduckgo-webscraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/0xarchit/duckduckgo-webscraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2Fduckduckgo-webscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2Fduckduckgo-webscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2Fduckduckgo-webscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2Fduckduckgo-webscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0xarchit","download_url":"https://codeload.github.com/0xarchit/duckduckgo-webscraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2Fduckduckgo-webscraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28005408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-24T02:00:07.193Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["proxy","proxy-list","proxylist","webscraper","webscraping"],"created_at":"2025-12-24T17:02:36.980Z","updated_at":"2025-12-24T17:04:10.790Z","avatar_url":"https://github.com/0xarchit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DuckDuckGo Query Scraper\n\nA simple Python-based web scraper that uses DuckDuckGo Lite search to fetch top result pages, then extracts structured content (titles, descriptions, headings, summaries, links, and more) from each page.\n\n🌟 Give this repo a star to show support 🌟\n\n## Features\n\n- Uses DuckDuckGo Lite search for query results\n- Automatically rotates through a free proxy list sourced from GitHub\n- Parses and extracts:\n  - Page title, meta description, and main heading\n  - First few paragraphs as a summary\n  - External, internal, and social links\n  - Email/contact info and keywords\n  - Author and publication date (if available)\n  - JSON-LD structured data when present\n- Handles DuckDuckGo redirect links and skips ad redirects\n- Gracefully falls back when pages fail to load via proxy\n\n## Web Interface (FastAPI + Tailwind CSS)\n\nA modern web UI to demo and visualize scraping results in your browser.\n\n- Install additional dependencies:\n  ```powershell\n  pip install fastapi uvicorn[standard] jinja2\n  ```\n- Run the FastAPI server:\n  ```powershell\n  uvicorn app:app --reload --host 0.0.0.0 --port 8000\n  ```\n- Open your browser at http://localhost:8000\n- Enter a search query, select or provide a proxy list, and watch the live loading animation while results are fetched and displayed.\n\nThe web interface features:\n- Dropdown menu to choose from preset proxy lists or enter a custom URL\n- Loading overlay with spinner and warning message\n- Styled result cards showing metadata and content excerpts\n\nFor a minimal, standalone scraper script without FastAPI, see `basescript/scraper_base.py` in the `basescript/` folder.\n\n## Installation of Script\n\n1. Clone this repository:\n   ```powershell\n   git clone https://github.com/0xarchit/duckduckgo-webscraper.git\n   cd duckduckgo-webscraper\n   ```\n\n2. Install dependencies:\n   ```powershell\n   pip install -r requirements.txt\n   ```\n\n## Usage\n\n## Proxy List\n\nThis project maintains a free, public proxy list for scraping, automatically updated every 20 minutes via GitHub Actions. The list is sourced from multiple providers and tested for working status before publishing.\n\n- **Latest proxy list (auto-updated 20mins):**  \n  https://raw.githubusercontent.com/0xarchit/duckduckgo-webscraper/refs/heads/main/proxies.txt\n\nThe update workflow fetches fresh proxies, tests them, and commits the results to the repository every 20 minutes.  \nYou can use this list directly in your own scraping projects or scripts.\n\n\u003e **Note:** Free proxies may be slow or unreliable. For best results, consider using a\n\nRun the scraper and enter your search query (single or multi-word):\n```powershell\npython basescript\\scraper_base.py\n```\n\nThe tool uses free proxy lists fetched from GitHub, so it can be relatively slow. If you need faster and more reliable scraping, consider using a paid proxy list and update the `fetch_proxies()` URL accordingly.\n\n## Configuration\n\n- Proxy list URL is defined in `fetch_proxies()` within `webscrapper.py`.\n- Adjust `max_results` to change the number of pages scraped per query.\n- Modify delays (`time.sleep()`) to tune rate limits.\n\n## Sample Output\n\n\u003cdetails\u003e\n  \u003csummary\u003eClick to expand output content\u003c/summary\u003e\n\n```text\n\n======================================================================\n                     🌐 DuckDuckGo Query Scraper 🌐\n                            By 0xArchit\n======================================================================\n\n🔎 Enter your search query: Gen AI\n🚀 Fetching fresh proxy list...\n✅ Loaded 2393 proxies.\n\n🔍 Initiating DuckDuckGo Lite search for: 'Gen AI'\n\n🔌 Testing proxy: 51.81.245.3:17981\n✅ Proxy working: 51.81.245.3:17981\n🌐 Attempt 1/5 Scraping: https://lite.duckduckgo.com/lite/?q=Gen AI with proxy: http://51.81.245.3:17981\n  ✅ Successfully scraped https://lite.duckduckgo.com/lite/?q=Gen AI\n⏳ Waiting 5 seconds after DuckDuckGo search to avoid rate limits...\n\n➡️ Found result: 'What is generative AI? - IBM'\n   🔗 Link: https://www.ibm.com/think/topics/generative-ai\n⏳ Waiting 5 seconds before scraping this result page...\n🌐 Attempt 1/5 Scraping: https://www.ibm.com/think/topics/generative-ai with proxy: http://51.81.245.3:17981\n  ✅ Successfully scraped https://www.ibm.com/think/topics/generative-ai\n\n➡️ Found result: 'Generative artificial intelligence - Wikipedia'\n   🔗 Link: https://en.wikipedia.org/wiki/Generative_artificial_intelligence\n⏳ Waiting 5 seconds before scraping this result page...\n🌐 Attempt 1/5 Scraping: https://en.wikipedia.org/wiki/Generative_artificial_intelligence with proxy: http://51.81.245.3:17981\n  ✅ Successfully scraped https://en.wikipedia.org/wiki/Generative_artificial_intelligence\n\n➡️ Found result: 'What is Generative AI? - GeeksforGeeks'\n   🔗 Link: https://www.geeksforgeeks.org/artificial-intelligence/what-is-generative-ai/\n⏳ Waiting 5 seconds before scraping this result page...\n🌐 Attempt 1/5 Scraping: https://www.geeksforgeeks.org/artificial-intelligence/what-is-generative-ai/ with proxy: http://51.81.245.3:17981\n  ✅ Successfully scraped https://www.geeksforgeeks.org/artificial-intelligence/what-is-generative-ai/\n\n==================== UNIVERSAL SCRAPE REPORT ====================\n\n\n---------- ✨ RESULT #1 ✨ ----------------------------------------\n\n  📌 Title: What is generative AI? - IBM\n  🌐 URL  : https://www.ibm.com/think/topics/generative-ai\n\n  📊 Detailed Analysis (General Web Page):\n    - Url: https://www.ibm.com/think/topics/generative-ai\n    - Title: What is Generative AI?  | IBM\n    - Meta Description: Generative AI is artificial intelligence (AI) that can create original content in response to a user’s prompt or request.\n    - Main Heading: What is generative AI?\n    - Summary Text: Editorial Lead, AI Models  Editor, Topics \u0026 Insights for IBM Think  Generative AI, sometimes calledgen AI,isartificial intelligence(AI) that can create original content such as text, images, video, audio or software code in response to a user’s prompt or request.\n    - Links: (Complex Data - See raw content)\n    - Email: xxx@ccc.com\n    - Keywords: Generative AI\n    - Author: Cole Stryker\n    - Published Date: Not specified\n    - Structured Data: (Complex Data - See raw content)\n\n  📄 Raw Content Excerpt:\n    What is Generative AI? | IBM What is generative AI? 22 March 2024\n    Link copied Authors Cole Stryker Editorial Lead, AI Models Mark\n    Scapicchio Editor, Topics \u0026 Insights for IBM Think What is\n    generative AI? Generative AI, sometimes called gen AI, is\n    artificial intelligence (AI) that can create original content such\n    as text, images, video, audio or software code in response to a\n    user’s prompt or request. Generative AI relies on sophisticated\n    machine learning models called deep learning models algorithms\n    that simulate the learning and decision-making processes of the\n    human brain. These models work by identifying and encoding the\n    patterns and relationships in huge amounts of data, and then using\n    that information to understand users' natural language requests or\n    questions and respond with relevant new content. AI has been a hot\n    technology topic for the past decade, but generative AI, and\n    specifically the arrival of ChatGPT in 2022, has thrust AI into\n    worldwide headlines and launched an unprecedented surge of AI\n    innovation and adoption. Generative AI offers enormous\n    productivity benefits for individuals and organizations, and while\n    it also presents very real challenges and risks, businesses are\n    forging ahead, exploring how the technology can improve their\n    internal workflows and enrich their products and services.\n    According to research by the management consulting firm McKinsey,\n    one third of organizations are already using generative AI\n    regularly in at least one business function.¹ Industry analyst\n    Gartner projects more than 80% of organizations will have deployed\n    generative AI applications or used generative AI application\n    programming interfaces (APIs) by 2026. 2 How generative AI works\n    For the most part, generative AI operates in three phases:\n    Training , to create a foundation model that can serve as the\n    basis of multiple gen AI applications. Tuning , to tailor the\n    foundation model to a specific gen AI application. Generation ,\n    evaluation and retuning , to assess the gen AI application's\n    output and continually improve its quality and accuracy. Training\n    Generative AI begins with a foundation model, a deep learning\n    model that serves as the basis for multiple different types of\n    generative AI applications. The most common foundation models\n    today are large language models (LLMs) , created for text\n    generation applications, but there are also foundation models for\n    image generation, video generation, and sound and music generation\n    as well as multimodal foundation models that can support several\n    kinds content generation. To create a foundation model,\n    practitioners train a deep learning algorithm on huge volumes of\n    raw, unstructured, unlabeled data e.g., terabytes of data culled\n    from the internet or some other huge data source. During training,\n    the algorithm performs and evaluates millions of ‘fill in the\n    blank’ exercises, trying to predict the next element in a sequence\n    e.g., the next word in a sentence, the next element in an image,\n    the next command in a line of code and continually adjusting\n    itself to minimize the difference between its predictions and the\n    actual data (or ‘correct’ result). The result of this training is\n    a neural network of parameters, encoded representations of the\n    entities, patterns and relationships in the data, that can\n    generate content autonomously in response to inputs, or prompts.\n    This training process is compute-intensive, time-consuming and\n    expensive: it requires thousands of clustered graphics processing\n    units (GPUs) and weeks of processing, all of which costs millions\n    of dollars. Open-source foundation model projects, such as Meta's\n    Llama-2, enable gen AI developers to avoid this step and its\n    costs. Tuning Metaphorically speaking, a foundation model is a\n    generalist: It knows a lot about a lot of types of content, but\n    often can’t generate specific types of output with desired\n    accuracy or fidelity. For that, the model must be tuned to a\n    specific content generation task. This can be done in a variety of\n    ways. Fine tuning Fine tuning involves feeding the model labeled\n    data specific to the content generation application questions or\n    prompts the application is likely to receive, and corresponding\n    correct answers in the desired format. For example, if a\n    development team is trying to create a customer service chatbot,\n    it would create hundreds or thousands of documents containing\n    labeled customers service questions and correct answers, and then\n    feed those documents to the model. Fine-tuning is labor-intensive.\n    Developers often outsource the task to companies with large data-\n    labeling workforces. Reinforcement learning with human feedback\n    (RLHF) In RLHF , human users respond to generated content with\n    evaluations the model can use to update the model for greater\n    accuracy or relevance. Often, RLHF involves people ‘scoring’\n    different outputs in response to the same prompt. But it can be as\n    simple as having people type or talk back to a chatbot or virtual\n    assistant, correcting its output. Generation, evaluation, more\n    tuning Developers and users continually assess the outputs of\n    their generative AI apps, and further tune the model even as often\n    as once a week for greater accuracy or relevance. (In contrast,\n    the foundation model itself is updated much less frequently,\n    perhaps every year or 18 months.) Another option for improving a\n    gen AI app's performance is retrieval augmented generation (RAG).\n    RAG is a framework for extending the foundation model to use\n    relevant sources outside of the training data, to supplement and\n    refine the parameters or representations in the original model.\n    RAG can ensure that a generative AI app always has access to the\n    most current information. As a bonus, the additional sources\n    accessed via RAG are transparent to users in a way that the\n    knowledge in the original foundation model is not. Generative AI\n    model architectures and how they have evolved Truly generative AI\n    models deep learning models that can autonomously create content\n    on demand have evolved over the last dozen years or so. The\n    milestone model architectures during that period include\n    Variational autoencoders (VAEs) , which drove breakthroughs in\n    image recognition, natural language processing and anomaly\n    detection. Generative adversarial networks (GANs) and diffusion\n    models , which improved the accuracy of previous applications and\n    enabled some of the first AI solutions for photo-realistic image\n    generation. Transformers , the deep learning model architecture\n    behind the foremost foundation models and generative AI solutions\n    today. Variational autoencoders (VAEs) An autoencoder is a deep\n    learning model comprising two connected neural networks: One that\n    encodes (or compresses) a huge amount of unstructured, unlabeled\n    training data into parameters, and another that decodes those\n    parameters to reconstruct the content. Technically, autoencoders\n    can generate new content, but they’re more useful for compressing\n    data for storage or transfer, and decompressing it for use, than\n    they are for high-quality content generation. Introduced in 2013,\n    variational autoencoders (VAEs) can encode data like an\n    autoencoder, but decode multiple new variations of the content .\n    By training a VAE to generate variations toward a particular goal,\n    it can ‘zero in’ on more accurate, higher-fidelity content over\n    time. Early VAE applications included anomaly detection (e.g.,\n    medical image analysis) and natural language generation.\n    Generative adversarial networks (GANs) GANs, introduced in 2014,\n    also comprise two neural networks: A generator, which generates\n    new content, and a discriminator, which evaluates the accuracy and\n    quality the generated data. These adversarial algorithms\n    encourages the model to generate increasingly high-quality\n    outpits. GANs are commonly used for image and video generation,\n    but can generate high-quality, realistic content across various\n    domains. They've proven particularly successful at tasks as style\n    transfer (altering the style of an image from, say, a photo to a\n    pencil sketch) and data augmentation (creating new, synthetic data\n    to increase the size and diversity of a training data set).\n    Diffusion models Also introduced in 2014, diffusion models work by\n    first adding noise to the training data until it’s random and\n    unrecognizable, and then training the algorithm to iteratively\n    diffuse the noise to reveal a desired output. Diffusion models\n    take more time to train than VAEs or GANs, but ultimately offer\n    finer-grained control over output, particularly for high-quality\n    image generation tool. DALL-E, Open AI’s image-generation tool, is\n    driven by a diffusion model. Transformers First documented in a\n    2017 paper published by Ashish Vaswani and others, transformers\n    evolve the encoder-decoder paradigm to enable a big step forward\n    in the way foundation models are trained, and in the quality and\n    range of content they can produce. These models are at the core of\n    most of today’s headline-making generative AI tools, including\n    ChatGPT and GPT-4, Copilot, BERT, Bard, and Midjourney to name a\n    few. Transformers use a concept called attention, determining and\n    focusing on what’s most important about data within a sequence to;\n    process entire sequences of data e.g., sentences instead of\n    individual words simultaneously; capture the context of the data\n    within the sequence; encode the training data into embeddings\n    (also called hyperparameters ) that represent the data and its\n    context. In addition to enabling faster training, transformers\n    excel at natural language processing (NLP) and natural language\n    understanding (NLU), and can generate longer sequences of data\n    e.g., not just answers to questions, but poems, articles or papers\n    with greater accuracy and higher quality than other deep\n    generative AI models. Transformer models can also be trained or\n    tuned to use tools e.g., a spreadsheet application, HTML, a\n    drawing program to output content in a particular format. What\n    generative AI can create Generative AI can create many types of\n    content across many different domains. Text Generative models.\n    especially those based on transformers, can generate coherent,\n    contextually relevant text, everything from instructions and\n    documentation to brochures, emails, web site copy, blogs,\n    articles, reports, papers, and even creative writing. They can\n    also perform repetitive or tedious writing tasks (e.g., such as\n    drafting summaries of documents or meta descriptions of web\n    pages), freeing writers’ time for more creative, higher-value\n    work. Images and video Image generation such as DALL-E, Midjourney\n    and Stable Diffusion can create realistic images or original art,\n    and can perform style transfer, image-to-image translation and\n    other image editing or image enhancement tasks. Emerging gen AI\n    video tools can create animations from text prompts, and can apply\n    special effects to existing video more quickly and cost-\n    effectively than other methods. Sound, speech and music Generative\n    models can synthesize natural-sounding speech and audio content\n    for voice-enabled AI chatbots and digital assistants, audiobook\n    narration and other applications. The same technology can generate\n    original music that mimics the structure and sound of professional\n    compositions. Software code Gen AI can generate original code,\n    autocomplete code snippets, translate between programming\n    languages and summarize code functionality. It enables developers\n    to quickly prototype, refactor, and debug applications while\n    offering a natural language interface for coding tasks. Design and\n    art Generative AI models can generate unique works of art and\n    design, or assist in graphic design. Applications include dynamic\n    generation of environments, characters or avatars, and special\n    effects for virtual simulations and video games. Simulations and\n    synthetic data Generative AI models can be trained to generate\n    synthetic data , or synthetic structures based on real or\n    synthetic data. For example, generative AI is applied in drug\n    discovery to generate molecular structures with desired\n    properties, aiding in the design of new pharmaceutical compounds.\n    Industry newsletter The latest AI trends, brought to you by\n    experts Get curated insights on the most important—and\n    intriguing—AI news. Subscribe to our weekly Think newsletter. See\n    the IBM Privacy Statement . Thank you! You are subscribed. Your\n    subscription will be delivered in English. You will find an\n    unsubscribe link in every newsletter. You can manage your\n    subscriptions or unsubscribe here . Refer to our IBM Privacy\n    Statement for more informa\n\n------------------------------------------------------------\n\n\n---------- ✨ RESULT #2 ✨ ----------------------------------------\n\n  📌 Title: Generative artificial intelligence - Wikipedia\n  🌐 URL  : https://en.wikipedia.org/wiki/Generative_artificial_intelligence\n\n  📊 Detailed Analysis (General Web Page):\n    - Url: https://en.wikipedia.org/wiki/Generative_artificial_intelligence\n    - Title: Generative artificial intelligence - Wikipedia\n    - Meta Description: No Meta Description\n    - Main Heading: Generative artificial intelligence\n    - Summary Text:  Generative artificial intelligence(Generative AI,GenAI,[1]orGAI) is a subfield ofartificial intelligencethat usesgenerative modelsto produce text, images, videos, or other forms of data.[2][3][4]These modelslearnthe underlying patterns and structures of theirtraining dataand use them to produce new data[5][6]based on the input, which often comes in the form of natural languageprompts.[7][8] Generative AI tools have become more common since theAI boomin the 2020s. This boom was made possible by improvements intransformer-baseddeepneural networks, particularlylarge language models(LLMs). Major tools includechatbotssuch asChatGPT,Copilot,Gemini,Claude,Grok, andDeepSeek;text-to-imagemodels such asStable Diffusion,Midjourney, andDALL-E; andtext-to-videomodels such asVeoandSora.[9][10][11][12]Technology companies developing generative AI includeOpenAI,Anthropic,Meta AI,Microsoft,Google,DeepSeek, andBaidu.[7][13][14] Generative AI has raised many ethical questions as it can be used forcyb...\n    - Links: (Complex Data - See raw content)\n    - Keywords: description, artificial, wikipedia, meta, generative, intelligence\n    - Author: Contributors to Wikimedia projects\n    - Published Date: Not specified\n    - Structured Data: (Complex Data - See raw content)\n\n  📄 Raw Content Excerpt:\n    Generative artificial intelligence - Wikipedia Jump to content\n    From Wikipedia, the free encyclopedia Subset of AI using\n    generative models Not to be confused with Artificial general\n    intelligence . Théâtre D'opéra Spatial (2022), an image made using\n    generative AI Part of a series on Artificial intelligence (AI)\n    Major goals Artificial general intelligence Intelligent agent\n    Recursive self-improvement Planning Computer vision General game\n    playing Knowledge representation Natural language processing\n    Robotics AI safety Approaches Machine learning Symbolic Deep\n    learning Bayesian networks Evolutionary algorithms Hybrid\n    intelligent systems Systems integration Applications\n    Bioinformatics Deepfake Earth sciences Finance Generative AI Art\n    Audio Music Government Healthcare Mental health Industry Software\n    development Translation Military Physics Projects Philosophy\n    Artificial consciousness Chinese room Friendly AI Control problem\n    / Takeover Ethics Existential risk Turing test Uncanny valley\n    History Timeline Progress AI winter AI boom Glossary Glossary v t\n    e Generative artificial intelligence ( Generative AI , GenAI , [ 1\n    ] or GAI ) is a subfield of artificial intelligence that uses\n    generative models to produce text, images, videos, or other forms\n    of data. [ 2 ] [ 3 ] [ 4 ] These models learn the underlying\n    patterns and structures of their training data and use them to\n    produce new data [ 5 ] [ 6 ] based on the input, which often comes\n    in the form of natural language prompts . [ 7 ] [ 8 ] Generative\n    AI tools have become more common since the AI boom in the 2020s.\n    This boom was made possible by improvements in transformer -based\n    deep neural networks , particularly large language models (LLMs).\n    Major tools include chatbots such as ChatGPT , Copilot , Gemini ,\n    Claude , Grok , and DeepSeek ; text-to-image models such as Stable\n    Diffusion , Midjourney , and DALL-E ; and text-to-video models\n    such as Veo and Sora . [ 9 ] [ 10 ] [ 11 ] [ 12 ] Technology\n    companies developing generative AI include OpenAI , Anthropic ,\n    Meta AI , Microsoft , Google , DeepSeek , and Baidu . [ 7 ] [ 13 ]\n    [ 14 ] Generative AI has raised many ethical questions as it can\n    be used for cybercrime , or to deceive or manipulate people\n    through fake news or deepfakes . [ 15 ] Even if used ethically, it\n    may lead to mass replacement of human jobs . [ 16 ] The tools\n    themselves have been criticized as violating intellectual property\n    laws, since they are trained on copyrighted works. [ 17 ]\n    Generative AI is used across many industries. Examples include\n    software development, [ 18 ] healthcare, [ 19 ] finance, [ 20 ]\n    entertainment, [ 21 ] customer service, [ 22 ] sales and\n    marketing, [ 23 ] art, writing, [ 24 ] fashion, [ 25 ] and product\n    design. [ 26 ] History [ edit ] Main article: History of\n    artificial intelligence Early history [ edit ] The first example\n    of an algorithmically generated media is likely the Markov chain .\n    Markov chains have long been used to model natural languages since\n    their development by Russian mathematician Andrey Markov in the\n    early 20th century. Markov published his first paper on the topic\n    in 1906, [ 27 ] [ 28 ] and analyzed the pattern of vowels and\n    consonants in the novel Eugeny Onegin using Markov chains. Once a\n    Markov chain is trained on a text corpus , it can then be used as\n    a probabilistic text generator. [ 29 ] [ 30 ] Computers were\n    needed to go beyond Markov chains. By the early 1970s, Harold\n    Cohen was creating and exhibiting generative AI works created by\n    AARON , the computer program Cohen created to generate paintings.\n    [ 31 ] The terms generative AI planning or generative planning\n    were used in the 1980s and 1990s to refer to AI planning systems,\n    especially computer-aided process planning , used to generate\n    sequences of actions to reach a specified goal. [ 32 ] [ 33 ]\n    Generative AI planning systems used symbolic AI methods such as\n    state space search and constraint satisfaction and were a\n    \"relatively mature\" technology by the early 1990s. They were used\n    to generate crisis action plans for military use, [ 34 ] process\n    plans for manufacturing [ 32 ] and decision plans such as in\n    prototype autonomous spacecraft. [ 35 ] Generative neural networks\n    (2014–2019) [ edit ] See also: Machine learning and deep learning\n    Above: An image classifier , an example of a neural network\n    trained with a discriminative objective. Below: A text-to-image\n    model , an example of a network trained with a generative\n    objective. Since inception, the field of machine learning has used\n    both discriminative models and generative models to model and\n    predict data. Beginning in the late 2000s, the emergence of deep\n    learning drove progress, and research in image classification ,\n    speech recognition , natural language processing and other tasks.\n    Neural networks in this era were typically trained as\n    discriminative models due to the difficulty of generative\n    modeling. [ 36 ] In 2014, advancements such as the variational\n    autoencoder and generative adversarial network produced the first\n    practical deep neural networks capable of learning generative\n    models, as opposed to discriminative ones, for complex data such\n    as images. These deep generative models were the first to output\n    not only class labels for images but also entire images. In 2017,\n    the Transformer network enabled advancements in generative models\n    compared to older Long-Short Term Memory models, [ 37 ] leading to\n    the first generative pre-trained transformer (GPT), known as GPT-1\n    , in 2018. [ 38 ] This was followed in 2019 by GPT-2 , which\n    demonstrated the ability to generalize unsupervised to many\n    different tasks as a Foundation model . [ 39 ] The new generative\n    models introduced during this period allowed for large neural\n    networks to be trained using unsupervised learning or semi-\n    supervised learning , rather than the supervised learning typical\n    of discriminative models. Unsupervised learning removed the need\n    for humans to manually label data , allowing for larger networks\n    to be trained. [ 40 ] Generative AI boom (2020–) [ edit ] Main\n    article: AI boom AI generated images have become much more\n    advanced. In March 2020, the release of 15.ai , a free web\n    application created by an anonymous MIT researcher that could\n    generate convincing character voices using minimal training data,\n    marked one of the earliest popular use cases of generative AI. [\n    41 ] The platform is credited as the first mainstream service to\n    popularize AI voice cloning ( audio deepfakes ) in memes and\n    content creation , influencing subsequent developments in voice AI\n    technology . [ 42 ] [ 43 ] In 2021, the emergence of DALL-E , a\n    transformer -based pixel generative model, marked an advance in\n    AI-generated imagery. [ 44 ] This was followed by the releases of\n    Midjourney and Stable Diffusion in 2022, which further\n    democratized access to high-quality artificial intelligence art\n    creation from natural language prompts . [ 45 ] These systems\n    demonstrated unprecedented capabilities in generating\n    photorealistic images, artwork, and designs based on text\n    descriptions, leading to widespread adoption among artists,\n    designers, and the general public. In late 2022, the public\n    release of ChatGPT revolutionized the accessibility and\n    application of generative AI for general-purpose text-based tasks.\n    [ 46 ] The system's ability to engage in natural conversations ,\n    generate creative content , assist with coding, and perform\n    various analytical tasks captured global attention and sparked\n    widespread discussion about AI's potential impact on work ,\n    education , and creativity . [ 47 ] In March 2023, GPT-4 's\n    release represented another jump in generative AI capabilities. A\n    team from Microsoft Research controversially argued that it \"could\n    reasonably be viewed as an early (yet still incomplete) version of\n    an artificial general intelligence (AGI) system.\" [ 48 ] However,\n    this assessment was contested by other scholars who maintained\n    that generative AI remained \"still far from reaching the benchmark\n    of 'general human intelligence'\" as of 2023. [ 49 ] Later in 2023,\n    Meta released ImageBind , an AI model combining multiple\n    modalities including text, images, video, thermal data, 3D data,\n    audio, and motion, paving the way for more immersive generative AI\n    applications. [ 50 ] In December 2023, Google unveiled Gemini , a\n    multimodal AI model available in four versions: Ultra, Pro, Flash,\n    and Nano. [ 51 ] The company integrated Gemini Pro into its Bard\n    chatbot and announced plans for \"Bard Advanced\" powered by the\n    larger Gemini Ultra model. [ 52 ] In February 2024, Google unified\n    Bard and Duet AI under the Gemini brand, launching a mobile app on\n    Android and integrating the service into the Google app on iOS . [\n    53 ] In March 2024, Anthropic released the Claude 3 family of\n    large language models, including Claude 3 Haiku, Sonnet, and Opus.\n    [ 54 ] The models demonstrated significant improvements in\n    capabilities across various benchmarks, with Claude 3 Opus notably\n    outperforming leading models from OpenAI and Google. [ 55 ] In\n    June 2024, Anthropic released Claude 3.5 Sonnet, which\n    demonstrated improved performance compared to the larger Claude 3\n    Opus, particularly in areas such as coding, multistep workflows,\n    and image analysis. [ 56 ] Private investment in AI (pink) and\n    generative AI (green). Asia–Pacific countries are significantly\n    more optimistic than Western societies about generative AI and\n    show higher adoption rates. Despite expressing concerns about\n    privacy and the pace of change, in a 2024 survey, 68% of Asia-\n    Pacific respondents believed that AI was having a positive impact\n    on the world, compared to 57% globally. [ 57 ] According to a\n    survey by SAS and Coleman Parkes Research, China in particular has\n    emerged as a global leader in generative AI adoption, with 83% of\n    Chinese respondents using the technology, exceeding both the\n    global average of 54% and the U.S. rate of 65%. This leadership is\n    further evidenced by China's intellectual property developments in\n    the field, with a UN report revealing that Chinese entities filed\n    over 38,000 generative AI patents from 2014 to 2023, substantially\n    surpassing the United States in patent applications. [ 58 ] A 2024\n    survey on the Chinese social app Soul reported that 18% of\n    respondents born after 2000 used generative AI \"almost every day\",\n    and that over 60% of respondents like or love AI-generated\n    content, while less than 3% dislike or hate it. [ 59 ]\n    Applications [ edit ] Notable types of generative AI models\n    include generative pre-trained transformers (GPTs), generative\n    adversarial networks (GANs), and variational autoencoders (VAEs).\n    Generative AI systems are multimodal if they can process multiple\n    types of inputs or generate multiple types of outputs. [ 60 ] For\n    example, GPT-4o can both process and generate text, images and\n    audio. [ 61 ] Generative AI has made its appearance in a wide\n    variety of industries, radically changing the dynamics of content\n    creation, analysis, and delivery. In healthcare, [ 62 ] generative\n    AI is instrumental in accelerating drug discovery by creating\n    molecular structures with target characteristics [ 63 ] and\n    generating radiology images for training diagnostic models. This\n    extraordinary ability not only enables faster and cheaper\n    development but also enhances medical decision-making. In finance,\n    generative AI is invaluable as it generates datasets to train\n    models and automates report generation with natural language\n    summarization capabilities. It automates content creation,\n    produces synthetic financial data, and tailors customer\n    communications. It also powers chatbots and virtual agents.\n    Collectively, these technologies enhance efficiency, reduce\n    operational costs, and support data-driven decision-making in\n    financial institutions. [ 64 ] The media industry makes use of\n    generative AI for numerous creative activities such as music\n    composition, scriptwriting, video editing, and digital art. The\n    educational sector is impacted as well, since the tools make\n    learning personalized through creating quizzes, study aids, and\n    essay composition. Both the teachers and the learners benefit from\n    AI-based platforms that suit various learning patterns. [ 65 ]\n    Text and software code [ edit ] Main article: Large language model\n    See also: Code completion , Autocomplete , and Vibe coding Jung\n    believed that the shadow self is not entirely evil or bad, but\n    rather a potential source of creativity and growth. He argued that\n    by embracing, rather than ignoring, our shadow self, we can\n    achieve a deeper understand\n\n------------------------------------------------------------\n\n\n---------- ✨ RESULT #3 ✨ ----------------------------------------\n\n  📌 Title: What is Generative AI? - GeeksforGeeks\n  🌐 URL  : https://www.geeksforgeeks.org/artificial-intelligence/what-is-generative-ai/\n\n  📊 Detailed Analysis (General Web Page):\n    - Url: https://www.geeksforgeeks.org/artificial-intelligence/what-is-generative-ai/\n    - Title: What is Generative AI? - GeeksforGeeks\n    - Meta Description: Your All-in-One Learning Portal: GeeksforGeeks is a comprehensive educational platform that empowers learners across domains-spanning computer science and programming, school education, upskilling, commerce, software tools, competitive exams, and more.\n    - Main Heading: What is Generative AI?\n    - Summary Text: Generative artificial intelligence, often called generative AI or gen AI, is a type of AI that can create new content like conversations, stories, images, videos, and music. It can learn about different topics such as languages, programming, art, science, and more, and use this knowledge to solve new problems. For example: It can learn about popular design styles and create a unique logo for a brand or an organisation. Businesses can use generative AI in many ways, like building chatbots, creating media, designing products, and coming up with new ideas. Generative AI has come a long way from its early beginnings. Here's how it has evolved over time, step by step: Generative AI is versatile, with different models designed for specific tasks. Here are some types:\n    - Links: (Complex Data - See raw content)\n    - Keywords: Generative AI, machine learning, deep learning, Generative Adversarial Networks, Large Language Models, multimodal generative AI, text-to-image generation, image-to-image transformation, speech-to-text technology, text-to-video models, creative content generation, personalized marketing campaigns, ethical concerns in AI, AI-powered design tools\n    - Author: GeeksforGeeks\n    - Published Date: 2023-08-16 12:11:46+00:00\n    - Structured Data: (Complex Data - See raw content)\n\n  📄 Raw Content Excerpt:\n    What is Generative AI? - GeeksforGeeks Data Science Data Science\n    Projects Data Analysis Data Visualization Machine Learning ML\n    Projects Deep Learning NLP Computer Vision Artificial Intelligence\n    Open In App Next Article: Generative Adversarial Network (GAN)\n    What is Generative AI? Last Updated : 23 Jan, 2025 Summarize\n    Comments Improve Suggest changes Share Like Article Like Report\n    Generative artificial intelligence, often called generative AI or\n    gen AI, is a type of AI that can create new content like\n    conversations, stories, images, videos, and music. It can learn\n    about different topics such as languages, programming, art,\n    science, and more, and use this knowledge to solve new problems.\n    For example: It can learn about popular design styles and create a\n    unique logo for a brand or an organisation. Businesses can use\n    generative AI in many ways, like building chatbots, creating\n    media, designing products, and coming up with new ideas. Evolution\n    of Generative AI Generative AI has come a long way from its early\n    beginnings. Here's how it has evolved over time, step by step: 1.\n    The Early Days: Rule-Based Systems AI systems followed strict\n    rules written by humans to produce results. These systems could\n    only do what they were programmed for and couldn't learn or adapt.\n    For example, a program could create simple shapes but couldn’t\n    draw something creative like a landscape. 2. Introduction of\n    Machine Learning (1990s-2000s) AI started using machine learning,\n    which allowed it to learn from data instead of just following\n    rules. The AI was fed large datasets (e.g., pictures of animals),\n    and it learned to identify patterns and make predictions. Example:\n    AI could now recognize a dog in a picture, but it still couldn’t\n    create a picture of a dog on its own. 3. The Rise of Deep Learning\n    (2010s) Deep learning improved AI significantly by using neural\n    networks, which mimic how the human brain works. AI could now\n    process much more complex data, like thousands of photos, and\n    start generating new content. Example: AI could now create a\n    realistic drawing of a dog by learning from millions of dog\n    photos. 4. Generative Adversarial Networks (2014) GANs, introduced\n    in 2014, use two AI systems that work together: one generates new\n    content, and the other checks if it looks real. This made\n    generative AI much better at creating realistic images, videos,\n    and sounds. Example: GANs can create life like images of people\n    who don’t exist or filters (used in apps like FaceApp or Snapchat\n    ). 5. Large Language Models (LLMs) and Beyond (2020s) Models like\n    GPT-3 and GPT-4 can understand and generate human-like text. They\n    are trained on massive amounts of data from books, websites, and\n    other sources. AI can now hold conversations, write essays,\n    generate code, and much more. Example: ChatGPT can help you draft\n    an email, write a poem, or even solve problems. 6. Multimodal\n    Generative AI (Present) New AI models can handle multiple types of\n    data at once—text, images, audio, and video. This allows AI to\n    create content that combines different formats. Example: AI can\n    take a written description and turn it into an animated video or a\n    song with the help of different models integrating together. Types\n    of Generative AI Models Generative AI is versatile, with different\n    models designed for specific tasks. Here are some types: Text-to-\n    Text : These models generate meaningful and coherent text based on\n    input text. They are widely used for tasks like drafting emails,\n    summarizing lengthy documents, translating languages, or even\n    writing creative content. Tools like ChatGPT is brilliant at\n    understanding context and producing human-like responses. Text-to-\n    Image : This involves generating realistic images from descriptive\n    text. For Example, tools like DALL-E 2 can create a custom digital\n    image based on prompts such as \"A peaceful beach with palm trees\n    during a beautiful sunset,\" offering endless possibilities for\n    designers, artists, and marketers. Image-to-Image : These models\n    enhance or transform images based on input image . For example,\n    they can convert a daytime photo into a night time scene, apply\n    artistic filters, or refine low-resolution images into high-\n    quality visuals. Image-to-Text : AI tools analyze and describe the\n    content of images in text form. This technology is especially\n    beneficial for accessibility, helping visually impaired\n    individuals understand visual content through detailed captions.\n    Speech-to-Text : This application converts spoken words into\n    written text. It powers virtual assistants like Siri,\n    transcription software, and automated subtitles, making it a vital\n    tool for communication, accessibility, and documentation. Text-to-\n    Audio : Generative AI can create music, sound effects, or audio\n    narrations from textual prompts. This empowers creators to explore\n    new soundscapes and compose unique auditory experiences tailored\n    to specific themes or moods. Text-to-Video : These models allow\n    users to generate video content by describing their ideas in text.\n    For example, a marketer could input a vision for a promotional\n    video, and the AI generates visuals and animations, streamlining\n    content creation. Multimodal AI : These systems integrate multiple\n    input and output formats, like text, images, and audio, into a\n    unified interface. For instance, an educational platform could let\n    students ask questions via text and receive answers as interactive\n    visuals or audio explanations, enhancing learning experiences.\n    Relationship Between Humans and Generative AI In today’s world,\n    Generative AI has become a trusted best friend for humans, working\n    alongside us to achieve incredible things. Imagine a painter\n    creating a masterpiece, while they focus on the vision, Generative\n    AI acts as their assistant, mixing colors, suggesting designs, or\n    even sketching ideas. The painter remains in control, but the AI\n    makes the process faster and more exciting. This partnership is\n    like having a friend who’s always ready to help. A writer stuck on\n    the opening line of a story can turn to Generative AI for\n    suggestions that spark creativity. A business owner without design\n    skills can rely on AI to draft a sleek website or marketing\n    materials. Even students can use AI to better understand complex\n    topics by generating easy-to-grasp explanations or visual aids.\n    Generative AI is not here to replace humans but to empower them.\n    It takes on repetitive tasks, offers endless possibilities, and\n    helps people achieve results they might not have imagined alone.\n    At the same time, humans bring their intuition, creativity, and\n    ethical judgment, ensuring the AI’s contributions are meaningful\n    and responsible. In this era, Generative AI truly feels like a\n    best friend—always there to support, enhance, and inspire us while\n    letting us stay in charge. Together, humans and AI make an\n    unbeatable team, achieving more than ever before. Generative AI Vs\n    AI Criteria Generative AI Artificial Intelligence Purpose It is\n    designed to produce new content or data Designed for a wide range\n    of tasks but not limited to generation Application Art creation,\n    text generation, video synthesis, and so on Data analysis,\n    predictions, automation, robotics, etc Learning Uses Unsupervised\n    learning or reinforcement learning Can use supervised, semi-\n    supervised, or reinforcement Outcome New or original output is\n    created Can produce an answer and make a decision, classify, data,\n    etc. Complexity It requires a complex model like GANs It has\n    ranged from simple linear regression to complex neural networks\n    Data Requirement Required a large amount of data to produce\n    results of high-quality data Data requirements may vary; some need\n    little data, and some need vast amounts Interactivity Can be\n    interactive, responding to user input Might not always be\n    interactive, depending on the application Benefits of Generative\n    AI Generative AI offers innovative tools that enhance creativity,\n    efficiency, and personalization across various fields. Enhances\n    Creativity : Generative AI enables the creation of original\n    content like images, music, and text, helping artists, designers,\n    and writers explore fresh ideas. It bridges the gap between human\n    creativity and machine-generated innovation, making the creative\n    process more dynamic. Accelerates Research and Development : In\n    fields like science and technology, Generative AI reduces the time\n    needed for research by generating multiple outcomes and\n    predictions, such as molecular structures in drug development.\n    This speeds up innovation and helps solve complex problems\n    efficiently. Improves Personalization : Generative AI creates\n    tailored content based on user preferences. From personalized\n    product designs to customized marketing campaigns, it enhances\n    user engagement and satisfaction by delivering exactly what users\n    need or want. Empowers Non-Experts : Even users without expertise\n    can create high-quality content using Generative AI. This helps\n    individuals learn new skills, access creative tools, and open\n    doors to personal and professional growth. Drives Economic Growth\n    : Generative AI introduces new roles and opportunities by\n    fostering innovation, automating tasks, and enhancing\n    productivity. This leads to economic expansion and the creation of\n    jobs in emerging fields. Limitations of Generative AI While\n    Generative AI offers many benefits, it also comes with certain\n    limitations that need to be addressed Data Dependence : The\n    accuracy and quality of Generative AI outputs depend entirely on\n    the data it is trained on. If the training data is biased,\n    incomplete, or inaccurate, the generated content will reflect\n    these flaws. Limited Control Over Outputs : Generative AI can\n    produce unexpected or irrelevant results, making it challenging to\n    control the content and ensure it aligns with specific user\n    requirements. High Computational Requirements : Training and\n    running Generative AI models demand significant computing power,\n    which can be costly and resource-intensive. This limits\n    accessibility for smaller organizations or individuals. Ethical\n    and Legal Concerns : Generative AI can be misused to create\n    harmful content, like deepfakes or fake news, which can spread\n    misinformation or violate privacy. These ethical and legal\n    challenges require careful regulation and oversight to prevent\n    abuse. Q1. Is generative AI replacing jobs? Generative AI isn’t\n    about replacing jobs but transforming them. It automates\n    repetitive tasks, allowing people to focus on more creative and\n    strategic aspects of their work. For example, content writers can\n    use AI for inspiration or to speed up first drafts, while\n    designers can use it to generate quick mockups. Q2. How does\n    Generative AI work? Generative AI works by teaching computer\n    programs (like GPT-3 or GANs) from lots of examples. These\n    programs learn how things are usually done from the data they\n    study. Then, they can use this knowledge to create new stuff when\n    given a starting point or a request. Q3. What are common use cases\n    for Generative AI? Generative AI has a wide range of applications,\n    including content generation, language translation, chatbots,\n    image and video creation, data augmentation, and personalized\n    marketing. It can also be used in artistic creation, medical image\n    generation, and more. Q4. Is Generative AI different from other AI\n    types? Yes, Generative AI is different from other AI types, like\n    classification or regression models. While those models make\n    predictions or classify data, generative models focus on creating\n    new, original data based on the patterns they’ve learned. They are\n    versatile and used for creative tasks. Q5. How can I get started\n    with generative AI? You can start by exploring tools and platforms\n    like ChatGPT for text generation, DALL-E for image generation, or\n    similar tools for your needs. Many platforms also provide APIs,\n    allowing developers to integrate AI capabilities into their own\n    applications. Learning basic prompt engineering can also help you\n    get the most out of these tools. Next Article Generative\n    Adversarial Network (GAN) A anushka_jain_gfg Improve Article Tags\n    : Artificial Intelligence AI-ML-DS Generative AI Similar Reads\n    Artificial Intelligence Tutorial | AI Tutorial Artificial\n    Intelligence (AI) refers to the simulation of human intelligence\n    in machines which helps in allowing them to think and act like\n    humans. It involves creating algorithms and systems that can\n    perform tasks which requiring human abilities such as visual\n    perception, speech recognition, decisio 5 min read Introduction to\n    AI What is Artificial I\n\n------------------------------------------------------------\n\n\n======================================================================\n                   ✨ Scraping Process Completed ✨\n======================================================================\n```\n\u003c/details\u003e\n\n## Cloudflare Worker Version\n\nA serverless scraper implemented as a Cloudflare Worker that leverages Jina AI for search and Groq LLM for content analysis. Rotate multiple API keys via GetPantry.\n\nSee `Cloudflare worker based jina ai \u0026 groq scraper/README.md` for full details.\n\n## Requirements\n\n- Python 3.7+\n- `requests`\n- `beautifulsoup4`\n- `lxml`\n\n## License\n\nMIT License\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xarchit%2Fduckduckgo-webscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0xarchit%2Fduckduckgo-webscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xarchit%2Fduckduckgo-webscraper/lists"}