{"id":31782072,"url":"https://github.com/denesepro/question-extractor","last_synced_at":"2025-10-10T09:14:29.724Z","repository":{"id":316119178,"uuid":"1062046976","full_name":"Denesepro/question-extractor","owner":"Denesepro","description":" An end-to-end automation tool to extract quiz questions from PDF files using Gemini AI and automatically upload them to biazmoon.com with Selenium.","archived":false,"fork":false,"pushed_at":"2025-09-22T18:44:10.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-22T20:38:59.648Z","etag":null,"topics":["automation","gemini-api","pdf-processing","pdf-to-json","python","question-extractor","quiz-automation","selenium","web-automation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Denesepro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-22T18:25:38.000Z","updated_at":"2025-09-22T18:51:39.000Z","dependencies_parsed_at":"2025-09-22T20:39:16.840Z","dependency_job_id":null,"html_url":"https://github.com/Denesepro/question-extractor","commit_stats":null,"previous_names":["denesepro/question-extractor"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Denesepro/question-extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Denesepro%2Fquestion-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Denesepro%2Fquestion-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Denesepro%2Fquestion-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Denesepro%2Fquestion-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Denesepro","download_url":"https://codeload.github.com/Denesepro/question-extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Denesepro%2Fquestion-extractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003388,"owners_count":26083579,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","gemini-api","pdf-processing","pdf-to-json","python","question-extractor","quiz-automation","selenium","web-automation"],"created_at":"2025-10-10T09:14:21.433Z","updated_at":"2025-10-10T09:14:29.716Z","avatar_url":"https://github.com/Denesepro.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF to Web Form: Automated Question Extractor and Uploader\n\nThis project provides a complete, two-stage solution for automating the process of transferring multiple-choice quizzes from a PDF file into a web-based platform.\n\n1.  **AI-Powered Extraction (`extractor.py`):** A Python script that leverages the **Gemini 1.5 Flash** multimodal AI to analyze images of PDF pages. It intelligently extracts all questions, options, and correct answers, saving them into a structured `JSON` file.\n2.  **Web Automation (`automator.py`):** A second Python script using **Selenium** that reads the generated `JSON` file. It then automatically logs into a target website (e.g., biazmoon.com), navigates to the question creation form, and systematically enters and submits each question.\n\n---\n\n## ✨ Key Features\n\n* **Accurate PDF to JSON Extraction**: Directly converts quiz questions from PDF files into a clean, structured `JSON` format.\n* **Powered by Gemini AI**: Utilizes the powerful `gemini-1.5-flash` model for high-accuracy visual content analysis.\n* **Flexible Answer Key Processing**: Can extract correct answers from either a dedicated answer key page or by detecting **bolded** text within the options.\n* **Advanced Text Post-Processing**: Automatically cleans the extracted text, correcting common punctuation and spacing errors (e.g., for Persian ZWNJ).\n* **End-to-End Web Automation**: Handles the entire web workflow, from logging in to filling out and submitting forms.\n* **Multi-Tag Support**: Allows for a predefined list of tags to be automatically added to each question on the website.\n* **Robust Error Handling**: Implements smart waits and error management to ensure the scripts run stably.\n\n---\n\n## ⚙️ How It Works\n\nThe project follows a simple, two-script workflow:\n\n1.  **Initial Input**: A quiz file, `Test.pdf`.\n    `⬇️`\n2.  **Script 1: `extractor.py`**:\n    * Converts the PDF into a series of high-resolution images.\n    * Sends each image to the **Gemini API** for analysis.\n    * Receives and processes the structured data.\n    * Saves the output to `questions.json`.\n    `⬇️`\n3.  **Intermediate File**: `questions.json`.\n    `⬇️`\n4.  **Script 2: `automator.py`**:\n    * Reads and parses `questions.json`.\n    * Launches a browser with **Selenium** and logs into the target website.\n    * Navigates to the \"Create Question\" page.\n    * Loops through each question, populating the web form and submitting it.\n`⬇️`\n5.  **Final Result**: All questions are successfully uploaded to the website.\n\n---\n\n## 📦 Prerequisites \u0026 Installation\n\nTo run this project, you will need the following:\n\n1.  **Python 3.7+**\n2.  **Poppler**: The `pdf2image` library requires this utility. Download it and add its `bin` directory to your system's `PATH`.\n    * [Download Poppler for Windows](https://github.com/oschwartz10612/poppler-windows/releases/)\n3.  **Python Libraries**: Install the necessary packages using pip:\n    ```bash\n    pip install google-generativeai selenium pdf2image Pillow\n    ```\n4.  **Google Chrome** and a compatible **ChromeDriver**. (Note: Modern versions of Selenium can manage ChromeDriver automatically).\n\n---\n\n## 🔧 Configuration\n\nBefore running the scripts, you must configure the following settings:\n\n#### In `extractor.py`:\n\n* `API_KEY`: Set your Google AI Studio API key.\n    ```python\n    API_KEY = \"YOUR_GOOGLE_AI_API_KEY\"\n    ```\n\n#### In `automator.py`:\n\n* **Login Credentials**: Enter your username and password for the target website.\n    ```python\n    YOUR_USERNAME = \"your_email@example.com\"\n    YOUR_PASSWORD = \"your_password\"\n    ```\n    \u003e **⚠️ Security Warning**: Never commit this file with your real credentials to a public GitHub repository.\n\n* **URLs and Settings**: Adjust the `LOGIN_URL`, `CREATE_QUESTION_URL`, `TAGS_TO_ADD`, and `QQQ_SESSION_NUMBER` variables to match your specific needs.\n\n---\n\n## 🚀 Usage Guide\n\n1.  **Clone the Repository**:\n    ```bash\n    git clone [https://github.com/your-username/your-repo-name.git](https://github.com/your-username/your-repo-name.git)\n    cd your-repo-name\n    ```\n2.  **Install Prerequisites**: Follow the installation guide above to set up your environment.\n3.  **Configure Scripts**: Edit the Python files to set your API key and user credentials.\n4.  **Place PDF**: Put your quiz PDF file in the main project directory.\n5.  **Run the Extractor Script**:\n    ```bash\n    python extractor.py\n    ```\n    The script will prompt you for the PDF filename, the total number of questions, and the answer key method. Once finished, it will generate a `_extracted_questions.json` file.\n\n6.  **Run the Automator Script**:\n    ```bash\n    python automator.py\n    ```\n    This will launch the browser, log in, and begin uploading the questions automatically.\n\n---\n\n## 📁 Project Structure\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenesepro%2Fquestion-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdenesepro%2Fquestion-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenesepro%2Fquestion-extractor/lists"}