{"id":23402622,"url":"https://github.com/martincastroalvarez/keras-document-classifier","last_synced_at":"2026-05-02T04:33:41.284Z","repository":{"id":40981121,"uuid":"199917868","full_name":"MartinCastroAlvarez/keras-document-classifier","owner":"MartinCastroAlvarez","description":"Neural Network using Keras, Google Search API and AsyncIO.","archived":false,"fork":false,"pushed_at":"2023-02-15T22:08:08.000Z","size":330,"stargazers_count":0,"open_issues_count":11,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-14T17:37:28.352Z","etag":null,"topics":["aiohttp","asyncio","google-search-using-python","keras","machine-learning","neural-networks","newspaper3k","python3","text-classification"],"latest_commit_sha":null,"homepage":"https://martincastroalvarez.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MartinCastroAlvarez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-31T19:36:04.000Z","updated_at":"2022-04-04T11:06:59.000Z","dependencies_parsed_at":"2024-12-22T12:29:54.558Z","dependency_job_id":"83baf507-ec31-4a1b-b72e-802a2bbefccc","html_url":"https://github.com/MartinCastroAlvarez/keras-document-classifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinCastroAlvarez%2Fkeras-document-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinCastroAlvarez%2Fkeras-document-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinCastroAlvarez%2Fkeras-document-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinCastroAlvarez%2Fkeras-document-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MartinCastroAlvarez","download_url":"https://codeload.github.com/MartinCastroAlvarez/keras-document-classifier/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247935891,"owners_count":21020911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aiohttp","asyncio","google-search-using-python","keras","machine-learning","neural-networks","newspaper3k","python3","text-classification"],"created_at":"2024-12-22T12:29:47.849Z","updated_at":"2026-05-02T04:33:41.231Z","avatar_url":"https://github.com/MartinCastroAlvarez.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# France\nNeural Network using Keras, Google Search API and AsyncIO.\n\n![image-alt](./france.jpg)\n\n## References\n- [Python3 Newspaper](https://pypi.org/project/newspaper3k/)\n- [Google Search API](https://github.com/abenassi/Google-Search-API)\n- [Python3 Begins](https://pypi.org/project/begins/)\n- [AioHttp](https://aiohttp.readthedocs.io/en/stable/)\n- [Common mistakes using AsyncIO](https://xinhuang.github.io/posts/2017-07-31-common-mistakes-using-python3-asyncio.html)\n- [AsyncIO Generators](https://github.com/python-trio/async_generator)\n- [AsyncIO Event Loop](https://docs.python.org/3/library/asyncio-eventloop.html)\n\n## Instructions\n\n#### Setup\nInstall this repo:\n```bash\ngit clone ssh://git@github.com/MartinCastroAlvarez/france\n```\nInstall all the dependencies:\n```bash\ncd france\nvirtualenv -p python3 .env\nsource .env/bin/activate\npip install -r requirements.txt\n```\n\n#### Generating a Dataset\nSearch for sample pages in Google:\n```bash\npython3 paris.py search --term \"Coffee Shop New Orleans\" --limit 5\npython3 paris.py search --term \"Coffee Shop\" --limit 5\npython3 paris.py search --term \"Restaurant\" --limit 5\npython3 paris.py search --term \"Italian Dinner\" --limit 5\npython3 paris.py search --term \"Food Bad Hamburguer\" --limit 5\npython3 paris.py search --term \"Indian Food Restaurant\" --limit 5\npython3 paris.py search --term \"Mexican Food Bar\" --limit 5\npython3 paris.py search --term \"Night Theatre\" --limit 5\npython3 paris.py search --term \"History Tour New Orleans\" --limit 5\npython3 paris.py search --term \"History Tour San Francisco\" --limit 5\npython3 paris.py search --term \"History Tour Los Angeles\" --limit 5\npython3 paris.py search --term \"New Orleans Organic Span\" --limit 5\npython3 paris.py search --term \"Beer Bar California\" --limit 5\npython3 paris.py search --term \"Night Bar California\" --limit 5\npython3 paris.py search --term \"Night Out Las Vegas\" --limit 5\npython3 paris.py search --term \"Cafe Las Vegas\" --limit 5\npython3 paris.py search --term \"Lounge Bar California\" --limit 5\npython3 paris.py search --term \"Rooftop Bar New York\" --limit 5\npython3 paris.py search --term \"Wedding Cake House\" --limit 5\npython3 paris.py search --term \"Donuts Los Angeles\" --limit 5\npython3 paris.py search --term \"Pizza Bar New York\" --limit 5\npython3 paris.py search --term \"Pizza Bar Los Angeles\" --limit 5\npython3 paris.py search --term \"Donuts New York\" --limit 5\npython3 paris.py search --term \"Parks\" --limit 5\npython3 paris.py search --term \"Tea Cafe\" --limit 5\npython3 paris.py search --term \"Town Bar\" --limit 5\npython3 paris.py search --term \"Night Concerts in New Orleans\" --limit 5\npython3 paris.py search --term \"Night Concerts in San Francisco\" --limit 5\npython3 paris.py search --term \"Night Concerts in Los Angeles\" --limit 5\npython3 paris.py search --term \"Museum in New Orleans\" --limit 5\npython3 paris.py search --term \"Museum in Los Angeles\" --limit 5\npython3 paris.py search --term \"Museum in San Francisco\" --limit 5\npython3 paris.py search --term \"Donuts San Francisco\" --limit 5\npython3 paris.py search --term \"Donuts Los Angeles\" --limit 5\npython3 paris.py search --term \"Parks in San Francisco\" --limit 5\npython3 paris.py search --term \"Parks in New Orleans\" --limit 5\n```\nAdd negative search results to your dataset:\n```bash\npython3 paris.py search --term \"Amazon Web Services\" --limit 50 --is-negative\npython3 paris.py search --term \"Buy New Car Used\" --limit 50 --is-negative\npython3 paris.py search --term \"Stack Overflow\" --limit 50 --is-negative\npython3 paris.py search --term \"Java Tutorial Kubernetes\" --limit 50 --is-negative\npython3 paris.py search --term \"Java Hadoop\" --limit 50 --is-negative\npython3 paris.py search --term \"Docker Cassandra\" --limit 50 --is-negative\npython3 paris.py search --term \"BBC News\" --limit 50 --is-negative\npython3 paris.py search --term \"Clarin Economia\" --limit 50 --is-negative\npython3 paris.py search --term \"Que es el cambio climatico\" --limit 50 --is-negative\npython3 paris.py search --term \"Global Warming\" --limit 50 --is-negative\npython3 paris.py search --term \"Black Hole\" --limit 50 --is-negative\npython3 paris.py search --term \"NASA News\" --limit 50 --is-negative\npython3 paris.py search --term \"China Technology\" --limit 50 --is-negative\npython3 paris.py search --term \"Venezuela News\" --limit 50 --is-negative\npython3 paris.py search --term \"Financial Consulting\" --limit 50 --is-negative\npython3 paris.py search --term \"Insurance\" --limit 50 --is-negative\npython3 paris.py search --term \"Hospital\" --limit 50 --is-negative\npython3 paris.py search --term \"Top 10\" --limit 50 --is-negative\npython3 paris.py search --term \"Top 5\" --limit 50 --is-negative\npython3 paris.py search --term \"Best 10 Places\" --limit 50 --is-negative\npython3 paris.py search --term \"Best 10 Beer\" --limit 50 --is-negative\npython3 paris.py search --term \"Best 5 Beer\" --limit 50 --is-negative\n```\nYou may also run some combinatorics:\n```bash\nCITIES=(\"Los Angeles\" \"Las Vegas\" \"San Francisco\" \"New Orleans\" \"New York\"\n        \"Boston\" \"Phoenix\" \"Chicago\" \"Houston\" \"Philadelfia\" \"San Antonio\"\n        \"San Diego\" \"Dallas\" \"Austin\" \"Seattle\" \"Detroit\" \"Miami\" \"Orlando\")\nPLACES=(\"Restaurant\" \"Bar\" \"Lounge\" \"Donuts\" \"Coffee\" \"Cafe\" \"Museum\" \"Park\" \"BBQ\"\n        \"Theatre\" \"Disco\" \"Club\" \"Square\" \"Neighborhood\" \"Pizza\" \"Italian Food\"\n        \"Mexican Food\" \"Indian Food\" \"Sushi\" \"Taqueria\" \"Beer\" \"Brewery\")\nfor CITY in \"${CITIES[@]}\"\ndo\n    for PLACE in \"${PLACES[@]}\"\n    do\n        SEARCH=\"${PLACE} in ${CITY} -best -list\"\n        python3 paris.py search --term \"${SEARCH}\" --limit 5\n    done\ndone\n```\nExport the dataset:\n```bash\npython3 paris.py export --name \"my-dataset-1.csv\"\n```\n\n#### Persisting the Dataset.\nOptionally, you can upload the dataset to S3.\nFirst create a new S3 Bucket:\n```bash\naws s3api --profile \"martin\" create-bucket \\\n    --bucket \"datasets.martincastroalvarez.com\" --region \"us-east-1\"\n```\nThen, upload the dataset to the S3 Bucket:\n```bash\naws s3 --profile \"martin\" cp \"./datasets/my-dataset-1.csv\" \\\n    \"s3://datasets.martincastroalvarez.com/my-dataset-1.csv\"\n```\nFinally, generate a pre-signed public URL that expires automatically:\n```bash\naws s3 --profile \"martin\" presign --expires-in 10000 \\\n    \"s3://datasets.martincastroalvarez.com/my-dataset-1.csv\"\n```\n\n#### Training a Prediction Model\nGenerate a prediction model using Neural Networks:\n```bash\npython3 paris.py learn --dataset \"my-dataset-1.csv\"\n```\n\n#### Making Predictions\nMake a prediction:\n```bash\npython3 paris.py predict --urls \"https://www.iacovonekitchen.com/\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartincastroalvarez%2Fkeras-document-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmartincastroalvarez%2Fkeras-document-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartincastroalvarez%2Fkeras-document-classifier/lists"}