{"id":23299113,"url":"https://github.com/emilyfelker/bloganalysis","last_synced_at":"2026-05-09T06:34:40.944Z","repository":{"id":223032646,"uuid":"746232893","full_name":"emilyfelker/BlogAnalysis","owner":"emilyfelker","description":"What can ChatGPT tell me about my younger self? Program to extract and analyze personal blog posts with basic features and ChatGPT, including a RAG pipeline.","archived":false,"fork":false,"pushed_at":"2025-01-20T19:52:39.000Z","size":2096,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-04T03:44:33.609Z","etag":null,"topics":["chatgpt","chatgpt-api","generative-ai","langchain","llm","natural-language-processing","nlp","openai-api","poetry","pytest","python","rag","sql","sqlite3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emilyfelker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-21T13:22:04.000Z","updated_at":"2025-01-20T19:52:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"a56380de-7a41-4cbc-85fa-3a4c57c5b6f8","html_url":"https://github.com/emilyfelker/BlogAnalysis","commit_stats":null,"previous_names":["emilyfelker/bloganalysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/emilyfelker/BlogAnalysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emilyfelker%2FBlogAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emilyfelker%2FBlogAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emilyfelker%2FBlogAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emilyfelker%2FBlogAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emilyfelker","download_url":"https://codeload.github.com/emilyfelker/BlogAnalysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emilyfelker%2FBlogAnalysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32809797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"online","status_checked_at":"2026-05-09T02:00:06.633Z","response_time":123,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","chatgpt-api","generative-ai","langchain","llm","natural-language-processing","nlp","openai-api","poetry","pytest","python","rag","sql","sqlite3"],"created_at":"2024-12-20T08:14:24.256Z","updated_at":"2026-05-09T06:34:40.926Z","avatar_url":"https://github.com/emilyfelker.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Blog Analysis with ChatGPT\n\n## Introduction\n\nThis program analyzes a set of around 800 blog posts, mostly about my \npersonal life, that I wrote during my teenage years and into college. It uses\n`BeautifulSoup` to parse the HTML files that constitute the blog archive\nand extract each post's title, date, and body. Various features of interest \nabout each blog post are calculated. For some features, ChatGPT is called \nvia the `OpenAI` API, and its responses are cached in a `sqlite3` database\nto avoid unnecessary API calls when re-running the program.\nThe program uses `Matplotlib` and `NumPy` to create graphs that show how\ncertain numerical features change over time. One question I had was: based\non each blog post's text, how \naccurately can ChatGPT estimate my age?\n\nAdditionally, the project implements \n**Retrieval-Augmented Generation \n(RAG)** with `LangChain` to \nallow for natural language queries \nabout the blog posts, which are \nsplit into smaller chunks for better \nretrieval and embedded with \n`OpenAIEmbeddings`. This \nfunctionality uses `FAISS` \nfor vector-based similarity search and leverages OpenAI's GPT model to generate \nanswers based on the retrieved blog post content.\n\n\n## RAG Usage Examples\n\nExample interaction with the \ninteractive question-answering tool:\n```\nWelcome to the Blog Analysis Query Tool!\nEnter your query (or type 'exit' to quit): What were my career aspirations back then?\nAnswer:\nYour career aspirations included becoming a librarian or library technician, \nas well as exploring service-related jobs and computer-related careers. You \nalso expressed an interest in becoming a research psychologist, although you \nwere uncertain about your future direction. Ultimately, you planned to major \nin psychology in college, with the possibility of switching to other interests \nsuch as French language, library science, computer science, or political science. \nYou aimed to pursue graduate and doctorate degrees and sought a job associated \nwith a university or a federal government position.\n\nEnter your query (or type 'exit' to quit): exit\nGoodbye!\n```\n\nAnswers can also be generated \nprogramatically by using the RAG \nfunctions directly in Python:\n ```python\nfrom BlogAnalysis.rag import generate_answer\n\nquestion = \"Who were Emily's lovers?\"\nanswer = generate_answer(question)\nprint(f\"Answer:\\n{answer}\")\n```\n\n\n## Data Visualization\n\nThis scatterplot of Chat GPT's age estimate plotted against my actual age shows (thankfully!) a positive correlation between the two variables, with a few outliers:\n![Scatterplot with linear regression line showing a positive correlation between age estimate and actual age](output/real_data_graphed.png \"Actual Age vs. GPT Estimate\")\n\nIn the future, the program could be expanded to see whether my age at the time of writing correlates better with more traditional predictors like word count or difficulty, sentence length, or other measures of writing complexity.\n\n\n## Data Analysis Examples\n\nLoading the dataset and calculating features:\n```python\ndataset = get_blogpost_dataset(\"data/XangaBlogPosts\")\ndataset_with_features = add_features_to_dataset(dataset)\n```\nPreviewing the post title, beginning of post body, and features:\n```python\npreview_features(dataset_with_features[:10]  # just the first ten posts\n```\nWhich prints output like this per post:\n```\nTitle: Which Language to Learn Next? | Date: 2012-02-07\nContent: After seven years of study, my level of French seems to have reached a point of ...\nFeatures:\n  word_count: 743\n  day_of_week: Tuesday\n  age_of_emily: 21.078713210130047\n  topic: Choosing between German and Spanish\n  age_estimate: 30.0\n```\nFor a fun trip down memory lane, I also wanted a quick and easy way to look at just the topic\nsummaries ChatGPT generated for all my posts:\n```python\nshow_summaries(dataset_with_features[:10])  # just the first ten posts\n```\nWhich prints output like this:\n```\nChoosing between German and Spanish\nFirst Christmas away from family.\nHighlights of trip: living with cats\nVisiting Pretoria and its attractions.\nProfessor uses powerful songs, poems.\nParents and others demand smiles\nPet peeves on self-referential posts.\nRegret ending online friendship, loneliness\nChallenging senior year with rigorous courses\nAnxiety-induced sleep troubles and volunteering.\nUnhappy with P.E. teacher, election excitement, terrified of squirrels\nDream of exploring Chinese city.\nPhysics project success, English class discomfort, unexpected reunion\nKey Club election results and socializing\nWYSE competition and school fundraiser.\nPhotos from colorful park outing.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femilyfelker%2Fbloganalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femilyfelker%2Fbloganalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femilyfelker%2Fbloganalysis/lists"}