{"id":20393498,"url":"https://github.com/alexandroskyriakakis/lessons","last_synced_at":"2025-07-15T12:34:48.545Z","repository":{"id":38997265,"uuid":"212348868","full_name":"AlexandrosKyriakakis/Lessons","owner":"AlexandrosKyriakakis","description":"Web scraping, Probs, Randomized Algorithms and a lot of Coffees...","archived":false,"fork":false,"pushed_at":"2023-03-03T09:42:56.000Z","size":13498,"stargazers_count":1,"open_issues_count":14,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-14T20:58:03.339Z","etag":null,"topics":["nlp","probabilistic-programming","randomized-algorithm","web-scraping"],"latest_commit_sha":null,"homepage":"https://alexandroskyriakakis.github.io/Lessons_Schedule/?fbclid=IwAR02q373JO1KcdDC7JMR534wpSNJeUr4-Sb8E7coQEmlCul-23i-soeso1g","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlexandrosKyriakakis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":["https://pay.revolut.com/profile/alexanog30"]}},"created_at":"2019-10-02T13:28:23.000Z","updated_at":"2023-04-22T20:29:13.000Z","dependencies_parsed_at":"2025-03-05T00:39:26.408Z","dependency_job_id":null,"html_url":"https://github.com/AlexandrosKyriakakis/Lessons","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AlexandrosKyriakakis/Lessons","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandrosKyriakakis%2FLessons","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandrosKyriakakis%2FLessons/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandrosKyriakakis%2FLessons/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandrosKyriakakis%2FLessons/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlexandrosKyriakakis","download_url":"https://codeload.github.com/AlexandrosKyriakakis/Lessons/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandrosKyriakakis%2FLessons/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265436806,"owners_count":23765035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","probabilistic-programming","randomized-algorithm","web-scraping"],"created_at":"2024-11-15T03:49:05.986Z","updated_at":"2025-07-15T12:34:48.494Z","avatar_url":"https://github.com/AlexandrosKyriakakis.png","language":"Python","funding_links":["https://pay.revolut.com/profile/alexanog30"],"categories":[],"sub_categories":[],"readme":"# Lessons Schedule\n\nLessons Schedule is a project that can be used to calculate the expected release date of all lessons grades at Electrical and Computer Engineering department of National Technical Univercity of Athens. This Project contains implementation of Web Scraping, Probabilities and Randomized Algorithms.\n\n## Web Scraping\n\nI collected my data from the official forum of my department [SHMMY](https://shmmy.ntua.gr/forum/index.php) using a well known lib [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/) using the following code.\n\n```python\n#Track Information From \"shmmy.ntua.gr\"\nRelease_dates_of_posts = []\nContent_of_Posts = []\n\ndef Track_shmmy_Data (page_id, no_of_pages):\n    global Release_dates_of_posts\n    global Content_of_Posts\n    headers = {\"User-Agent\": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15' }\n    next_page = 0\n    \n    while (next_page \u003c= no_of_pages):\n        URL = 'https://shmmy.ntua.gr/forum/viewtopic.php?f=290\u0026t='+ page_id + '\u0026start=' + str(next_page)\n        page = requests.get(URL, headers = headers)\n        soup = BeautifulSoup(page.content, 'html.parser')\n        date = soup.findAll( \"p\", {\"class\": \"author\"})\n        for author in date:\n            Release_dates_of_posts.append(author.text)\n        post = soup.findAll ( \"div\", {\"class\" : \"content\"})\n        for content in post:\n            Content_of_Posts.append(content.text)\n        next_page += 20\n```\nI also extracted dates of release grades using regular expressions\n```python\n#Function return dates\nMonths = ['ιαν', 'φεβ', 'μαρ', 'απρ', 'μαιος', 'ιουν', 'ιουλ', 'αυγ', 'σεπ', 'οκτ','νοεμ', 'δεκ']\n    \ndef Aux_Define_Dates (word):\n    global Months\n    return (word in Months)\n\ndef Define_dates (list_of_words):\n    global Months\n    only_month = list(filter(Aux_Define_Dates, list_of_words))\n    day = int(re.sub(\"[^0-9]\", \"\", list_of_words[list_of_words.index(only_month[0]) + 1]))\n    month = int(Months.index(only_month[0]) + 1)\n    year =  int(re.sub(\"[^0-9]\", \"\", list_of_words[list_of_words.index(only_month[0]) + 2]))\n    return [day, month, year]\n\n\n\n```\nWhich creates a list with all post contents and a list with all the released dates of posts.\n\n## Export data from exams' programs pdf\n\nIn order to collect the data from pdfs I converted them into excel files using [a web service](https://www.ilovepdf.com/pdf_to_excel) and then I used the lib [xlrd](https://pypi.org/project/xlrd/) to process data. The following code describes the module.\n\n```python\n#Read and Extract from Excel\nExcel_Cell_Values = []\n\n\ndef Extract_From_Excel (folder):\n    book = xlrd.open_workbook(\"/Users/alexandroskyriakakis/MyProjects/Python_Projects/Project_One/EXAMS SCHEDULE/\" + folder + \"/excel/Read_Info.xlsx\")\n    global Excel_Cell_Values\n    for book_sheet_i in range(book.nsheets):\n        current_sheet = book.sheet_by_index(book_sheet_i)\n        for row_j in range(current_sheet.nrows):\n            for column_k in range(current_sheet.ncols):\n                current_cell = current_sheet.cell(row_j,column_k)\n                if (current_cell.value != '' and type(current_cell.value) is str):\n                    Excel_Cell_Values.insert(len(Excel_Cell_Values),current_cell.value)\n\n\n\n```\nA bit challenging was the recognition of dates of exams but I used regex's once more.\n\n```python\n#Tracking the dates on excel       ----day/month/year----\nExcel_lessons_date_of_exam = []\n\ndef Track_Dates_in_posts (list_of_strings):\n    current_date_of_exam = [0,0,0]\n    for cell_i in list_of_strings:\n      #  print (cell_i,type(cell_i))\n        a = re.search(\"\\d\\d/\\d\\d/\\d\\d\\d\\d\", cell_i)\n        if (a == None): Excel_lessons_date_of_exam.append(current_date_of_exam)\n        else:\n            current_date_of_exam = a.group().split('/')\n            current_date_of_exam = list(map(int, current_date_of_exam))\n            Excel_lessons_date_of_exam.append(current_date_of_exam)\n            \n    \n\n\n```\n## Recognition of lessons name in posts\n\nFirstly I normalized everything removing accents meaningless words and make everything lower case letters.\n\n```python\n#Function to keep only letters\ndef Keep_only_letters (list_of_words):\n    only_letters_list = []\n    for word_i in list_of_words:\n        word_i.replace(\"-\",\" \")\n        word_i.replace(\"\\n\",\" \")\n        word_i.replace(\"\\b\",\" \")\n        current_word = ''.join(filter(str.isalpha, word_i))\n        if (current_word == ''): list_of_words.remove(word_i)\n        else: only_letters_list.append(current_word)\n    return only_letters_list\n\n    \n#Function that removes accents (') in a given string\ndef remove_accents(input_str):\n    nfkd_form = unicodedata.normalize('NFKD', input_str)\n    return u\"\".join([c for c in nfkd_form if not unicodedata.combining(c)])\n\n\n\n#Remove some meaningless words \nMeaningless_Words = ['εξαμηνο','για','ηλ', 'αμφ', 'αιθ','και','η','το','τα','ο','οι','\u0026','\\n','της','των','απο','στα','στο','στη','επι', 'πτυχιο', 'πτυχιω', 'θεματα', 'με','την']\n\n\ndef Remove_Meaningless_Words (words):\n    global Meaningless_Words\n    return list(w for w in words if w not in Meaningless_Words)\n\ndef Remove_words_greater_than_six (Input_list_of_words):\n    Output_list_of_words = []\n    index = 0\n    while (index \u003c len(Input_list_of_words) and index \u003c 8):\n        Output_list_of_words.append(Input_list_of_words[index])\n        index += 1\n    return Output_list_of_words\n\n#Do the changes, input string return list\ndef Normalize (Input_String):\n     a = (remove_accents(Input_String.lower())).split() #Lowercase, remove accents, make list of words\n     a = Remove_Meaningless_Words (a)\n     return a\n\n```\n\n## Algorithm of matching values in pdf with values in posts\n\nFinaly, I matched the values using an O(n^3) algorithm. At first I parse all the excel contents through all the post contents. For every common word I catch the number of common letters and then I sort the matched list based on the cardinality of common letters. Then I consider that the match with the most common letters is valid, and I remove the values from both lists. In the end I repeat this idea n times until all values get a match.\n\n```python\n#For a lesson search a post for common words and returns the number of letters in common words \n\ndef Max_common_letters_in_words (lesson, posts_content):\n     max_sentence_size = 0\n     for lesson_word_i in lesson:\n          max_word_size = 0\n          for post_word_j in posts_content:\n               if (lesson_word_i == post_word_j):\n                    length_of_common_letters = len(lesson_word_i)\n                    if (length_of_common_letters \u003e max_word_size): max_word_size = length_of_common_letters\n          max_sentence_size += max_word_size \n     return max_sentence_size\n\n\n\n\n#For each one of excel cells we put in the list the post with the most common letters with the cell \nCommonLetters_Post_CellValue_Date = []     \n\ndef Create_List_with_all_data(Content_of_Posts_copy):\n    for Excel_value_j in Excel_Cell_Values:\n        max_common_letter_post = 0\n        for Post_i in Content_of_Posts:\n            a = Max_common_letters_in_words (Excel_value_j,Post_i)\n            \n            if (max_common_letter_post \u003c= a):\n                max_common_letter_post = a\n                if (a != 0):\n                    exam_date = Excel_lessons_date_of_exam[Excel_Cell_Values_Copy.index(Excel_value_j)]\n                    grades_release_date = Release_dates_of_posts[Content_of_Posts_copy.index(Post_i)]\n                    CommonLetters_Post_CellValue_Date.append([a , Post_i , Excel_value_j, exam_date, grades_release_date])\n                   \ndef Repeat_Idea ():\n    global CommonLetters_Post_CellValue_Date\n    Repeated_list = []\n    while ((Excel_Cell_Values != [] and Content_of_Posts != [])  ):\n        Create_List_with_all_data(Content_of_Posts_copy) \n        CommonLetters_Post_CellValue_Date.sort(key=itemgetter(0))\n        CommonLetters_Post_CellValue_Date.reverse()\n        if (CommonLetters_Post_CellValue_Date == []): break\n        current_max_value = CommonLetters_Post_CellValue_Date.pop(0)\n        #print (current_max_value)\n        if (current_max_value[2] in Excel_Cell_Values):\n            Excel_Cell_Values.remove(current_max_value[2])\n        if ((current_max_value[1] in Content_of_Posts)):\n            Content_of_Posts.remove(current_max_value[1])\n        Repeated_list.append(current_max_value)\n        CommonLetters_Post_CellValue_Date = []\n    CommonLetters_Post_CellValue_Date = Repeated_list\n\n\n```\n## Database\n\nThen we created and upload data to data base with project2.py which looked like this\n\n![Database for Winter exams](https://raw.githubusercontent.com/AlexandrosKyriakakis/Lessons/master/Images/Database.png)\n\n\n## Calculating Mean and STD (test.py)\n\nIn order to have more realistic mean results I applied the Harmonic Series to the result as frequency of the values starting from the latest. So the latest the most participating weight on the mean.\n\n```python\n\n#Apply Harmonic Series to results\ndef Harmonic_Series (lista):\n    if (len(lista) == 0): return lista\n    if (len(lista) == 1): return np.random.normal(lista[0], 1, 100)\n    else:\n        Result_list = []\n        lcm = 0\n        freq_list = Aux_Series(lista) # [33,45,67] -\u003e [3,2,1] \n        if (len(lista) == 2): lcm =  2\n        else: lcm = np.lcm.reduce(freq_list) # lcm = 6\n        for i in freq_list:\n            for j in np.random.normal(lista[freq_list.index(i)], 1, 100*(lcm//i)): #Randomize the result\n                Result_list.append(j) #6/3 = 2 -\u003e [33,33,45,45,45,67,67,67,67,67,67]\n        return Result_list\n    \ndef Aux_Series (lista):\n    result = []\n    for i in lista:\n        result.append(len(lista) - lista.index(i))\n    return result\n\n```\n\nThe naive way to do this is by adding n times the latest element and n/2 the second latest etc.\n\n\n![Database for Winter exams](https://raw.githubusercontent.com/AlexandrosKyriakakis/Lessons/master/Images/Without_Randomize.png)\n\n\nBut instead I took the values taking a normal random variable with mean on the current value for n times.\n\n![Database for Winter exams](https://raw.githubusercontent.com/AlexandrosKyriakakis/Lessons/master/Images/With_Randomize.png)\n\nSo following these steps I calculated the Mean Values and the STD's.\n\n## Web Page\n\nAs for the web page, I used the code from [a react exaple on git](https://github.com/ahfarmer/emoji-search.git) and I changed the content using test32.py and deploy it using git pages. Every day after the Expected date of release the Probability of Grade's Release is growing above 50%.\n\n## Conclusion\n\nProject's inner purpose was to get comfortable with python and it's libraries, trying web scrapping and make something usefull for my impatient classmates...\n\nHave Fun!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexandroskyriakakis%2Flessons","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexandroskyriakakis%2Flessons","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexandroskyriakakis%2Flessons/lists"}