{"id":20909713,"url":"https://github.com/lovenui/weblogs-analysis-system","last_synced_at":"2025-07-20T16:04:19.608Z","repository":{"id":165934298,"uuid":"641355673","full_name":"LoveNui/WebLogs-Analysis-System","owner":"LoveNui","description":"A big data platform for analyzing web access logs","archived":false,"fork":false,"pushed_at":"2023-07-15T16:42:16.000Z","size":3944,"stargazers_count":13,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-08T17:02:09.407Z","etag":null,"topics":["hbase","javascript","log-analysis","python","scala","spark"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LoveNui.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-16T09:51:41.000Z","updated_at":"2023-10-06T11:47:56.000Z","dependencies_parsed_at":"2024-11-18T14:36:22.597Z","dependency_job_id":"2d601a88-b6c9-405a-aed5-91c5fdab395f","html_url":"https://github.com/LoveNui/WebLogs-Analysis-System","commit_stats":null,"previous_names":["superstar512/gender_classification","lovenui/gender_classification","lovenui/weblogs-analysis-system"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LoveNui/WebLogs-Analysis-System","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FWebLogs-Analysis-System","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FWebLogs-Analysis-System/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FWebLogs-Analysis-System/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FWebLogs-Analysis-System/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LoveNui","download_url":"https://codeload.github.com/LoveNui/WebLogs-Analysis-System/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FWebLogs-Analysis-System/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266152255,"owners_count":23884475,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hbase","javascript","log-analysis","python","scala","spark"],"created_at":"2024-11-18T14:12:26.522Z","updated_at":"2025-07-20T16:04:19.586Z","avatar_url":"https://github.com/LoveNui.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HBase Actual Data Analysis System\n  \n## instructions\n\n### 1. Database design\n\n####LogData\n\n- This table is used to store the data after data cleaning and transformation\n- Database type: HBase\n- Table Structure\n\n  Rowkey|prop|\n  ----|:------------------------------:|\n  | rowkey | IP / BYTES / URL / DATES / METHOD / FYDM / BYTES|\n- RowKey structure design description\n\u003e RowKey is divided into date + last three digits of website code + six digit ID\n\u003e Each part is described as follows:\n\n  Field | Explanation | Example\n----| ----- |----\nDate |The date when the log file was generated (pure numbers, without spaces and -) | 20170808\nCompany code | The last three digits of the company code |200\nID | Six digits starting from 100000, used to uniquely mark data and align | 100001\n\u003e complete example\n\u003e 201708082001000000 means a request made by 200-point company on 2017-08-08\n\n- Create table statement\n\u003e create \"LogData\", \"prop\"\n\n-\n\n#### LogAna\n- This table is used to store the analyzed data\n- Database type: HBase\n- Table Structure \n\nRowKey | IP | URL | BYTES | MTHOD_STATE |REQ\n-------|----|-----|-------|-------------|---\nrowkey |IPSumVal IPTotalNum IPList |URLList MaxURL | BytesSecList BytesHourList / TotalBytes | MethodList StateList | ReqHourList ReqSecList ReqSum\n- field description\n\nField | Explanation | Example\n----| ----- |----\nIPTotalNum| The total number of IPs, excluding duplicates | 100 means that 100 IPs visited the website that day\nIPSumVal | Total number of IPs, including duplicates | 100 indicates that 100 IPs visit the website, and IPs can be repeated\nIPList | Ranking of IP and corresponding visits, the structure is a JSON file converted from mutable.Map[String, Int] | {\"190.1.1.1\": 1000} means that the IP of 190.1.1.1 generated 1000 requests on the website in total)\nURLList | The 10 most requested URLs, the structure is Json | {\"test.aj\":100, \"test2.aj\":90, ...}\nMaxURL | The URL with the most requests (now the front end has given up using this field) |{\"test.aj\": 100}\nBytesSecList | Statistical traffic generated per second, the unit is Byte, but converted to MB when the front-end display | {\"2017-08-08 01:00:00\":9000, \"2017-08-08 01:00:00\" : 500, ...}\nBytesHourList | Count the traffic generated every hour in a day, the unit is Byte, but it will be converted to MB when displayed on the front end | {\"08\": 9000, \"09\": 500, ...}, 08 means within 8 o'clock to 9 o'clock generated traffic\nTotalBytes | The total traffic size generated in one day, the unit is Byte, but it is converted to MB when displayed on the front end | 3000, indicating that the traffic of 3000b bytes is generated on that day\nMethodList | Appeared request method statistics | {\"POST\":3446,\"OPTIONS\":5,\"HEAD\":4}\nStateList | Appeared request state intermediate | {\"501\":8,\"302\":801,\"404\":1,\"200\":14738,\"400\":2,\"405\":4}\nReqHourList | Count the number of requests by hour | {\"15\":2350,\"09\":3503,\"00\":690,\"11\":1903}\nReqSecList | Count the number of requests by second | {\"2017-08-08 10:44:08\":1,\"2017-08-08 09:45:05\":4,\"2017-08-08 10:06:58 \":4}\nReqSum | The total number of requests in a day | 1000, indicating that there are 1000 requests in the day\n\n- RowKey structure design description\n\u003e RowKey is divided into date + last three digits of company code\n\u003e Each part is described as follows:\n\nField | Explanation | Example\n----| ----- |----\nDate |The date when the log file was generated (pure numbers, without spaces and -) | 20170808\nCompany code | The last three digits of the company code | 200, it should be noted that 000 means all website data of the day\n\n\u003e example:\n20170808200 means all the data of Tianjin High Court on 2017-08-08\n20170808000 means all courts at point 2017-08-08 all data\n\n- Create table statement\n\u003e create \"LogAna\", \"IP\", \"URL\", \"BYTES\", \"METHOD_STATE\", \"REQ\"\n\n\n### 2. Project code description\n- This project is divided into three sub-projects, including data acquisition, data storage and display, and data offline analysis\n\n#### data collection\n\n- Project name: CollectTomcatLogs\n- Function Description:   \n\n\u003e Collect tomcat logs under the specified path\n\u003e Upload to HDFS or FTP server after renaming the file\n\u003e Save the log to record whether the upload is successful\n \n- Deployment instructions: Deploy on each server that needs to collect logs, specify the company code and log path in my.properties\n- Configuration management: maven\n- Main technologies: Java FTPClient, HDFS\n- Test case description: mainly used to test whether the renamed file is normal\n- File renaming: Add the court code before the localhost_XXXXX.txt file to distinguish the data of each company\n\n#### Data storage and display\n- Project name: RestoreData\n- Function Description:\n\n\u003e Data preprocessing: including data analysis, cleaning and transformation\n\u003e Data storage: save the converted data in a List and insert them into the HBase database in batches\n\u003e Front-end display: display the analyzed data\n\u003e Data query: Query corresponding data according to various input conditions\n- Development environment:\n\u003e JDK 8\n\u003e Hadoop 2.7\n\u003e Hbase 1.2\n\u003e tomcat 8\n- Deployment instructions: Configure various data in my.properties, pay attention to the compatibility of JDK and Hadoop versions\n- Configuration management: maven\n- Main technology: Spring MVC / Hadoop / JSP\n- Test case description:\n\u003e HbaseBatchInsertTest.java: for testing batch insertion\n\u003e HbaseConnectionTest.java: used to test whether the Hbase connection is normal\n\u003e ParseLogTest.java: for testing log parsing\n\u003e ListBean.java: Print all beans, used to cope with @Autowried failure\n- Front end part:\n\u003e #### code section\n\u003e index.jsp: The page is loaded by default, and the data will be requested after loading, showing all the website data of the previous day\n\u003e index.js: used to process various requests and data analysis in index.jsp\n\n\u003e ----------\n\u003e queryData.jsp: Used to query the data of various websites, the input is date + website, multiple selection is supported\n\u003e queryData.js: used to process various requests and data analysis in queryData.jsp (to be completed)\n\n\u003e ---------\n\u003e dataGrid.jsp: display data in form of table (to be completed)\n\n\u003e --------\n\u003e myCharts.js: Use echarts to draw various charts (note that the initialization of dom is done externally)\n\u003e inputCheck.js: Check if the input is legal\n\n\u003e---------\n\u003e mystyle.css: Customize various styles\n\u003e####Third party library\n\u003e Bootstrap: mainly with its grid system\n\u003e Bootstrap-select: Implementation of multiple selection boxes\n\u003e BootstrapDatepickr: date input\n\u003e echarts: draw various charts\n\u003e jQuery: frame\n\u003e font-awesome: various small icons\n\n\n#### Data offline analysis\n\n- Project name: ScalaReadAndWrite\n- Function Description:\n\n\u003e Offline analysis of various data, a total of 13 indicators, see the database table LogAna design for details\n\n- Development environment:\n\u003e Scala 2.11\n\u003e Spark 1.52\n\u003e Hadoop 2.7\n- Special Note:\n\u003e There are only two implementations of global variables in spark, broadcast variables or accumulators, this project uses accumulators\n\u003e When customizing the accumulator, it is very important to pay attention to the correct input and output types\n\u003e Be sure to implement all six overloaded functions\n\u003e An accumulator can only pass one kind of variable, which can be a complex object\n\u003e Failure to do so will invalidate the accumulator!\n- Deployment instructions: None\n- Configuration management: maven\n- Main technology: Spark\n- Description of project structure:\n\u003e Accumulator: accumulator, including various custom accumulators\n\u003e analysis: main analysis code\n\u003e DAO: parse the entity class and store it in HBase\n\u003e Entity: two entity classes\n\u003e util: various tools\n\n#### 3. Project screenshot:\n\n- Hbase database screenshot\n![image](https://github.com/LoveNui/WebLogs-Analysis-System/blob/master/image/p2.png)\n\n- Data display interface\n![image](https://github.com/LoveNui/WebLogs-Analysis-System/blob/master/image/p1.png)\n\n- Data display interface\n![image](https://github.com/LoveNui/WebLogs-Analysis-System/blob/master/image/p3.png)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovenui%2Fweblogs-analysis-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flovenui%2Fweblogs-analysis-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovenui%2Fweblogs-analysis-system/lists"}