{"id":15433387,"url":"https://github.com/beiyuouo/mi-store-log-analysis","last_synced_at":"2025-06-24T21:06:26.794Z","repository":{"id":107740338,"uuid":"422425080","full_name":"beiyuouo/mi-store-log-analysis","owner":"beiyuouo","description":"👨‍🦽 伪·小米商城-大数据电商日志分析","archived":false,"fork":false,"pushed_at":"2021-11-07T13:15:58.000Z","size":19144,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-18T08:16:02.583Z","etag":null,"topics":["flask","full-stack","java","kafka","python","spark"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/beiyuouo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-29T03:02:49.000Z","updated_at":"2022-10-31T03:12:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"a25cd375-e207-43e7-a5bd-8b3e48d6778b","html_url":"https://github.com/beiyuouo/mi-store-log-analysis","commit_stats":{"total_commits":1,"total_committers":1,"mean_commits":1.0,"dds":0.0,"last_synced_commit":"b23acc414ac92ba794eebdb026931a60e7540a7b"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/beiyuouo/mi-store-log-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beiyuouo%2Fmi-store-log-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beiyuouo%2Fmi-store-log-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beiyuouo%2Fmi-store-log-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beiyuouo%2Fmi-store-log-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/beiyuouo","download_url":"https://codeload.github.com/beiyuouo/mi-store-log-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beiyuouo%2Fmi-store-log-analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261756746,"owners_count":23205156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask","full-stack","java","kafka","python","spark"],"created_at":"2024-10-01T18:33:48.469Z","updated_at":"2025-06-24T21:06:26.776Z","avatar_url":"https://github.com/beiyuouo.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mi-store-log-analysis\n\nBigData Project\n\n## How to use\n0. Prerequire\n\n- Hadoop\n- Flume\n- Zookeeper\n- Kafka\n- Storm\n- Nginx\n- Nodejs\n\n1. 启动前端\n```sh\ncd mi-store-fn\nnpm install\nnpm run serve\n```\n\n2. 配置并启动后端\n```sh\ncd mi-store-bn\nmysql -uroot -p\u003cpassword\u003e \u003c storeDB.sql\nmysql -uroot -p\u003cpassword\u003e -DstoreDB \u003c analogDataSql.sql\nnode app.js\n```\n\n3. 配置Nginx\n```sh\n\n#user  nobody;\nworker_processes  1;\n\n#error_log  logs/error.log;\n#error_log  logs/error.log  notice;\n#error_log  logs/error.log  info;\n\npid        logs/nginx.pid;\n\n\nevents {\n    worker_connections  1024;\n}\n\n\nhttp {\n    include       mime.types;\n    default_type  application/octet-stream;\n\n    log_format  main  '$remote_addr - $remote_user [$time_local] \"$request\" '\n                      '$status $body_bytes_sent \"$http_referer\" '\n                      '\"$http_user_agent\" \"$http_x_forwarded_for\"';\n\n    access_log  /var/log/nginx/access.log  main;\n\n    sendfile        on;\n    #tcp_nopush     on;\n\n    #keepalive_timeout  0;\n    keepalive_timeout  65;\n\n    #gzip  on;\n\n    server {\n        listen       80;\n        server_name  localhost;\n\n        #charset koi8-r;\n\n        #access_log  logs/host.access.log  main;\n\n        location / {\n            proxy_pass http://localhost:8080;\n            # root   html;\n            # index  index.html index.htm;\n        }\n        #error_page  404              /404.html;\n\n        # redirect server error pages to the static page /50x.html\n        #\n        error_page   500 502 503 504  /50x.html;\n        location = /50x.html {\n            root   html;\n        }\n\n        # proxy the PHP scripts to Apache listening on 127.0.0.1:80\n        #\n        #location ~ \\.php$ {\n        #    proxy_pass   http://127.0.0.1;\n        #}\n\n        # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000\n        #\n        #location ~ \\.php$ {\n        #    root           html;\n        #    fastcgi_pass   127.0.0.1:9000;\n        #    fastcgi_index  index.php;\n        #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;\n        #    include        fastcgi_params;\n        #}\n\n        # deny access to .htaccess files, if Apache's document root\n        # concurs with nginx's one\n        #\n        #location ~ /\\.ht {\n        #    deny  all;\n        #}\n    }\n\n\n    # another virtual host using mix of IP-, name-, and port-based configuration\n    #\n    #server {\n    #    listen       8000;\n    #    listen       somename:8080;\n    #    server_name  somename  alias  another.alias;\n\n    #    location / {\n    #        root   html;\n    #        index  index.html index.htm;\n    #    }\n    #}\n\n\n    # HTTPS server\n    #\n    #server {\n    #    listen       443 ssl;\n    #    server_name  localhost;\n\n    #    ssl_certificate      cert.pem;\n    #    ssl_certificate_key  cert.key;\n\n    #    ssl_session_cache    shared:SSL:1m;\n    #    ssl_session_timeout  5m;\n\n    #    ssl_ciphers  HIGH:!aNULL:!MD5;\n    #    ssl_prefer_server_ciphers  on;\n\n    #    location / {\n    #        root   html;\n    #        index  index.html index.htm;\n    #    }\n    #}\n\n}\n\n```\n\n\n4. 配置并启动Kafka\n```sh\n# 启动kafka内置zookeeper\nbin/zookeeper-server-start.sh config/zookeeper.properties\n# 启动kafka\nbin/kafka-server-start.sh config/server.properties\n```\n5. 配置并启动Flume\n\n```sh[nginx-kafka.conf]\n# Name the components on this agent\na1.sources = r1\na1.sinks = k1\na1.channels = c1\n\n# Describe/configure the source\na1.sources.r1.type = exec\na1.sources.r1.command = tail -F /var/log/nginx/access.log\n\n# Describe the sink\na1.sinks.k1.channel = c1\na1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink\na1.sinks.k1.kafka.topic = test\na1.sinks.k1.kafka.bootstrap.servers = localhost:9092\na1.sinks.k1.kafka.flumeBatchSize = 2\na1.sinks.k1.kafka.producer.acks = 1\na1.sinks.k1.kafka.producer.linger.ms = 1\na1.sinks.k1.kafka.producer.compression.type = snappy\n\n# Use a channel which buffers events in memory\na1.channels.c1.type = memory\na1.channels.c1.capacity = 1000\na1.channels.c1.transactionCapacity = 100\n\n# Bind the source and sink to the channel\na1.sources.r1.channels = c1\na1.sinks.k1.channel = c1\n```\n\n启动Flume\n```sh\nflume-ng agent -c conf/ -f conf/nginx-kafka.conf -n a1\n```\n\n可以启动kafka消费者来看一下`test`话题中的内容\n```sh\nbin/kafka-console-customer.sh --topic test --bootstrap-server localhost:9092\n```\n\n6. 启动Spark消费Kafka数据并进行可视化\n\n```sh\n# 启动Hadoop保存checkpoint\nbin/start-dfs.sh\n\n# 提交Spark\nspark-submit\n    --master local[*] \\\n    --jars file:///opt/pkg/spark/jars/org.apache.commons_commons-pool2-2.6.2.jar,file:///opt/pkg/spark/jars/org.apache.kafka_kafka-clients-2.4.1.jar,file:///opt/pkg/spark/jars/mysql-connector-java-8.0.21.jar,file:///opt/pkg/spark/jars/spark-sql-kafka-0-10_2.12-3.0.1.jar,file:///opt/pkg/spark/jars/spark-token-provider-kafka-0-10_2.12-3.0.1.jar \\\n    file:///opt/pkg/spark/kafka/kafka_mysql.py\n```\n\n7. 启动数据可视化前端\n```sh\npython app.py\n```\n\n8. Windows端口映射\n```sh\n# 查看端口映射\nnetsh interface portproxy show v4tov4\n\n# 添加端口映射\nnetsh interface portproxy add v4tov4 listenport=3090 listenaddress=192.168.0.106 connectaddress=192.168.186.100 connectport=80\n\n# 删除端口映射\nnetsh interface portproxy delete v4tov4 listenport=3090 listenaddress=192.168.0.106\n\n\n```\n\n\n## 其他解决方案和软件安装\n```sh\n# Build log analysis workflow\n# Copy dist/GeoLite2-City.mmdb to anywhere you like, and config it in log-analysis-workflow/src/main/resources/application.properties\n\n# Download and install redis\nwget https://download.redis.io/releases/redis-6.2.6.tar.gz\ntar xzf redis-6.2.6.tar.gz\ncd redis-6.2.6\nmake \u0026\u0026 make test \u0026\u0026 make install\n\n# Lanuch redis server and client\nnohup ./redis-server \u0026\n./redis-cli\n\n# Test redis\n127.0.0.1:6379\u003e ping\n\n# Also, you need to config redis and access.log in application.properties\n\n\n# Visualization\n# https://www.vultr.com/docs/how-to-install-goaccess-on-centos-7\nsudo goaccess /var/log/nginx/access.log --log-format=COMBINED -a -o /var/www/html/report.html\n```\n\n# Reference\n- https://github.com/fenlan/storm-nginx-log\n- https://github.com/hai-27/store-server\n- https://github.com/hai-27/vue-store\n- https://github.com/allinurl/goaccess\n- https://www.vultr.com/docs/how-to-install-goaccess-on-centos-7\n- https://github.com/TurboWay/bigdata_practicese\n- https://github.com/debatosh99/pyspark-kafka-mysql","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeiyuouo%2Fmi-store-log-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeiyuouo%2Fmi-store-log-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeiyuouo%2Fmi-store-log-analysis/lists"}