{"id":18893417,"url":"https://github.com/xtaci/log_analysis","last_synced_at":"2025-08-30T17:08:16.969Z","repository":{"id":147664151,"uuid":"74732565","full_name":"xtaci/log_analysis","owner":"xtaci","description":"Practical Log Analysis","archived":false,"fork":false,"pushed_at":"2016-12-07T15:08:58.000Z","size":845,"stargazers_count":15,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-24T05:37:14.717Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xtaci.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-11-25T06:43:30.000Z","updated_at":"2019-04-11T03:43:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"cded2e94-55a7-43e3-8e7f-d1b8ac0f4a7a","html_url":"https://github.com/xtaci/log_analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/xtaci/log_analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtaci%2Flog_analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtaci%2Flog_analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtaci%2Flog_analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtaci%2Flog_analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xtaci","download_url":"https://codeload.github.com/xtaci/log_analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtaci%2Flog_analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272878320,"owners_count":25008336,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T08:13:37.477Z","updated_at":"2025-08-30T17:08:16.963Z","avatar_url":"https://github.com/xtaci.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Practical Log Analysis\n\n## scenario\n![scenario](log.png)\n\ntested on the versions below:\n* apache-hive-2.1.0-bin.tar.gz\n* elasticsearch-5.0.1.tar.gz\n* kafka_2.11-0.10.1.0.tgz\n* kibana-5.0.1-linux-x86_64.tar.gz\n* logstash-5.0.0.tar.gz\n* mysql-connector-java-5.1.40.tar.gz\n* spark-1.6.3-bin-hadoop2-without-hive.tgz\n* hadoop-2.6.5.tar.gz\n\n## hadoop\n* http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html  -- 单节点hdfs部署\n* https://github.com/chrislusf/gleam -- Fast, efficient, and scalable distributed map/reduce system written in Go and LuaJIT\n\n## kafka\n* https://kafka.apache.org/documentation   --kafka官方文档\n* https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part1  -- es和kafka的最佳实践\n* https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part2\n* https://github.com/travisjeffery/jocko   --golang的kafka复刻\n* https://github.com/oldratlee/translations/blob/master/log-what-every-software-engineer-should-know-about-real-time-datas-unifying/README.md --经典\n* https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations  kafka论文和ppt\n* https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/\n* https://www.youtube.com/watch?v=77huw-31oZg\n* https://www.youtube.com/watch?v=k_Y5ieFHGbs\n* https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines\n\n## logstash\n* https://www.elastic.co/guide/en/logstash/current/index.html -- Centralize, Transform \u0026 Stash Your Data\n* https://github.com/influxdata/telegraf -- The plugin-driven server agent for collecting \u0026 reporting metrics.\n* https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html -- logstash部署\n\n## hive\n* https://cwiki.apache.org/confluence/display/Hive/GettingStarted -- hive配置\n* https://cwiki.apache.org/confluence/display/Hive/LanguageManual --hive的SQL手册\n* https://github.com/xtaci/json2hive -- 通过json构造hive schema\n\n## metastore\n* https://hub.docker.com/_/mysql/  -- 可以给metastore用的mysql镜像\n* https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf   -- metastore结构\n* https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin -- metastore配置\n* https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool -- schema创建\n\n## spark\n* https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started -- hive和spark集成\n* http://spark.apache.org/docs/latest/spark-standalone.html -- spark配置\n* http://mangocool.com/1467770109867.html -- hive on spark的版本问题\n* http://www.csdn.net/article/2015-04-24/2824545 -- Intel李锐：Hive on Spark解析\n\n## elasticsearch\n* https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html --es和hive的集成\n* https://www.elastic.co/blog/found-sizing-elasticsearch -- es索引规划，容量规划\n* https://www.elastic.co/blog/performance-indexing-2-0 -- es索引\n* https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up --es内部原理\n* https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html -- index模板\n* https://www.elastic.co/blog/found-elasticsearch-in-production --es生产部署\n* https://www.smashingmagazine.com/2012/05/stop-redesigning-start-tuning-your-site/\n* https://www.elastic.co/blog/customizing-your-document-routing -- es读取优化\n* https://www.elastic.co/videos/big-data-search-and-analytics\n* https://www.elastic.co/blog/disk-based-field-data-a-k-a-doc-values\n* https://aphyr.com/posts/288-the-network-is-reliable\n* https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions\n* https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html --mapping重建\n* http://www.cnblogs.com/Creator/p/3722408.html --mapping重建\n* http://wzktravel.github.io/2016/05/11/elasticsearch-reindex/  --mapping重建\n\n## s3\n* https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html  --es数据备份\n* https://www.elastic.co/guide/en/elasticsearch/plugins/5.0/repository-s3.html --es备份到s3的插件\n* https://github.com/minio/minio --s3兼容存储\n\n## mongodb:\n* https://github.com/mongodb/mongo-hadoop \n* https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage -- hive和mongodb的集成\n* https://docs.mongodb.com/manual/tutorial/deploy-replica-set/ -- mongodb复制集部署\n* https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-1-introduction-setup -- mongodb和spark/hive集成\n* https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-2-hive-example\n* https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-3-spark-example-key-takeaways\n\n## application library\n* https://github.com/gliderlabs/logspout -- 采集docker容器的标准输出\n* https://github.com/Sirupsen/logrus -- 结构化日志输出\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxtaci%2Flog_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxtaci%2Flog_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxtaci%2Flog_analysis/lists"}