Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pravj/ospi
Open Source Presence Infographic of Indian Startups
https://github.com/pravj/ospi
data-analysis data-visualization india open-source startup
Last synced: 2 months ago
JSON representation
Open Source Presence Infographic of Indian Startups
- Host: GitHub
- URL: https://github.com/pravj/ospi
- Owner: pravj
- Created: 2014-12-23T11:14:07.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-01-19T12:27:41.000Z (almost 10 years ago)
- Last Synced: 2024-04-14T19:59:23.359Z (8 months ago)
- Topics: data-analysis, data-visualization, india, open-source, startup
- Language: Python
- Homepage:
- Size: 770 KB
- Stars: 24
- Watchers: 3
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
ospi
====> Open Source Presence Infographic of Indian Startups
* Source Code supplement for my post : [Open Source Presence Infographic of Indian Startups](http://pravj.github.io/blog/open-source-presence-infographic)
* [How to use it](https://github.com/pravj/ospi/blob/master/Manual.md)
* [Terms and Conditions](https://github.com/pravj/ospi/blob/master/Terms-and-conditions.md)---
Report
======Technically speaking, organizations used in this report are no more only a *startup* now, but I hope you people won't mind this and aren't gonna launch a drone on me.
##Abstract
I think, *something* is clear from the name itself, is it? Well! it should.
This report tries to plot all the involved organizations on the Open-Source portal. It tries to tell that, in the race to achieve their goals, what different organizations are doing there, for/in the community.
It's pretty biased though, because this report uses only one platform of the Open-Source community, GitHub.
*This report doesn't measure the success of involved organizations; it simply can't. They all are doing good in their fields, that's why they are here.*
##Motive
I think it was almost mid of the December last year when I saw the [interview](http://yourstory.com/2014/12/techie-tuesdays-amod-malviya-cto-flipkart/) of Flipkart's CTO *Amod Malviya* in a YourStory article. I started reading that and kept reading till the end. At the end my reaction was, *wow! this man is awesome* and he is indeed. I have seen many of his talks after reading that interview.
That interview made a different impression on me. I liked his words where he was talking about building a top class internet infrastructure in India. I don't know what you people think of Flipkart, Myntra etc. but what I think is that they are evolving continuously, at least in the technical aspect. That's why they are in the marathon and Amazon itself is in the race with them.
So, after a while I found myself on the GitHub organization of Flipkart and I was scrolling through their projects there. Then the idea of this report popped-up in my mind and here I'm, struggling with it.
##For What Joy? Is there a need?
The earth will keep rotating without this report but it's kinda necessary for technical organizations to be a part of current Open-Source era. I mean as they say in the *Group Dynamics*, If you're part of a group then you learn for other members and they learn from you.Do you remember something named Facebook? Lets take an example from them.
Maybe that you take PHP as *a language for the kids* but keep in mind that *The Social Network* was initially developed in that same PHP. But as they started growing and feeling glitch using it; seeing that the was not coming to help them, they attempted building something on their own. Finally today, we know the inventions as [HHVM](http://hhvm.com/) and [Hack language](http://hacklang.org/).
So, the thing is *don't wait for santa and build cool things that matters*. Big organizations are already doing it, be it [hhvm](https://github.com/facebook/hhvm), [react](https://github.com/facebook/react) by Facebook or [typeahead.js](https://github.com/twitter/typeahead.js) by Twitter or [web-starter-kit](https://github.com/google/web-starter-kit) by Google and many more by others.
##Involved Organizations
I do believe that the organization selection part was a bit biased as I wanted to have my favorite organizations first on the list, like HackerEarth, Hasgeek, Housing, Flipkart, Wingify and Zomato etc.
It was disappointing to see that Housing was not on the GitHub by that time and Zomato's organization was having zero public activities.
Finally, I selected 15 startups, giving priority to my favorite ones.
* [Cucumbertown](http://www.cucumbertown.com/) - Follow great cooks, showcase your cooking, build a following
* [Exotel](http://exotel.in/) - Reliable Cloud Telephony System for your business
* [Flipkart](http://www.flipkart.com/) - Online Shopping India
* [Freshdesk](http://freshdesk.com/) - Online customer support software and helpdesk solution
* [HackerEarth](http://www.hackerearth.com/) - Programming challenges and Developer jobs
* [HasGeek](https://hasgeek.com/) - HasGeek organises events for geeks
* [Instamojo](https://www.instamojo.com/v2/) - Easiest Way to Collect Payments Online
* [Myntra](http://www.myntra.com/) - Online Shopping India
* [MySmartPrice](http://www.mysmartprice.com/) - Compare the best prices from online retailers
* [Practo](https://www.practo.com/) - Find Best Doctors and Book Appointments Online
* [ShepHertz](http://www.shephertz.com/) - Complete Cloud Ecosystem for App/Game Developers
* [Urban Ladder](https://www.urbanladder.com/) - Furniture Online Shopping Store
* [WebEngage](http://webengage.com/) - On-Site Customer Engagement Suite
* [Wingify](https://wingify.com/) - Website Optimization tools that simply work
* [Zomato](https://www.zomato.com/) - Discover great places to eat around youThere is a section here in this report, which uses last year's GitHub activity of organizations, so I killed my idea of replacing Zomato by someone else as the year was gone and it was kinda tough to jump traditional API [bumper](https://developer.github.com/v3/repos/statistics/#participation) and collect data.
As I said Zomato have zero public activity last year but it doesn't mean they are not good, they are doing pretty good; aquiring it all, at a rate of hurricane wind speed and serving in cities more than you've ever been in your life. Maybe they are using some other platform, a local Git hosting or something.
---
*You better zoom-in the images or open them in a different tab.*
---
##1. Appearance Timeline of Organizations
Do you know, when all of these organizations were found? Not sure?
![Apperance-Timeline](https://raw.githubusercontent.com/pravj/ospi/master/images/appearance-timeline.png?token=ADRywhE4xlNONt10EmIA36gAgeKkrVMkks5Uw7DbwA%3D%3D)
This plot shows *relative appearance* of selected organizations both in the public world as well as in the open-source world.
> Add legend text in the image.
* I didn't know that Myntra was founded a bit earlier than Flipkart, who [aquired](http://timesofindia.indiatimes.com/tech/tech-news/Flipkart-acquires-Myntra/articleshow/35472797.cms) the older player recently.
* Myntra and Flipkart came in existance before the GitHub itself.
* We can see a large gap between apperance on these two portals for Flipkart, Myntra and Zomato, Myntra being the slowest one to join.
* Some organizations like Instamojo, HackerEarth and HasGeek felt the need of time and took no significant time in this.Well! in case if you're thinking that this information is all chatter, let me present something interesting.
Go back and see the image carefully and you'll notice something different from others for Cucumbertown and HasGeek.
Yes! the GitHub organizations for these two were created before their public launching itself. *Sounds interesting, right?*
I can't say for Cucumbertown now but I can present a supporting theory to prove this for the HasGeek.
Do you guys remeber what was the first event that HasGeek organised? It was [DocType HTML5](http://wiki.hasgeek.in/DocType_HTML5), you silly. The event was held on October, 2010 and HasGeek was pubilcally launched in December, 2010. You can fly to their GitHub account and check that they are developing [hasgeek/doctypehtml5](https://github.com/hasgeek/doctypehtml5) since then.
Maybe organising this event was the inspiration behind launching the HasGeek, I need to hear HasGeek founder [Kiran](https://twitter.com/jackerhack)'s words on it, though.
##2. Repository Status
As we all know, repository is an important component of GitHub's ecosystem.###2.1. Public Repository Status
This section deals with no. of public repositories for each involved organization.
![Public-Repository-Count](https://raw.githubusercontent.com/pravj/ospi/master/images/repository-status.png?token=ADRywg5tr-hivEgluPp7-94WVCvKak-Bks5Uw7ERwA%3D%3D)Cloud services provider ShepHertz has maximum no. of public repositories there, mainly based on their *App42* service stack. Flipkart and HasGeek also have significant no. of repositories, rest are the organizations are building their store gradually.
No. of repositories on GitHub is not the right thing to measure about, though.
###2.2. Stars Distribution
As I said, having more number of repositories doesn't explicitly show your popularity. It's not an old wars between states where king with more elephants was supposed to be the winner.
But no. of stars on any GitHub repository can represent its vogue, *leave the case where they're fake.*
![Stars-Distribution](https://raw.githubusercontent.com/pravj/ospi/master/images/stars-distribution.png?token=ADRywjvt0gT0uoecBrlQfPB9kfua8oXMks5Uw7EhwA%3D%3D)This graph represents the stars distribution on all the repositories of involved organizations.
> Top 10 repositories according to no. of stars
* [wingify/please.js](https://github.com/wingify/please.js) · 211 ☆
* [flipkart/HostDB](https://github.com/flipkart/hostdb) 190 · ☆
* [myntra/MYNStickyFlowLayout](https://github.com/myntra/MYNStickyFlowLayout) · 133 ☆
* [wingify/dom-comparator](https://github.com/wingify/dom-comparator) · 129 ☆
* [hasgeek/lastuser](https://github.com/hasgeek/lastuser) · 113 ☆
* [flipkart/phantom](https://github.com/flipkart/phantom) · 71 ☆
* [hasgeek/hasjob](https://github.com/hasgeek/hasjob) · 70 ☆
* [wingify/agentredrabbit](https://github.com/wingify/agentredrabbit) · 49 ☆
* [wingify/lua-resty-rabbitmqstomp](https://github.com/wingify/lua-resty-rabbitmqstomp) · 45 ☆
* [hackerearth/hackerearth.vim](https://github.com/hackerearth/hackerearth.vim) · 28 ☆You can see Wingify, Flipkart and HasGeek are ruling the leader-board here.
###2.3. Relative Repository Attributes
GitHub provides a feature named *fork*, using that you can contribute to awesome projects of others like it was your own project.
This section deals with attributes of repositories, counting which one of them is a *forked* repository or which one is a *source* repository.
![Repository-Attributes](https://raw.githubusercontent.com/pravj/ospi/master/images/repository-attributes.png?token=ADRywnhxZCKJ7p8JVmi1oXapN-t3yYrTks5Uw7E1wA%3D%3D)This plot shows which organization have all their own *source* repositories and which one is having *forked* repositories.
During the development, I also calculated *active* and *inactive* percentage of the *forked* repositories. You can have a look [here](https://github.com/pravj/ospi/blob/master/Terms-and-conditions.md#section-7) at *how this was calculated*.
We can see that HasGeek is doing fairly good here, having more share of *source* repositories than *forked*. A large portion of Flipkart and Freshdesk's repositories are *inactive-forked*.
##3. Development Activity
All the involved organizations have somewhat for the community; projects born as solutions of some problems, projects born in some hackathons and so on. They're gradually building things to enhance their infrastucture and market position.
###3.1. Repository Creation
This section deals with creation of repositories of all the organizations.
![Repository-Creation](https://raw.githubusercontent.com/pravj/ospi/master/images/repository-creation.png?token=ADRywjpieLD8QNLf7rF3iHc-sJhah5j_ks5Uw7FSwA%3D%3D)* You can see that Urban Ladder, HasGeek and Exotel created their first repository almost at the same time of their GitHub organization creation.
* ShepHertz, HasGeek and Flipkart have kinda continuous repository creation events through out the timeline.Again, if you think that it's *general knowledge*, then let me show you the *magic*.
Go back and watch the image carefully and you'll notice something weird for HackerEarth, are you?
Yes! you see there, HackerEarth's first repository was created before creation of their GitHub organization itself. *How is this even possible?*
Well! ladies and gentlemen, this is possible. Let me introduce a new theory in support of this.
HackerEarth's oldest repository in the time series is [django-storages](https://github.com/hackerearth/django-storages). It's the same repository, which is creating the confusion. But the fact is that this repository was initially *forked* by HackerEarth's Co-founder *Vivek* on his GitHub [account](https://github.com/vivekp). After the creation of a separate organization for HackerEarth, he merged that repository to the organization.
That's why this repository's creation date is before creation of their organization. Well! again, I need [Vivek](https://github.com/vivekp)'s approval on this.
###3.2. Commit Activity
This section deals with the commit activity of all the organizations.
![Commit-Activity](https://raw.githubusercontent.com/pravj/ospi/master/images/commit-activity-zoom.png?token=ADRywqH04ZqDrpchKMH8qaRylP1bpuPVks5Uw7FywA%3D%3D)This plot shows weekly commit activity of all the organizations. This is pretty much mixed-up though, but this was the only plot-type in my mind at the time, when I was developing this.
You can see a relatively more development activity in the start of the year.
Flipkart development team [keeps](https://github.com/Flipkart/linux) a *fork* of the *linux*, it's not a *forked repo* though. I removed its activities because this was making the plot even more cluttered. You can check [that plot](https://raw.githubusercontent.com/pravj/ospi/master/images/commit-activity.png?token=ADRywlIX6usOu11z34wv-NZrVt_lmL00ks5Uw7GBwA%3D%3D) also, though.
##4. Technology Stack
Different organization are working in different fields of the technology; be it medical services, developer events, online shopping, food, cloud services, online payments etc., so they're encountering different problems in the path and managing it accordingly.
###4.1. Programming Languages in Production
This section deals with use of different programming languages in the involved organizations's infrastructure.
![Language-Use](https://raw.githubusercontent.com/pravj/ospi/master/images/language-uses.png?token=ADRywp4Fkhgv0CSCIBoMOPiC0yBfWc-qks5Uw7GWwA%3D%3D)This plot uses colors from GitHub's [linguist](https://github.com/linguist) for different programming languages.
This helps us understanding tech-stack of all the organizations.
* Flipkart uses Java, HasGeek uses Python, Practo uses PHP and Freshdesk uses Ruby as their major programming language.
* Organizations have started using *non-traditional* languages like Lua, Erlang and Scala etc.
* ShepHertz uses maximum no. of programming languages(14), in their quest to serve all *in-demand* programming language in their service.###4.2. Field of work
This section deals with the fields, different organization are working in.
To calculate the results, I have used *repository names* and *their description* here.
Actually I wanted to have relative sharing in fields of working of all the organizations.So, initially, my plan was to use [Latent Dirichlet Allocation](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) on the *repository-description-text* corpus for *Topic Modeling*.
Where I had use *concatenated repository descriptions* of organizations as a *document* but then I droped this idea because of asymmetrical repository distribution. It was resulting in a corpus of 14 documents only (Zomato excluded).
You can have a brief knowledge about LDA, [here](http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/).
Then I changed the plan and moved towards *Naive Bayes Classifier* and used word frequencies only.
> You can check the [classifier](https://github.com/pravj/ospi/tree/master/classifier) topic results based on probability or frequency, [here](https://github.com/pravj/ospi/blob/master/results/).
So, some of the topic results from Classifier for organizations are :
* Cucumbertown : *Django, Gearman, Email, Commit, Notifier*
* Exotel : *Audio, IVR, Music, SMS*
* Flipkart : *REST API's, MySQL, lucence, Redis, HTTP proxy, load balancer*
* Freshdesk : *Databases, Rails, API, Websockets, Socket.io, YUI, Resque*
* HasGeek : *Workshop, Lastuser, App management, TV, Job, GitHub*
* HackerEarth : *Django, API documentation and clients, extensions and editors*
* Instamojo : *API clients, Wordpress, Frameworks, Huxley*
* Myntra : *iOS, Cocoa, Android, ElasticSearch, Docker, Librato*
* MySmartPrice : *Technology Blog, Gearman workers, Cookbook*
* Practo : *OpenID, Flask, Sentry, Symfony, Raven, Mail clients, Messages*
* ShepHertz : *App42, PaaS, SDK, API clients, MongoDB, MySQL, Redis*
* WebEngage : *Message, API, Website, Speech*
* Wingify : *Angular.js, DOM, RabbitMQ, iOS, Data, Bootstrap, VWO*Here we can see that Flipkart's stack includes things related to distributed computing, Networking, Databases on the other hand Wingify's stack includes things related to Frontend, Data, Networking.
---
So, this is it. Open Source Presence Infographic of Indian Startups. .
If you're thinking that *santa* helped me in all this; then you are wrong, my friend.
I was all alone everytime, thinking about it, collecting the data, managing R source files in Rstudio, writing Python for it and all that.If you're feeling that you can do *something* much more awesome than this.
> You can do whatever you want; It's hosted on GitHub, [pravj/ospi](https://github.com/pravj/ospi).