I feel like a lot of attempts to recreate ggplot2 end up being superficial because they don't recognize / duplicate the power of the underlying Grid graphics that ggplot2 uses.
I know that web technologies are all the rage these days, but at least for static, publication-ready graphics, Grid is really nice substrate, with well thought out lower-level abstractions.
The ports I see feel like a < 100% enumeration of the plots in the mainline ggplot2 package. However, there are heaps of great extensions to ggplot2 that I suspect are in part due to there being a carefully thought out set of abstractions at the low level of Grid that mesh nicely with the high level abstractions of ggplot.
ggplot2 being built on top of Grid means that modestly complex stuff is easy (in ggplot2 by itself), but that it's relatively easy to drop down into the lower layer (grid) to do more.
Surprise to see this at the top, I am the creator* of plotnine. The most common question seems to be, what to expect of plotnine? The answer; a high quality implementation of a grammar of graphics with an API that closely matches ggplot2, and more.
I also want other packages to be able to build off of plotnine, e.g. a package with the functionality of Seaborn could be built off of plotnine. The only constraint should be whether the backend -- in this case Matplotlib -- does stand in the way. Matplotlib is evolving (though slowly) and has a very receptive community so there is lots of hope.
I watched your refactor of yhat's py ggplot branch, and was disappointed when glamp dropped in a totally new implementation out of the blue. Thanks for all your hard work--glad it is it's own package now :).
Well, I think it was different priorities. My main objective for contributing was to have a full on grammar of graphics package in python. I appreciate those warm feelings from afar.
Seaborn is not lacking in any way, it has a goal and it accomplishes it. However, I think Seaborn would have been easier to create if it had been based on a grammar based package, a few caveats not withstanding.
Recreating and keeping up with Hadley's hard work is challenging, particularly because ggplot2's layout and extensions are really nice and continue to evolve.
As an alternative that preserves the full power of Wickham's implementation, pygg[1] is a Python wrapper that provides R's ggplot2 syntax in Python and runs everything in R.
I too am interested in what the differences are. I have used the yhathq ggplot library for a while and it is quite useful but I sometimes find it lacking certain types of plots and documentation.
Also the last commit to the yhathq ggplot library was on Nov 20, 2016, so this library looks like it is currently more active in development.
This library started as a refactor of that project, after it laid broken and unmaintained for a long time. Having followed the yhat ggplot for a long time, I've lost faith that it will be actively maintained.
These comparisons are pointless. Would you like to compare loc as well? How about man hours spent?
The only comparison that is important is how well the two projects work. I have no idea how well plotnine works yet (but I intend to find out). I do know that ggplot works OK - and seeing as it leverages matplotlib if there is anything that isn't implemented I can finish the plot off manually.
EDIT it seems that plotnine also leverages matplotlib and produces nicer plots for some common cases :).
It's not pointless at all. For long-term maintenance, the community strength and level of active development is just as important (and sometimes more important) than minor feature differences.
I would often rather use a decade-old project that was developed solo by a world-class expert dumping code over the wall once every 6 months than a community project being hacked on by 100 amateurs.
Without additional context I find recency of last commit and number of committers to be almost impossible to draw useful conclusions from.
I wish this both-lazy-and-condescending missing-the-point hand-wavy-analogy argument style would die already.
The earlier poster in this thread implied that number of contributors and recency of commits in one of two competing github projects was evidence that it was better.
My point is that these are inadequate (often totally misleading) heuristics unless both projects are otherwise extremely similar, which they usually are not, and even then are usually not very useful heuristics compared to other ways of comparing the projects.
Unless you know who the authors are, what the project management/organization style is, how the project is funded / what level of commitment the authors have, what the project release cycle is like, etc., or unless you directly examine the code yourself, the only thing that looking at the most recent git commit tells you is how recently someone published public code changes. Which is not something that anyone evaluating two projects cares about directly, but only as some heuristic signal of other features that might be more costly to examine.
But note that commit recency doesn’t give a remotely useful sense of how extensible the project is, how readable or efficient the code is, how well designed the API is, how good the documentation is, how friendly the community is, how competent the project management is, .....
If we want to make a car analogy, it’s like choosing which car to buy based on how frequently the company introduces new models, or how many engineers they employ, rather than based on customer reviews, reliability estimates, accessibility of mechanics, gas mileage, top speed, or storage capacity.
Your argument is basically analogous to: “because the average car with frequent updates is better than the average car with infrequent model updates, criticizing that as a primary criterion for choosing a car is an invalid argument”. Notice that you haven’t even bothered to examine whether your premise about the relation between updates and quality is true, or whether that average relationship makes update frequency a practically useful heuristic or not.
You are arguing a red herring. I never posited that update frequency is the only useful metric, or that you shouldn't consider how well the library itself works. You should consider all the aspects that are relevant to your use cases...and often this should include the community strength, along with the intrinsic library design, funding, documentation, etc. Certainly you shouldn't stop at the library design and code itself, that is just one of many considerations to weigh when adopting a dependency.
In the absence of any other information, the more recently updated codebase is preferred to the least recently updated, for the same reason that an abandoned codebase is dispreferable.
Alternative equally speculative conclusion from the same data: the very-in-flux code is so shoddy that there are constant security bugs needing weekly fixes, whereas the stable and relatively inactive code is so rock solid that nobody ever needs to touch it for it to keep working.
For example, how often does DJB publish new code changes to his various projects?
You are right, but we don't the proficiency of the developers. The best info we have is the repository update info. That's what you have to quickly compare the projects.
I'm no expert, but I think that one of the main ideas is to separate the elements of making a plot from the way that the data is presented. For example, in ggplot2, you have the data that will go into the graph, the type of plot (or "geometry") that defines how the data are presented (scatterplot, bar plot, etc.), and then various "layers" that can be added that affect style.
In order to split a plot into subplots, you simply define how it is to be faceted (what column should be used to define groups). Grammar-of-graphics moves plotting away from the "turtle graphics" model and lets you specify what should be done. Then ggplot figures out how to do it, kind of like SQL vs. writing for loops to retrieve information.
I find it the opposite. The syntax is the most intuitive of all plotting applications.
Layers are as follows [1]
1. Data
2. Aesthetic mappings
3. Statistical transformation (stat)
4. Geometric object (geom)
5. Position adjustment
Once you get a hang of this, it becomes easy to create new plots purely from the understanding of the layers. In matplotlib or even in Seaborn, I find myself constantly Googling for examples.
ggplot2 is the most beautiful thing to happen in visualization space!
[1] Wickham, Hadley, and Carson Sievert. "4.4.1 Layers." Ggplot2: Elegant Graphics for Data Analysis. Dordrecht: Springer, 2016. N. pag. Print.
There are a few concepts to learn. With a grammar, you can create plots in 5 minutes that would take an hour to create using "the standard syntax". For many people once they experience it, they do not want to go back. You could be one of them.
I know that web technologies are all the rage these days, but at least for static, publication-ready graphics, Grid is really nice substrate, with well thought out lower-level abstractions.
EDIT: I should also add that it's documented within an inch of its life should anyone feel that it's worth recreating: https://stat.ethz.ch/R-manual/R-devel/library/grid/html/grid...