Top 10 popular Open Source Projects in R Language
Dplyr is mainly used for data manipulation in R. Dplyr is actually built around these 5 functions. These functions make up the majority of the data manipulation you tend to do. You can work with local data frames as well as with remote database tables.
Ggplot2 is the one of the best library for data visualization in R. The ggplot2 library implements a “grammar of graphics” (Wilkinson, 2005). This approach gives us a coherent way to produce visualizations by expressing relationships between the attributes of data and their graphical representation. Ggplot2 has wide range of functions.
Esquisse package has brought the most important feature of Tableau to R. Just drag and drop, and get your visualization done in minutes. This is actually an enhancement to ggplot2. This addin allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar graphs, curves, scatter plots, histograms, then export the graph or retrieve the code generating the graph. It’s awesome, isn’t it?
When you get into Data Science, you deal with different kinds of data. You may not know what sort of data you gotta deal with in future. If you are in health industry then trust me, you’ll find this very useful. I consider this library to be highly useful when you are working on genomic data. Bioconductor is an open source project that hosts a wide range of tools for analyzing biological data with R. To install Bioconductor Packages, you need to install BioConductor.
The Lubridate serves its purpose really well. It’s mainly used for data wrangling. It makes the dealing of date-time easier in R. You can do everything you ever wanted to do with date arithmetic using this library, although understanding & using available functionality can be somewhat complex here. When you are analyzing time series data and want to aggregate the data by month then you can use floor_date from lubridate package, it gets your work done quite easily. It has wide range of functions.
Knitr is used for dynamic report generation in R. The purpose of knitr is to allow reproducible research in R through the means of Literate Programming. This package also enables integration of R code into LaTeX, Markdown, LyX, HTML, AsciiDoc, and reStructuredText documents. You can add R to a markdown document and easily generate reports in HTML, Word and other formats. A must-have if you’re interested in reproducible research and automating the journey from data analysis to report creation.
Mlr is absolutely incredible in performing machine learning tasks. It almost has all the important and useful algorithms for performing machine learning tasks. It can also be termed as the extensible framework for classification, regression, clustering, multi-classification and survival analysis. It also has filter and wrapper methods for feature selection. Another thing is most operations performed here can be parallelized.
Quanteda.dictionaries extends the capabilities of quanteda package. It consists of dictionaries for text analysis. It’s mainly designed to work with quanteda but can also work well with other text analysis libraries like tm, tidytext and udpipe. With the liwcalike() function from the quanteda.dictionaries package, you can easily analyze text corpora using exising or custom dictionaries. You can install this package from their github page.
RCrawler is a contributed R package for domain-based web crawling and content scraping. It adds the functionality of crawling that Rvest package lacks. RCrawler can crawl, parse, store pages, extract contents, and produce data that can be directly employed for web content mining applications. The process of a crawling operation is performed by several concurrent processes or nodes in parallel, so it’s recommended to use 64bit version of R.