rOpenSci | Blog

All posts (Page 73 of 99)

All the fake data that’s fit to print

charlatan makes fake data. Excited to annonunce a new package called charlatan. While perusing packages from other programming languages, I saw a neat Python library called faker. charlatan is inspired from and ports many things from Python’s https://github.com/joke2k/faker library. In turn, faker was inspired from PHP’s faker, Perl’s Faker, and Ruby’s faker. It appears that the PHP library was the original - nice work PHP. 🔗 Use cases What could you do with this package?...

Tackling the Research Compendium at runconf17

Two years ago at #runconf15, there was a great discussion about best practices for organizing R-based analysis projects that yielded a nice guidance document describing research compendia. Compendia, as we described them, were minimal products of reproducible research, using parts of R package structure to organize the inputs, analyses, and outputs of research projects. Since then, we’ve seen more examples and models of research compendia emerge (the organization of such projects is something of an obsession for some of the community)....

New rOpenSci Packages for Text Processing in R

Textual data and natural language processing are still a niche domain within the R ecosytstem. The NLP task view gives an overview of existing work however a lot of basic infrastructure is still missing. At the rOpenSci text workshop in April we discussed many ideas for improving text processing in R which revealed several core areas that need improvement: Reading: better tools for extracing text and metadata from documents in various formats (doc, rtf, pdf, etc)....

Unconf projects 5: mwparser, Gargle, arresteddev

And finally, we end our series of unconf project summaries (day 1, day 2, day 3, day 4). 🔗 mwparser Summary: Wikimarkup is the language used on Wikipedia and similar projects, and as such contains a lot of valuable data both for scientists studying collaborative systems and people studying things documented on or in Wikipedia. mwparser parses wikimarkup, allowing a user to filter down to specific types of tags such as links or templates, and then extract components of those tags....

Unconf projects 4: cityquant, notary, packagemetrics, pegax

Continuing our series of blog posts (day 1, day 2, day 3) this week about unconf 17. 🔗 cityquant Summary: The goal with the cityquant project was to build a digital dashboard for sustainable cities. They also had a “spin-off” project called selfquant to get data from a quantified self google sheets template to keep track of weekly performance in various categories. Team: Reka Solymosi, Ben Best, Chelsea Ursaner, Tim Phan, Jasmine Dumas...

Working together to push science forward

Happy rOpenSci users can be found at