Mongodb is a scalable, high-performance, document-oriented NoSQL database. Since R is one of my main compute environments and languages, I have been playing with the rmongodb package to access mongodb from R.
The rmongodb package provides an interface from R to mongodb and back. Installing rmongodb was simple:
System Message: ERROR/3 (<string>, line 7)
Unknown directive type "code-block".
.. code-block:: r
> install.packages('rmongodb')
I am assuming below that one has a mongodb server up and running on the local machine. On Mac OS, I used homebrew to get the software and followed the instructions on how to run it. Mongodb, by default, runs and accepts connections from localhost only, and has no authentication. For the purposes of this little article, I have left these defaults in place and have not turned on authentication.
First, I load the library and connect to the database.
System Message: ERROR/3 (<string>, line 17)
Unknown directive type "code-block".
.. code-block:: r
> library(rmongodb)
> m = mongo.create()
> print(m)
[1] 0
attr(,"mongo")
<pointer: 0x100675860>
attr(,"class")
[1] "mongo"
attr(,"host")
[1] "127.0.0.1"
attr(,"name")
[1] ""
attr(,"username")
[1] ""
attr(,"password")
[1] ""
attr(,"db")
[1] "admin"
attr(,"timeout")
[1] 0
I have (using python) loaded a data dump from the COSMIC database. To get and view a single record:
System Message: ERROR/3 (<string>, line 43)
Unknown directive type "code-block".
.. code-block:: r
> res = mongo.find.one(m,'bio.CosmicMutation')
> res
_id : 2 35540
start : 16 56493965
end : 16 56493965
samples : 4
0 : 2 TCGA-02-0083
hgncid : 2 3431
cdsmutation : 2 c.3137G>A
gene : 2 ERBB3
aamutation : 2 p.S1046N
chromosome : 2 chr12
The result is an object of type mongo.bson. Conversion to an R list is easy:
System Message: ERROR/3 (<string>, line 62)
Unknown directive type "code-block".
.. code-block:: r
> resl = mongo.bson.to.list(res)
> resl
$`_id`
[1] "35540"
$start
[1] 56493965
$end
[1] 56493965
$samples
[1] "TCGA-02-0083"
$hgncid
[1] "3431"
$cdsmutation
[1] "c.3137G>A"
$gene
[1] "ERBB3"
$aamutation
[1] "p.S1046N"
$chromosome
[1] "chr12"
Most of the time, I am not interested in a single record returned but want to do a full query. To do so, we can choose to build a query object.
System Message: ERROR/3 (<string>, line 96)
Unknown directive type "code-block".
.. code-block:: r
> bson = mongo.bson.buffer.create()
> mongo.bson.buffer.append(bson,'gene','TP53')
[1] TRUE
> query = mongo.bson.from.buffer(bson)
> query
gene : 2 TP53
Executing the query is very similar in principle to SQL, though the syntax is quite different. The result of the query is a cursor object. To get the actual records, one needs to iterate over them; I choose below to convert to a list, but that is optional. Printing the bson objects results in meaningful output as well.
System Message: ERROR/3 (<string>, line 108)
Unknown directive type "code-block".
.. code-block:: r
> cursor = mongo.find(m,'bio.CosmicMutation',query,limit=12,skip=10)
> while (mongo.cursor.next(cursor)) {
+ l = mongo.bson.to.list(mongo.cursor.value(cursor))
+ print(l[['cdsmutation']])
+ }
[1] "c.537T>C"
[1] "c.136_147del12"
[1] "c.646_660del15"
[1] "c.392A>T"
[1] "c.618G>C"
[1] "c.498_499insA"
[1] "c.590T>A"
[1] "c.376-4delACAGTACTCCCCT"
[1] "c.97-1G>C"
[1] "c.673-2A>C"
[1] "c.767C>A"
[1] "c.817delC"
Another form for queries can also be used. If a list instead of a mongo.bson object is supplied, it is converted as necessary to the bson notation for mongodb. A simple count query gives the idea.
System Message: ERROR/3 (<string>, line 131)
Unknown directive type "code-block".
.. code-block:: r
> res = mongo.count(m,'bio.CosmicMutation',list(gene='TP53'))
> res
[1] 3055
There is quite a bit more that one can do with mongodb from R. In particular, I have not done any data inserting or used the server-side javascript functionality. The docs are pretty extensive, so see help('mongo') after loading the library for details.
System Message: ERROR/3 (<string>, line 140)
Unknown directive type "code-block".
.. code-block:: r
> mongo.destroy(m)
NULL
System Message: ERROR/3 (<string>, line 146)
Unknown directive type "code-block".
.. code-block:: r
> sessionInfo()
R Under development (unstable) (2012-01-19 r58141)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ascii_2.1 rmongodb_1.0.2
loaded via a namespace (and not attached):
[1] tools_2.15.0