Accessing mongodb from R

Share | |

Mongodb is a scalable, high-performance, document-oriented NoSQL database. Since R is one of my main compute environments and languages, I have been playing with the rmongodb package to access mongodb from R.

The rmongodb package provides an interface from R to mongodb and back. Installing rmongodb was simple:

System Message: ERROR/3 (<string>, line 7)

Unknown directive type "code-block".

.. code-block:: r

  > install.packages('rmongodb')



I am assuming below that one has a mongodb server up and running on the local machine. On Mac OS, I used homebrew to get the software and followed the instructions on how to run it. Mongodb, by default, runs and accepts connections from localhost only, and has no authentication. For the purposes of this little article, I have left these defaults in place and have not turned on authentication.

First, I load the library and connect to the database.

System Message: ERROR/3 (<string>, line 17)

Unknown directive type "code-block".

.. code-block:: r

  > library(rmongodb)
  > m = mongo.create()
  > print(m)
  [1] 0
  attr(,"mongo")
  <pointer: 0x100675860>
  attr(,"class")
  [1] "mongo"
  attr(,"host")
  [1] "127.0.0.1"
  attr(,"name")
  [1] ""
  attr(,"username")
  [1] ""
  attr(,"password")
  [1] ""
  attr(,"db")
  [1] "admin"
  attr(,"timeout")
  [1] 0


I have (using python) loaded a data dump from the COSMIC database. To get and view a single record:

System Message: ERROR/3 (<string>, line 43)

Unknown directive type "code-block".

.. code-block:: r

  > res = mongo.find.one(m,'bio.CosmicMutation')
  > res
        _id : 2          35540
        start : 16       56493965
        end : 16         56493965
        samples : 4
                0 : 2    TCGA-02-0083

        hgncid : 2       3431
        cdsmutation : 2          c.3137G>A
        gene : 2         ERBB3
        aamutation : 2   p.S1046N
        chromosome : 2   chr12


The result is an object of type mongo.bson. Conversion to an R list is easy:

System Message: ERROR/3 (<string>, line 62)

Unknown directive type "code-block".

.. code-block:: r

  > resl = mongo.bson.to.list(res)
  > resl
  $`_id`
  [1] "35540"

  $start
  [1] 56493965

  $end
  [1] 56493965

  $samples
  [1] "TCGA-02-0083"

  $hgncid
  [1] "3431"

  $cdsmutation
  [1] "c.3137G>A"

  $gene
  [1] "ERBB3"

  $aamutation
  [1] "p.S1046N"

  $chromosome
  [1] "chr12"


Most of the time, I am not interested in a single record returned but want to do a full query. To do so, we can choose to build a query object.

System Message: ERROR/3 (<string>, line 96)

Unknown directive type "code-block".

.. code-block:: r

  > bson = mongo.bson.buffer.create()
  > mongo.bson.buffer.append(bson,'gene','TP53')
  [1] TRUE
  > query = mongo.bson.from.buffer(bson)
  > query
        gene : 2         TP53


Executing the query is very similar in principle to SQL, though the syntax is quite different. The result of the query is a cursor object. To get the actual records, one needs to iterate over them; I choose below to convert to a list, but that is optional. Printing the bson objects results in meaningful output as well.

System Message: ERROR/3 (<string>, line 108)

Unknown directive type "code-block".

.. code-block:: r

  > cursor = mongo.find(m,'bio.CosmicMutation',query,limit=12,skip=10)
  > while (mongo.cursor.next(cursor)) {
  +   l = mongo.bson.to.list(mongo.cursor.value(cursor))
  +   print(l[['cdsmutation']])
  + }
  [1] "c.537T>C"
  [1] "c.136_147del12"
  [1] "c.646_660del15"
  [1] "c.392A>T"
  [1] "c.618G>C"
  [1] "c.498_499insA"
  [1] "c.590T>A"
  [1] "c.376-4delACAGTACTCCCCT"
  [1] "c.97-1G>C"
  [1] "c.673-2A>C"
  [1] "c.767C>A"
  [1] "c.817delC"


Another form for queries can also be used. If a list instead of a mongo.bson object is supplied, it is converted as necessary to the bson notation for mongodb. A simple count query gives the idea.

System Message: ERROR/3 (<string>, line 131)

Unknown directive type "code-block".

.. code-block:: r

  > res = mongo.count(m,'bio.CosmicMutation',list(gene='TP53'))
  > res
  [1] 3055


There is quite a bit more that one can do with mongodb from R. In particular, I have not done any data inserting or used the server-side javascript functionality. The docs are pretty extensive, so see help('mongo') after loading the library for details.

System Message: ERROR/3 (<string>, line 140)

Unknown directive type "code-block".

.. code-block:: r

  > mongo.destroy(m)
  NULL


System Message: ERROR/3 (<string>, line 146)

Unknown directive type "code-block".

.. code-block:: r

  > sessionInfo()
  R Under development (unstable) (2012-01-19 r58141)
  Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

  locale:
  [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base

  other attached packages:
  [1] ascii_2.1      rmongodb_1.0.2

  loaded via a namespace (and not attached):
  [1] tools_2.15.0

Written: Monday, January 23, 2012
Tags: , ,
blog comments powered by Disqus