Ethereum Logo

Ethereum Blockchain Parsing

/ Sven Boesiger / Blockchain

About two weeks ago me and two fellow research colleagues sat together and wanted to analyze the Ethereum blockchain with machine learning techniques. It became immediately clear that we had to parse the Ethereum blockchain in a more convenient format, such as a csv file or rows in a Mongo database, in order to be able to work with it. Searching for a solutions we came across a script by Alex Miller on GitHub which exactly did that: a python script that parses and transfers the entries from the blockchain to a Mongo database.

Unfortunately the script was already one year old and didn’t work at first. We therefore had to fork it and introduce some bug-fixes in order to be able to run it. For the data to get transfered on a MacOS X or Linux machine, the following step were necessary:

Attention: Python3 needs to be available. Please note that the following steps were outlined for a linux system. Still, we tried it on a MacOS X as well – some parts need to be amended though.

1. Download and extract of the script from GitHub


2. Synchronize the blockchain with Geth

geth --rpc

2. Install the required python packages

pip install pymongo contractmap, tqdm and requests

3. Set the environment variable for the mongo database


4. Start Mongo Database

sudo systemctl start mongodb

5. Finally launch the parser

cd Ethereum_Blockchain_Parser-master/Scripts

The process took us in total about three days. The reason for that is, that the blockchain synchronization itself took already quite a while (~two days). The script then puts itself between Geth and MongoDB. It requests via http individual blocks and transfers them one-by-one to the Mongo database. Nevertheless, after all the waiting we had the data ready for analysis. So plan well ahead if you’re thinking to use that approach to analyze the blockchain.

I hope you like our inputs. Please leave any comments or questions below.

One Reply to “Ethereum Blockchain Parsing”

    • 8th September 2017, 1:44 pm- REPLY

      Thanks for fixing and updating the repository. It is able to do the job, partially. While running Scripts/, I came across a simple module import error. I tried solving it, but was unable to do so. The error is:
      `Traceback (most recent call last):
      File “”, line 9, in
      from Crawler import Crawler
      File “./../Preprocessing/Crawler/”, line 1, in
      from Crawler import Crawler
      File “./../Preprocessing/Crawler/”, line 4, in
      import crawler_util
      ImportError: No module named ‘crawler_util’`

      BTW, runs with an error:
      ‘Traceback (most recent call last):
      File “”, line 56, in
      blocks = ParsedBlocks(t)
      File “Analysis/”, line 60, in __init__
      self.contracts = ContractMap(load=True).addresses
      File “Analysis/”, line 57, in __init__
      File “Analysis/”, line 132, in load
      assert os.path.isfile(self.filepath), no_file’
      , although it does the job it was supposed to. Please suggest me on how to proceed further.