Picture depicting CryptoKitties
Ethereum, Cryptokitties, Data Science, Research

Exploring CryptoKitties — Part 1: Data Extraction


Written by Markus Buhatem Koch

Source: https://www.cryptokitties.co/kitty/101

Introduction to CryptoKitties

If you are reading this, you’ve probably heard of the game that has caught everyone’s attention on the Ethereum network over the last few months: CryptoKitties!

In short, the game consists of collecting virtual cats. Cats are created by the players of the game, who can breed two cats to generate a new one. Each cat has its own genetic sequence, which determines their physical attributes. Their genome is a function of their parents' genes plus some randomness.

In addition to breeding, up to 50,000 cats with predefined characteristics can be created by Axiom Zen, the company behind the game. There is a market for buying and selling cats and another one for “renting” cats for breeding purposes. You can read more about the game here.

BlockScience is a technology research and analytics firm specializing in the design and evaluation of decentralized economic systems. Analyzing aspects of the CryptoKitties economy seemed like a great opportunity to improve our data extraction tools while at the same time getting our hands on some real-world data from a live (and lively!) decentralized application.

This blog post has been split into two parts:

  • Part 1 (this post) covers technical aspects related to extracting and transforming data from the Ethereum blockchain.
  • Part 2 contains the actual analysis of the game data extracted.

Extracting Data from the Ethereum Blockchain

Even though everything that ever happened on the Ethereum network is recorded on the blockchain, turning those bits into meaningful data is not always straightforward.

It is simple to extract transaction data stating that in a given block account A sent some ether (ETH) to account B and set a certain gas price for that transaction to be processed. However, when we’re working on transactions sent to contracts, decoding blockchain data is akin to implementing an ETL from multiple fixed width text files whose formats are described only in the source code of the software that created them.

Transactions that Call Functions in Smart Contracts

Take for instance a transaction sent to contract 0xb1690c08e213a35ed9bab7b318de14420fb57d8c with the following content in the data field0x454a2ab300000000000000000000000000000000000000000000000000000000000871ad

What does it do?

The first part of the data field (0x454a2ab3) refers to the function inside the smart contract that is being called by the transaction. Those are the first four bytes of the hash of the function signature, which is defined as the name of the function followed by the data types of its parameters.keccak256(“<function>(<type_of_data_1>,<…>,<type_of_data_N>)”)

The remaining bytes are the values ​​of the function parameters. You can read about it in detail here.

Even knowing those 4 bytes, how can we tell what function is being called, or how many parameters it has? In this specific case, we know that contract 0xb1690c… is the CryptoKitties auction smart contract — the market for buying and selling cats. And because its source code has been made public, we know that it has a function called bid/// Bids on an open auction, completing the auction and
/// transferring ownership of the NFT if enough Ether is supplied.
/// param _tokenID: ID of token to bid on.
function bid (uint256 _tokenId)

If we calculate the hash of the bid function signature, we can see that the first four bytes are exactly those present in the transaction data.keccak256(“bid(uint256)”) = 454a2ab3c602fd9…

And because the function only takes one argument, we can tell that everything following those first four bytes in the transaction data is that parameter. In other words, the transaction is bidding on cat number 0x871ad (553389).

Smart Contracts that Log Information

It is common for smart contracts to log information during their execution.

Logs recorded by a contract can be obtained by calling the JSON RPC API eth_getlogs method. As is the case with transactions that call contract functions, we need to know the source code of the contract in order to decode the data returned by this API. For example, what does a log with the following data mean?

blockNumber: 0x51968f
topics: [0x0a5311bd2a6608f08a180df2ee7c5946819a649b204b554bb8e39825b2c50ad5]data: 0x0000000000000000000000001b8f7b13b14a59d9770f7c1789cf727046f7e542000000000000000000000000000000000000000000000000000000000009fac1000000000000000000000000000000000000000000000000000000000009f80e000000000000000000000000000000000000000000000000000000000008957200004a50b390a6738697012a030ac21d585b4c8214ae39446194054b98e0b98f

Logs are recorded when a contract triggers an event. The first element of the topics array (which only has one element in our example) is the hash of the event signature. In the case of CryptoKitties, logs are recorded when a cat gets pregnant and when a cat is born, for example./// The Pregnant event is fired when two cats successfully breed
/// and the pregnancy timer begins for the matron.
event Pregnant (address owner, uint256 matronId, uint256 sireId, uint256 cooldownEndBlock);/// The Birth event is fired whenever a new kitten comes into
/// existence. This obviously includes any time a cat is created
/// through the giveBirth method, but it is also called when
/// a new gen0 cat is created.
event Birth (address owner, uint256 kittyId, uint256 matronId, uint256 sireId, uint256 genes);

See how the hash of the Birth event signature corresponds to the value in the log in our examplekeccak256(“Birth(address,uint256,uint256,uint256,uint256)”) = 0x0a5311bd2a6608f08a180df2ee7c5946819a649b204b554bb8e39825b2c50ad5

So far, we know that on block number 51968F (5346959) a cryptokitty was born! The next step in our decoding process is to split the data field according to the five parameters of the Birth event. The first parameter is an Ethereum address, which is 160 bits long, but is encoded with 256 bits (zeroes are added to the left of the address). The other parameters are 256-bit integers. The data field is therefore divided into 5 parts, each with 256-bit (64 hexadecimal characters).

owner:
0000000000000000000000001b8f7b13b14a59d9770f7c1789cf727046f7e542
kittyId:
000000000000000000000000000000000000000000000000000000000009fac1
matronId:
000000000000000000000000000000000000000000000000000000000009f80e
sireId:
0000000000000000000000000000000000000000000000000000000000089572
genes:
00004a50b390a6738697012a030ac21d585b4c8214ae39446194054b98e0b98f

See what I meant by “implementing an ETL from multiple fixed-width text files whose formats are described only in the source code of the software that created them”? :-)

Move on to Part 2, where we’ll share some interesting facts we came across while analyzing the CryptoKitties game data! Special thanks to the BlockScience team for the research, insights, and review.


About BlockScience

BlockScience® is a complex systems engineering, R&D, and analytics firm. Our goal is to combine academic-grade research with advanced mathematical and computational engineering to design safe and resilient socio-technical systems. We provide engineering, design, and analytics services to a wide range of clients, including for-profit, non-profit, academic, and government organizations, and contribute to open-source research and software development.

You've successfully subscribed to BlockScience Blog
You have successfully subscribed to the BlockScience Blog
Welcome back! You've successfully signed in.
Unable to sign you in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.