Skip to main content

3 posts tagged with "python"

View All Tags

· 5 min read
Dominik Harz

In Ethereum and other blockchains there are still a lot of proof of concept implementation and developers trying out how to cope with the new concepts. As part of the dInvest post series I was also looking into Ethereum and trying to implement a hedge fund in a blockchain. In a previous post I discussed how to get a quantitative framework in python up and running. In this post I will write how to integrate python programs with Ethereum smart contracts. For one reason or another you might be also faced with the issue, that although Ethereum offers a Turing-complete language not everything is actually doable there.

Let's say you have created one of the simple tutorial contracts in Ethereum and now want to look at something more advanced. I personally liked the Hitchhiker's Guide to Smart Contracts by Manuel Aráoz to get started with more complex code, setup testrpc, and truffle. Take a look at it.

dInvest smart contract

dInvest is composed of one smart contract that is responsible for making investments, verifying investment criteria and distribution of returns. The contract exposes public functions to create new investments and for withdrawal which will act as main functions of a hedge fund. Users of the hedge fund are identified by their Ethereum address which is equivalent for the public key. Suggestion of investment strategies and strategy execution are done in different agents that also have Ethereum addresses. These agents are set by the contract creator only. When a user is creating an investment it is possible to specify a list of industry sectors identified by a two digit number based on the Standard Industrial Classification codes. These sectors will be identified as a black list when making the investments. Therefore user have the ability control the sectors which the hedge fund will invest on.

The contract can be found in the GitHub repo.

Interaction with smart contracts

To interact with smart contracts, there are a couple of option including RPC or a JavaScript API. I found the easiest way to interact with Ethereum smart contracts from other programs (like python programs) was using their web3 JavaScript API. As the majority of dInvest is written in python, I wanted to stick to the language and not include JS as well. Luckily, there is a web3 implementation in python. To get it up and running for the dInvest setting I switched to the virtualenv, where I also installed zipline and then install web3 simply with pip install web3.

Using web3, there are three steps to get you up and running to interact with your smart contract:

  1. Getting your ABI
  2. Setup the RPC connection
  3. Interact with the smart contract

In the next sections, I will go into detail how to achieve the three steps. I am using this mostly as a python module for other programs. In the end our python module structure might look like this:

contract
|-- __init__.py
|-- ContractHandler.py
|-- your-contract-name.json

Getting your ABI

Now, to interact with any smart contract you need the Application Binary Interface(ABI) defined by the contract. The ABI is a static, strongly typed interface. Whenever you create a new contract or change an existing one, chances are your ABI changes as well. In my experience the easiest way to get the current ABI of a smart contract (which might be yours or any contract you have the source code available) is to go to https://ethereum.github.io/browser-solidity/ and copy/paste your code there. Then press the "Compile" button on the upper right side and copy the entire string in the "Interface" field into a your-contract-name.json file. Once you have that JSON, your web3 interface will know how to interact with the contract.

Setting up the RPC provider

As a next step you will need to connect to the RPC provider. In your python file (e.g. ContractHandler.py) include those lines of code:

from web3 import Web3, TestRPCProvider

class ContractHandler:
def __init__(self):
self.web3 = Web3(RPCProvider(host='localhost', port='8545'))
with open(str(path.join(dir_path, 'contract_abi.json')), 'r') as abi_definition:
self.abi = json.load(abi_definition)
self.contract_address = your_contract_address
self.contract = self.web3.eth.contract(self.abi, self.contract_address)

I prefer having my configurations in a separate file. There are many ways to do it and it seems like there is no standard in python. I guess using a txt file is not the best option though and I plan to switch to yml soon. See also here https://martin-thoma.com/configuration-files-in-python/. Make sure to run your favorite Ethereum client before starting your program (e.g. geth --rpc).

Interacting with the smart contract

Note: Before interacting with your own account you need to unlock it first. This is achieved in web3 via:

self.web3.personal.unlockAccount(your_ethereum_account, your_ethereum_password)

There are some standard web3 calls you can make, like getting the current balance of an account in wei:

wei_balance = self.web3.eth.getBalance(some_ethereum_address)

In case you want to call a function in the contract you can do this by calling the command as defined by the contract ABI. In our dInvest example there is a contract call which returns the blacklisted companies for our sustainable investment. It is callable with:

blacklist = self.contract.call().blackListCompanies()

There are some more examples in the GitHub code available.

Final note

As a final note, I would like to point out that there are other blockchain solutions like Hyperledger Fabric or Tendermint that aim to solve issues around compatibility with other programming language, transaction throughput etc. As they are permissioned blockchains I haven't yet given them a try, but might be interesting to take a look at.

· 10 min read
Dominik Harz

As stated in the [dInvest post series]({% post_url 2017-01-10-dinvest %}) the idea is to build a hedge fund in a blockchain. Due to computational limitations, it is not feasible to implement investment agents in the blockchain. In dInvest an investment agent should do the following: (1) Get a list of all available financial assets to trade; (2) based on the data given (i.e. financial data and fundamentals data) make a recommendation which assets to buy; (3) keep track of which assets the agent is currently holding; and (4) send the recommended assets to buy to the blockchain. In this blog post I will cover the first three tasks. But what does a hedge fund actually do and how do financial investment strategies look like? I had taken some courses in my undergrad on international finance, but as a computer scientist I had to learn some new concepts while doing this project.

Financial investments can take different forms. A hedge fund offers multiple individuals to acquire a part of the pooled investment, which itself invests in publicly traded financial assets. It is measured according to its absolute returns [1]. A manager administrates the investment of the hedge fund and takes the decisions to buy, sell, or hold financial assets to balance absolute returns and risks. A hedge fund may apply different strategies. Hedge funds can follow one specific strategy, mix established strategies, or create a fund out of hedge funds to diversify [2]. The majority of hedge fund strategies try to optimize the absolute return and risk based on financial indicators of the assets to be invested in. An overview of the different strategies that are currently applied by different hedge funds is provided by HFR, Inc in [3]. Measuring the performance of hedge funds as well as specific influence factors is a wide research area, which is covered in more detail in our report.

Measuring success of investments

A common standard in literature and the investment practice to compare hedge funds is based on their absolute returns, Sharpe ratio, alpha and beta as well as a comparison to benchmarks [2].

  • Returns: The return of the fund is influenced by how well the strategy is able to determine assets that are increasing (for long buys) or decreasing (for short buys) in value over time.
  • Sharpe: With the Sharpe ratio one can determine the return in respect to the risk involved. The Sharpe ratio is calculated by dividing the sum of the asset return and a benchmark return by the standard deviation of the asset return. The higher the Sharpe ratio, the higher the return with the same risk or the lower the risk with same return.
  • Alpha: The alpha value expresses the risk-adjusted performance of the fund in comparison to a benchmark. An alpha of zero means exact performance as the benchmark. A positive value indicates outperforming the benchmark and a negative value represents falling behind the benchmark.
  • Beta: The beta value shows the volatility of the fund in comparison to a benchmark. The beta value baseline is one and represents the same volatility as the benchmark. A value below one indicates a lower volatility and consequently a value above one a higher volatility than the benchmark.

Quantitiative finance

To implement an investment strategy, certain rule sets are developed and applied. Quantitative finance ”translates” economic models into the mathematical world and allows to apply for example stochastic analysis over shares or calculate the value of derivatives [4]. Quantitative finance techniques express investment strategies in mathematical formulas and thus enables a translation into an algorithm [4]. When starting out with dInvest, we were looking into existing quantitative frameworks, that offer us (1) a simulation environment to test our algorithms without using actual money, (2) realistic conditions of the trading market, such as fees and process delays, (3) is able to trade shares and derivatives based on historical data to test our algorithm, (4) provide little delay (lower than 10 minutes) in trading current assets, and (5) is able to be controlled using a programming language (such as Python, Java, or Scala). We evaluated several solutions decided to use the open source implementation of the Quantopian framework zipline.

Getting started with zipline

Zipline is based on python and is able to run with python 2 and 3. Quantopian has created a beginners tutorial on their website. Before going into the details of zipline, let us look into how to set it up locally. Note: I will be using python3 throughout this post. To install zipline you first need all the required C extensions. For Debian based distributions use the command shown below, otherwise there are some more details provided here.

sudo apt-get install libatlas-base-dev python-dev gfortran pkg-config libfreetype6-dev

I then created a virtual environment for Zipline. If you have never setup a virtualenv follow this tutorial. In case you want to use pyhton3 like me, make sure to use the python3 executable. Also, when trying python2 I ran into a couple of issues as Linux Mint uses older versions of python2, which seem to be incompatible with the required version of numpy. When you setup your virtualenv, run pip install zipline from the terminal with your virtualenv . This will install all the required packages including numpy and most likely take some time.

The first investment algorithm

In the /zipline/examples folder you will find some example algorithms to try out. In their beginners tutorial the zipline authors describe some of the algorithms, how they are generally structured and how to execute them. I will go into more detail about having a value investment algorithm and not take one of the example algorithms.

The algorithm to be implemented is inspired by Benjamin Graham [6]

  1. Select shares and derivatives of companies with a market capitalization of minimum 100 million dollars
  2. Determine the SIC code of each asset [5] and exclude the ones contained in the blacklist from the hedge fund
  3. Based on the sectors of the companies find the two sectors with the price of assets per earning ratio (PE ratio)
  4. Invest every month in all the companies listed in the two sectors

Let's take a look at our algorithm. It is basically taken from the web-based version of zipline called Quantopian and can be found here. I have tried to include quite some comments, so if you are familiar with python then the code should be quite readable. Zipline does not include the full functionality and data to realize the investment algorithm. The market capitalization data, sector codes, and PE ratio are not accessible from zipline. For a quick overview of the current code and the changes I made, I will go through the main functions. Also, please note that I only used free sources of financial data. There seem to be a lot of other sources available, but they have their price tag.

initialize(context) Defines the number of sectors we are interested in and also schedules to execution of our algorithm for back-testing.

rebalance(context, data) This essentially takes to selected assets and distributes them evenly into our portfolio.

get_fundamentals(context, data) In Quantopian this is fairly easy, since the Morningstar fundamentals dataset is included. However, in zipline I had to find a workaround. Quandl offers the SF0 fundamentals for free: www.quandl.com/data/SF0-Free-US-Fundamentals-Data. To get a full view of the data, I choose to download the whole data as a CSV file and automated the process. An example of how to do this can be found in the TradeHandler.getData(self) function here.

get_sectors(key) Again, this is included in Quantopian, but needs a manual workaround. To get the sector codes it is quite simple. You need to download a txt file, which includes the sector codes and the function will read them into a dict.

get_sector_code() Also, this part is included in Quantopian. Here we download a JSON file to get a mapping of sector codes to the financial assets we are trading.

To analyse the performance of the investment algorithm the absolute returns, Sharpe, alpha, and beta for the period of January 1, 2009 to November 15, 2016 is compared. The investment algorithm uses the Quandl financial data i.e. opening and closing prices of specific assets during that period [8]. The algorithm is executed multiple times. First, no sectors are excluded and second, an exclusion list based on sectors is used. Defense, beer and alcohol, tobacco, and coal industry are excluded based on ESG criteria and common practices [9]. The first graph in the figure below presents the returns achieved by the unrestricted algorithm, the algorithm with single excluded sectors, and the benchmark return. These returns show that unrestricted and single excluded algorithms have overall a higher profit than the benchmark. The second graph in the figure displays a combination of two excluded sectors and the benchmark return, while the third graph elaborates on the combinations of three and all four sectors excluded. A single or combination of exclusions constantly perform better than the benchmark until beginning of 2012. Then the unrestricted algorithm as well as the single exclusion of defense and alcohol exclusion and also the combination of tobacco and coal, defense and alcohol, defense and coal exclusions drop until they recover in 2014. The other single and combination algorithms do not show this drop, but also have a strong increase in returns in 2014. Then all variations show high volatility. The drop in 2012 in the unrestricted algorithm is caused by three of the sectors it is investing in. When tobacco is restricted, there is no strong drop in 2012. However, if not also in combination defense and/or coal is restricted the algorithm selects either of the two industries instead and another drop occurs. In the third graph of the figure these exclusions are combined and no significant drop occurs, however the volatility remains.

There is further analysis and discussion of the results in our report.

References

  1. D. Harper. (2016) Hedge funds hunt for upside, regardless of the market. [Online]. Available: http://www.investopedia.com/articles/03/112603.asp
  2. M. Agarwal, Hedge Fund Strategies. John Wiley & Sons, Inc., 2009, pp. 45–55. ISBN 9781118258187. [Online]. Available: http://dx.doi.org/10.1002/9781118258187.ch4
  3. Hedge Fund Research. (2016) HFR hedge fund strategy classification system. [Online]. Available: https://www.hedgefundresearch.com/hfr-hedge-fund-strategy-classification-system
  4. P. Wilmott, Paul Wilmott on quantitative finance. John Wiley & Sons, 2013.
  5. U.S.D. of Labor. (2016) SIC division structure. [Online]. Available: https://www.osha.gov/pls/imis/sic_manual.html
  6. B. Graham and D. Dodd, Security Analysis: Sixth Edition, ser. Security Analysis Prior Editions. McGraw-Hill Education, 2008.
  7. Quantopian. (2016) zipline: Pythonic algorithmic trading library. [Online]. Available: https://github.com/quantopian/zipline/
  8. Quandl. (2016) Quandl: financial database. [Online]. Available: https://www.quandl.com/browse?idx=database-browser
  9. J. R. Evans and D. Peiris, “The relationship between environmental social governance factors and stock returns,” UNSW Australian School of Business Research Paper No. 2010ACTL02, 2010.

· 13 min read

Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a way of differentiating humans and machines and was coined by von Ahn, Blum, Hopper, and Langford [5]. The core idea is that reading distorted letters, numbers, or images is achievable for a human but very hard or impossible for a computer. CAPTCHAs might look like the one below. Most likely the reader has already seen one, when trying to register at a website or write a comment online.

Penguin-Pal_Captcha

There are several use cases for CAPTCHAs, which includes the ones presented in [6]: Preventing comment spam, protect website registration, protect e-mail addresses from scrappers, protect online polls, preventing dictionary attacks, and block/hinder search engine bots.

CAPTCHAs do not give a guarantee that it prevents these cases every time as there are known attack vectors. These include cheap or unwitting human labor, insecure implementation, and machine learning based attacks. We will not go into detail on insecure implementations, as the focus of this article are deep learning based approaches.

Human-based CAPTCHA breaking

Out of curiosity and to compare results achieved by machine learning approaches, we take a look at the human based approach. For example BypassCAPTCHA offers breaking CAPTCHAs with cheap human labor in packages (e.g. 20,000 CAPTCHAs for 130$). There are also other services including Image Typerz, ExpertDecoders, and 9kw.eu. There are also hybrid solutions that use both OCR and human labor like DeathByCAPTCHA. These vendors list the following accuracies and response times (averages):

ServiceAccuracy (daily average)Response Time (daily average)
BypassCAPTCHAN/AN/A
Image Typerz95%10+ sec
ExpertDecoders85%12 sec
CAPTCHABOSS (premium version of ExpertDecoders)99%8 sec
9kw.euN/A30 sec
DeathByCAPTCHA96.8%10 sec

The values are advertised and self-reported. We did not conduct any verification of the stated numbers, but it can give an orientation on human performance and serves as a reference for our machine learning algorithms.

Learning-based CAPTCHA breaking

CAPTCHAs are based on unsolved or hard AI problems. However, with the progress of AI techniques and computing power, sequences of characters or CAPTCHAs can be recognized as shown by Goodfellow et al. in [1], Hong et al. in [2], Bursztein et al. in [3] and [7], and Stark et al. in [4] using deep learning techniques. Goodfellow et al. predict numbers from Goolge Street View images directly (without pre-processing) utilizing a CNN. They make use of DistBelief by Dean et al. to scale the learning to multiple computers and to avoid out of memory issues [1]. This technique was later on used to solve CAPTCHAs, whereby the researched achieved an accuracy of up to 99.8%. Hong et al. pre-process CAPTCHAs to rotate them and segment the characters. Afterwards they apply a CNN with three convolutional layers and two fully connected layers [2]. Bursztein et al. use pre-processing, segmentation, and recognition techniques (based on KNN) and later on various CNNs to detect CAPTCHAs from multiple websites including Baidu, Wikipedia, reCAPTCHA, and Yahoo [3],[7]. Stark et al. researched a way of detecting CAPTCHAs with limited testing data. They use a technique called Active Learning to feed the network with new training data, where the added data has a high classification uncertainty, to improve the performance overall [4]. The below table gives an overview of the reported accuracies in the different papers and blog posts.

ResearcherDatasetTechniqueAccuracy (maximum)Reference
Goodfellow et al.Google Street View image filesCNN with DistBelief96%[1]
Hong et al.Microsoft CAPTCHAsPreprocessing, segementation and CNN57%[2]
Stark et al.Cool PHP CAPTCHA generated CAPTCHAsCNN with Active Deep Learning90%[4]
Bursztein et al.Baidu, CNN, eBay, ReCAPTCHA, Wikipedia, Yahoo CAPTCHAsReinforcement Learning, k-Nearest Neighbour54% (on Baidu)[7]
BurszteinSimple CAPTCHACNN92%[3]
BurszteinSimple CAPTCHARNN96%[8]

Current state of CAPTCHAs

Google has introduced NoCAPTCHA in December 2014. This introduces multiply new features including evaluation based on cookies, movement of the mouse, and recognition of multiple images. Google announced to introduce an invisible CAPTCHA to get rid of the checkbox.

The previous version of reCAPTCHA was very popular on many websites. It included typically two words with rotation and had an audio option. Further CAPTCHA techniques can include simple logic or math questions, image recognition, recognition of friends (social CAPTCHA), or user interaction (like playing a game) [9].

Our objectives and motivation

The aim of the project is to break CAPTCHAs using deep learning technologies without pre-segmentation. Initially we focus on simple CAPTCHAs to evaluate the performance and move into more complex CAPTCHAs. The training dataset is generated from an open source CAPTCHA generation software. Tensorflow is used to create and train a neural network.

Creating the datasets

We are generating the datasets using a Java based CAPTCHA generator (SimpleCAPTCHA). We have created the following datasets.

DescriptionSizeTraining samplesTest samples
Digits only38 MB9502100
Digits and characters197 MB49796100
Digits and characters with rotation39 MB10000100
Digits and characters with rotation198 MB49782500
Digits and characters with rotation777 MB196926500

Each dataset contains jpeg images containing a CAPTCHA with five characters. The characters are lowercase (a-z) or numbers (0-9). We used the fonts "Arial" and "Courier" with noise. An example of the created CAPTCHAs is displayed below. Our intention was to mimic the CAPTCHAs created by Microsoft. We have extended SimpleCAPTCHA library in order to get character rotation, outlines in CAPTCHAs to achieve the same look of Microsoft CAPTCHAs.

54563 5p23r ycn2m

Generated CAPTCHAs will be 152x80 greyscale images. This resolution is chosen because it is small enough to reduce memory footprint when training the CNN and it is also enough to recognize the CAPTCHA easily.

Deep CNN model

Based on the research in [1], [3] and [4] we use a deep CNN with three ReLU layers and two fully connected layers to solve the CAPTCHAs. Each digit is represented by 36 neurons in the output layer. The three convolutional layers with ReLU activation function have the sizes of 32, 64, and 128. 5x5 filter size was used in all layers. After each convolutional layer there is a max pooling of 2. After the last convolutional layer, there is a fully connected layer with ReLU of size 1024 and finally another fully connected layer that has an output size of 180. In the ReLU layers, a dropout of 0.75 is applied.

In the output layer, digits 0-9 will be represented by 1 to 10 neurons and, characters a to z will be represented by 11 to 36 neurons. Therefore, there are 5 x 36 neurons which will identify the CAPTCHA. The network will output the predictions for all 5 digits and each digit is calculated by the max probability of its 36 neurons. We have set the learning rate as 0.001.

CNN

Results and discussion

First, we trained the CNN with 10000 five letter and digit CAPTCHAs without rotation on a GTX660M. We had 100 batches with a batch size of 100 and ran it for 20 epochs. The hyperparameters were set as described in the previous section. The figure below shows that the network did not perform well with these settings. We then increased the training size to 50000 CAPTCHAs, but the results stayed the same. We then tried with 10000 simplified CAPTCHAs with only five digits without rotation. However, this still did not improve our situation. We noted that the loss function reduced quite quickly and stayed constant. We hence introduced another convolutional layer to the network to allow it to further differentiate the digits. Again, this resulted in almost the same result.

DigitsOnly660M
CNN with three conv. layers and two fully connected layers accuracy of CAPTCHAs with five digits or lowercase letters without rotation. Training in 100 batches and 10000 training samples.

These results match the ones presented in [4]. The authors then introduce Active Learning to circumvent the problem. In [1] a larger amount of samples is used. However, we do not have sufficient computing power available to use millions of images as training data. Also, in [3] the batch size is larger, resulting in a considerably higher accuracy. We decided to change our batch size, but required a more powerful GPU for that. Hence, we used a Nvidia Tesla K80 from AWS to conduct our training. We also changed the CNN back to three conv. layers and two fully connected layers. On the simple case with five digit CAPTCHAs without rotation we used 39250 CAPTCHAs in 157 batches and 10 epochs. We conducted testing with a very small dataset of 100 CAPTCHAs. The results did improve considerably as shown in the figure below. We managed to achieve a best training error of 94.9% and a test error of 99.2%. The very high test accuracy is quite likely also caused by the small amount of test items. However, running a K80 in AWS is quite costly and therefore we kept the test set to a minimum also in the upcoming experiments.

DigitsOnly
CNN with three conv. layers and two fully connected layers accuracy of CAPTCHAs with five digits without rotation. Training in 157 batches, 39250 training samples, and testing with 100 CAPTCHAs.

We then tried with a bit more complex CAPTCHAs with digits and lowercase characters. We used 49750 training and 100 test CAPTCHAs with the same CNN used in the simple case above. The figure below presents our results and shows that we can achieve a training accuracy of 65% and a test accuracy above 80% in these cases. We stopped the CNN prematurely after 10 epochs to try more complex use cases.

DigitsChar
CNN with three conv. layers and two fully connected layers accuracy of CAPTCHAs with five digits or lowercase letters without rotation. Training in 199 batches, 49750 training samples, and testing with 500 CAPTCHAs.

Next, we added rotation to our CAPTCHAs and trained the CNN on 49782 samples. This resulted in almost random result with an accuracy around 10%. We thus increased the samples to 196926 and tested it on 500. Again, we kept the same hyperparameters as presented in the model section. This time we trained for 15 epochs, to prevent premature interruption of the training. Our results are presented in the figure below. With the increased training size we achieve a training accuracy of 97.1% and a test accuracy of 99.5%. However, this is again on a very small test set. DigitsCharRot
CNN with three conv. layers and two fully connected layers accuracy of CAPTCHAs with five digits or lowercase letters with rotation. Training in 787 batches, 196926 training samples, and testing with 500 CAPTCHAs.

From the tests conducted above, we have a few examples to show correct and false predictions.

Correctly classifiedIncorrectly classified
Prediction: 54563 Image: 54563Prediction: 82298 Image: 82290
Prediction: grh56 Image: grh56Prediction: k76ap Image: h76ap
Prediction: fb2x4 Image: fb2x4Prediction: fffgr Image: k76ap

Conclusion

With this project we have shown that it is possible to create large enough datasets automatically to mimic certain CAPTCHAs (i.e. Microsoft). This provides large labeled datasets, which serve as a foundation to train neural networks. We have chosen to use two different CNNs with three and four convolutional layer and two fully connected layers. Our experiments show however, that adding more layers to the network did not increase the accuracy. We noticed that optimizing a CNN can be cumbersome. While running the CNN on a GTX 660M, we were not able to manage to get satisfying results. Most likely we would have needed more training time on the batches to receive better results. When we switched to a Tesla K80 we managed to train the network with larger amounts of data and a higher batch size. This resulted in higher accuracy on the simple and more complex datasets. We realized that memory poses a quite severe limitation towards applying large scale machine learning. Our approach based on a CNN is limited to CAPTCHAs with exactly the length defined in the network. Hence, classifying CAPTCHAs with any other length than five would fail. As an alternative a RNN could be used to resolve this issue. In [8] a use of RNN to break CAPTCHAs is discussed with fairly good results. However, also in this approach a powerful GPU is required. Moreover, using a CNN requires large datasets to be trained on. For a combination of digits, characters, and rotation we required a dataset of around 200000 CAPTCHAs (~780MB). On small sized GPU this datasets cause either out of memory errors or require a quite long training time. Even with the Tesla K80 the training time takes around 2 hours and 30 minutes.

References

  1. Goodfellow, Ian J., et al. "Multi-digit number recognition from street view imagery using deep convolutional neural networks." arXiv preprint arXiv:1312.6082 (2013).
  2. Hong, Colin et al. "Breaking Microsoft’s CAPTCHA." (2015).
  3. Using deep learning to break a CAPTCHA system in Deep Learning. 3 Jan. 2016, https://deepmlblog.wordpress.com/2016/01/03/how-to-break-a-CAPTCHA-system/. Accessed 6 Dec. 2016.
  4. Stark, Fabian, et al. "CAPTCHA Recognition with Active Deep Learning." Workshop New Challenges in Neural Computation 2015. 2015.
  5. Von Ahn, Luis, et al. "CAPTCHA: Using hard AI problems for security." International Conference on the Theory and Applications of Cryptographic Techniques. Springer Berlin Heidelberg, 2003.
  6. "CAPTCHA: Telling Humans and Computers Apart Automatically" 2010, http://www.CAPTCHA.net/. Accessed 7 Jan. 2017.
  7. Elie Bursztein et al., "The end is nigh: generic solving of text-based CAPTCHAs". Proceedings of the 8th USENIX conference on Offensive Technologies, p.3-3, August 19, 2014, San Diego, CA
  8. Recurrent neural networks for decoding CAPTCHAs in Deep Learning. 12 Jan. 2016, https://deepmlblog.wordpress.com/2016/01/12/recurrent-neural-networks-for-decoding-CAPTCHAs/. Accessed 9 Jan. 2017.
  9. CAPTCHA Alternatives and thoughts. 15 Dec. 2015, https://www.w3.org/WAI/GL/wiki/CAPTCHA_Alternatives_and_thoughts. Accessed 9 Jan. 2017.