One of the business problems we encountered a while ago involved identifying customers who subscribe to a company’s next best product, neural networks can help us determine that.
There was tremendous customer data and a plethora of information hidden in it – both useful and not useful.
As always, we took the BADIR approach to solve the problem. We had already formed a Business Question, the first step of the process. The next steps were the Analysis Plan and Data Collection.
After this, according to our hypothesis, we had a good set of data, both extracted and engineered, that would be relevant in identifying customers.
Since we had over a million rows of data, and Neural Networks love massive amounts of data, we decided to dirty our hands by training a deep neural network.
What are neural networks?
So what are Neural Networks in the first place?
The biological inspiration to build neural networks comes from the brain, specifically neural connections. This structure inspires the building blocks of neural networks. A simple neural network unit – called perceptron can make this clearer –
Three things are happening in a perceptron:
- All the inputs are being multiplied by the corresponding weights (weights are also referred to as parameters): [latex] $x_i \times w_{ij}$ [/latex]
- Then all the products of input and weight pairs are being summed up (transfer function ) : [latex]$\sum (x_i \times w_{ij})$ [/latex]
- The sum is transformed or is acted upon by a function called activation function and the result is passed on : [latex]$f(\sum{x_{i} \times w_{ij}}+\theta)$[/latex
Here θ is called the threshold, or more commonly, the bias term. The neural networks that we use will involve multiple such perceptron units. Each perceptron unit converges to a node, and various nodes make up a layer in our system.
If we keep increasing layers of nodes, the network grows in depth. Hence the term deep neural network.
What is the right architecture?
Well, there is no one correct answer. It all depends on the data patterns and finding the architecture, which gives us the best results. It would help if you started somewhere – so we did some brainstorming and came up with possible suggestions that can provide guidance.
A few pointers to keep in mind before you start building your neural network –
- It has been found that increasing the depth (adding more layers) works better than adding more nodes in a layer
- Start with simple architecture, then increase the complexity if you think the data patterns are not appropriately captured
- Try to ensure that the number of rows to the number of parameter ratios is more than 50. Therefore, don’t build a complex architecture if you don’t have a large dataset
- In general, nodes in subsequent layers keep decreasing in most architecture
For example, if you have a dataset with about 100,000 rows and about 40 columns, you could build a neural network with two layers – the first layer with 30 nodes and the second layer with 20 nodes. This will mean the neural network will have about 1870 parameters and the ratio of the number of rows to the number of parameters is above 50.