Mabble Rabble: March 2017

27 March 2017

26 March 2017

Many news sources state that there is a skills shortage in various sectors of IT. However, in close inspection this really is not the case at all. In fact, there is huge talent pool out there of candidates eager and willing to work. In most cases, for every job there are hundreds of applicants. Organizations are just looking for cultural fit. In other cases, one might have the skills but just not be someone that the employer gets along with. That does not mean that there is a skills shortage. In fact, it just means the organization is discriminating on the things they are looking for which are not even relevant to the role and the skills. In fact, outsourcing is a key factor in buying in talent on the cheap leaving local talent behind. Many human resource evaluations do state that the larger degree of the interview is about the person rather than the quantifiable skills that they can bring to the team. It is more about the likability which is a form of discrimination again people want to work with people that are more like them. This definitely does not imply that there is a skills shortage in the market. There are plenty of women out there that are very good in Big Data and Machine Learning but does one see them on their team. Employers are basically utilizing a wand of cultural nonsense and overlooking what really matters is the ability of the candidate to perform based on their skills as well as to innovate. Large organizations like Twitter, Facebook, IBM, Google, Microsoft, Amazon, and others harp on about skills shortage but they not focused on diversity and inclusion within their teams. They also buy in cheap outsourcing contracts from places like India ignoring the local thriving talent pool within the prospective regions. We need to recruit more women and generally people of all types and backgrounds in IT, supporting the local communities of where multinationals are based, and that starts from each individual that is well positioned towards evaluating applications on merit and objectivity. If people hire people, and people are generally not always objective, then perhaps, artificial intelligence needs to diminish the responsibilities of the human resources function in organizations.

25 March 2017

Future of Software Engineering

In world of Big Data we essentially have four separate encompassing roles: Big Data Engineer, Data Scientist, Data Architect, and Data Analyst. In most companies the differentiation of the roles generally has a distinctive overlap. However most of these role types are a summation of one main role as it were and that is of a Computer Scientist or more simply put of a Software Engineer. A Big Data Engineer ordinarily is a Software Engineer who is able to look at the big picture. This implies essentially that a Data Scientist, Data Analyst, and a Data Architect are really subsets of the Big Data Engineer. In future, we will witness the convergence of these roles into one where many of the separate role responsibilities will merge into the Software Engineer role. As memory requirements grow to meet Big Data it will mean that Software Engineers take on a broader scope of work. The many disparities in roles converging implies that complexity is more managed and accessible across a standard engineering team. Obtaining a Data Scientist with a Phd is no longer going to be the conservative forte for organizations. Many Software Engineers are able to do the work of Data Scientist as well as Big Data Engineers and manage to look at the big picture from the business standpoint. In almost all cases, algorithms can be taught, approaches can be taught, skills can be re-learnt, but really what teams require is the flexible mindset to adapt to change. Data Scientists have several limitations for organizations: they tend to have questionable programming skills at R and/or Python, familiar with specifically a set of statistical and/or machine learning approaches, and generally apply imperfect/overfitted models to a very small subset of data constraints which are not always adaptable to change in the realistic Big Data requirements of a business. They also have a hacking mindset and a very strong dependency on Big Data Engineers to provide the backbone for ingestion of data sources, refactoring, feature engineering, data pipelining, model scaling, and to ease their model building process. That is almost three quarters of the Big Data work. Ordinarily, Software Engineers can handle multitasking across the entire stack as well as practically apply data science concepts. The future means that the roles of Data Analyst, Data Scientist, Data Architect, Big Data Engineers will no longer be necessary as part of recruitment and they will eventually be eclipsed by the standard Software Engineer. It is not so difficult to attend conferences, read journal papers, conduct research, and build models on data. One only really needs the right business mindset or domain knowledge. Learning and applying new approaches is a continuous process of Software Engineers to up their skills in moving with the times. Organizations will demand more out of hybrid Software Engineers in being able to adapt for changes which are often guided by the business need and the data landscape. Applied Artificial Intelligence will imply that even the building of generalizable models for data is going to become a simpler process of engineering without requiring specialist skills.

Swarm Bandit Robotics

Border control is an issue for most land locked countries including ones that have direct access to water ways. Being able to control every aspect of a border is not humanly possible in most cases. But AI can in fact help cover a larger distance with greater amount of force and attributed control. Artificial Intelligence in form of swarm intelligence and reinforcement learning can create an effective force for border security and control. Building a great wall is pointless and overly expensive. Eventually the wall comes down. But, an army of drones and robots, if well engineered, becomes a force to reckon with as well as adaptable to patterns of attack and infiltration. The ultimate goal being an autonomous army of swarm robots that can apply tactical understanding and manoeuvrability across the entire map of the country, advanced in strategic alliance and combat, when necessary, to protect the sovereignty of a nation and its people.

Military Swarming
Kilobots
pentagon drone swarm autonomous war machines

24 March 2017

Cultural Fit

Cultural Fit is a shield many organizations use to cover up a set of attributes they look for that are intrinsic to their company. But, if one inspects the concept closely it pretty much makes it transparent that the notion is really a form of discrimination and technically where such biases are illegal in the workplace not to mention in secular societies. Could this be the reason why diversity is still an issue in some organizations? And, one reason why there are so few women in technology roles today? There are two attributes that drive a business: performance and innovation. And, it really is these two attributes that should drive the recruitment process. It seems many companies are still hung up on the idea of cultural fit, end up recruiting individuals that are too similar to each other, possibly treat their roles as a daily grind where innovation gets inhibited and the enjoyment of work becomes a foregone conclusion. The glue that is holding a team together drifts apart eventually from frustration and an exceptionally high turnover of employees emerges within an organizational dynamics all with the view of maintaining some abstract concept of a culture. No two people are ever alike. Such concepts of cultural fit are implemented and developed by human resources turning into an almost factory concept of workers. But, if one inspects the function of human resources closely do they really understand how to recruit the right people if all they will do is hunt for keywords on a CV and whether the candidate acts a certain way? In most cases, the entire human resources function at organizations can be replaced by artificial intelligence and data science. Many companies have already given up on the nonsensical idea of cultural fit, favoring more towards diversity and inclusion. Perhaps, it is time organizations moved on from nonsensical approaches to recruitment, developed better, more objective means of evaluation, and focused more on what really matters as relevant to the role. Performance is a major driver for business. What does it matter if someone fits in or not? They could just as easily be lazy at work, bored out of their wits with the repetitive grind, and still pretty much fit in a bubble of a large organizational dynamics which really does not bring any quantifiable output to the business other than another salary or wage earning employee that could become a victim for redundancy down the road. We do need to inspire, encourage, be more accepting of people that are not like us, especially for a balanced workplace and for maintaining a healthy inspiring energy for innovation. All this is important for the survival of a business and to continuously maintain a competitive advantage. Also, is it any wonder that history repeatedly shows that it is really the people that don't fit in are really the ones that eventually become the real success story in society. Why are we so hung up on maintaining the status quo in every form of society when even time eventually brings about change?

the end of culture fit

23 March 2017

Entrepreneur First

Majority of investors associated with Entrepreneur First expect you to have a Phd or collaboration with someone who does have it. The funny thing is most of these investors had previously worked for companies that had become successful by founders that were college dropouts. Have there been many companies that have become successful by a Phd holder? It is very rare. Most innovators in past and present have been people with no-Phd, not even a masters, and in many cases were simply college dropouts. But, they still managed to build a successful business, engineer an innovative product, and gain investment. Looking back at the likes of Google, Microsoft, Apple, Oracle, Dell, Paypal, Facebook, and others, would any of the investors of this generation have asked back then whether they had a Phd before they received investment? Invariably, an individual with a Phd likely will not have the mindset towards taking risks and building a product that can actually sell. And, if one needs to build an entire team around them then really the value of a Phd becomes fairly redundant. Why do investors have such double standards? And, the fact that people with ideas can sometimes struggle to find investment. Perhaps, it all boils down to risk and the gamble they have to make with a potential idea as well as their return on investment. Funnily enough, 80% of all academic research amounts to nothing and holds no real qualitative or quantitative value in terms of substance and real insights. Businesses don't make any money from publishing papers, they make money from selling a viable product. In fact, in most industry sectors, innovation in organizations, is often driven by non-Phd people with many years of practical experience who have been able to spot gaps in the market or problem areas for which there is a high customer demand for solutions. The narrow-minded attitude of investors does not pay dividends and it certainly does not help any potential individuals with a viable product that may be seeking investment. However, such is the conservative attitude of many investors today either a) one has a phd, b) one has people with phd individuals on the team no matter how practically useless/clueless/inexperienced/unprofessional/unethical they may be, or c) one has a collaboration with a large multinational. Otherwise, one might struggle to convince an investor to buy into the innovative idea. Sometimes, all it takes is an investor seeking potential to invest in the individual (likability factor) and not just the product as a sum package - for that there are better alternative avenues for investment.

Entrepreneur First

Hackathons

Hackevents
UK Hackathons and Jams
hackathon
hackathon guide
techsoc

21 March 2017

Automatic Keyphrase Extraction

intro to automatic keyphrase extraction
survey of the state of the art in keyphrase extraction
KEA
automatic keyphrase extraction based on statistical NLP
Keyphrase Extraction using DeepLearning on Twitter
keyword extraction tutorial

C++ Standard

C++ standard
C++ Wikibooks

20 March 2017

Timeseries Databases

Graphite
InfluxDB
M2X
Prometheus
TSDB
DalmatinerDB

Senecajs

Snowplow

Snowplow

Alternatives: KeenIO , RakamIO

17 March 2017

Solace

16 March 2017

Clerezza & UIMA Integration

Clerezza-UIMA
domeo text mining uima and clerezza
debategraph
UIMA Annotators
UIMA Tools
UIMA Addons
UIMA Resources
UIMA Ruta
UIMA Fit
UIMA DKPro

Basel

15 March 2017

Github Alternatives

GitLab
BitBucket
Git/Gitolite (self-hosted & setup own authorization layer)
Perforce
Frog Creek Klin

git-hosting-services-compared
github-alternatives

Enterprise Natural Language Generation

Automated Insights
Narrative Science
Arria
Linguastat
SmartLogic
OnlyBoth
SimpleNLG

Types of NLG:

Canned Text Systems
Template Systems
Phrase-based Systems
Feature-based Systems
Neural Generative Systems

Generalizable Workflow:

Text Plan->Discourse Plan->Surface Realization->Narratives

PyData Stack

PyData

Tidyverse

Code Coverage Metrics

Function Coverage
Statement Coverage
Decision Coverage
Condition Coverage
Condition/Decision Coverage
Loop Coverage
Path Coverage

14 March 2017

Awesome Tensorflow

awesome-tensorflow
TensorFlow-Examples

12 March 2017

Cloud Native Development & Deployment

DZone Guide to Cloud Native

AWS Open Guides

Open Guides on AWS

11 March 2017

Select Papers

F1
Megastore
Spanner
Recursive Deep Models
Drill
Random Forests
PageRank
Useful Things To Know About Machine Learning

Insurance Ontologies

Insurance ontologies are useful for building a semantic decision tree in various context of policy, claim, as well as providing the right insurance for a potential member. They also build formalisms towards a knowledge representation on basis of which decisions can be made towards policy and claim decisions. There are an ever growing types of insurances and even further in the different types of providers with their individual policy features. Semantic discovery of the right insurance product with the right policy features for a policy holder is also important towards a full alignment of coverage and the right balance of risk premium. Another place where such ontologies can apply is as a thesaurus service of terms for semantic extraction and understanding of an insurance document often in form of a SKOS schema.

Example List of Some Attributes for Insurance:

Insurable Object Id
Insurable Object Type
Insurable Object Name
Insurable Object Detail
Insurer Id
Insurer Name
Insurer Product Id
Insurer Product Name
Insurer Product Type
Insurer Product Detail
Member Id
Member Name
Member Type
Member Detail
Organization Id
Organization Name
Organization Type
Organization Detail
Agreement Id
Agreement Name
Agreement Type
Agreement Detail
Claim Type
Claim Id
Claim Amount
Claim Folder
Claim Document
Claim Offer
Claim History
Claim Status
Claim Link
Claim Decision
Party Role
Assessment Id
Assessment Result
Assessment Score
Fraud Assessment
Policy Type
Policy Id
Policy Name
Policy Coverage Detail
Policy Premium
Policy Condition
Policy Term
Policy Member
Policy Duration
Policy Pre-exist Condition
Policy Sub-Members
Policy Risk Score

Digital News

News is a central cornerstone of society which allows one to be kept informed of events of local and global scope across a plentitude of topics. Paper newspapers are a foregone conclusion as many are now digitally available online. However, plenty of news services are still very much untapped and news outlets still depend on conventional approaches. Also, accessibility of news should be available to all. Although, this has been made possible to a degree with social media. The below provides some ideas of extending into a new generation of news services to provide for objectivity as well as allowing untapped potential for news gathering, semantics, analytics, and reporting.

real-time bidding of news
submission of news by journalists (but one doesn't have to be a journalist to provide news of significant value i.e crowdsourcing of news for all)
return on investment for news sites and writers
analytics on news
seo of textual and image content through natural language processing and deep learning
scoring news for influence, trust, and authority
digital news gathering and discovery
circulation graph
alerting and viral monitoring
sentiment analysis
defining knowledge actors/agents for: anchor, journalist, reporter, writer, editor, correspondent
event/story timelines
automated/assistive headlines and summaries
cloud based services and api
news content publishing
collaborative and content filtering
increasing readership and news consumption
news curation
story semantics and connected news sources
identifying fake news
news accessibility
firehose of automated generation of news feeds
queryability for connected news in a linked data graph of stories
prevent risk events through proactive text-driven forecasting and reporting

10 March 2017

ScyllaDB

Scylladab

9 March 2017

Google Addons List

Mail Merge
Scheduler
Email Extractor
Phone Number Extractor
Name/Address Extractor
Advanced Spam Filter
Digital Signature Maker
Digital Signature Protector
Signature Recognizer
Fraud Recognizer
Drive Auditor
Attachment/Link Scanner
Malware Scanner
Bulk Forward
Bulk Send
Contacts Update
Contacts Map
Purging
Advanced Filters
Self-destructing messages
Read Tracker
Auto-Expire Attachments/Links
Unsubscriber
Auto-Responder
Draft Templates
Recall Message
Regex Evaluator
Task Manager
Docs Filters
Personal Assistant
Diary Manager
Micro eCards
Message Resource Lookup (SPARQL)
Send Guard (don't send to unintended recipients by mistake)
Smart Rules
Mail Protector
Inbox Auditor

Gmailify
Gmail Learning Center
Google App Script

8 March 2017

Companies at Level39

Level39

Data Science Masters

The 50 best masters in data science

Vibrant BigData Projects

Hadoop
Beam
Spark
Flink
Cascading
Scalding
DL4J
H2O
Kafka
Airflow
BlinkDB
PrestoDB
GlusterFS
Ignite
Kudu
Nifi
Streamsets
Arrow
Drill
Druid
Gearpump
Phoenix
Tensorflow

7 March 2017

Legal Knowledge Systems

lkif-core

Comparison of Four Legal Ontologies
Estrellaproject

Awesome Django

CDN Planet

6 March 2017

Data Ingestion Tools

NiFi
Gobblin
Flume
Streamsets
Sqoop
Kinesis
Scribe
Fluentd
Morphlines
Embulk
HIHO
Kafka
Beam
ZeroMQ
Kettle

Load and Performance Testing

Gatling
BeesWithMachineGuns
Bench
JMeter
Yandex.Tank
Locust
YSlow
PageSpeed
WebPageTest
Grinder
HTTPRess
TRex
Taurus
Blazemeter
Loader.io
Loadstorm
OctoPerf
Blitz.io
Pylot
LoadBooster
Loadster
LoadImpact

Random.org

Random Number Service

5 March 2017

JSDelivr

Deeplearning TV

Artificial General Intelligence

AGI Society
Artificial General Intelligence The Holy Grail of AI
What is AGI
AGI Conference
The Road to Artificial General Intelligence
Theoretical Foundations Artificial General Intelligence

Tech Documentation Browsers

Dash
Zeal
Velocity
DevDocs
Helm-Dash

Akka Streams

CQRS

Clever Cloud

PipelineIO

Kaggle Datasets

3 March 2017

Awesome Public Datasets

awesome public datasets

2 March 2017

Optimize Performance of Advertising Campaign

Kaggle Display Advertising Challenge

Scalable Machine Learning

Reasons for why scale machine learning:

training data doesn't fit on a single machine
time to train model is too long
too high volume of data that is coming in
low latency requirements for predictions

How to spend less time on a scalable infrastructure:

choose the right ML algorithm that is fast and lean that is able to work on a single machine accurately
subsampling data
vertical scalability
sacrificing accuracy if it is cheaper

Horizontal scalability options:

Hadoop ecosystem with Mahout
Spark ecosystem with MLlib
Turi from GraphLab
Streaming Technologies like Kafka, Storm, AWS Kinesis, Flink, Spark Streaming

Scalability consideration for a model-building pipeline:

choose scalable option like logistic regression or svm
scaling up nonlinear algorithms by making approximations
use a distributed infrastructure to scale out

How to scale predictions in both volume and velocity:

Infrastructure that allows scale up across the number of workers
Sending same prediction to multiple workers and returning back the first one to optimize prediction velocity
choose an algorithm that can parallelize across multiple machines

A curious alternative for Hadoop for scalability is also Vowpal Wabbit for building models on large datasets without the requirement of a big data system. Feature selection also comes in handy when one wants to reduce the size of training data by selecting and retaining the most predictive subset of features. Lasso is a linear algorithm that is often use for feature selection. In respect of prediction velocity and volume, scaling in volume means being able to handle more data while scaling velocity means being able to do it fast enough for a use case. One also has to weigh out the sacrifice between speed and accuracy of predictions.