Alt
Fluxible
Reflux
Redux
27 March 2017
26 March 2017
IT Skills Shortage
Many news sources state that there is a skills shortage in various sectors of IT. However, in close inspection this really is not the case at all. In fact, there is huge talent pool out there of candidates eager and willing to work. In most cases, for every job there are hundreds of applicants. Organizations are just looking for cultural fit. In other cases, one might have the skills but just not be someone that the employer gets along with. That does not mean that there is a skills shortage. In fact, it just means the organization is discriminating on the things they are looking for which are not even relevant to the role and the skills. In fact, outsourcing is a key factor in buying in talent on the cheap leaving local talent behind. Many human resource evaluations do state that the larger degree of the interview is about the person rather than the quantifiable skills that they can bring to the team. It is more about the likability which is a form of discrimination again people want to work with people that are more like them. This definitely does not imply that there is a skills shortage in the market. There are plenty of women out there that are very good in Big Data and Machine Learning but does one see them on their team. Employers are basically utilizing a wand of cultural nonsense and overlooking what really matters is the ability of the candidate to perform based on their skills as well as to innovate. Large organizations like Twitter, Facebook, IBM, Google, Microsoft, Amazon, and others harp on about skills shortage but they not focused on diversity and inclusion within their teams. They also buy in cheap outsourcing contracts from places like India ignoring the local thriving talent pool within the prospective regions. We need to recruit more women and generally people of all types and backgrounds in IT, supporting the local communities of where multinationals are based, and that starts from each individual that is well positioned towards evaluating applications on merit and objectivity. If people hire people, and people are generally not always objective, then perhaps, artificial intelligence needs to diminish the responsibilities of the human resources function in organizations.
Labels:
artificial intelligence
,
big data
,
data science
,
economics
,
machine learning
,
society
,
software engineering
25 March 2017
Future of Software Engineering
In world of Big Data we essentially have four separate encompassing roles: Big Data Engineer, Data Scientist, Data Architect, and Data Analyst. In most companies the differentiation of the roles generally has a distinctive overlap. However most of these role types are a summation of one main role as it were and that is of a Computer Scientist or more simply put of a Software Engineer. A Big Data Engineer ordinarily is a Software Engineer who is able to look at the big picture. This implies essentially that a Data Scientist, Data Analyst, and a Data Architect are really subsets of the Big Data Engineer. In future, we will witness the convergence of these roles into one where many of the separate role responsibilities will merge into the Software Engineer role. As memory requirements grow to meet Big Data it will mean that Software Engineers take on a broader scope of work. The many disparities in roles converging implies that complexity is more managed and accessible across a standard engineering team. Obtaining a Data Scientist with a Phd is no longer going to be the conservative forte for organizations. Many Software Engineers are able to do the work of Data Scientist as well as Big Data Engineers and manage to look at the big picture from the business standpoint. In almost all cases, algorithms can be taught, approaches can be taught, skills can be re-learnt, but really what teams require is the flexible mindset to adapt to change. Data Scientists have several limitations for organizations: they tend to have questionable programming skills at R and/or Python, familiar with specifically a set of statistical and/or machine learning approaches, and generally apply imperfect/overfitted models to a very small subset of data constraints which are not always adaptable to change in the realistic Big Data requirements of a business. They also have a hacking mindset and a very strong dependency on Big Data Engineers to provide the backbone for ingestion of data sources, refactoring, feature engineering, data pipelining, model scaling, and to ease their model building process. That is almost three quarters of the Big Data work. Ordinarily, Software Engineers can handle multitasking across the entire stack as well as practically apply data science concepts. The future means that the roles of Data Analyst, Data Scientist, Data Architect, Big Data Engineers will no longer be necessary as part of recruitment and they will eventually be eclipsed by the standard Software Engineer. It is not so difficult to attend conferences, read journal papers, conduct research, and build models on data. One only really needs the right business mindset or domain knowledge. Learning and applying new approaches is a continuous process of Software Engineers to up their skills in moving with the times. Organizations will demand more out of hybrid Software Engineers in being able to adapt for changes which are often guided by the business need and the data landscape. Applied Artificial Intelligence will imply that even the building of generalizable models for data is going to become a simpler process of engineering without requiring specialist skills.
Cybersecurity with Spot
Labels:
big data
,
data science
,
hadoop
,
information retrieval
,
intelligent web
,
internet
,
machine learning
,
security
,
spark
Swarm Bandit Robotics
Border control is an issue for most land locked countries including ones that have direct access to water ways. Being able to control every aspect of a border is not humanly possible in most cases. But AI can in fact help cover a larger distance with greater amount of force and attributed control. Artificial Intelligence in form of swarm intelligence and reinforcement learning can create an effective force for border security and control. Building a great wall is pointless and overly expensive. Eventually the wall comes down. But, an army of drones and robots, if well engineered, becomes a force to reckon with as well as adaptable to patterns of attack and infiltration. The ultimate goal being an autonomous army of swarm robots that can apply tactical understanding and manoeuvrability across the entire map of the country, advanced in strategic alliance and combat, when necessary, to protect the sovereignty of a nation and its people.
Military Swarming
Kilobots
pentagon drone swarm autonomous war machines
Military Swarming
Kilobots
pentagon drone swarm autonomous war machines
24 March 2017
Cultural Fit
Cultural Fit is a shield many organizations use to cover up a set of attributes they look for that are intrinsic to their company. But, if one inspects the concept closely it pretty much makes it transparent that the notion is really a form of discrimination and technically where such biases are illegal in the workplace not to mention in secular societies. Could this be the reason why diversity is still an issue in some organizations? And, one reason why there are so few women in technology roles today? There are two attributes that drive a business: performance and innovation. And, it really is these two attributes that should drive the recruitment process. It seems many companies are still hung up on the idea of cultural fit, end up recruiting individuals that are too similar to each other, possibly treat their roles as a daily grind where innovation gets inhibited and the enjoyment of work becomes a foregone conclusion. The glue that is holding a team together drifts apart eventually from frustration and an exceptionally high turnover of employees emerges within an organizational dynamics all with the view of maintaining some abstract concept of a culture. No two people are ever alike. Such concepts of cultural fit are implemented and developed by human resources turning into an almost factory concept of workers. But, if one inspects the function of human resources closely do they really understand how to recruit the right people if all they will do is hunt for keywords on a CV and whether the candidate acts a certain way? In most cases, the entire human resources function at organizations can be replaced by artificial intelligence and data science. Many companies have already given up on the nonsensical idea of cultural fit, favoring more towards diversity and inclusion. Perhaps, it is time organizations moved on from nonsensical approaches to recruitment, developed better, more objective means of evaluation, and focused more on what really matters as relevant to the role. Performance is a major driver for business. What does it matter if someone fits in or not? They could just as easily be lazy at work, bored out of their wits with the repetitive grind, and still pretty much fit in a bubble of a large organizational dynamics which really does not bring any quantifiable output to the business other than another salary or wage earning employee that could become a victim for redundancy down the road. We do need to inspire, encourage, be more accepting of people that are not like us, especially for a balanced workplace and for maintaining a healthy inspiring energy for innovation. All this is important for the survival of a business and to continuously maintain a competitive advantage. Also, is it any wonder that history repeatedly shows that it is really the people that don't fit in are really the ones that eventually become the real success story in society. Why are we so hung up on maintaining the status quo in every form of society when even time eventually brings about change?
Labels:
big data
,
data science
,
economics
,
jobsearch
,
politics
,
programming
,
society
,
software engineering
23 March 2017
Entrepreneur First
Majority of investors associated with Entrepreneur First expect you to have a Phd or collaboration with someone who does have it. The funny thing is most of these investors had previously worked for companies that had become successful by founders that were college dropouts. Have there been many companies that have become successful by a Phd holder? It is very rare. Most innovators in past and present have been people with no-Phd, not even a masters, and in many cases were simply college dropouts. But, they still managed to build a successful business, engineer an innovative product, and gain investment. Looking back at the likes of Google, Microsoft, Apple, Oracle, Dell, Paypal, Facebook, and others, would any of the investors of this generation have asked back then whether they had a Phd before they received investment? Invariably, an individual with a Phd likely will not have the mindset towards taking risks and building a product that can actually sell. And, if one needs to build an entire team around them then really the value of a Phd becomes fairly redundant. Why do investors have such double standards? And, the fact that people with ideas can sometimes struggle to find investment. Perhaps, it all boils down to risk and the gamble they have to make with a potential idea as well as their return on investment. Funnily enough, 80% of all academic research amounts to nothing and holds no real qualitative or quantitative value in terms of substance and real insights. Businesses don't make any money from publishing papers, they make money from selling a viable product. In fact, in most industry sectors, innovation in organizations, is often driven by non-Phd people with many years of practical experience who have been able to spot gaps in the market or problem areas for which there is a high customer demand for solutions. The narrow-minded attitude of investors does not pay dividends and it certainly does not help any potential individuals with a viable product that may be seeking investment. However, such is the conservative attitude of many investors today either a) one has a phd, b) one has people with phd individuals on the team no matter how practically useless/clueless/inexperienced/unprofessional/unethical they may be, or c) one has a collaboration with a large multinational. Otherwise, one might struggle to convince an investor to buy into the innovative idea. Sometimes, all it takes is an investor seeking potential to invest in the individual (likability factor) and not just the product as a sum package - for that there are better alternative avenues for investment.
Entrepreneur First
Labels:
artificial intelligence
,
big data
,
Cloud
,
data science
,
event
,
finance
,
machine learning
,
natural language processing
,
predictive analytics
,
programming
,
society
,
software engineering
,
text analytics
21 March 2017
Automatic Keyphrase Extraction
intro to automatic keyphrase extraction
survey of the state of the art in keyphrase extraction
KEA
automatic keyphrase extraction based on statistical NLP
Keyphrase Extraction using DeepLearning on Twitter
keyword extraction tutorial
survey of the state of the art in keyphrase extraction
KEA
automatic keyphrase extraction based on statistical NLP
Keyphrase Extraction using DeepLearning on Twitter
keyword extraction tutorial
Labels:
big data
,
data science
,
deep learning
,
linked data
,
machine learning
,
natural language processing
,
semantic web
,
text analytics
17 March 2017
Solace
Labels:
big data
,
data science
,
distributed systems
,
event-driven
,
message-driven
,
microservices
,
software engineering
16 March 2017
Clerezza & UIMA Integration
Clerezza-UIMA
domeo text mining uima and clerezza
debategraph
UIMA Annotators
UIMA Tools
UIMA Addons
UIMA Resources
UIMA Ruta
UIMA Fit
UIMA DKPro
domeo text mining uima and clerezza
debategraph
UIMA Annotators
UIMA Tools
UIMA Addons
UIMA Resources
UIMA Ruta
UIMA Fit
UIMA DKPro
Labels:
big data
,
data science
,
information retrieval
,
linked data
,
machine learning
,
natural language processing
,
scala
,
semantic web
,
text analytics
Basel
Labels:
interaction design
,
interface design
,
JavaScript
,
nodejs
,
programming
,
reactive
,
software engineering
,
web design
15 March 2017
Github Alternatives
GitLab
BitBucket
Git/Gitolite (self-hosted & setup own authorization layer)
Perforce
Frog Creek Klin
git-hosting-services-compared
github-alternatives
BitBucket
Git/Gitolite (self-hosted & setup own authorization layer)
Perforce
Frog Creek Klin
git-hosting-services-compared
github-alternatives
Enterprise Natural Language Generation
Automated Insights
Narrative Science
Arria
Linguastat
SmartLogic
OnlyBoth
SimpleNLG
Types of NLG:
Text Plan->Discourse Plan->Surface Realization->Narratives
Narrative Science
Arria
Linguastat
SmartLogic
OnlyBoth
SimpleNLG
Types of NLG:
- Canned Text Systems
- Template Systems
- Phrase-based Systems
- Feature-based Systems
- Neural Generative Systems
Text Plan->Discourse Plan->Surface Realization->Narratives
Code Coverage Metrics
- Function Coverage
- Statement Coverage
- Decision Coverage
- Condition Coverage
- Condition/Decision Coverage
- Loop Coverage
- Path Coverage
14 March 2017
12 March 2017
AWS Open Guides
Labels:
API
,
big data
,
Cloud
,
data science
,
devops
,
distributed systems
,
microservices
,
software engineering
,
webservices
11 March 2017
Insurance Ontologies
Insurance ontologies are useful for building a semantic decision tree in various context of policy, claim, as well as providing the right insurance for a potential member. They also build formalisms towards a knowledge representation on basis of which decisions can be made towards policy and claim decisions. There are an ever growing types of insurances and even further in the different types of providers with their individual policy features. Semantic discovery of the right insurance product with the right policy features for a policy holder is also important towards a full alignment of coverage and the right balance of risk premium. Another place where such ontologies can apply is as a thesaurus service of terms for semantic extraction and understanding of an insurance document often in form of a SKOS schema.
Example List of Some Attributes for Insurance:
- Insurable Object Id
- Insurable Object Type
- Insurable Object Name
- Insurable Object Detail
- Insurer Id
- Insurer Name
- Insurer Product Id
- Insurer Product Name
- Insurer Product Type
- Insurer Product Detail
- Member Id
- Member Name
- Member Type
- Member Detail
- Organization Id
- Organization Name
- Organization Type
- Organization Detail
- Agreement Id
- Agreement Name
- Agreement Type
- Agreement Detail
- Claim Type
- Claim Id
- Claim Amount
- Claim Folder
- Claim Document
- Claim Offer
- Claim History
- Claim Status
- Claim Link
- Claim Decision
- Party Role
- Assessment Id
- Assessment Result
- Assessment Score
- Fraud Assessment
- Policy Type
- Policy Id
- Policy Name
- Policy Coverage Detail
- Policy Premium
- Policy Condition
- Policy Term
- Policy Member
- Policy Duration
- Policy Pre-exist Condition
- Policy Sub-Members
- Policy Risk Score
Labels:
big data
,
data science
,
finance
,
linked data
,
machine learning
,
natural language processing
,
ontology
,
predictive analytics
,
recommender
,
semantic web
,
text analytics
Digital News
News is a central cornerstone of society which allows one to be kept informed of events of local and global scope across a plentitude of topics. Paper newspapers are a foregone conclusion as many are now digitally available online. However, plenty of news services are still very much untapped and news outlets still depend on conventional approaches. Also, accessibility of news should be available to all. Although, this has been made possible to a degree with social media. The below provides some ideas of extending into a new generation of news services to provide for objectivity as well as allowing untapped potential for news gathering, semantics, analytics, and reporting.
- real-time bidding of news
- submission of news by journalists (but one doesn't have to be a journalist to provide news of significant value i.e crowdsourcing of news for all)
- return on investment for news sites and writers
- analytics on news
- seo of textual and image content through natural language processing and deep learning
- scoring news for influence, trust, and authority
- digital news gathering and discovery
- circulation graph
- alerting and viral monitoring
- sentiment analysis
- defining knowledge actors/agents for: anchor, journalist, reporter, writer, editor, correspondent
- event/story timelines
- automated/assistive headlines and summaries
- cloud based services and api
- news content publishing
- collaborative and content filtering
- increasing readership and news consumption
- news curation
- story semantics and connected news sources
- identifying fake news
- news accessibility
- firehose of automated generation of news feeds
- queryability for connected news in a linked data graph of stories
- prevent risk events through proactive text-driven forecasting and reporting
Labels:
API
,
big data
,
data science
,
deep learning
,
linked data
,
machine learning
,
microservices
,
natural language processing
,
news
,
publishing
,
semantic web
,
sentiment analysis
,
text analytics
10 March 2017
9 March 2017
Google Addons List
Mail Merge
Scheduler
Email Extractor
Phone Number Extractor
Name/Address Extractor
Advanced Spam Filter
Digital Signature Maker
Digital Signature Protector
Signature Recognizer
Fraud Recognizer
Drive Auditor
Attachment/Link Scanner
Malware Scanner
Bulk Forward
Bulk Send
Contacts Update
Contacts Map
Purging
Advanced Filters
Self-destructing messages
Read Tracker
Auto-Expire Attachments/Links
Unsubscriber
Auto-Responder
Draft Templates
Recall Message
Regex Evaluator
Task Manager
Docs Filters
Personal Assistant
Diary Manager
Micro eCards
Message Resource Lookup (SPARQL)
Send Guard (don't send to unintended recipients by mistake)
Smart Rules
Mail Protector
Inbox Auditor
Gmailify
Gmail Learning Center
Google App Script
Scheduler
Email Extractor
Phone Number Extractor
Name/Address Extractor
Advanced Spam Filter
Digital Signature Maker
Digital Signature Protector
Signature Recognizer
Fraud Recognizer
Drive Auditor
Attachment/Link Scanner
Malware Scanner
Bulk Forward
Bulk Send
Contacts Update
Contacts Map
Purging
Advanced Filters
Self-destructing messages
Read Tracker
Auto-Expire Attachments/Links
Unsubscriber
Auto-Responder
Draft Templates
Recall Message
Regex Evaluator
Task Manager
Docs Filters
Personal Assistant
Diary Manager
Micro eCards
Message Resource Lookup (SPARQL)
Send Guard (don't send to unintended recipients by mistake)
Smart Rules
Mail Protector
Inbox Auditor
Gmailify
Gmail Learning Center
Google App Script
8 March 2017
7 March 2017
6 March 2017
5 March 2017
CQRS
Labels:
groovy
,
Java
,
JavaScript
,
microservices
,
programming
,
python
,
scala
,
software engineering
3 March 2017
2 March 2017
Scalable Machine Learning
Reasons for why scale machine learning:
- training data doesn't fit on a single machine
- time to train model is too long
- too high volume of data that is coming in
- low latency requirements for predictions
How to spend less time on a scalable infrastructure:
- choose the right ML algorithm that is fast and lean that is able to work on a single machine accurately
- subsampling data
- vertical scalability
- sacrificing accuracy if it is cheaper
Horizontal scalability options:
- Hadoop ecosystem with Mahout
- Spark ecosystem with MLlib
- Turi from GraphLab
- Streaming Technologies like Kafka, Storm, AWS Kinesis, Flink, Spark Streaming
Scalability consideration for a model-building pipeline:
- choose scalable option like logistic regression or svm
- scaling up nonlinear algorithms by making approximations
- use a distributed infrastructure to scale out
How to scale predictions in both volume and velocity:
- Infrastructure that allows scale up across the number of workers
- Sending same prediction to multiple workers and returning back the first one to optimize prediction velocity
- choose an algorithm that can parallelize across multiple machines
A curious alternative for Hadoop for scalability is also Vowpal Wabbit for building models on large datasets without the requirement of a big data system. Feature selection also comes in handy when one wants to reduce the size of training data by selecting and retaining the most predictive subset of features. Lasso is a linear algorithm that is often use for feature selection. In respect of prediction velocity and volume, scaling in volume means being able to handle more data while scaling velocity means being able to do it fast enough for a use case. One also has to weigh out the sacrifice between speed and accuracy of predictions.
Labels:
big data
,
data science
,
distributed systems
,
hadoop
,
machine learning
,
predictive analytics
Text Summarization with Deep Learning
- Text Summarization with Tensorflow
- Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
- Sequence-to-Sequence with Attention Model for Text Summarization
- A Neural Attention Model for Abstractive Sentence Summarization
- Generating News Headlines with Recurrent Neural Networks
- ATTSum: Joint Learning of Focusing and Summarization with Neural Attention
- A Convolutional Attention Network for Extreme Summarization of Source Code
- Sequence-to-Sequence RNNs for Text Summarization
- Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network
- LCSTS: A Large Scale Chinese Short Text Summarization Dataset
- Deep Dependency Substructure-Based Learning for Multidocument Summarization
- Ranking with Recursive Neural networks and ITs Application to Multi-document Summarization
- Query-oriented Unsupervised Multi-document Summarization via Deep Learning
- Abstractive Multi-Document Summarization via Phrase Selection
- Modelling, Visualising and Summarizing Documents with a Single Convolutional Neural Network
- SRRank: Leveraging Semantic Roles for Extractive Multi-Document Summarization
Subscribe to:
Posts
(
Atom
)