1 July 2021

Customer Satisfaction Metrics

CX = Customer Experience

NPS = Net Promoter Score (use for long-term customer loyalty)

CSAT = Customer Satisfaction Score (use for short-term customer loyalty)

CES = Customer Effort Score


NPS useful for:

  • Growth Targeting
  • Brand Loyalty
  • Happiness
  • Feedback
  • Honesty
  • Overall Experience

28 June 2021

What Does An AI Researcher Do?

There is nothing that an AI Researcher does that an AI Engineer cannot do. In fact the following things are a majority aspect of a researcher's job function.

They research new ways of doing things, the magic word here is explore. They don't build systems that can scale to hundreds and thousands of data points, for that they need an AI Engineer. 

They read research papers, which an AI Engineer also does and apply them in context

They stay current on AI Trends, which an AI Engineer also does and apply them in context

They hold a Phd, which an AI Engineer may/may not hold nor necessarily care for to do their work

They spend time building novel algorithms, which is a bit questionable as they have poor programming skills and lack the ability to scale any of the algorithms, on top of which to solve a problem they will need practical experience which one does not get from reading research papers. Again, an AI Engineer can pretty much do this and likely better, and have the mindset to apply design patterns by doing it the correct way from prototypes to scalable solutions. 

They get sponsorship funding for research work, which they get from holding a Phd but not necessarily from actually having skills at converting theory into practice for that again they need an AI Engineer. They also lack the ability to manage data and what they have learned they find it difficult to adapt to change in research approach. They seem to stick to what they know rather than learn what they don't know as part of adapting to change. They rarely like to approach things outside of the boundaries of their so called specialist area. Funnily enough, it is in most cases a research assistant at academic institutions that tends to build the practical application of theory not the professor.

They are specialized in a certain field, have likely taught in academia, given conference talks, and have published papers in the area. One can't be a specialist in an area if they have no practical skills of applying any of it. It is questionable whether any of what they publish is even worthy of research, precisely why 80% of research coming out of industry and academic institutions amounts to nothing. And, the 20% that does get classed as a research breakthrough tends to be built by someone  sponsored by an organization, looking to solve a certain practical issue, who doesn't hold a Phd, and likely with a practical engineering background. Furthermore, it is surprising that many AI Researchers make poor teachers. Invariably, many, also dislike the aspect of teaching but have to as part of being linked to an academic institution.

They collaborate on standardization effort and open source artefacts as part of converting theory to practice. Even here majority of open standardizations are done by people with practical experience who understand the gaps in an application area. An AI Researcher rarely produces anything of significant value but usually tries to take credit for a large majority of the effort which is likely done by an AI Engineer.

One can notice that practical application is far more important than theory. It is from practical experience one can understand problems and extend towards a solution. There is also no one way of solving a problem. Basic theoretical background can be achieved without going to university or even attaining a Phd. There are books and online courses for literally anything and everything one can think of. 

They have greater opportunities for academic influence and research. This may be true because they have built up a network within the area for collaboration. However, most academic institutions get their funding from either government trusts, grants, or private sector. Unless the AI Engineer has a sponsorship or a network of associates that can provide funding for their work, they may be at a slight disadvantage. AI Researchers also as a result of networking tend to have a higher sense of reach towards influence but even this can be targeted back to an AI Engineer who develops influence through practical applications. One way of beating an AI Researcher at their own game is to build open source projects, provide published papers for them, and build a portfolio of practical solutions delivered to organizations. One doesn't need an advanced degree for practical achievement. Invariably, it is far more important to have the tenacity, curiosity, and enthusiasm to learn, explore, and extend towards building a practical novel solution.

AI Researchers tend to have a more focused academic background, with limited practical experience, as defined on their resume with publications and conference talks, but an AI Engineer tends to have a stronger practical bias where they may or may not have published papers. However, at times the job functions may be interchangeable given the confusion and disarray in communication at most clueless organizations. A Phd background is not an automatic pass to the person being an expert in that area. One may at times come across people calling themselves AI Researchers with a Phd but in fact are completely clueless about the field or the application of it. One has to be mindful when hiring such people. There are many ways of spotting the fake AI Researcher, many of which relate to their lack of objectivity, their questionable attitude, their questionable understanding about a topic, their confused sense of ethics, and their lack of critical evaluation process: 

  • they confuse Neural Linguistic Programming (NLP) with Natural Language Processing (NLP)
  • they lack professionalism and respect when interacting with non-Phd people
  • they have a history of practicing academic dishonesty which in the work place interaction converts into unethical practices and false sense of entitlement
  • they are often hypocritical with their notion of ethical principles and code of conduct
  • they don't treat others with respect and have an in grain sense of being overly defensive and biased in their communication
  • they have published papers that show very little critical evaluation
  • they have published papers that are likely plagiarised
  • they have published papers that are not theoretically correct
  • someone else may have written a published paper for them, for which they took credit, eg via crowdsourcing, most of the work done by a supervisor to meet passing research indicators
  • they are generally clueless and contradicting themselves with their own actions and explanation and digging themselves into an even bigger hole of illogical thinking
  • they haven't really published much after their thesis work
  • they have a mediocre citation score to majority of their papers
  • they lie about their background
  • they violate basic privacy laws during meetings, are rude in their interaction, or appear insecure by trying to invalidate others
  • they are not self-critical of their own work and of their own deficiencies, they spend more time criticizing others rather than self-reflection
  • they think they know what they are talking about just because they hold a Phd
  • they have peculiar mannerisms and the way they come across about a topic makes you question their qualifying background
  • they display an unwelcoming or condescending attitude
  • they like to use a lot of flowering language and impress upon how busy they are, even if they aren't that busy at all
  • they will try to impress upon their background, but will get caught in using incorrect terms, incorrect logical thinking, and automatically invalidate their position
  • they may use a lot of assumptions in their speech without backing their claims
  • they are unable to translate anything into any form of practical output, and what little they produce is packaged up as an API wrapper around someone else's work
  • they will be enamored by the academic institution that they attended for their Phd, keep using that as their defense, but have little to no contextual understanding about concepts in any level of technical depth when applied in practice
  • it is very easy to put them on the spot and expect them to remain speechless of being caught out
  • they have a tendency of using cognitive biases in their actions and speech
  • the way they interact, if the workplace CCTV was played back to them, it would not only be embarrassing, but display both their rudeness and unprofessionalism 

Open Standardization In Artificial Intelligence

Open Standards in Artificial Intelligence

Knowledge Graph Embedding Libraries

  • Ampligraph
  • OpenKE
  • Scikit-KGE
  • OpenNRE
  • LibKGE
  • PyKG2Vec
  • GraphVite
  • PyKeen
  • DGL-KE

27 June 2021

Incompetent Data Scientists

What does a Machine Learning Engineer do?

Everything a Data Scientist is suppose to be able to do

What does a Data Engineer do?

Everything a Data Scientist is suppose to be able to do

What does a Knowledge Engineer do?

Everything a Data Scientist is suppose to be able to do

What does a NLP Engineer do?

Everything a Data Scientist is suppose to be able to do

What does an AI Engineer do?

Everything a Data Scientist, a Machine Learning Engineer, a NLP Engineer, a Knowledge Engineer, a Software Engineer is suppose to be able to do

So, why isn't a Data Scientist doing any of this and why so many additional roles? Why not just hire an AI Engineer and save the mumbo jumbo of roles?

Because, majority of Data Scientists in industry only like to build models and the models they build are not suitable for prime-time production nor are they evaluated correctly, nor are they built correctly. Basically, they are incompetent, many hold Phd degrees, and are never taught basic Computer Science concepts. In industry, roles are created to fill gaps in particular skills. Shockingly, all the above roles can be done by anyone with a Computer Science degree. Because, Machine Learning is part of AI, Knowledge Representation and Reasoning is part of AI, NLP is part of AI, Data Engineering is part of Computer Science, Software Engineering is part of Computer Science, Data Science is part of Computer Science, and AI is part of Computer Science. And, all these concepts are taught in a Computer Science degree. The industry has become a hodgepodge of roles because of hiring incompetent Phd people that need help with literally everything for them to do their job. And, the job they do amounts to nothing because everyone else is pretty much doing the job for them. Even the aspect of research requires some artefact to be produced which they need help with to complete their work. In fact, even a peer reviewed paper they need others to help them peer review. The useless Phd researcher is more a redundant role in industry and as accounting paperwork would show that hiring one of them leads to hiring a whole list of other people to help them do their work. Badly designed academically inclined recruitment processes, badly designed academic courses, it is a right mess that Phd individuals have created in industry and at academic institutions with a total lack of ability to convert theory into practice. Such things are only going to get worse, as more organizations look to hire clueless Phd people with the false pretence of thinking that they actually have any practical experience and expertise in their respective domains. In fairness, they likely to be more hypocritical, arrogant, egotistic, and come with huge amount of shortcomings in both their approach to work as well as in the practical application of it. Fundamentally, a Phd individual lacks the mindset to think in terms of abstractions, constraints, in terms of distributed systems, does not take into account real-world complexity, nor accounts for aspects of uncertainty that come with modeling data and the relevant scope for error. There are just too many people in industry that are clueless that will just follow the crowd and never really understand whether any of it makes sense. In fact, management and investors, in particular, tend to dictate pretentious attitudes to recruitment and to designated work roles. So long as, organizations keep desiring Phd individuals, they will have to keep recruiting more engineers to support them with everything - what an utter waste of budgets, displaced talents, and resources. The areas of Machine Learning, NLP, and Knowledge Graph techniques have been around for decades compared to the more recent role of a Data Scientist. In fact, traditionally, the role of Data Scientist did not even incorporate aspects of AI, this has been a fairly recent addition to the role function. Traditionally, Data Science used to be mostly about the application of a limited subset of Machine Learning approaches in the context of Data Mining which now overshadows the domain of Data Analysis. Even here, many people in industry make the assumption that AI is all about Machine Learning or that Data Science profession is all about Machine Learning application which could not be further from the truth. Organizations need to re-evaluate their hiring requirements and hire such roles with an engineering mindset where the person is involved in the entire end-to-end data science method rather than a Data Scientist whose only interest is in building Machine Learning models with a subjective evaluation that are inherently overfitted to the data with little to no appreciation of the entire work effort.

26 June 2021

Fake Devops Engineers

  • Takes them 1 to 2 months to spin an instance on the cloud when it should take a couple minutes at max (the whole process literally takes a few seconds on most cloud environments), apart from additional time for setting up security groups which should take 2 days or possibly a week.
  • Negating everything you say, then using your suggestions as their own
  • Taking longer than is normal to provision and setup an environment
  • Having excuses for everything when things go wrong
  • Playing blame games
  • Not provisioning sufficient monitoring and automation services
  • Have they ever attended a devops conference?
  • They prefer windows to linux environments
  • They get frustrated very quickly at the most silly things
  • They confuse ops with devops
  • They find it difficult understanding that any regular polygon can fit into a square (this is a typical case of being able to understand abstractions which even works at identifying a fake architect)
  • Don't understand infrastructure as code
  • Don't understand the relationship between development and operations
  • Don't understand how to manage and use automation
  • Don't understand what small deployments means
  • Don't understand what feature switches mean
  • Don't understand how to use nor heard of Kanban, Lean and other Agile methods
  • Don't understand how to manage builds and operate in a high-velocity environment
  • Don't understand how to make sense of automation, tools, and processes
  • They don't understand the devops workflow
  • They lack empathy
  • They don't understand trunk-based development
  • They don't understand what a container is used for
  • They don't know how to manage an orchestration process
  • They don't know how to manage a staging environment
  • They don't know what serverless means
  • They don't understand difference between microservices and monolith
  • They don't understand immutable infrastructure
  • They don't know what type of devops specialist they are
  • They don't know how to create a one-step build/deploy process
  • They don't know how to instil trust and respect
  • Not having any favorite devops tools
  • Not having any specific devops strategies or best practices
  • How do they decide whether something needs to be automated
  • They find it difficult to solve issues when things go wrong
  • They find it difficult to embrace failure and learn from their mistakes
  • They have difficulty in problem-solving in production environments
  • They find it difficult to link up tools and technical skills with culture and teamwork
  • They have a big ego vs a humble nature when it comes to self-characterization
  • Someone that over-compensates on self-promotion but does not acknowledge their deficiencies

Kubernetes Ecosystem

Backup

  • Velero

CI/CD

  • Argo
  • Flagger
  • Flux
  • Keda
  • Skaffold
  • Spinnaker

CLI

  • helm
  • k9s
  • ktunnel
  • Kubealias
  • Kubebox
  • Kubectx
  • Kubens
  • Kubeprompt
  • Kubeshell
  • Kubetail
  • Kubetree
  • Stern

Clustering

  • Eksctl
  • k3s
  • kind
  • kops
  • Kube-AWS
  • Kube-adm
  • Kubespray
  • miniKube
  • Gravity
  • Kaniko
  • Ingress
  • KubeDB

Data Processing

  • Kubeflow

Development

  • Garden
  • Makisu
  • Telepresence
  • Tilt
  • Tye
  • Teresa

Mesh

  • Istio
  • Linkerd
  • Ngix Mesh

Monitoring

  • Dashboard
  • Grafana
  • Kiali
  • Prometheus
  • Statemetrics
  • Kubecost

Networking

  • Coredns
  • Externaldns
  • Kubedns

Security

  • Falco
  • Gatekeeper
  • SealedSecrets

Storage

  • Rook

Testing

  • Popeye
  • k6s
  • Kube-Monkey

Native

  • KNative
  • Tekton
  • Kubeless

25 June 2021

Bad Processes for Applications And Interviews

Interview and application processes at many organizations generally requires a massive overhaul across the board, but for IT related jobs in particular. Some of the below, highlight the glaring truths about such recruitment practices:

  • Providing puzzles to solve that no one in their right mind will ever need to do on the job
  • Providing codility or hackerank tests that anyone can cheat their way through them
  • Asking bookish questions to test memorization skills or to impress upon the candidate
  • Asking questions that are totally irrelevant to the role function
  • Asking one to do a test, like why? Do you ask a builder to build you a sample wall before you hire them to build you a wall?
  • Coming into an interview with certain assumptions about the candidate even before interviewing them
  • Asking them badly worded questions like "what gives you energy?"
  • Asking someone to do pair programming. Do people naturally talk out loud in any job function, ever heard of anyone pairing in other job functions like finance, admin, marketing, operations?
  • Having excessive amount of interview stages
  • Having bad attitude or being unprofessional while interviewing a candidate
  • Hypocritical behavior, like making a candidate wait for a long time for an interview, but having an issue when the candidate is running late
  • Having silly tests that have no basis
  • Applying for one role but trying to interview the candidate for another role without candidate consent
  • Messing around with candidates and wasting their time during application and interview stages
  • Not providing application and interview feedback. Providing feedback is a requirement if want to be in compliance with GDPR as part of processing and storage of candidate information.
  • Changing the job role part way through an interview process
  • Interviewing candidates before securing funding for the work
  • Not being honest about the job role
  • Over selling and underdelivering on the job function
  • Using terms like cultural fit to justify their biases
  • Expecting certain educational backgrounds that are unnecessary for the job function
  • Showing interest in a candidate purely on basis of where they got their degree or which company they had previously worked for
  • Not focusing on candidates practical skills
  • Giving candidates silly tests to do through out the interview stage
  • Making candidates feel uncomfortable during interview stage
  • Not providing water or asking for refreshments before a face-to-face interview
  • Not listening to what the candidate has to say or not letting them speak
  • Evaluating candidates purely on basis of likability and subconscious and unconscious biases
  • Rejecting candidates for roles just because they are women or part of a minority group
  • Rejecting candidates for roles on basis of religion or other such prejudices
  • Not being considerate and respectful with candidates
  • Being overly distrustful and pessimistic of candidates through the entire process
  • Not answering basic questions of candidates that would help them evaluate the job function
  • Being difficult, unapproachable, and not being forthcoming with candidates
  • Taking too long to provide feedback or not providing any at all
  • Not realizing that job interviews are a two way process
  • Rejecting candidates for using american english rather than british english for spelling words
  • Rejecting candidates for grammatical mistakes and being too pedantic
  • Rejecting candidates based on their looks and appearances
  • Rejecting candidates based on disability and not being sufficiently accommodating
  • Changing interview times or cancelations at the last minute
  • Purging the entire job application database so the candidate who might have spent time on the application has no chance to be reviewed, and likely has to apply again
  • Advertising for jobs that do not exist
  • Advertising for jobs but having a preferred source of candidates
  • Advertising for job where the job title does not match the job description
  • Advertising for job when the job has already been filled
  • Using job ads as a marketing gimmick
  • Asking for age, date of birth, and race on the job application
  • Not focusing the application and interview to what the job actually requires and entails
  • Not interviewing candidates on their relevant merits
  • Using silly benchmarks and psychometric tests
  • Not reviewing every job application and candidate
  • Using non-technical people to pre-screen candidates who have no background for the skills required for the job function 
  • Using job titles rather than the context and content of work when evaluating job applications
  • Screening candidates by keywords and not context and content of work
  • Making it difficult for candidates to approach organizational recruitment teams for enquires or feedback
  • Not acknowledging job applications nor the deletion of job applications
  • Refusing to shake the candidate's hand after an interview
  • Using cognitive and prejudicial biases to screen a candidate
  • Having badly designed job application forms
  • Having poor communication skills as interviewer but expecting amazing communication skills from interviewee
  • Going off on a tangent and losing focus
  • Rejecting candidates because they didn't feel comfortable or like your pet dog or cat in the office
  • Being rude and offensive to candidates
  • Talking about diversity awareness, but not having much of a diverse workforce in the office, nor displaying an open-mind about diversity of cultures
  • Using excessive stereotypes and generalizations in communication with candidates
  • Not being careful with using gender pronouns
  • Using innuendos whether sexual or otherwise to invade privacy or personal space of candidates
  • Don't ask silly questions like "how are you" during a lockdown period or when there is a pandemic as you could likely expect a diplomatic answer, in many cases the candidate could find the question on the whole quite inappropriate, given the circumstances of the situation
  • Don't expect a candidate to have video on during a virtual interview session, in fact you should not even care what the person looks like in first instance

24 June 2021

People That Memorize Things

There appear to exist people in IT industry that memorize literally everything from APIs, Libraries, functions and the whole lot. In some job interviews they will even ask questions that purely test one's memorization skills. Such efforts at memorization are futile. In many cases, it is unnecessary and a pointless exercise in wasted time. One should never have to memorize something that they can lookup or do an autocompletion on, especially if it is being used as a tool to complete work. Technology moves at such a fast pace that new versions are released, APIs are changed, new ways of doing things are introduced, and in time previous methods may be deprecated. In many cases, people that memorize such tools are likely doing it to pass a certification exam. Perhaps, also something to question, is the need for such a pointlessly designed certification. Invariably, memorization is tested by people that are academically inclined, that use it as a yardstick for others, and have very poor practical skills at applying any of it themselves. Phd people tend to fall in such an academically inclined group with poor practical skills. All in all, memorization in most context from academics to practical life adds little value. 

20 June 2021

Why Pure Theoretical Degrees Are Useless

Theoretical degrees are utterly useless in the practical world. However, they may be useful for teaching. Reason being they have zero element for practical reasoning. If one can't apply theory into practice then what is the point of such degrees. Math degrees teach concepts, they provide the formula and they provide the problem. In physics, they provide the problem and they provide the formula to solve for that problem. Such degrees amplify little in application. When such people enter the practical world, they need to be taught how to do literally everything. One wonders where they forgot to think along the way of attaining such a theoretical degree in being able to apply themselves. In real world, one has to find the problem then find a way to solve for that problem, and this is the case ninety nine percent of the time. The only way one can combine such excessive theory is to add an element of engineering to it. In biology, chemistry, and other such courses the degree is transformed into application for medicine, pharmacology and life science disciplines. Any degree that only provides an element of theory is pointless, as it is only good for academic purposes. In practical world one has to be taught how to apply such theory into practice to be productive and useful in society. Increasingly, universities are failing to combine theory with practice, because they aim to meet numbers for educational measures and indicators so as to achieve more academic funding. The biggest mistake employers can make in the IT world is recruit graduates with purely theoretical backgrounds to do practical AI work. Don't hire a math, physics, or statistics graduate to build a machine learning model they will require a lot of mentoring and training. Majority of practical and theoretical AI work requires a computer science background where such material is formally taught in the degree course.

Data Journalism

Data Journalism

18 June 2021

Why Pure Probabilistic Solutions Are Bad

In Data Science, there is a tendency to focus on machine learning models that are inherently based on statistical outcomes that are essentially probabilistic in nature. However, to test these models one uses an evaluation method which is also seeped in statistics. Then to apply further analysis on explainability and interpretability one again uses a statistical method. What this turns into is a vicious cycle of using statistics to explain statistics with an uncertainty of outcomes. At some point, one will need to incorporate certainty to gain confidence in the models being derived for a business case. Essentially, knowledge graphs serve multiple cases towards increasing certainty into the models but also for providing logical semantics that can be derived through constructive machine-driven inference. Logic, through relative inference, can give a definite answer while the machine learning model can at most provide a degree of confidence score to which something holds or doesn't hold with a total lack of care for contextual semantics. A machine learning model rarely provides a guaranteed solution as it is based on targets of approximations and error. Hence, why there is a tendency to measure bias and variance in training, testing, validation data. The evaluation is also relatively based on approximations of false positives, false negatives, true positives, and true negatives. Logical methods can be formally tested. A machine learning model can at most be subjectively evaluated with a degree of bias. Invariably, at any iterative time slice a pure statistically derived model will always be overfitted to the data to some degree. Statistics derive ridged models that don't lend themselves to providing definite guarantees in a highly uncertain world. Invariably, the use of statistics is to simplify the problem into mathematical terms that a human can both understand, solve, and constructively communicate. Hence, why there is such a huge statistical bias in academia as it tends to be traditionally a very conservative domain of processing thoughts and reasoning over concepts as a critical evaluation method within the research community. One can say that such a suboptimal solution may be sufficiently good enough? But, is it really good enough? One can always provide garbage data and train the model to provide garbage output. In fact, all the while the statistical model never really understands the semantics of the data to correct itself. Even the aspect of transfer learning in a purely statistical model is derived in a probabilistic manner. The most a statistically derived model can do is pick up on the patterns. But, the semantic interpretability of such data patterns is still yet to be determined for guarantees of certainty in fact it is presumably lost in translation. Even the state of the art model is fairly subjective. Evaluations that only look at best-cost analysis in terms of higher accuracy are flawed. If someone says their model is 90% accurate, one should question in terms of what? And, what happens to the other 10% that they failed to account for in their calculations which is something of an error that the person pipelining the work will have to take into account. Invariably, such a model will likely then have to be re-evaluated in terms of average-cost and worst-cost which is likely going to mean an increase in variable error between 5% and 15%. The average-cost is likely to lie somewhere in 10% and the worst-cost somewhere near 15%. So, 90% of time, in production the idealized performance accuracy of the model would be 90% - 10% = 80% on average-cost, plus/minus 5% on the best-cost, and anywhere between minus 10% and 15% on the worst-cost. This implies that 5% of time the model will perform on best-cost, 5% of time on worst-cost basis, but 90% of time on average-cost where the idealized predictive accuracy, when taken into account the full extent of the error would be 80%. Even though, this is still fairly subjective, at least idealized metric where it takes into account the environment factors improves on the certainty. This is because in most cases a model is built under the assumptions of perfect conditions without taking into account the complexity and uncertainty which would be present in production environment. There is also the need to be mindful, sensible, and rational with the use of the accuracy paradox. One can conclude here, that a hybrid solution of combining probabilistic and logic approach would be the best alternative for reaching a model generalization case of sufficient certainty to tackle the adaptability mechanism for process control as well as to capture the complexity and uncertainty domains of the world.

15 June 2021

Bias, Slant, and Spin

Hydroponics Fruits, Herbs and Veggies

Agrotech industry is a fairly active area of research for AI. The setup requires indoor soil, lighting and AI to make it happen. The below list includes most common fruits, herbs, and vegetables grown in hydroponics:

  • Cucumber
  • Tomato
  • Lettuce
  • Strawberry
  • Basil
  • Coriander
  • Spring Onion
  • Pepper
  • Spinach
  • Blueberry
  • Radish
  • Kale
  • Green Bean
  • Chive
  • Mint
  • Cabbage
  • Carrot
  • Cauliflower
  • Celery
  • Beet
  • Chard
  • Broccoli
  • Corn
  • Eggplant
  • Leek
  • Sage
  • Melon
  • Onion
  • Pea
  • Zucchini
  • Squash
  • Parsley
  • Oregano
  • Rosemary
  • Thyme
  • Chamomile
  • Dill
  • Lavender
  • Chicory
  • Fennel
  • Mustard Cress
  • Lemon Balm
  • Watercress
  • Stevia
  • Peppermint
  • Tarragon
  • Bok Choy

11 June 2021

Thorny Weeds of Data Science

In industry, the area of data science is a bit like navigating through a large field of thorny weeds. There are just too many people pitching themselves as experts, that don't understand what they are doing. Many of them with Phd backgrounds who have the complete inability to translate theory into practice. The field is a breeding ground of insecure people in teams with pure academic backgrounds. For everything, they require help, additional resources, adding to the frustration of their peers but also to the management who have to support their work with unnecessary inefficiencies and extensive funding. The patterns of recruitment or lack thereof seem to be typical across many organizations. Often these false hopes of hiring Phd individuals to lead research work in business seems to stem from clueless investors who neither have the interest nor the sense to understand the practical aspects of the field. Interview rounds for candidates becomes a foregone conclusion of misplaced, inept, and aberrant hiring. As a result, the entire organization becomes guilty of compounding issues in hiring incapable, pretentious, and arrogant individuals, lacking basic common sense, who may apparently have steller academic credentials. In many cases, it is questionable on the merit of a Phd qualification, where much of it may have been gained via crowdsourcing platforms or even via the extensive help of the academic advisor for the thesis write-up. Most things that a Phd caliber individual can do, a non-Phd individual can do it better and translate it into a product. If a Phd individual cannot convert theory into practice then what is the point of such a hire? Considering the fact, only one percent of the population has a Phd, is it any wonder why organizations are so ill-informed about calling it a skills shortage when there isn't any to begin with and should really focus on correcting their idealized job requirements. Invariably, organizations learn the hard way when projects fail to deliver and there is no tangible return on investment from a Phd hire. Is this an evidence of a failure of the education system, of the entire technology industry, or perhaps both. Flawed online training courses, mentoring, and certification courses further amplify this ineffective practice. Bad code and models, just breeds bad products to the end-user which ultimately effects investment returns from lack of business performance where targets need to be offset through additional end-user support plans. It still stands to reason, that for any company, people are the biggest asset. Hiring the right people and looking beyond the fold of their credentials is paramount. In fact, hiring astute generalists is far more important than hiring specialists in the long-term. A specialist in a certain area, at a given point in time, is likely to be an outdated specialist in the short-term to long-term cycle of work. Organizations that function as businesses need to strategize their game plan and forecast for the future, which may just be a few quarterly cycles ahead of time. The quickest way to failed startup is to increase costs by hiring Phds and increasing head count of staff to support their work. One needs to wonder why hire so many people to support someone with a Phd unless they are practically incompetent. Academics invariably cannot translate into practice which impacts delivery cycles of work. Phds as a result become the weakest link in many cases, inhibiting and hampering the cycle of productivity and innovation both for the short-term as well as long-term growth of an organization.

7 June 2021

Why Microsoft Products Are Terrible

  • Tight coupling approach to products and services
  • Documentation bias towards own products and services
  • This only works on windows
  • Plagiarism and stolen code from other vendors
  • Security risks and software glitches
  • Business model built on stolen products, services, ideas, and code
  • Market hijacking
  • Consolidation rather than any significant innovation
  • Windows copied from Mac
  • DOS copied from DEC
  • Bing copied from Google
  • Explorer copied from Netscape
  • All windows versions come with design flaws
  • Lack of separation between OS and application
  • Unrepairable codebase
  • Waste of resources
  • The dreadful blue screen of death
  • Unreliable as cloud servers
  • Trying to do everything but master of none
  • Terrible network management
  • Terrible at internet related services and products
  • Enjoys copying other competitors
  • Lots of security vulnerabilities
  • Forced sales targets for substandard products and services
  • Marketing gimmicks that breed lies and failed promises
  • Buying open source solutions to kill off the competition
  • Doesn't support open source community
  • Works on the vulnerabilities of ignorant customers
  • Ease of use can be subjectivity and at the detriment to lack of quality
  • Ignorant users are happy users
  • Forcing updates and patch releases for security failures in quality
  • Bad practices and foul play
  • Forcing users to use windows instead of linux or mac
  • Vendor lock-in and further use of the cloud to apply the same methodologies
  • Business as usual with anti-trust
  • Rigged tests and distorted reality
  • Bogus accusations
  • Censorship
  • Limited memory protection and memory management
  • Insufficient process management
  • No code sharing
  • No real separation between user-level and kernel-level
  • No real separation between kernel-level code types
  • No maintenance mode
  • No version control on DLL
  • A weak security model
  • Very basic multi-user support
  • Lacks separation of management between OS, user, and application data  
  • Does not properly follow protocol standards
  • Code follows bad programming practices
  • Anti-competitive practices to block open source innovation and technological progress

6 June 2021

Why Build Your Own Cloud Infrastructure

It benefits organizations in moving away from third-party cloud providers to developing their own in-house cloud strategy. The following highlight some reasons:

  • Too dependent on third-party infrastructure from cloud provider
  • Too much trust in third-party cloud provider for your organizational needs
  • Compliance and privacy breaches from cloud provider
  • Leaked secrets to competitors from cloud provider
  • You don't own your own data from cloud provider
  • You don't know where your data is held from cloud provider
  • Geo-located third-party services makes it difficult to keep track of governance
  • Tight coupling to the cloud provider
  • Have to build design architecture dependent on cloud provider services and orchestration process
  • Have to build design architecture according to cloud provider access/role services and policies
  • Cloud provider can block your services at anytime
  • Cloud provider could be using other third-parties
  • Cloud provider may lack customer care when you require support
  • Your service uptime is dependent on cloud provider uptime
  • If your choice of cloud provider shuts down permanently, it will require a massive migration
  • Cloud provider decommissions a service leads to sudden re-engineering and re-think of services
  • Logging anything from cloud provider can be limited and at times problematic in transparency
  • Cloud provider cost is variable and can change at anytime
  • Loss of data from cloud provider
  • Your customers will be effected with the downtime of cloud provider and any lack of support
  • Control your own destiny, security, orchestration, and architecture
  • Control your own backups
  • Control your own data governance and user management
  • Control your own cost of maintenance
  • Control your own reliability and scale out needs
  • Control your own data and storage
  • Control your own organizational assets
  • Control your own organizational liabilities
  • Recruit and screen your own employees that manage your cloud infrastructure (know your employees)
  • Flexibility to sell your own cloud to other third-parties
  • Build services that measure up to organizational requirements
  • Know exactly where your data is stored and meet regulatory requirements for compliance and audit
  • Build your own AI and data science infrastructure
  • Make your own cloud strategy fully automated
  • Make it more responsive to failure and fault tolerance
  • Build your own secret knowledge graph sauce for your organization using own infrastructure
  • No longer dependent on specialist resources for your organizational needs
  • Technology is more advanced than it used to be, things are getting simpler to manage
  • Don't use Azure, it sucks, an organization is definitely in a better position to develop their own cloud strategy
  • Don't use GCP, it sucks, an organization is definitely in a better position to develop their own cloud strategy
  • Don't use AWS, it sucks, an organization is definitely in a better position to develop their own cloud strategy
  • Don't use some other opinionated cloud provider, an organization is definitely in a better position to develop their own cloud strategy

5 June 2021

Unpredictable Google

Google services are the worst. One minute they are available for use. The next minute they are going through a decommissioning process. Then there is that aspect of their page ranking algorithms which keep changing effecting the publisher revenue. Not to mention the way they have recently been giving preferential treatment through a preferred advertising supplier network. One minute an API is available to use, next minute it is gone. The same is the case on GCP. Nothing seems to stay for very long before it is changed with a total lack of regard for the user. No time frames given for planning a migration. Not to mention the fact to find any information one has to literally hunt for it. One would think if they are a search company they would know how to make their searchable and findability functions user-friendly - but no. And, it takes ages to remove anything from their search engine. The company is also slack in following basic privacy and regulatory compliance. The company just gives off an air of arrogance like they can get away with everything without really being very responsible with user data. There seems to be a complete disconnect across the internal organization which shows in their products and services initiatives. Over the years, with multiple court cases in the international community, Google has been slowly but surely losing the sense of credibility of their services with users. Large company like Google eventually meets its faith when more issues with reliability and security of their services come into question while increasing frustration for their users for their lack of responsive customer care and dodgy business practices. A perfect example of a company that just doesn't care about the end-user.

28 May 2021

TypeDB

TypeDB

Drawbacks of SHACL

SHACL is a constraint language for validating RDF graphs against a set of conditions. It works in a closed-world assumption when the end goal is an open-world assumption. The validation is only really defined against those set conditions of constraint. There will be cases when validation cases are missed at point of inference. The entire search space can not be tested against constraints defined over a set of conditions. Approach can be seen as rather opposite to the intended goal. What follows from a SHACL validation can lead to a form of reverse engineering leading to a partially closed-world assumption criteria. It may be better to introduce SHACL earlier in the process rather than later in the process so as to avoid conflict of outcomes. SHACL validation tests can quickly go out of hand if acceptance tests against requirements that are defined in a question/answer form become a form of integration validation tests where constraints have overlapping dependencies. One can notice how quickly SHACL tests form unmaintainable targets and impervious constraints that are derived from set of conditions. In all fairness one may want to validate the graph at point of when it is in a state of a closed-world assumption and not after it has been generalized to an open-world assumption.

23 May 2021

Who is regarded as an expert?

Is an expert, someone with years of practical experience or someone with a Phd that doesn't know how to apply any of the theory in to practice? In, most cases, an expert is someone that has a balance of both and the practical experience to match, but not necessarily someone with a Phd. In fact, invariably, an expert does not have a Phd and usually finds the whole approach pointless. However, the way things move in the world, one is an expert one minute, and pretty much an outdated expert the next. It takes at least 10 years of practically applied experience to become an expert at something. By which time, either the approach or the technology becomes outdated. So, is it even worth becoming an expert? In some areas there is no other way. However, in most cases an expert is a pointless notion as it is all fairly subjective. But, conservatives will still focus more weight on academics than on experience. Ultimately, people learn the hard way when their financial pockets run dry and they need someone to deliver on a solution in the shortest amount of time rather than take any further risks with useless Phds who have no clue whatsoever and have provided quantifiably and qualitatively zero results in terms of return on investment to an organization. The safest bet for organizations is to develop a team of astute generalists that can adapt to change who have the ability to self-learn, self-explore, and self-grow in their practical endeavors, which is the whole point of what a degree is supposed to prepare and enable an individual at point of their graduation.

22 May 2021

Key Things Missing In Academics

Academics cannot replace practical skills. Many academically inclined people, that tend to have a Phd, have theoretical backgrounds but lack sufficient skills to convert it into practice for it to be of any use to business. Many academic courses at universities also discount certain key areas that almost always occur in practice or in the practical world would be defined as common sense. The following highlight four core areas that almost always are necessary in solving business problems in practice that are never sufficiently taught in academics: 

  • Rationalization of complexity:
    • Understanding complexity of the world we live in and the fact something that can be done in academic theory may not be possible in practice, given the resources at the time. Understanding such aspects of computational cost, architectural constraints, input/output dependencies, third-party dependencies, scalability requirements, resilience, latency, load, bandwidth, performance, time constraints, funding limitations, management buy-in, cloud resources, access to data, licensing requirements, state management, copyright restrictions, regulatory requirements, skill availability, and other such complexity constraints.
  • Rationalization of uncertainty:
    • Understanding that in practice there are many uncertainty variables, that need to be taken into account, that often may get ignored in academic theory. Often this involves a certain degree of risk. Such risks could take on many forms of economic, geopolitical, social, environmental, government regulation, criminal, accidents, errors, event state outcomes, customer behavior, market demand/trends, failure to deliver, health and safety, loss of team resource, loss of funding, sudden eventual changes, and any such factors outside of immediate control.
  • Rationalization of noisy data:
    • Understanding that in practice data is almost always noisy and that someone will not provide you clean data on a silver spooned platter.
  • Rationalization of context:
    • Understanding that in practice everything has context, and it is in this context that things can be constructively applied within the bounds of rational pragmatic thinking. There is no silver bullet that can magically solve all problems of the world. Often context understanding comes with practice as it involves all the above key areas of complexity, uncertainty, and being able to handle noisy data. In most cases, a formula is not provided and the problem is not defined. One has to formulate the problem present in the data, and discover a formula to solve such a problem.
As a side note, it seems as the more academic one gets, the more inclined one becomes, at not using common sense. And, even the most simple tasks become difficult to accomplish or require additional help from others. So it seems, academics is a trajectory that promises a lot initially but quantifiably or qualitatively delivers little in return to the individual in society over the long-term.

Matterhorn

Matterhorn

16 May 2021

Six Elements of Web Intelligence

  • Look 
  • Listen 
  • Learn 
  • Connect 
  • Predict 
  • Correct

What is a Knowledge Graph

A knowledge graph has a few key characteristics:

  • Is a type of knowledge base containing semantically integrated and interconnected graph-structured data
  • The representation is in a machine-readable standard that follows an open-world assumption
  • The representation forms an abstraction of a connected graph of relationships 
  • An ability to reason over the connected representation, defined in an open-world assumption, in order to build inference on new relationships
  • The queryable representation is iteratively evolvable through machine-inference, therefore dynamic and not static
  • The ability to semantically infer over machine-readable data allows the abstract representation of data to extend from information into knowledge
  • The semantic context is prevalent and embedded in the knowledge representation, namespaced, and tends to be defined as interlinked URI resources
  • The representation can be stored in any noSQL database, that is able to support the serialized data format, but for performance reasons it tends to be stored in a native graph-like triplestore or a quadstore
  • Property graph without the machine-readable representation and inference layer is not a knowledge graph 
  • Without machine-readable representation and inference layer the representation is static, merely data, likely to be defined in a closed-world assumption that can be queried and managed in a database leaving much of the form of inference to the person running the analytics
  • With additional inference layer and rich machine-readable representation, the database can take a performance hit in both reads and writes, in most cases writes are sub-optimal to reads, and therefore a write operation tends to be done in the background as a periodic bulk function and only a read operation is made available to the end user (an example of this can be seen between Wikipedia, which is available for read/writes, and DBpedia, as knowledge graph which is bulk loaded every so often and made available for read-only)
  • Many semantic graphs still work on the client-server basis with vertical scaling option of the pre-2000s era, which may pose an issue for heavy read/writes where reliance is on one core server, to avoid downtime make bulk writes offline on a hot swappable as a replicated instance, once resolved then swap the instances 
  • In a knowledge graph, only metadata is stored (data about data which is in a connected and machine-readable form) and not the changing data itself (keep the data separate from the metadata)
  • A knowledge graph is generally intended for searchability, queryability, discoverability, and findability cases and not for heavy transactional cases
  • If one wants heavy read/writes then opt for a transactional option, in most cases where such databases may not support inference nor provide semantic compliance, but may provide sharding, horizontal scaling, static property graphs in their serialized data representations which may be compliant with the tinkerpop stack, alternatively, an in-memory analytical graph may also be an option to avoid heavy I/O performance hits (bare in mind, without inference + machine-readable representation it is no longer a knowledge graph but merely a static connected graph-structured data representation stored in a database)
  • Knowledge Graph is a special case of a Knowledge Base defined as per the graphical representation that it holds in the data abstraction in form of subject-predicate-object triples, however, machine-readable knowledge does not necessarily have to be stored in form of a triple, the W3C have defined a specific set of standards for working with semantic data, but is not necessarily the only way

26 April 2021

Five Phases of AI Project

  • Definition and Hypothesis (business problem cases and identify value targets) 
  • Data Acquisition and Exploration 
  • Model Building Pipeline and Evaluation 
  • Interpretation and Communication 
  • Automation and Deployment Operations

31 March 2021

Three Approaches to Word Similarity Measures

  • Geometric/Spatial to evaluate relative positions of two words in semantic space defined as context vectors 
  • Set-based that relies on analysis of overlap of the set of contexts in which words occur 
  • Probabilistic using probabilistic models and measures as proposed in information theory

17 March 2021

TDD and BDD for Ontology Modelling

The end goal of most ontologies is to meet the semantic representation in a specific context of generalization that follows the open-world assumption. However, such model building approaches can manifest in different ways. One approach is to apply test-driven development and behavior-driven development techniques towards building domain-level ontologies where constraint-based testing can be applied as part of the process. The process steps are elaborated below.

  • Create a series of high-level question/answering requirements which can be defined in form of specification by example
  • Create SHACL/SHEX tests as granular to individual specification examples in context. Each, SHACL/SHEX validation basically tests 'ForSome' case as part of predicate logic per defined question where subset of domain/ranges can be tested.
  • Create BDD based acceptance tests and programmatic unit tests that can test logic constraints
  • At this stage all tests fail. In order to make them pass implement 'ForSome' closed-word assumption defined in the SHACL/SHEX validation i.e implement the representation so SPARQL query can answer the given contextual question for subset cases. Then make the test pass.
  • Keep repeating the test-implement-refactor stages until all tests pass the given set of constraints. Incrementally, refactor the representation ontology. The refactoring is more about building working generalizations that can transform the closed-world assumption of asserted facts to the partial open-world assumption of unknowns for the entire set.
  • Finally, when all tests pass, refactor the entire ontology solution so it conforms to the open-world assumption for the entire set i.e 'ForAll, there exists' which can further be tested using SPARQL against the subsumption hypothesis.
  • If the ontology needs to be integrated with other ontologies build a set of specification by examples for that and implement a set of integration tests in a similar manner.
  • Furthermore, in any given question/answer case identify topical keywords that provide bounded constraints for a separate ontology initiative, it maybe helpful here to apply natural language processing techniques in order to utilize entity linkage for reuse.
  • All tests and implementations can be engineered so it follows best practices for maintainability, extensibility, and readability. The tests can be wired through a continuous integration and a maintainable live documentation process.
  • Expose the ontology as a SPARQL API endpoint
  • Apply release and versioning process to your ontologies that complies with the W3C process
  • It is easier to go from a set of abstractions in a closed-world assumption to an open-world assumption than from an open-world assumption to a closed-world assumption. One can use a similar metaphor of going from relational to graph vs graph to relational in context. 
  • Focus on making ontologies accessible to users
  • OWA is all about incomplete information and the ability to infer on new information, constraint-based testing may not be exhaustive in the search space, but one can try to test against a subsumption hypothesis

5 March 2021

Allotrope Framework

Allotrope is used as a semantic framework in context of scientific data. Or is it really? The framework seems to have received awards. But, on deeper inspection, it looks like a hacked out solution, not only does it increase complexity, it also enforces tight coupling of semantics, lacks consistency, laced with bugs, lacks sufficiency in test coverage, and impedes the data lineage steps necessary for integrating with machine learning components for better explainability. The framework looks like a work in progress, lacks clarity, and accessibility of documentation. In fact, the documentation is inaccessible to public unless one becomes a member. It also only supports Java and C#, with no real equivalent Python API. However, an eventual Python API does appear to be in the works but only time will tell when such a solution is available for use. Although, the use of HDF5 as a reference data format is a good choice, the implementation on the whole as a semantic execution platform is not. And, worst of all most of the testing is enforced via SHACL validation using reversed engineering from open-world assumption to a closed-world assumption especially as data can take on multiple dimensions of complexity, not to mention running into the vicious cycle of unmaintainable and unreadable recursive validation cases where there is no clear cut way for requirement elicitation and subsumption testing. Enforcing what and how one validates semantic data is questionable at best. The framework tries to tackle data complexity with more complexity rather than keeping things simple, accessible, and reusable. After all, it is suppose to be an effort in using standards but seems to be applying them in a very opinionated fashion. That is another case in point, the entire framework lacks reusability and a lot of duplication of the work with reinventing the wheel. Data and design patterns are half-hearted and not well baked into the framework. There must be better ways of doing things than to use frameworks that impede productivity and only increase in the frustration of working with semantic data complexity and constraints where inevitably the solution becomes the problem.  

22 January 2021

Federated Protocol

  • Mastadon 
  • NextCloud 
  • PeerTube 
  • Friendica 
  • Mobilzon 
  • Pixelfed 
  • Pleroma 
  • Misskey

13 January 2021

Machine Learning in Rust

Machine Learning in Rust

Chatbot Evaluations

  • ChatEval 
  • Acute-Eval
  • SSA 
  • NUC 
  • SASSI 
  • WER 
  • DSTC 
  • DSTC2 
  • BLEU 
  • PARADISE
  • QoE
  • IQ
  • Perplexity
  • F1
  • Hits@k
  • Average Utterance Length
  • Ratio of Rare Words
  • Number of Repetitions
  • Number of System Questions
  • Comprehensible
  • Interesting
  • Topical Relevance
  • Response Incorrectness
  • Conversation Continuity
  • Engagement
  • Conversational Depth
  • Coherence
  • Domain Coverage
  • Conversational Diversity and Breadth
  • Naturalness
  • Informativeness
  • Unification Quality
  • User Satisfaction (Questionnaires)
  • User Simulations