24 November 2020

Hybrid Methods

Combining semantic rule-based methods with machine learning are the best hybrid methods for optimal contextualized results. A pure machine learning solution is rarely going to understand semantics of data, and will always return some form of confidence score as an approximation with no exact results. A pure semantic method will provide exact results based on inference and reasoning constraints. A semantic solution is also logically testable through standard programmatic methods. A machine learning solution can at best be evaluated on approximates for which formal explainability and interpretability methods are required in the process for compliance. Semantics can take the form of ontological representations like knowledge graphs and commonsense reasoning methods. When both approaches are combined there is an increase in connected context and semantics which is beneficial for formal artificial intelligence driven systems. It also provides for better transfer learning and a way of managing governance. Such methods when combined add significant accessibility value to business in form of natural language interpretability, integration, feature engineering from standards compliance, and for human-computer interaction interfaces. In essence, data is transformed into knowledge and information through the enrichment of machine-interpretable context that can grow through the mechanisms of self-learning, self-experience, and inevitably develop self-awareness about the environment. The semantic aspects act as a simulated form of associated memories that are available for forming new data associations and relationships. Such memories are formed through aspects of persistence in a knowledge graph that could be treated as in-memory cache for short-term processing and storage for long-term processing. In process, the machine learning and semantic methods feed of each other to increase in learnability and comprehension about the domain targets within the aspects of an open-world. The entire world wide web is based on the aspects of semantic resources making the internet accessible for browsing, searching, and findability. Metadata is in every technological software and hardware in use today. However, such metadata requires semantic enrichment to enable machine-interpretability which can be achieved through semantic standards. In fact, many programming languages are built using similar theoretical underpinnings of compilers, interpreters, parsers, semantics, and syntax. Many of these methods for decades have shown to be sufficiently plausible in industrial business use cases. In many respects, these are similar aspects, albeit in simpler abstractions, to human intelligence processes that are far more intricate and complex in nature. Rarely, do humans think in statistics for pattern matching and recognition where many of such processes are driven through semantic associations and are reinforced through experience. A pure statistical machine is always going to be fairly unsure about the world if based on approximations. The semantics will give it an edge to formulate meaningful associations from the world through domain relevant experiences which it is then able to interpret and analyze in context. 

20 November 2020

Why do you need a Phd

A Phd in all intents and purposes adds little value for business. Although saying that, pretentious investors do look for Phd caliber individuals at startups in order to secure funding. In such cases, the hypocrisy can be seen in the background of such investors who may have even worked for companies that were founded by college dropouts. Publication of research papers does not generate any revenue for an organization either other than to gain notoriety. Considering that 80% of all published research work amounts to nothing that is a lot of investment in wasted time and man hours. By the time, a Phd candidate completes their dissertation work, it is already outdated to be of any significant real use to industry. A Phd work can last anywhere between 3 to 6 years, considering market movements and the advancements in technology in industry, apart from outdated practical skills they would have very little to offer at time of graduation.  In many cases, an entire support function needs to be developed by an organization to cater for Phd individuals who will even need help in refactoring and scaling their work. In most cases, they will need a lot of mentoring and training for all the basic skills that they should have learned in academics that a practically applied individual with work experience already would have to offer an organization. It is questionable what quantifiable work they produce if additional resources are needed to make it of any use to business. Invariably, theory does not supersede in practice. In many cases, theory may not be plausible to implement in practice due to uncertainty and complexity reasons for which many Phd individuals have very little experience of outside of academics. And, putting them in line of influence on business product initiatives is a big risk as they come with very little practical experience. Academics is very different from how things get done in the practical world. Who really is defined as a domain expert? Is it one that has studied a topic for decades with published papers in a sheltered environment or one that has learned the art through delivering practical projects across industry domains? In fact, as more clueless Phd people find work in industry this has driven a shortage in academic institutions. This is also an indication of how bad teaching is in academia and how out of touch it is with the complexities of the real world. Considering, only one percent of the world educated population holds a Phd, it is hardly wonder they won't be missed much in industry. Perhaps, people with Phds should really stay in academics where they can teach (lack thereof to improve their teaching skills) and publish papers (lack thereof to improve on quality research) within the confines of a protected institution and leave the practical aspects to the experienced experts in business.

13 November 2020

Why do you require a Data Engineer

Companies need to fundamentally ask the question - why do you require a data engineer on your team? The role of a data engineer spans 80% of the data science method. If a data engineer is expected to work alongside data scientists then there is a valid assumption one can make about the incompetence of such data scientists especially as they are only able to fulfil 20% of their job responsibilities as per the data science method. The next follow up question naturally arises - why do you require a data engineer on your team if you already have a team of data scientists? A competent data scientist's job is to be responsible for the entire end-to-end data science method which doesn't mean just building the models but also to pipeline and scale them so they are of use to the company.  In fact, they also are responsible for doing their own feature engineering. Otherwise, this begs the question - how can one build a model without doing the feature engineering themselves? And, it further begs the question, that if feature engineering work is done by someone else other than the one building the model, it is highly likely that it would lead to an overfitted model solution that only partially solves the business case. This means there is no real need to hire a data engineer if there is already a team of data scientists. In fact, companies can significantly reduce their hiring costs by just hiring people that know and understand the data science method. Either only have a team of data engineers or only a team of data scientists as it makes literally no sense of hiring both in a team. The next follow-up question could be - if you already have data scientists on your team, then is there any point in hiring a machine learning engineer? With so many role overlaps it only spells more and more incompetence and clueless team members not to mention frustration in hiring so many people. Companies can save on costs significantly by hiring capable people that have the sense to understand their job functions and have the relevant practical skills. Furthermore, one can then proceed to ask the next question - why do you require a data engineer, a data scientist, and even a machine learning engineer, if you could just hire a team of AI engineers? Machine learning is part of AI, data engineering is part of data science function, data science is pretty much what AI engineers can also do. And, to be able to act as AI engineers you have to understand aspects of computer science. Which leads to the last and final question - ultimately don't they all require a competent software engineer with practical skills that can deliver solutions for business? Invariably, what we find in industry is that the weakest link is the data scientist, and this is usually because companies are tarred with recruiting Phds who have no practical skills apart from their academic theory that adds little quantifiable value for business in delivering products, they neither come with software engineering skills nor do they have the applied experience mindset to appreciate it. Precisely, why so many data scientists in industry are only interested in building models, because in academics, they are rarely taught the importance of feature engineering, of how to build a scalable pipeline, nor about the entire data science method. Essentially, the data engineer role becomes a gap filler for all the inadequacies and deficiencies of an incompetent data scientist resource.

2 November 2020

Are Men and Women Equal

Mathematically, two things are equal when they are exactly the same and when they represent the same object. Equality is symmetric, transitive, and reflexive. For men and women to be equal, they would have to be identically the same in attributes. Biologically, men and women are distinctly different. Men cannot give birth. Men cannot get periods. Although, physically men and women may look similar to some degree. However, mentally and hormonally they may also differ. Essentially, they have different attributes that define their aggregate makeup. Generally, laws of most countries, on paper, only recognize two genders male and female, so we can reduce the complexity here from infinite genders. Since male has XY chromosome and female has XX chromosome, for female to equal to male, X would have to equal to Y. The laws of equality dictate that if x = y, then y = x. That would invalidate the existence of two sexes. For a woman to exist or continue to exist, one would need XY for reproduction and variety assuming the first woman born cannot be born pregnant. In order for the population to grow, there is a need for XY chromosome. Although, this is quite an oversimplification from the various mutations that can occur. Lets assume X = 1, and Y = 2. In which case, through addition, X+X = 2 and X+Y = 3, 2 is not equal to 3. Lets assume the second scenario of X = Y, Y = X. In which case X+X = 2 and X+Y (which can be replaced by X) = 2 but this is not possible because that would mean both are women under the laws of equality which would refute the claim for having two sexes man and a woman. The alternative also does not hold when X is substituted for Y in which case both cannot be men. But, since men and women do exist, equality cannot hold. Precisely, why a woman is called a woman, and a man is called a man, because the two are dissimilar. In fact, one can substitute addition for multiplication and derive at the same answer. Which would imply that Y cannot equal to X, therefore XX cannot equal to XY. Invariably, in society we still find women expecting men to pay after them and even to the point of bending the knee for a proposal. Rarely, do we see a woman bending the knee to propose to a man. Women often take pride in the fact that they can multitask, while at the same time making the implication that men cannot. By making such sweeping statements, perhaps, they don't realize they put themselves in a self-defeating corner of any justifiable evidence for equality. Also, stating that biological males should not be competing against biological females automatically dismisses any argument in support for equality. Separate public bathrooms for men and women also disproves the notion of equality. Laws in most societies are there to protect women in terms of abortions, custody battles, alimony, and child support. When it comes to abortions women talk about "my body, my choice" and ignore the rights of the father. However, when the child is born, suddenly the rights of the father become apparent to the woman when she demands child support. Such expectations clearly do not define equality. However, what they do display is hypocritical double standards. Hence, they may have different needs but similar wants which is what society defines as equality. An obvious case can be found in their differential needs in a typical supermarket where women personal care aisle is separate from the men personal care aisle. If they were equal they would both share essentially the same needs in personal care products. That would imply that emotions aside, logically men and women are two distinct types of humans that cannot be equal to each other. In fact, one can go further, and make the assumption under the laws of equality for the cases of both men and women that they are essentially defined as unique individuals with uniquely defined attributes that measure the summation of their identities. No two men are ever alike. Just like no two women are ever alike. We are all unique in our own way. In fact, even identical twins eventually develop variants. In multi-faceted societies, gender roles often are affected by environment factors that go beyond economic, social, and cultural divides that influence the makeup of individual personalities and identities.

1 November 2020

Linear Annotations

In theory, annotations can be implemented in multiple different ways. However, in practice they almost always tend to be linear. Language is hierarchical therefore not very linear. In order to programmatically apply non-linear steps it would mean multiple dependency points, lots of side effects, increase in complexity, and a requirement of state. In fact, it also means the annotations become dynamic rather than static. In big data terms, this is not only a huge computational cost, it also means many raw sources will fail annotation process and put the entire pipeline into a sudden grinding halt as the data sources and annotations grow. In fact, with just only two to five annotations and thousands of data sources the computational requirements can be exhaustive and time consuming. In practice, annotations can vary from anywhere from one annotations to thousands of annotations. In media and publishing, annotations can be so huge that they are provided with their own numbering system against a set standard. There are fundamentally a few abstractions in a callable annotation process which may include: model, score, label, annotation, and metrics. Invariably, there is also a frontend component that talks to the backend model components as well as an evaluation and an adjudication process in order to validate and mediate a corpus production. A separate method may be applied to quality check the human annotations for example via active learning and predetermined evaluation metrics. The model can be linear or non-linear. However, the annotation process itself in a pipeline is almost always linear in nature. An example of a basic linearly defined and aggregated parsing steps might be an Entity Recognizer. One can utilize annotation tree structures on frontend, but this creates complexity issues at the backend model process from an input/output perspective. In fact, some model steps may follow a serially defined dependency where one model process leads into another model process step. In the cloud, data tends to follow an immutability constraint both for process as well as storage. In industry, as a result of huge cost in exponential complexity, not to mention an increase in errors with processing large number of data sources, non-linear structured annotations have not received widespread adoption outside of the scientific research community. It seems also plausible to highlight that people that suggest to businesses to use non-linear annotations, as an alternative, are likely inexperienced to understand the practical complexities that come with such architectural issues, in production mode, unless they can provide at least one successfully deployed productionized pipeline, in industry, that used non-linear annotations and the associated metrics to backup their claims. In fairness, it may just be their confusion, for something that they may regard as non-linear may just be an aggregation of linear annotation steps.