18 May 2022

Knowledge Graph Libraries

  • PyKeen
  • LibKGE
  • OpenKE
  • Graphvite
  • PyKg2Vec
  • DGL
  • DGLKE
  • DeepGraph
  • PBG
  • Ampligraph
  • Graphscore
  • KarateClub
  • Scikit-KGE
  • Stellargraph
  • KGTK
  • MyDig
  • Limes
  • Dedupe
  • Karma
  • Silk

15 May 2022

Foundations of Mathematics

One of the oddest thing in the machine learning and the data science community is the complete discredit of logics. The fact of the matter is that without logics nothing is possible. In fact, logic is the basis of computation. One can derive bayesian statistics from the axiomatization of logics. It is quite shocking that many statisticians never gain the foundational appreciation of logics. In fact, when one looks at decision theory they are back in the world of logics. Statistics is built on three foundational elements of mathematics: Logics, Arithmetic, and Set Theory.

The following books highlight the embedded logics in probability and statistics.

Commonsense Reasoning Timeline

  • Cyc (1984)
  • Open Mind Common Sense (1999)
  • OpenCyc (2004)
  • ConceptNet (2004)
  • ResearchCyc (2006)
  • Nell (2010)
  • WebChild (2014)
  • Nell (2015)
  • WebChild 2.0 (2017)
  • ConceptNet 5 (2017)
  • Atomic (2019)

WebChild

WebChild

Atomic

Atomic

14 May 2022

Research Directions for Self-Attention Efficiency

  • Low-Rank Kernels
  • Fixed/Factorized Random Patterns
  • Learnable Patterns
  • Memory
  • Recurrence

11 May 2022

Fundamental Methods of Prediction Speed-Ups

There are four fundamental ways in which one can speed-up prediction and reduce memory footprint of transformer models:
  • Knowledge Distillation
  • Quantization
  • Pruning
  • Graph Optimization

8 May 2022

DiffBot

Diffbot is one of the most useless solutions out there for harvesting the web. In fact, their solution is basically what google already provides for free. They also use methods that have been used by multiple providers for last twenty years. They in fact do not provide a knowledge graph. The solution is simply indexed crawl that one can replicate with elasticsearch. Or, even use commoncrawl data. What they are doing is trying to make a fool out of organizations and charging a premium for it. There are so many free alternatives out there that do a better job. In fact, their notion of a knowledge graph is a marketing gimmick. The knowledge graph has no real semantics and provides for no meaningful inference. Even the data they extract is basically data, and not machine-readable. They add virtually no real metadata. In fact, their solution does not even utilize the schema.org let alone any W3C standards. They also don't follow web etiquettes of obeying the robot.txt. Diffbot utilizes a ruthless form of crawling by hiding itself as a human visitor via spoofing. In most cases, their approach is also likely to violate GDPR. There is also no real deep learning models being used for either computer vision, AI, or natural language processing. This is a perfect example of an organization trying to sell something that has no real value to would be customers. 

6 May 2022

Fake Job Ads

The growing amount of fake job ads in data science and AI industry sectors is an indication of incompetent data scientists where job ads whose job titles don't match the job description. Job ad saying it is 'AI Engineer' that doesn't require any AI skills, then why call it an AI Engineer role? Such practices need to stop. Usually, an indication of fake job ads is when they mention that you will be working along side data scientists. Either the company does not know what they are doing or they don't know that AI Engineers are more than capable of doing the data scientists job. In fact, one will find that even job ads that mention machine learning engineers are simply about supporting some clueless data scientist whose job should include being able to pipeline and scale out their own work. Don't hire incompetent data scientists and waste your time having to create new roles to fill all their skills gaps and not only that but the new roles are fake where the job titles don't match the job descriptions. Such practices only lead to recruitment fraud, big teams where roles have significant overlaps, and unnecessary cost to organizations. 

Similar indication of clueless hiring processes where they could simply call it a software engineer role:
 
  • Computer Vision Engineer - where they don't need computer vision skills because clueless data scientists are building their models in jupyter notebooks 
  • NLP Engineer - where they don't need NLP skills because clueless data scientists are building their models in jupyter notebooks 
  • ML Engineer - where they don't need ML skills because clueless data scientists are building their models in jupyter notebooks 
  • Ontologists - who need KG Engineers to help them build a pipeline, to do integration between ontologies, to help you build the codified ontologies, and everything that would make you wonder why you even bothered to hire an ontologist in first place? 
  • Researchers - who need software engineers to help them scale out the work of their hacked out artefacts, who make you wonder why you bothered to hire a researcher with a Phd who hasn't even got the basic skills? 
  • Data Engineers - who are needed to help with 80% of the data scientists job like feature engineering, cleaning data, ETL, pipelining...who make you wonder why you bothered to hire a data scientist?

Essentially, any data scientist that is only interested in building models is a clueless data scientist: 

  • they are only capable of doing 20% of their job 
  • the models they build are invariably overfitted to the data because someone else is doing feature engineering for them 
  • they don't understand that they are responsible for the entire data science method including pipeline and scale out of their work into production 
  • you need to keep hiring and creating new roles to fill all their skills gaps like data engineer, mlops engineer, and so on
  • especially, if they are building a deep learning model 
  • and, if you are having to fill all their roles not only are you stuck creating fake job ads but at same time they provide for a lot of skills overlaps in teams

Why not then simply either remove the role of the data scientist/researcher, or simply hire more capable people and save up on cost of hiring for incompetence where academics does not replace practical skills. Such practices will only get worse in organizations if they are seeking advice from Phd individuals who simply have academic backgrounds and limited practical skills. And, in most organizations practical skills should trump over academic backgrounds when costs and revenues are important for a business to meet profitability targets.

3 May 2022

Weird Interviews

The worst interview questions and situations are ones where the interviewer is antagonistic and passes more assumptions about the candidate rather than trying to explore the candidate background and experiences. They also tend to be situationally awkward and make the candidate feel uncomfortable in some way. A lot of the below questions and situations are related to discrediting the candidate, because likely they either feel threatened by the candidate, don't like the candidate, are clueless themselves, generally employ clueless people, or prefer someone else. Sometimes, it can also be linked to institutional racism and gender biases. Also, if they keep asking the same questions over and over again shows the interviewer has poor listening skills, bad organizational skills, could not be bothered to take notes, or are simply just disinterested in the candidate. Usually, interviewers that are disorganized and guided by assumptions tend to indicate red flags either about the team or the organization. Generally, such lines of interactions spell a lot of negativity around the role, and unprofessional practices, which tend to be ones that a candidate should not bother to pursue. 

  • You didn't do this yourself, did you?
  • Someone else did this for you, right?
  • This wasn't done by you, right?
  • So, what part of the work did you do?
  • Most of this work was done for you, right?
  • Asking the same question to the candidate about their past roles multiple times via email and through every interview stage?
  • Any other questions where they make more assumptions about candidate?
  • Reading a bunch of questions from a cribsheet, some of which the interviewer does not even know the answer to, or funnily enough has wrong answers to?
  • Giving puzzles for the candidate to solve on the board that even the interviewer cannot solve themselves?
  • Interviewer arriving 1 to 2 hours late to an interview and expecting the candidate to be ok with it?
  • Interviewer having issue with the way you greet them?
  • Interviewer refusing to shake your hand after the interview?
  • Interviewer commenting on your racial background or sharing their opinion of people of specific backgrounds?
  • Interviewer openly sharing their gender biases during the interview?
  • Interviewer that can't stop playing politics even while they are in the interview?
  • Interviewer that asks the candidate to make tea or coffee for them while they are sat there with a grin on their face?
  • Interviewer that doesn't want the candidate to have that job because they applied to it as well?
  • Interviewer that actively practices nepotism and cronyism for job placements?
  • Interviewer with poor sense of hygiene?
  • Interviewer making sexual remarks and advances directed towards the candidate?
  • Interviewer being overly touchy towards the candidate?
  • Interviewer dressing inappropriately to the interview?
  • Interviewer constantly having their pet cat or dog interrupt the interview?
  • Interviewer having a snack or lunch during the interview?
  • Interviewer that rather than being bothered to interview you, spends their focus on checking emails or doing work on their laptop?
  • Interviewer that rather than interviewing you, spends the time talking about their spouse or their divorce?
  • Interviewer that dismisses the candidate's technical replies to something as being inaccurate while to the contrary being incorrect and inaccurate themselves?