Mabble Rabble: March 2020

31 March 2020

CSVW

CSVW Tabular Data
CSVW Namespace Vocabulary Terms
Generating RDF from Tabular Data on Web
Embedding Tabular Data in HTML
CSV on Web Use Cases
Generating JSON from Tabular Data on Web
Metadata Vocabulary for Tabular Data
Model for Tabular Data and Metadata on Web
CSVLint

30 March 2020

Representation Learning on Networks

Snap
Comprehensive Survey
Graph Neural Networks

26 March 2020

Gist Upper Ontology

Gist

25 March 2020

Fake Data Scientists

How to spot the fake data scientist?

they have no clue about the processes of a data science method
they skip the feature engineering part of the data science method
they require data engineers to provide them cleaned data through an ETL process
they need a whole team of technical people to support their work
they are only interested in building models and the models they build inherently are almost always overfitted as they never bother to do the feature engineering work themselves
they don't consider creating their own corpus as an important step of model build work
they don't understand the value of features when training a model to solve a business case
they have no clue how to scale, build, deploy, evaluate their models into production
they think with a phd they know everything but practically they are zero
they rarely bother to understand the business case nor ask the right questions
they don't know how to augment the data to create their own corpus for training
they don't know how to apply feature selection
they don't know how to generalize a model so they are sat there re-tunning their overfitted models
they spend years and years sitting in organizations building overfitted models when they could have built generalizable models in weeks and months
they don't understand the value of metadata or the value of knowledge graphs for feature engineering
they raise ridiculously dumb issues during agile standups like they have built a model but it doesn't have certain features (i.e they skip the feature engineering step)
they build a model straight out of a research paper and assume the exploratory step is the entire data science method
they use classification approaches when they should be using clustering methods
they are unwilling to learn new ways of doing things nor are willing to adapt to change
they prefer to use notebooks rather than build a full structured implementation of their models that can be deployed to production
they build models that contain no formal evaluation or testing metrics
they only partially solve a business case because they skipped the feature engineering or passed that effort to a data engineer
they are only interested in quantitative methods and not willing to think outside the box of what they have been taught in academics
they build academic models that are not fit for purpose for production nor do they add business value
they require a lot of handholding and mentoring to be taught basic coding skills
they struggle to understand research papers nor the fact that 80% of such research work is useless and of no inherent value
they literally assume that something is state of the art when it is mentioned in a research paper rather than contextualize the model appropriateness to solve a business case
they don't bother to visualize the data as part of exploration stage
they don't bother to do background research to identify use cases where a certain approach has worked or not worked for a business
they don't bother to look at reuse appropriately
they have no understanding of how to clean data
they try every model type until something sticks
they don't have clarity on how the different model types work
they don't fully understand the appropriate context of when to apply a model type
they only know very few model methods and how to approach them for a narrow set of business cases
they don't understand bias and variance
they don't know whether they want accuracy or interpretability nor how to pick
they don't know what a baseline is
they use the wrong sets of metrics
they incorrectly apply the train, validation, test split
they go to the other extreme of focusing on optimization before actually solving the problem
they have a phd and the arrogance to match, but literally no practical experience of how to be productive in applying any of it in the workplace especially against noisy unstructured data
they come with fancy phds and spend time teaching others how to do their job, but usually require the help of everyone on team to do their own work
they come with a phd in a specific area but have no willingness to understand other scientific disciplines in the application of data or have a tendency to outright dismiss such methods
they think AI is just machine learning
they want someone to hand them a clean dataset on a gold platter because they can't be bothered to do it themselves nor do they think it is an important aspect of their work
they can't seem to think beyond statistics to solve a problem
they have tendency of looking down on people and dismissing any one that doesn't hold a phd
they struggle to understand basic concepts in computer science
they need a separate resource to help them refactor their code nor will they be bothered to do it themselves
they find services like datarobot helps their work in automating machine learning especially feature engineering which inherently allows them to build overfitted models much faster
they can't tell the difference between structured and unstructured data
they don't have a clue how to deal with noisy data
they not very resourceful in hunting for datasets as part of a curation step
they need to be shown how to google for things and basically someone constantly showing them how to do things to be practical in the workplace
they prefer to use GUI interfaces that allow them to simply use buttons and drag/drop to build models rather than hand build it themselves
they state that they have been a data scientist for last 20 years when the field only went mainstream in industry for last 4 or 5 years (an indication of the designated role is evidence from when it first started advertising on recruitment boards and within organizations)
they want to apply machine learning to everything, even where it may be overkill
they hold phd but are more than happy to plagiarize other people's work and try to take credit for it, in many cases their bit is probably just exposing it as an API
they hold a phd but try to take credit of the entire work, even when someone else or an entire team has probably done 80% of their work
they use personal pronouns like 'I' in most cases, but rarely do they use 'We' when working in the team
they only care about their inputs, outputs, and dependencies for building a model rather than being flexible, considerate, and thinking as a team in looking at the bigger picture
if your 'head of data science' uses terms like 'I don't understand' to the point of annoyance then it is a likely indication of their technical incompetence and ability in that capacity
they think decision trees is just a bunch of rules and not a type of machine learning technique

24 March 2020

ScholarlyHTML

JATS-XML

23 March 2020

Multi-Armed Bandits

Multi-Armed Bandits
Bandit Algorithms
Bandit Book
Survey of Practical Applications
Regret Analysis
Survey of Contextual Bandits

MQTT

20 March 2020

Metadata Validators

Google Rich Results Testing Tool
Google Structured Data Testing Tool
Yandex Structured Data Testing Tool
Structured Data Linter
RDF to SVG Bookmarklet
Bing Markup Validator
Apple App Search API Validation Tool
Open Graph Debugger
RDFa Play
Microdata Parser
Google Data Feed Validation Tool
JSON-LD Playground
Schema-DTS
Email Markup Tester
OpenLink Structured Data Sniffer
Microdata.reveal
Microdata/JSON-LD sniffer
Semantic Inspector
META SEO Inspector
Green Turtle RDFa
RDF Translator
Convert RDFa to JSON-LD
Convert Wikipedia URL to DBPedia URL
Wikidata Lookup by Name
Sindice Web Data Inspector
Corporate Contacts Markup Tester
Event Markup Tester

OpenFog

17 March 2020

Security Threat Modeling

Stride
Attack Tree
Dread
OWASP

9 March 2020

Individualism & Economic Order

Informal To Formal Ontology Terminology

building a model of some subject matter -> building an ontology
things -> individuals
kinds of things -> classes
generalizations/specializations -> subClassOf
some kind of thing -> instance or member of class
literal is of a certain kind -> has a datatype
relationships between things -> object properties
attributes of things -> data properties
kinds of literals -> datatypes
saying something -> asserting a triple
drawing conclusions -> inference

8 March 2020

Bartoc

Bartoc.org

Industry-Specific Taxonomies and Ontologies

General Domains:

Media & Publishing:

Financial:

Energy & Environment:

Ecommerce & SEO:

Pharma & Healthcare:

Business:

Science & Technology:

Government Organizations:

Chebi

Knowledge Graph Tools

Slipo
Semantify.it
XLWrap
Mapping Master
XMLToRDF
Tripliser
Silk
LDIF
R2RML

Six Types of OWL Assertions

Individual things
Kinds of things
Individual of a certain kind
More specific things and more general kinds of things
Relationships that connect things to other things
Relationships that connect things to literals

Semantic Desktop

Strigi

SIMILE Projects

Haystack Projects

6 March 2020

Core Representations in OWL

Individual Things
Kinds of Things
Kinds of Relationships

5 March 2020

Ontology Preliminary Development Activities

Select subset of a domain, use cases, and term list
Identify concepts and preliminary relationships between them
Research existing ontologies/vocabularies to determine reusability
Identify and extend concepts in existing ontologies/vocabularies
Connect concepts through relationships and constraints, derived from term list and competency questions
Conduct basic tests to make sure what is produced is consistent

31 March 2020

30 March 2020

26 March 2020

25 March 2020

24 March 2020

23 March 2020

20 March 2020

17 March 2020

9 March 2020

8 March 2020

6 March 2020

5 March 2020

2 March 2020

1 March 2020