CSVW Tabular Data
CSVW Namespace Vocabulary Terms
Generating RDF from Tabular Data on Web
Embedding Tabular Data in HTML
CSV on Web Use Cases
Generating JSON from Tabular Data on Web
Metadata Vocabulary for Tabular Data
Model for Tabular Data and Metadata on Web
CSVLint
31 March 2020
CSVW
Labels:
big data
,
data science
,
linked data
,
metadata
,
natural language processing
,
semantic web
,
text analytics
30 March 2020
26 March 2020
25 March 2020
Fake Data Scientists
How to spot the fake data scientist?
- they have no clue about the processes of a data science method
- they skip the feature engineering part of the data science method
- they require data engineers to provide them cleaned data through an ETL process
- they need a whole team of technical people to support their work
- they are only interested in building models and the models they build inherently are almost always overfitted as they never bother to do the feature engineering work themselves
- they don't consider creating their own corpus as an important step of model build work
- they don't understand the value of features when training a model to solve a business case
- they have no clue how to scale, build, deploy, evaluate their models into production
- they think with a phd they know everything but practically they are zero
- they rarely bother to understand the business case nor ask the right questions
- they don't know how to augment the data to create their own corpus for training
- they don't know how to apply feature selection
- they don't know how to generalize a model so they are sat there re-tunning their overfitted models
- they spend years and years sitting in organizations building overfitted models when they could have built generalizable models in weeks and months
- they don't understand the value of metadata or the value of knowledge graphs for feature engineering
- they raise ridiculously dumb issues during agile standups like they have built a model but it doesn't have certain features (i.e they skip the feature engineering step)
- they build a model straight out of a research paper and assume the exploratory step is the entire data science method
- they use classification approaches when they should be using clustering methods
- they are unwilling to learn new ways of doing things nor are willing to adapt to change
- they prefer to use notebooks rather than build a full structured implementation of their models that can be deployed to production
- they build models that contain no formal evaluation or testing metrics
- they only partially solve a business case because they skipped the feature engineering or passed that effort to a data engineer
- they are only interested in quantitative methods and not willing to think outside the box of what they have been taught in academics
- they build academic models that are not fit for purpose for production nor do they add business value
- they require a lot of handholding and mentoring to be taught basic coding skills
- they struggle to understand research papers nor the fact that 80% of such research work is useless and of no inherent value
- they literally assume that something is state of the art when it is mentioned in a research paper rather than contextualize the model appropriateness to solve a business case
- they don't bother to visualize the data as part of exploration stage
- they don't bother to do background research to identify use cases where a certain approach has worked or not worked for a business
- they don't bother to look at reuse appropriately
- they have no understanding of how to clean data
- they try every model type until something sticks
- they don't have clarity on how the different model types work
- they don't fully understand the appropriate context of when to apply a model type
- they only know very few model methods and how to approach them for a narrow set of business cases
- they don't understand bias and variance
- they don't know whether they want accuracy or interpretability nor how to pick
- they don't know what a baseline is
- they use the wrong sets of metrics
- they incorrectly apply the train, validation, test split
- they go to the other extreme of focusing on optimization before actually solving the problem
- they have a phd and the arrogance to match, but literally no practical experience of how to be productive in applying any of it in the workplace especially against noisy unstructured data
- they come with fancy phds and spend time teaching others how to do their job, but usually require the help of everyone on team to do their own work
- they come with a phd in a specific area but have no willingness to understand other scientific disciplines in the application of data or have a tendency to outright dismiss such methods
- they think AI is just machine learning
- they want someone to hand them a clean dataset on a gold platter because they can't be bothered to do it themselves nor do they think it is an important aspect of their work
- they can't seem to think beyond statistics to solve a problem
- they have tendency of looking down on people and dismissing any one that doesn't hold a phd
- they struggle to understand basic concepts in computer science
- they need a separate resource to help them refactor their code nor will they be bothered to do it themselves
- they find services like datarobot helps their work in automating machine learning especially feature engineering which inherently allows them to build overfitted models much faster
- they can't tell the difference between structured and unstructured data
- they don't have a clue how to deal with noisy data
- they not very resourceful in hunting for datasets as part of a curation step
- they need to be shown how to google for things and basically someone constantly showing them how to do things to be practical in the workplace
- they prefer to use GUI interfaces that allow them to simply use buttons and drag/drop to build models rather than hand build it themselves
- they state that they have been a data scientist for last 20 years when the field only went mainstream in industry for last 4 or 5 years (an indication of the designated role is evidence from when it first started advertising on recruitment boards and within organizations)
- they want to apply machine learning to everything, even where it may be overkill
- they hold phd but are more than happy to plagiarize other people's work and try to take credit for it, in many cases their bit is probably just exposing it as an API
- they hold a phd but try to take credit of the entire work, even when someone else or an entire team has probably done 80% of their work
- they use personal pronouns like 'I' in most cases, but rarely do they use 'We' when working in the team
- they only care about their inputs, outputs, and dependencies for building a model rather than being flexible, considerate, and thinking as a team in looking at the bigger picture
- if your 'head of data science' uses terms like 'I don't understand' to the point of annoyance then it is a likely indication of their technical incompetence and ability in that capacity
- they think decision trees is just a bunch of rules and not a type of machine learning technique
24 March 2020
JATS-XML
Labels:
big data
,
data science
,
linked data
,
metadata
,
natural language processing
,
publishing
,
semantic web
,
text analytics
23 March 2020
20 March 2020
Metadata Validators
Google Rich Results Testing Tool
Google Structured Data Testing Tool
Yandex Structured Data Testing Tool
Structured Data Linter
RDF to SVG Bookmarklet
Bing Markup Validator
Apple App Search API Validation Tool
Open Graph Debugger
RDFa Play
Microdata Parser
Google Data Feed Validation Tool
JSON-LD Playground
Schema-DTS
Email Markup Tester
OpenLink Structured Data Sniffer
Microdata.reveal
Microdata/JSON-LD sniffer
Semantic Inspector
META SEO Inspector
Green Turtle RDFa
RDF Translator
Convert RDFa to JSON-LD
Convert Wikipedia URL to DBPedia URL
Wikidata Lookup by Name
Sindice Web Data Inspector
Corporate Contacts Markup Tester
Event Markup Tester
Google Structured Data Testing Tool
Yandex Structured Data Testing Tool
Structured Data Linter
RDF to SVG Bookmarklet
Bing Markup Validator
Apple App Search API Validation Tool
Open Graph Debugger
RDFa Play
Microdata Parser
Google Data Feed Validation Tool
JSON-LD Playground
Schema-DTS
Email Markup Tester
OpenLink Structured Data Sniffer
Microdata.reveal
Microdata/JSON-LD sniffer
Semantic Inspector
META SEO Inspector
Green Turtle RDFa
RDF Translator
Convert RDFa to JSON-LD
Convert Wikipedia URL to DBPedia URL
Wikidata Lookup by Name
Sindice Web Data Inspector
Corporate Contacts Markup Tester
Event Markup Tester
Labels:
big data
,
data science
,
linked data
,
metadata
,
natural language processing
,
semantic web
17 March 2020
9 March 2020
Informal To Formal Ontology Terminology
- building a model of some subject matter -> building an ontology
- things -> individuals
- kinds of things -> classes
- generalizations/specializations -> subClassOf
- some kind of thing -> instance or member of class
- literal is of a certain kind -> has a datatype
- relationships between things -> object properties
- attributes of things -> data properties
- kinds of literals -> datatypes
- saying something -> asserting a triple
- drawing conclusions -> inference
Labels:
big data
,
data science
,
linked data
,
natural language processing
,
ontology
,
semantic web
,
text analytics
8 March 2020
Bartoc
Labels:
big data
,
data science
,
linked data
,
natural language processing
,
ontology
,
semantic web
,
text analytics
Industry-Specific Taxonomies and Ontologies
General Domains:
Financial:
Energy & Environment:
Ecommerce & SEO:
Pharma & Healthcare:
Business:
- Universal Decimal Classification Summary
- School Online Thesaurus
- Getty Art & Architecture Thesaurus
- Unesco Thesaurus
- GeoNames
- Springer Nature SciGraph
- UKAT Archival Thesaurus
Financial:
Energy & Environment:
Ecommerce & SEO:
Pharma & Healthcare:
Business:
Science & Technology:
- ACM Computing Classification
- Information Technology Glossary
- ITIL
- UDC
- Springer Nature SciGraph
- GESIS
- BioPortal
Government Organizations:
Labels:
big data
,
data science
,
linked data
,
natural language processing
,
ontology
,
semantic web
,
text analytics
Chebi
Labels:
big data
,
data science
,
natural language processing
,
ontology
,
semantic web
,
text analytics
Six Types of OWL Assertions
- Individual things
- Kinds of things
- Individual of a certain kind
- More specific things and more general kinds of things
- Relationships that connect things to other things
- Relationships that connect things to literals
Labels:
big data
,
data science
,
linked data
,
natural language processing
,
ontology
,
semantic web
,
text analytics
Strigi
Labels:
big data
,
data science
,
linked data
,
natural language processing
,
semantic web
,
text analytics
6 March 2020
Core Representations in OWL
- Individual Things
- Kinds of Things
- Kinds of Relationships
Labels:
artificial intelligence
,
big data
,
data science
,
linked data
,
natural language processing
,
ontology
,
semantic web
,
text analytics
5 March 2020
Ontology Preliminary Development Activities
- Select subset of a domain, use cases, and term list
- Identify concepts and preliminary relationships between them
- Research existing ontologies/vocabularies to determine reusability
- Identify and extend concepts in existing ontologies/vocabularies
- Connect concepts through relationships and constraints, derived from term list and competency questions
- Conduct basic tests to make sure what is produced is consistent
Labels:
big data
,
data science
,
linked data
,
natural language processing
,
ontology
,
semantic web
,
text analytics
2 March 2020
Infura
Labels:
big data
,
data science
,
distributed systems
,
finance
,
intelligent web
,
security
,
semantic web
,
webservices
1 March 2020
Subscribe to:
Posts
(
Atom
)