Brief recap of the Open Data Science Conference - San Francisco, October 2019

On AI ROI: the questions you need to be asking Kerstin Frailey Metis

Success is unpredictable in AI – feasibility is oftenunknown before a project has begun. Projects are esoteric – require highlyspecialized training. Application is new – methods to track ROI haven’t beenadjusted for AI; managing AI is a challenge

Performance is volatile – and there’s an iterativelifecycle. Feedback loops and response to AI intervention speed up theexpiration of data and it’s dependent models

Targets are fuzzy – executives don’t have experience with AIprojects so they can’t set clear expectations. Data scientists inevitably missthe intangible goals

Data science teams strive to achieve good data science;which doesn’t always translate to achieving business goals

Data scientist are not informed of the strategy of thebusiness

AI ROI is urgent – companies continue to double down oninvestments even without seeing ROI on their investments. Investments will dryup

Planning – what type of ROI should we expect? Some projectsare explorations for gaining knowledge, some are recommendations or informingdecisions, and some are automation interventions; e.g. fraud prevention project– to interfere and not allow fraudulent transactions from going through.Identifying fraudulent transactions isn’t impactful – reducing the costs due tofraud is the desired impact. Sanity check – is this pure automation play? Coulddescriptive statistics do the job without AI? Does it just seem fun butpromises no clear impact? Is it more cost-effective than pre-exiting orvendor-based solutions? ensure that the project is strategically positioned andif the system can be leveraged elsewhere

Development  - what kind of errors do we prefer, whenwe compare models and presenting the models to the business, have to show we’vethought about this. What kind of volatility can we handle? Volatility inperformance is expected; ask your stakeholders what they can accept. Benchmark-simulate, use ghost mode and control group to see how performance compares.Sanity check – is the solution user friendly? An the solution be scaled toaddress the entirety of the problem? Are we building feedback loops? Will thedata become stale quickly? Will we need to update this model often?

Deployment – is it performing close to expectations – arebusiness metrics moving as expected in response to the data science metrics? Isvariability close to expectations?

Governance - Sanity check – are we monitoring feedbackloops, are we iterating to the point of overoptimizing, are we duct-tapingupdates and iterations?

Peter Welinder OpenAI - Learning Robot Dexterity

In order to interact with the world; we have to make contactwith it. Open AI wants to combine learning and manipulation to allow robots tocarry out useful tasks. They decided to test a rubrics cube – they can do thisin a fraction of a second

Past – high robotics expertise

Future – all you need is learning

  1. Deep reinforcement learning – learning by trial and error; like teaching a dog. Drawback is that it takes a long time to get results
  2. Simulation to reality
  3. Cool results

Harry Glaser Sisense - sources of bias: strategies fortackling inherent bias in AI

AI judge developed by UCL computer scientists

AI could identify gang crimes and ignite an ethicalfirestorm

False positives are a societal issue here; a machine treatsa specific geography as “gang related” and results in severe punishments

Another example is using facial recognition during TSAsecurity check. This was an issue for Asian race; you have to build a team thatrepresents the wider world that you plan to apply your model on; the broaderand more diverse team you can get – the better the outcomes of the models.

AI unchallenged runs a strong risk of delivering immoraloutcomes. It is your job and responsibility as a data professional to use yourskills to be the moral compass of your organization and make it right.

Turning data teams into superheroes – Sisense – think abouthuman outcomes

Google employees quit over controversial pentagon work;targeting ads vs targeting drone strikes – got to the right outcome because theengineers that developed the system took ownership and knew the differencesbetween outcomes

How can you incorporate data ethically?

Holistic metrics – grade yourself on metrics you think matchwell with societal outcomes

Representative & diverse teams – key to building modelswith a positive outcome on society

Check your sources – get diverse sources and consider thebias in your data

Who do you report to?

Data professionals reporting directly to sales lean towardsthe biases of sales objectives. Centralized data teams are more likely toremove bias.

You need a Chief Data Office that thinks holistically aboutthe ethical use of data; CDO is conscious of the organization.

You are the conscience of AI – this is your responsibility

Building AI Products: delivery vs. discovery

Companies face challenges with getting data science to workfor them;

Information technology – integrate deploy and mange finishedproducts in production

Software engineering design and code new products using bestpractices

Data engineering – build data pipelines that collect,organize and validate data

Data science- discover the unknown patterns in data andalgorithms that add business value

Data science is different – cross functional engineering,product, marketing, finance – must work autonomous – separate from the traditionalengineering product lifecycle, self-organizing and self-managing. It’s alsoexperimental; - form a hypothesis, analyze data, make predictions, run backtests, a/b testing. It’s also self-sustaining – not a cost center, generates arevenue

Problem is companies hoard data; data stores are a costsink.

Rapid prototyping is key for data science; back of theenvelop calculations; simple experiments; don’t make plans – make tests. Repeatuntil it works.

Kirk Borne Adapting Machine Learning Algorithms to Novel Use Cases

It’s not about telling the business about the coolness ofyour algorithm – it’s about connecting it to their needs and using storytellingto do this

Innovations are inspired by data, informed by data, enabledby data, and create value from data

Confucius says “study your past to know your future” –machine learning

Travel sites raise prices for mac users because they assumethey make more money and would be willing top pay more

Most important thing in your data is metadata

Hilary mason – getting specific about algorithm bias

Facial recognition products fail to do a good job withdarker shades of skin; 99.7 white male and 65.3% darker female.

Sources of bias enter at different stages; machine learningcan amplify bias

People are more likely to assume algorithms are objective orerror free -even if they’re given the option of a human override

Algorithms are more likely to be implemented with no appealsprocess in place

Algorithms are often used at scale

The privileged are processed by people; the poor are processed by algorithms (Cathy O’Neil)