If a start-up stage company is just getting started on their AI/ML journey has budget for 3 FTE people. The company already has traditional ETL/BI expertise and a "DWH". Who would you hire (data scientist, ML engineer, Dat Engineer) and how would you allocate the division of responsibilities?
1. Depending on your application, the end users can be from different target market/background. If that will be the case with your app, list down the top X markets and create a specific landing page which "talks" the language of the target market.
2. Cold-outreach, Find your ideal target customers on LinkedIn/Twitter, google them, message/email them on social media (lead finding tools) and ask for help. Be willing to offer to pay them for 10-15 minutes of their time. At least a few will help without asking for money.
3. Assuming what you are selling is described on a landing page (doesn't have to be), you can do a user test by asking questions to consumers using survey tools)
the goal is find out if users understand what you are trying to sell (clarity of message, trustworthiness)
You can use tools like Survey Monkey, Google Surveys, even Facebook ads.
here are a couple of examples of a purpose-built tool for feedback called ninjafeedback:
@legg0myegg0 thanks for bringing up Dremio. Can Dremio connect with different databases RDBMS, Kafka, DataLake PRC file formats? Is the use case limited to certain data stores? Is the use-case primarily a united query engine (so that the code remains consistent across DB engines) or is the use case query acceleration?
@gwittel, appreciate you sharing your insights. Will you be able to elaborate on "RDBMS will have natural limitations"? Can you provide a specific example?
Presto gets most of its speed from parallelizing work and taking advantage of columnar formats when it can.
In the case of a RDBMS can you get performance gains if you try to parallelize a query from many clients? It will depend on the DB adapter and query. In a random case, if you slice a query into N shards it’s not necessarily going to go faster. It’s still the same DB underneath bound by the same HW performance boundaries.
Yeah this is a common misconception. Trino and Presto were aimed to replace and speed up the Hive engine.
As you say gwittel, adding Trino to an RDBMS itself won't speed things up. However, if you have operational data sitting in that RDBMS and data sitting in a data lake somewhere on like S3, then you can quickly join those datasets together.
Trino does its best to take advantage of any existing indexes that the RDBMS has by doing a pushdown but won't return that data any faster than the underlying database could. But it's the joining with other data sources data sets that makes the RDBMS connector worthwhile.
If you have a 1GB customer dataset in mysql and a 100TB dataset in s3 of all your orders, then Trino will first run a quick query against your mysql database, get a list of customer ids that meet the query, and then will use that list to filter the order id.
SELECT * FROM mysql.db_name.customer AS c JOIN s3.db_name.orders AS o ON c.id = o.customer_id WHERE c.credit_card_num = 123456789;
• There are many databases and tools but where is the data platform?
• There are only three roles in data
• Data visualization color guide
• The Future of the (Modern) Data Stack
• Indexes in Postgres